Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A COM PUTATIONAL DESCRIPTION OF THE O r g a n i z a t i o n o f h u m a n r e a c h i n g a n d PREHENSION by Bruce Richard Hoff A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) August 1992 Copyright 1992 Bruce Richard Hoff UMI Number: DP22847 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. Published by ProQuest LLC (2014). Copyright in the Dissertation held by the Author. Dissertation Publishing UMI DP22847 Microform Edition © ProQuest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106- 1346 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 Bruce Richard Hoff under the direction of h .i.§..... Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of Ph-P This dissertation, written by DOCTOR OF PHILOSOPHY Dean o f Graduate Studies D a te......August,.11..1332 DISSERTATION COMMITTEE Chairperson ii i This dissertation is dedicated to m y parents, Dale and Barbara, for giving inspiration, allowing freedom, having patience, making it possible. I ( A c k n o w l e d g m e n t s This experience has tru ly been an exercise in the coalescing of a large body of know ledge. I m ust credit m y advisor Prof. A rbib for asking the m o tiv atin g questions, an d p u sh in g m y curiosity into a v ariety of d iscip lin es, in c lu d in g neu ro scien ce, b io m ech an ics, m o to r b eh av io r, robotics, an d engineering. Dr. Thea Iberall gave crucial encouragem ent early in the early years, w hen our research g ro u p w as still form ing, and peer su p p o rt w as thin. Prof. H auser is to be com m ended for generously giving tim e to discuss control theoretic issues, an d for bearing m y novice efforts in optim al control, before I engaged in form al instruction from Prof. P appavassilopoulos, w ho is to be acknow ledged for teaching the subject well. I thank M arc Jeannerod and C laude Prablanc, of ESTSERM, L yon for w elcom ing m e into th eir m otor behavior lab o rato ry an d for p ro v id in g fruitful discussions th at increased the quality of m y m odeling w ork. T hey are p a rt of the in tern atio n al H um an Frontier Science P rogram g ro u p to w h ich I am in d eb ted for fu n d in g m y research. A d d itio n a l s u p p o rt cam e from th e A u d ito ry an d V isual P erception Research L aboratories (A.T.R.), Kyoto, Japan. A lberto Borghese provided b o th in terestin g theoretical discussion an d m an y b ru te m an -h o u rs of program m ing in neural netw ork im plem entation. Secretaries ten d to be the un sp o k en heroes in in d u stry , business, an d academ ia. I'd like to th an k P au lin a Tagle for sm oothing potential problem s th ro u g h o u t the years. The cam arad erie of fellow stu d e n ts w as in d isp en sab le. P eter D om iney a n d R eza S h ad m eh r h av e a g e n u in e excitem ent for th eir research w hich is quite contagious and m otivating. A lfredo W eitzenfeld and I shared m oral su p p o rt in our parallel struggles as o u r theses evolved. I appreciate the enjoyable atm osphere created by cow orkers Jeff Teeters, J iv Jim Liaw , Fernando Corbacho, Jean M arc Fellous, N icolas Schw eighofer, Irw in King, and Lucia Simo. I received w arm su p p o rt from m any good friends a t U.S.C. o utside the laboratory, including G reg F riedm an, John K nauf, M ike G arm on, an d K irk B rennan. M y sister C athy has been a w elcom e "E-m ail pal" th rough those long days in the lab. I m u st thank Jeff Teeters an d the "G.P.A.C." for show ing m e the best w ay in the world to blow off steam . I w an t to specifically thank ultram arathon ru n n ers Bob "The M onk" A djem ian, A1 "Sore Eyes" Solish, Steve E lder, an d R ichard Velez for taking m e u n d e r their w ing an d show ing m e th e b eau tifu l country side in G riffith P ark an d the A ngeles N ational Forest, at high speed! Lastly, I w ant to acknow ledge the love, support, and patience given to m e by m y fam ily, especially m y parents Dale and Barbara. C o n t e n t s Acknowledgments iii List of Figures viii Abstract xi 1 Introduction: Computational Modeling as a Tool for Understanding Motor Behavioral Processes 1 1.1 Problem Statement............................................................... 1 1.2 The Approach........................................................................ 2 1.3 Scope of The Work................................................................ 7 1.4 Summary of Results and Contributions............................8 1.5 Organization of the Dissertation.................................... 9 2 Modeling Trajectories of Reach under Normal Conditions and Target Perturbations 12 2.1 Behavioral Phenomena During Reach............................ 12 2.2 Modeling the Control of Reach..................................... ..16 2.3 Simulation Results...........................................................28 2.4 Discussion............................................................................31 3 A Model of Duration in Normal and Perturbed Reaching Movement 34 3.1 Introduction...................................................................... 34 3.2 The Minimum Jerk / Minimum Time Model..................35 v i 3.3 Modeling Perturbation Data............................................ 43 3.4 Discussion............................................................................48 4 The Coordination of Reach and Grasp in Prehension: Normal and Perturbed Conditions 52 4.1 Introduction...................................................................... 52 4.2 Investigations of Hand Transport and Prehension Interaction...........................................................................55 4.3 Modeling Transport and Prehension Interaction..........59 4.3.1 Temporal Interaction of Transport and Prehension...................................................................59 4.3.2 Trajectory Generation for Transport..................... 64 4.3.3 Trajectory Generation for Preshape and Enclose .69 4.4 Simulations Using the Transport / Prehension M odel...................................................................................74 4.4.1 Simulating Perturbed Location and Size................74 4.4.2 Simulating Jeannerod (1981).................................. 82 4.4.3 Predictions of the Model.........................................84 4.5 Discussion............................................................................85 4.5.1 Comments on the Maximum Duration Model 87 4.5.2 Observations on Enclose Time...............................88 4.5.3 Critique of an Alternative Model.......................... 90 4.5.4 Suggested Experiments............................................. 91 5 The Speed / Accuracy Trade-off: A Stochastic Optimal Control Analysis 92 5.1 Behavioral Background and Past Models of Variability in Reaching...................................................... 92 5.2 The Problem to be Studied...............................................100 vii 5.3 Discrete Time Optimal Control Using Dynamic Programming................................................................... 104 5.3.1 Formalization of the Delayed Feedback...............105 5.3.2 Optimization by Dynamic Programming..............108 5.4 Expected Value and Variance of the Trajectory.......... 116 5.5 Simulation Results......................................................... 118 5.6 Discussion.......................................................................... 128 6 Arm Dynamics in Trajectory Formation 132 6.1 Optimization Choices in Trajectory Modeling........... 132 6.2 The Prablanc / Martin Perturbed Pointing Task.......... 135 6.3 Dynamic Optimization and Perturbation Modeling.. 138 1 6.5 Conclusion.......................................................................149 7 Learning Optimization Of Dynamic Processes: A Neural Network Model 151 7.1 Connectionist Approaches to Trajectory Learning......151 7.2 An Application of Reinforcement Learning.................. 163 7.3 Discussion.......................................................................... 167 8 Conclusion 170 Appendices 172 Appendix A. Analysis of Variance (ANOVA).......................172 Appendix B. Description of the NSL Simulation of the Transport / Prehension Model.......................................177 Appendix C. Matrix Differential Calculus......................... 181 References 183 viii L is t o f F ig u r e s 1.1. C oordinated control program (CCP) for reach and g rasp ........................... 4 2.1. U sing feedback to generate a tim e varying trajectory................................ 25 2.2. Target perturbation experim ent of Pelisson et al. (1986)...........................30 2.3. Trajectory reversal experim ent of G eorgopoulos et al. (1981).................. 32 3.1. M ovem ent duration in Pelisson et al. (1986).................................................44 3.2. M acaque h and trajectories in target reversal task of G eorgopoulos et al. (1981).........................................................................................................................45 3.3. M ovem ent tim e data and m odel prediction for the target reversal experim ent of G eorgopoulos et al. (1981)............................................................... 45 3.4. Target location perturbation experim ent of Paulignan et al. (1991)..... 47 3.5. M ovem ent tim e data and m odel prediction for the target p ertu rb atio n experim ent of Paulignan et al. (1991).............................................48 3.6. U pdated version of trajectory controller from C hapter 2, show ing d u ratio n dependency on target distance an d current lim b state.................... 49 4.1. C oordinated control program for reach and g rasp......................................54 4.2. T ransport and preshape for u n p ertu rb ed trials of Paulignan et al. (1991a)................................................................................................................................. 56 4.3. T ransport and preshape for perturbed trials of Paulignan et al. (1991a)................................................................................................................................. 57 4.4. Feedback controllers for transport, preshape, and enclose.......................62 4.5. Sam ple w rist velocity profile fit w ith a fourth o rder polynom ial 67 ix 4.6. Ten each of w rist paths for perturbed-left and perturbed-right cases. M odeled w rist paths, assum ing various term inal acceleration v a lu e s ................................................................................................................................. 68 4.7. U n p ertu rb ed m ovem ent to center target.......................................................76 4.8. D ata an d sim ulation for tran sp o rt and prehension w hen target location is p erturbed to the left..................... 77 4.9. D ata and sim ulation for tran sp o rt and prehension w hen target size is unexpectedly increased (S-L)...................................... 80 4.10. D ata and sim ulation for tran sp o rt and prehension w hen target size is unexpectedly decreased.....................................................................................81 4.11. Results for replication of the kinem atics recorded in Jeannerod (1981)................................................................................................................................... 83 4.12. Results of sim ulating a short, quick m ovem ent........................................ 85 4.13. Results of sim ulating a m ovem ent w ith perturbation of both j target location and size 86 ; 5.1. W rist velocity d u rin g peg-in-hole insertion, for holes of various sizes......................................................................................................................................94 5.2. V ariability in w rist, thum b, and index finger du rin g reaching to grasp a sm all vertical dow el........................................................................................97 5.3. Schem atic of controller and controlled p lan t............................................ 115 5.4. Sim ulated m ovem ents at various speeds...................................................121 5.5. C om parison of sim ulation to the data of M ilner and Ijaz.....................122 5.6. Sim ulated m ovem ent w ith delayed feedback, b u t no noise................ 123 6.1. H um an planar point-to-point reaching, for a variety of targets, and trajectories predicted by m inim um torque change criterion...........................134 6.2. Schematic of the apparatus used by Prablanc and M artin..................... 136 6.3. Trajectories of the righ t index finger d u rin g pointing ...........................137 6.4. P erturbed pointing trajectories of the right index finger.......................137 X 6.5. M inim um jerk based trajectories for the u n p ertu rb ed pointing paradigm of Prablanc an d M artin.............................................................................139 6.6. M inim um jerk based trajectories for the p ertu rb ed pointing p arad ig m ..........................................................................................................................139 6.7. M inim um torque change based trajectories for th e un p ertu rb ed pointing p arad ig m ........................................................................................................147 6.8. M inim um torque change based trajectories for the p erturbed pointing p arad ig m ........................................................................................................147 6.9. Velocity profile for unp erturbed and p erturbed reaches tow ard the 40° target...........................................................................................................................147 6.10. Ten superim posed w rist trajectories, predicted m inim um jerk trajectory, and predicted m inim um torque change trajectory...................... 150 7.1. A rchitecture of W erbos' Backpropagated A daptive Critic..................... 160 7.2. Three-layer netw ork used for critic and controller in BAC architecture...................................................................................................................... 164 7.3. NSL sim ulation of BAC architecture learning m inim um jerk trajectory.......................................................................................................................... 165 7.4. Results of reinforcem ent learning sim ulation.........................................166 7.5. C ontroller w ith tim e rem aining and w ith d u ratio n as in p u t............ 168 A .I. Tw o gaussian distributions of statistical values......................................173 A.2. G raphic representation of individual m eans, p opulation m ean 174 xi A b s t r a c t This w ork bridges the fields of com puter science, engineering, and n e u ro sc ie n c e to in v e s tig a te a n d m a th e m a tic a lly d e sc rib e h u m a n m ovem ent in reaching and grasping tasks. C ontrol theory an d com puter sim ulation are used to build m odels w hich reflect kinem atic d ata collected d u rin g m otor behavioral experim ents, both replicating observed behavior a n d p red ictin g fu tu re experim ental findings. The m odels are b ased on succinct m athem atical principles, yet give rise to a variety of results. A m odel of control based on continuous afferent integration, tuned w ith a m in im u m of s u p e rv is io n , u n d e r p e rfo rm a n c e c rite ria incorporating efficiency of m ovem ent, accuracy, an d duration, reproduces : findings from a variety of m otor behavior studies. The tran sp o rt of the h a n d d u rin g reach an d the p resh ap e of the fingers are coordinated via th eir tim ing, and this coordination can be explained by M axim um time sy n c h ro n iza tio n an d a Constant enclose time co n strain t. N o rm al an d p e rtu rb e d tra n sp o rt an d p resh ap e trajectories are b ased in optim ality p rin cip le s for m o v em en t efficiency (sm oothness), w ith a p e n alty for a p ertu re ad d ed to preshape, an d d ep en d on delays in inform ation flow j b etw een sen so rim o to r p ro g ram s for reach an d g rasp . G rasp in g an d p o in tin g tasks p u t different constraints on the final state of th e h an d , w hich affect the entire trajectory. M ovem ent d u ra tio n resu lts from a trade-off b etw een efficiency an d quickness. The sp eed , accuracy, an d velocity profile characteristics du rin g accurate reach are based in com bined o p tim izatio n of accuracy an d sm oothness. A single d elay ed -feed b ack control m odel can explain stereotypical reaching m ovem ents w hich w ere p re v io u sly th o u g h t to be b i-m o d a l in th e n a tu re of th e ir control xii (feedforw ard follow ed b y feedback). D iffering reaction tim es for targ et p e rtu rb a tio n s in d iffe re n t d ire c tio n s is e x p la in a b le b y a m o d el in co rp o ratin g lim b dynam ics. Finally, optim ality is learnable by a self organizing neural system . A s th e m o to r co n tro l m o d els th a t allo w us to d ra w th ese conclusions are developed, a novel fusion of optim ization an d control is in tro d u ced to the field, sim ultaneously explaining m ovem ent kinem atics an d m odel-based integration of afferent and efferent signals. It is show n h o w com plex m o v em en t p a tte rn s m ay com e a b o u t as a re s u lt of interaction of controller and plant, giving a new perspective on trajectory "p lan n in g ." 1 C h a p t e r 1 I n t r o d u c t i o n : C o m p u t a t i o n a l M o d e l in g a s a T o o l f o r U n d e r s t a n d i n g M o t o r B e h a v i o r a l P r o c e s s e s W e describe som e of the problem s in u n d e rstan d in g m otor behavior and th e benefits of trajectory m odeling usin g optim al control in ad d ressin g i these problem s. W e delineate the scope of the questions to be addressed an d the level of m odeling to be undertaken. The resulting contributions to the u n d erstan d in g of m otor behavior are enum erated._________________ 1.1 Problem Statement i The questions addressed in this thesis are: W hat are the underlying principles for m ovem ent an d for coordination of reach and grasp? H ow can th e form ation of sensory g u id ed voluntary m ovem ent be carried out in a n eurally plausible w ay? The answ er m u st address a large b o d y of k in e m a tic d a ta fro m b e h a v io ra l e x p e rim e n ts a n d e x p re ss, in a m ath em atically succinct w ay, m odels w h ich b o th re p ro d u c e observed behavior an d p red ict new results in term s of testable hypotheses. The m odels sh o u ld be useful w hen addressing the issue of n eu ral correlates an d sh o u ld suggest how autonom ous organization of their behavior m ay arise. 2 1.2 The Approach W e begin by exam ining behavioral d a ta from h u m an an d prim ate in visually g u id ed reaching an d prehension. W ith m ovem ent expressed in term s of a p p ro p ria te ly chosen variables, w e d ev elo p m athem atical m o d els th a t p ro d u c e m o v em en t trajecto ries in term s of th e sam e variables, for the sake of com parison. In d o in g so w e find tw o useful m athem atical tools: O ptim ality an d C ontrol Theory. O ptim ality allow s the terse expression of a perform ance m easure w hich indicates one of an infinite set of m ovem ents satisfying situational constraints. The sim plest exam ple is to select the m ost efficient m ovem ent betw een a p air of start an d en d points. These term inal points, or boundary conditions, are the constraints, w hile som e form al m easu re of efficiency is th e optim ality criterion. N a tu ra lly th e constraints, th e o p tim ality criterion, a n d the system u p o n w h ich th ey act m u st all be ex p ressed m ath em atically . C ontrol theory perm its the m odeling of actu atio n of a p la n t b ased on, am ong other factors, the p lan t’s current state. As behavioral d ata reveals, m ovem ents do n o t take place in a p re-p ro g ra m m e d ballistic fashion, rath e r sensory in p u t an d the state of the lim b continually influence the m ovem ent. T aking th ese tw o m ath em atical to o ls to g eth er, w e use optimal control to d escrib e w hole fam ilies of m o v em en ts, o ccu rrin g u n d e r a variety of conditions. Lastly, w e w ish to couch o u r m odels in a neurally plausible form. T here are tw o concerns. First th at the m odels o p erate on variables and p aram eters w hich w e w o u ld expect to see rep re se n te d in th e p rim ate m otor control system , an d that they d o so in a w ay consistent w ith neural interactions. Second, that the types of optim ality criteria discussed m ay be learn ed b y au tonom ous, self-organizing system s, i.e. th a t such system s m ay p ro d u ce m ovem ent trajectories optim al w ith respect to the criteria, w ith o u t having a priori know ledge of w hich trajectories are optim al. To accom plish these tw o goals, w e m odel the internal control in term s of the processing of visual an d kinesthetic signals rep resen tin g targ et an d lim b locatio n an d m ovem ent. Signal p a th w ay s are given realistic delays. 3 Trajectories are g en erated from activity of dynam ic in tern al structu res, w h ic h h a v e tu n a b le p a ra m e te rs, so th a t fam ilies of m o v em en ts are created in a realistic w ay. A utonom ous learning m ethods, expressed in term s of m odels of dynam ic neural system s, are investigated to show that o p tim al trajectories m ay be learn ed in a p lau sib le fashion. T hen to ev alu ate th e d ev elo p ed m odels, w e com pare th em to m o to r b ehavior d ata, from a v ariety of experim ental p arad ig m s, in term s of m ovem ent trajectories, tim ing, v ariab ility, a n d d ep en d en cy on p a ra m eters of the experim ents. A starting p o in t for o u r w ork is the C oordinated C ontrol Program (A rbib, 1981). It show s, at a conceptual level, the com putations necessary for visually g u id ed reach and grasp, an d the inform ational dependencies b e tw ee n them . W e e x ten d this conceptual m o d el by rep lacin g the conceptual m otor processes w ith explicit control theoretic descriptions of h o w in p u t in fo rm atio n is tran sfo rm ed in to o u tp u t. F u rth e r w e ad d tem p o ral inform ation p ath w ay s to the spatial in fo rm atio n ones show n, and w e ad d realistic delays to the inform ation pathw ays. To take a control theoretic approach in discussing guidance of lim b m ovem ent, w e m ay th in k of the arm as a m echanical system w hich m ay b e d escrib ed by its state trajectory x(t) (w here th e w o rd "trajectory" em phasizes th at it is a tim e-varying quantity), and w hich responds to an in p u t from the n erv o u s system , a n d /o r the external en v iro n m en t, u(t). In g en eral, its resp o n se can be d escribed by a system of eq u atio n s abbreviated f t' - « - < » ' * • » , a , , i.e. its change in state depends on its current state, its in p u t, an d the tim e t. In a kinem atic discussion, x(t) w ould typically be a vector of joint positions a n d velocities, w ith u(t) b e in g a v ecto r of acceleratio n s, o r h ig h er kinem atic derivatives (as in troduced in C hapter 2). If the approach w ere based on a robotics m odel, the in p u t m ight be a vector of applied torques, an d the function f() w o u ld take into account such dynam ic quantities as 4 recognition criteria visual input visual input visual input activation of visual search target location orientation stze visu a l, kinesthetic, and tactile input visual and kinesthetic input activation of reaching Slow Phase Movement Orientation Recognition V isual Location Size Recognition Actual Grasp Hand Reaching Ballistic Movement G rasp in g Hand Preshape Hand Rotation Fig. 1.1. C oordinated control pro g ram (CCP) for reach an d grasp. R edraw n from A rbib et al. (1985). m om enta and inertias in determ ining the state trajectory (as in C hapter 6). In com bining a robotic m odel w ith a m uscle m odel, the in p u t m ig h t be a vector of tim e varying m uscle equilibrium lengths, and f() w o u ld not only tak e in to account th e lim b's d y nam ics, b u t also th a t of th e m uscles (D ornay et al., 1992). Finally, as nervous activation m odels are taken into account, u(t) m ight be expressed as a vector of efferent n erv o u s signals (K atayam a and K aw ato, 1991). In considering different m odels of m ovem ent, given a m odel of the p la n t to be controlled (i.e. Eqn 1.1), choosing a m ovem ent is a m atter of selecting an in p u t u(t). If u is p red eterm in ed as a function of tim e alone, th e n it is a p rep ro g ram m ed , feed fo rw ard control. If it is g iv en as a function of th e current state, u — g(x(t)) it is considered to be a feedback control. It is helpful at this p oint to discuss different control strategies from th e view point of neural m otor control, so th a t th is thesis has a clear vocabulary. The com m on d e n o m in ato r in 5 c o n tro l is th e a p p lic a tio n of m e a su re m e n ts of th e sta te of th e en v iro n m en t an d the controlled p la n t to d eterm in e fu tu re settin g s of p lan t control param eters. W here feedback control involves basing control settings on th e state of the variable being controlled, feedforw ard control in v o lv es b a sin g c o n tro l se ttin g s o n v a ria b le s w h ich are n o t b e in g controlled. A n analogy w o u ld be a hom e tem p eratu re controller w hich bases its setting on the am bient tem p eratu re outside the house, a quantity n o t u n d e r control by th e system . B allistic m o v em en t is a form of fee d fo rw ard control in w hich the control over th e w hole m o v em en t is b ased o n a sam ple of environm ental variables taken at the beginning of th e m ovem ent. This is contrasted w ith continuous feedforw ard control, w h ere th e control signal is ad ju sted b a sed on co n tin u o u s sen sin g of I rele v an t p aram eters. A n exam ple of th e latter is th e vestibulo-ocular reflex, in w hich eye m ovem ent com pensates for head rotation in o rd er to p rev en t retinal slip of a visual im age. The eye m ovem ent is not b ased on cu rren t retinal slip (a feedback strategy); rath er it is based on vestibular in fo rm atio n reg a rd in g h ead ro tatio n , a q u a n tity n o t controlled b y the j vestibulo-ocular reflex. This is an im p o rtan t distinction. J Feedback control m ay be div id ed into continuous and interm ittent. In co n tin u o u s feedback, th e v ariab le being controlled is co n tin u o u sly sam pled, an d th e inform ation is u se d to ad ju st th e sam e variable. The p osition servo m echanism , in b o th robotic m otors an d in th e biological j stretch reflex, is an exam ple of co ntinuous negative feedback. P assive j m echanical system s can also m an ifest feedback p ro p erties. A m ass hanging from a spring constitutes a position feedback system . If th e m ass is tugged d ow nw ard, for instance, a restoring force im m ediately pulls it up h ard er, stabilizing its position. If the sp rin g is linear, i.e., if it follow s | H ooke’s law [F = -k (x -x 0 )], it constitutes a linear feedback system . In I n atu ral system s, such as the m usculoskeletal system , it is com m on to find m o re su b tle re la tio n sh ip s b etw een th e in p u t an d o u tp u t v ariab les, resulting in nonlinear feedback. L astly, in term itten t feedback is used w h en the control signal over som e p eriod of tim e is based on a sam ple of the controlled variable taken 6 at th e beginning of (or p rio r to) th at period. This w o u ld be the case in ro o m te m p e ra tu re control if th e th erm o stat took a "snapshot" of the room 's state, varied the tem p eratu re for som e tim e w ith o u t m onitoring th e effects, an d th en periodically rep eated th e process. This strategy is useful w hen high feedback gains coupled w ith feedback tim e delays m ake c o n tin u o u s feedback u n stab le. T his style of co n tro l is o ften m o re so p h isticated th an continuous feedback, req u irin g an in tern al m odel of th e b e h av io r of th e system being controlled. A n im p o rta n t p o in t to em phasize is th at interm ittent feedback control is n o t to be confused w ith feedforw ard control. In interm ittent feedback, although control is ballistic betw een sam ples, the overall process is based on sam pling the controlled param eter and so is distinct from feedforw ard control. In th e case of v o lu n tary reaching, w e observe th a t if th e targ et m oves, the subject's arm trajectory is u p d ated . This is a feed fo rw ard process, since the arm 's trajectory can be controlled w ith vision of target position even if the h and is not visible to the subject. H ow ever, if vision of th e arm is o b scu red d u rin g m ovem ent, or if th e subject's arm is deafferented, m ovem ent is less accurate, w hich im plies th at som e type of feedback control is presen t. F u rth er, in v estig atio n s of v a ria b ility in reach in g (C hapter 5) give evidence th a t control of reach is a feedback process. R etu rn in g n o w to th e subject of o p tim izatio n , as stated earlier o p tim iza tio n analyses are b ased o n m odels of system d ynam ics an d control. T here are several different approaches to optim ization u sed in this thesis. In analyzing determ inistic, continuous tim e system s, w e use the m inim um principle (Bryson and H o, 1975), a d eriv ativ e of variational calculus. In an aly zin g stochastic system s, w e p refer a d iscrete tim e form ulation of the system and control for w hich a dynamic programming approach to optim ization is useful (Bryson and H o, 1975; Bertsekas, 1976b). i These tw o approaches w ill be introduced in C hapters 2 and 5, respectively. W e n o w su m m a riz e th e a p p ro a c h th is d isse rta tio n tak es in developing a m odel of m otor behavior. It begins w ith the m inim um -jerk m odel for reaching trajectories, w hich has been show n to be accurate for a 7 lim ited scope of behavioral situations. W e couch the op tim ality criterion in a feedback controller so th at target perturbations m ay be applied, and w e th u s rep ro d u ce kinem atic d ata from target p e rtu rb a tio n p arad ig m s. W e th en extend both the com plexity and capability of th e m odel b y ad d in g a d u ratio n penalty, an accuracy penalty, arm dynam ics, an d self-adaptation in chapters 3, 5, 6, an d 7, respectively. W e a d o p t a sim ilar optim ization a p p ro a ch for m o d elin g h a n d p resh ap e in c h ap ter 4. T h ro u g h o u t the dissertation w e take th e approach of extending the basic m odel, in o rder to encom pass additional bodies of behavioral data. 1.3 Scope of The Work To give perspective on the lim its of the problem s to be investigated an d the m odels to be developed, w e m ay consider four levels of m odeling detail. There is the schema level, reflected in the m odel of Fig. 1.1. A t this J level sen so ry a n d m o to r p ro g ram s a n d th eir in p u ts a n d o u tp u ts are j considered in a conceptual w ay. It is possible to in co rp o rate com plex j b eh av io rs at this level of detail. T he functional level a d d s d etail to schem a level m o d els by rep lacin g h ig h level co n cep ts w ith explicit m athem atical form ulae. M ovem ent kinem atics an d concepts of feedback control m ay be in tro d u ced here. A t th e connectionist level, n eu ral-lik e stru ctu res are in tro d u ced to perform the com putations o f th e functional level. H ere, issu es m ay be in v estig ate d in d istrib u te d in fo rm a tio n re p re se n ta tio n a n d c o m p u tatio n , a n d in se lf-o rg a n iza tio n . A t th e physiological level, neural m odels adhere to w h at is kn o w n of anatom ical s tru c tu re a n d real n e u ra l in te rac tio n s. B iom echanics, fo r ex am p le m uscular im pedance control, are m odeled in m otor processes. The field of motor behavior u su ally relates to th e schem a an d fu n ctio n al levels. C om plex behavior, analyzed in m otor behavioral experim ents, can hint to u n d e rly in g c o n tro l stru c tu re , b u t n e u ra l stru c tu re is n o t rev ealed . A n ato m ical an d p h y sio lo g ical stu d ies (e.g. sin g le cell rec o rd in g in b e h a v in g subjects) re v e a l m u ch a b o u t n e u ra l e n c o d in g , b u t th e m ovem ents are typically sim ple; often isom etric studies are done. In this 8 thesis w e w ill rem ain at the first three levels, enriching schem a m odels w ith functionality an d self-organization, an d ho p e to give an orientation for studies at the physiological level. From a robotics p ersp ectiv e, th ere are m an y p o p u la r p ro b lem s a d d re sse d in m o d elin g sen so rim o to r control in th e n e rv o u s system . T hese in clu d e sen so rim o to r (reference fram e) tran sfo rm atio n s, inverse dynam ics, solving m otor red u n d an cy , an d trajectory optim ization. O ur w o rk w ill address the fourth problem . H ow ever, there is overlap betw een th e se p ro b lem s a n d , as w ill b e seen in C h a p te r 6, o u r w o rk has i im plications for dynam ics in control as well. It is help fu l to p o in t o u t som e of th e related areas of research o u tsid e the scope of this thesis. This thesis w ill not address biom echanics o r m uscle m odels. T he accurate m o d elin g of biom echanics is in its infancy for as sim ple a situ atio n as p o stu re, let alone com plex, m u lti m ovem ent behavior. W e do not ad d ress neuroanatom y: Q uestions such as "H ow does visual inform ation reach cortical m o to r centers for reach a n d grasp?" are still unsettled. W e h ope th at the w ork in this thesis gives b eh av io ral o rien tatio n to those m o deling functional n eu ro an ato m y . In d iscu ssin g feedback of the state of th e lim b d u rin g control, w e d o not d ifferen tiate b etw een visual an d kinesthetic channels, alth o u g h th ere is ev id en ce th a t b o th are active, a n d th a t vision enhances accuracy. In s tu d y in g g rasp , w e assu m e a sim ple, tw o surface p in ch , d iscu ssin g a p e rtu re form ation an d enclosure. W e acknow ledge th e large b o d y of ; know ledge in object recognition an d grasp selection, b u t it is beyond w h at w e address here. 1.4 Summary of Results and Contributions W e find th a t a m o d el of control b ased on c o n tin u o u s afferen t in teg ratio n , tu n ed w ith a m inim um of supervision, u n d e r perform ance criteria incorporating efficiency of m ovem ent, accuracy, an d d u ratio n , can rep ro d u c e findings from a v ariety of m otor behavior studies: W e find 9 th a t tra n sp o rt a n d p resh ap e are co o rd in ated via th eir tim ing, a n d this coordination can be explained by "M axim um tim e" synchronization an d a "C onstant enclose tim e" constraint. N orm al an d p e rtu rb e d tra n sp o rt and p re sh a p e trajectories are b ased in o p tim ality p rin cip les fo r m o v em en t efficiency (sm oothness) w ith a p en alty for a p ertu re a d d e d to p resh ap e. D elays in inform ation flow betw een sensorim otor program s for reach and g rasp affect tim ing an d kinem atics of these actions. In reaching to grasp an d reaching to point, the different tasks p u t different constraints on the final state of the hand. The differing constraints at the final tim e affect the entire trajectory. This is m odeled in an explicit, form al w ay. D u ratio n in j m o v em en t is show n to be a trade-off betw een efficiency a n d quickness. The sp e e d /a c c u ra c y trade-off an d velocity profile characteristics d u rin g accu rate reach are b ased on com bined o p tim iza tio n o f accuracy an d sm oothness. A sin g le d e la y ed -fe ed b a ck c o n tro l m o d el can e x p la in stereotypical reaching m ovem ents w hich w ere prev io u sly th o u g h t to be b i-m o d al in th e n a tu re of th eir co n tro l -- feed fo rw a rd fo llo w ed by feedback. D iffering reaction tim es in reach to targ e t p e rtu rb a tio n s in different directions is explainable by lim b dynam ics. Finally, optim ality is learnable by a self-organizing system . As w e develop the m otor control m odels th a t allow us to d ra w th e se c o n clu sio n s, w e in tro d u c e to th e field a n o v e l fu sio n o f o p tim izatio n an d control, w here w e sim u ltan eo u sly explain m ovem ent kinem atics an d m odel-based in teg ratio n o f afferent an d efferent signals. W e show h o w com plex m ovem ent p attern s m ay com e about as a resu lt of interaction of controller and plant, giving a new perspective on trajectory "p lan n in g ." 1.5 Organization of the Dissertation In c h ap ter 2, w e rev iew b eh av io ral p h e n o m e n a re g a rd in g th e k in em atics of reach in g , th en p ro p o se a m o d el of reach trajec to ry form ation w hich is based on continuous in teg ratio n of cu rren t state and 10 efferent control inform ation. It is show n th at th e m odel predicts w ell the on-line corrections w hich occur d u rin g targ et location perturbation. The m o d el is b a se d on an o p tim a lity p rin c ip le for sm o o th n ess, w h ich determ ines a single, efficient trajectory from the m any w hich satisfy a set of constraints. This lays th e basis for m o d elin g trajectory form ation, control, an d optim ization of reach an d g rasp in sub seq u en t chapters. In chapter 3, w e extend a m inim um jerk m odel of reach trajectory planning to include a penalty for duration, an d show th at it can be used to m odel a n d p re d ic t th e d u ra tio n s of u n p e rtu rb e d a n d p e rtu rb e d v o lu n ta ry re a c h in g m o v e m e n ts as a fu n c tio n o f m o v e m e n t d ista n c e a n d perturbation. C hapter 4 addresses the m odeling of tw o-dim ensional reach a n d g rasp , u n d e r n o rm al an d p e rtu rb e d co n d itio n s. T ran sp o rt an d p resh ap e are coordinated via their tim ing, an d this coordination can be explained by "M axim um tim e" synchronization a n d a "C onstant enclose \ tim e" c o n stra in t. N o rm a l a n d p e rtu rb e d tra n s p o rt a n d p re s h a p e trajectories are b ased in o p tim ality principles for m o v em en t efficiency (sm oothness) w ith a p en alty for a p ertu re a d d ed to preshape. D elays in in fo rm atio n flow b etw een sensorim otor p ro g ram s for reach an d g rasp affect tim ing an d kinem atics of these actions. In reaching to g rasp an d reaching to point, th e different tasks p u t different constraints on the final state of th e h a n d , a n d th is is reflected in th e en tire reach trajectory. C h ap ter 5 integrates a m odel of variability in m ovem ent w ith a m odel of co n tin u o u s control u n d e r d elay ed feedback to re p ro d u c e fin d in g s on accuracy an d trajectory in m ovem ents of constrained accuracy. The m odel com bines optim ization criteria of sm oothness an d accuracy in a stochastic m odel. A n explicit m odel of variability d u rin g the trajectory allow s novel in sig h t into th e pro p erties of reach trajectories. In chapter 6, w e review p la n t m odels an d cost form ulations for trajectory m odeling, highlighting th e a d v a n ta g e s of th e d ifferen t a p p ro ach es. To m o d el a p e rtu rb e d p o in tin g task w hich spans both proxim al and d istal portions of reachable sp ace, a d y n a m ic a rm m o d el a n d cost fo rm u la tio n a re u tiliz e d . C om parison is m ad e to the m inim um jerk m odel. A novel analysis of p ertu rb atio n reaction tim e is m ade, based on m odeling results. In chapter 7, w e review m odels of trajectory learning in artificial n eu ral netw orks 11 a n d stress th e p ro b lem of discovering trajectories ra th e r th a n copying them th ro u g h su p erv ised training. For the task of autom atic calculation of o p tim al trajectories w e converge on rein fo rcem en t lea rn in g as th e ap p ro p riate paradigm , and show an exam ple of optim izing sm oothness in a reaching m ovem ent. This is considered an im p o rtan t link in justifying th e optim ality m odels of this dissertation as being n eu rally realizable. In chapter 8, w e discuss further w ork w hich m ay be fo u n d ed on the results of this dissertation, especially m odeling efforts at the neural level. 12 C h a p t e r 2 M o d e l i n g T r a j e c t o r ie s o f R e a c h u n d e r N o r m a l C o n d i t i o n s a n d T a r g e t P e r t u r b a t io n s W e review behavioral phen o m en a reg ard in g the kinem atics of reaching, th en p ro p o se a m odel of reach trajectory form ation w hich is b ased on co ntin uous integration of c u rren t state an d efferent control inform ation. It is sh o w n th a t th e m o d el predicts w ell th e on-line corrections w hich i occur d u rin g targ e t location p ertu rb atio n . T he m o d el is b ased o n an ; o p tim ality principle for sm oothness, w hich determ ines a single, efficient trajectory from the m any w hich satisfy a set of constraints. T his lays the basis for m odeling trajectory form ation, control, an d optim ization of reach an d grasp in subsequent chapters.__________________________________________ t 2.1 Behavioral Phenomena During Reach W e b e g in b y d isc u ssin g e x p e rim e n ta l o b se rv a tio n s of th e kin em atics of reach w h ich w ill later b e a d d re sse d b y o u r trajecto ry g en eratio n m odel. It is g en erally accepted th a t reach in g m o v em en ts in v o lv e central p ro g ram m in g of m o to r p a tte rn s w ith , in m an y cases, som e form of feedback m odulation. For exam ple, w here the equilibrium p o in t h y p o th esis (Feldm an 1986) p red ic ts th a t a t the b e g in n in g of a v o lu n ta ry m ovem ent a joint's eq u ilib riu m positio n shifts at once to the d esired final position for th at joint w ith the lim b follow ing according to 13 th e m echanical p ro p erties of th e lim b an d its m u scu latu re, Bizzi et al. (1984) show ed th a t instead of a sim ple step to a new value th e v irtu al eq u ilib riu m p o sitio n follow s a tim e v a ry in g trajectory (this id e a being d u b b ed th e equilibrium trajectory hypothesis by Flash, 1987). In exploring th e n a tu re of su ch m o to r p ro g ra m s, H o g an (1984) m o d e le d elbow ro tatio n s in p o in tin g m ovem ents of m onkeys to w ard a v isu ally located targ et. T he m o v em en ts w ere in th e h o rizo n tal p lan e , a b o u t 60° in m ag n itu d e, an d of in term ed iate sp eed (ab o u t 700 m s in d u ratio n ). H e p ro p o sed th e m inim um jerk hypothesis to describe the kinem atics of such m o v e m e n ts. By a p p ly in g th e c a lc u lu s of v a ria tio n s, u sin g th e o p tim izatio n criterio n th a t th e m ean sq u a red jerk (third d e riv a tiv e of position) b e m in im ized d u rin g th e m ovem ent, he d eriv e d a p o sitio n function of tim e given by a fifth o rd er polynom ial, uniquely specified by th e initial an d final values of position, velocity, an d acceleration. If the ta rg e t has z ero velocity an d acceleration at th e sta rt a n d e n d of the m ovem ent, the velocity profile is sym m etric and bell-shaped, m u ch like th e low -accuracy pointing m ovem ents perform ed by the subjects. F la sh a n d H o g a n (1985) e x a m in e d su b je c ts p e rfo rm in g u n co n strain ed arm m ovem ents in the h o rizo n tal plane, h o ld in g a lig h t w eight m an ip u lan d u m (pantograph). The room w as darkened, rem oving visual feedback of arm location. The light em itting diode (LED) target w as illu m in ated an d w as 20 - 40 cm distant. N o accuracy req u irem en t w as stated. A m ong other experim ents, they h ad subjects m ove betw een points in th e p lan e w ith o u t obstacles. It w as fo u n d th a t th e h a n d 's path w as approxim ately a straight line (as predicted by the m inim um jerk criterion), re g a rd le ss of th e sta rt an d e n d p o in ts of th e m ovem ent. A lso, th e tra je c to r y of th e h a n d w as p re d ic te d w ell b y th e m in im u m jerk hy pothesis, yielding characteristic sym m etric bell-shaped sp eed profiles. T hus th e principle th at explains elbow rotations also explains w hole arm m ovem ents. In Flash (1987), this kinem atic m odel w as m arried w ith the e q u ilib riu m trajecto ry h y p o th esis to su g g e st th a t it w as th e h a n d 's e q u ilib riu m p o in t (i.e., th e p o in t at w hich th e h a n d w o u ld ev en tu ally com e to rest, given th at the cu rren t n eu ral d rive to the m usculoskeletal 14 p la n t rem a in e d constant) w hich follow ed a m in im u m jerk trajectory. Such a m odel predicts th a t th e arm 's kinem atic trajectory w ill slightly d ev iate from th e m inim um jerk trajectory of its eq u ilib riu m p o in t in a m an n e r d ictate d by th e arm 's dynam ic characteristics, such dev iatio n s being observed in arm m ovem ent experim ents. A tk eso n a n d H o lle rb a c h (1985) e x a m in e d th e k in em atics of u n restrain ed vertical arm m ovem ents betw een p o in t targets. The subjects pointed their finger to a lit target w ith o u t touching it an d h ad no im posed accuracy constraints. T he m ovem ents w ere c o n d u cted w ith o u t visual feedback of arm position (i.e., the room w as darkened). Trials w ere also ru n w ith th e subjects holding 2, 3, or 4 lb loads. M ovem ent tim es w ere ] ro u g h ly 400 - 1200 ms. In addition to being perform ed at different speeds an d w ith different loads, the m ovem ents w ere d one in both the u p w a rd an d d o w n w ard directions. The finding w as th at the tim e varying speed of h a n d m o v e m e n t, w h e n n o rm a liz e d fo r a m p litu d e a n d d u ra tio n , consistently follow ed the sam e tem poral profile, regardless of the load and d ire c tio n v a ria b le s. T his su g g e sts th a t a c h a ra c te ristic k in em atic m ovem ent profile is generated by the CNS at a high level, w hich is then im plem ented in term s of the necessary forces and duration. (We note th at , although th e speed profile h ad a single characteristic shape, the shap e of th e h an d path w as som etim es straig h t an d som etim es curved. This is in contrast to the observation of Flash and H ogan (1985) th at point-to-point arm m ovem ents in th e h o rizo n tal p lan e consistently h av e stra ig h t line paths.) The above ex p erim en ts p ro v id e stro n g evidence th a t a single kinem atic p a tte rn for v o lu n tary lim b m ovem ents exists for a v ariety of conditions. A re th e c h a ra c te ristic m o v e m e n t p ro file s o b se rv e d a b o v e p re d e te rm in e d a n d th e n e x ec u te d , o r are th e y d e v e lo p e d as th e m o v em en t unfolds an d subject to m odification by sensory in p u t d u rin g the reaching m ovem ent? E vidence for the latter o ption com es from target p e rtu rb a tio n experim ents w hich show th at reaching m ovem ents can be m o d ified "on th e fly." This im plies a sy stem w h ich ree v alu a te s its p ro g ress as m o v em en t p ro ceed s, b ased o n in co m in g , albeit d elay ed , 15 sen so ry in fo rm atio n . P elisson et al. (1986) p e rtu rb e d th e ta rg e t of a p o in tin g task at m ovem ent onset. The initial target w as 30, 40, or 50 cm from the h a n d ’s starting position, an d the p e rtu rb e d targ et p o sitio n w as 10% fu rth er aw ay. V ision of the subjects' h an d w as p rev en ted , an d they c o n sisten tly u n d e rs h o t th e targ et. H o w ev er, w h e n th e ta rg e t w as p e rtu r b e d , th e m o v e m e n t d is ta n c e w as in c re a s e d a n a m o u n t corresponding to th e p erturb ation. Also, th e subject d id n o t stop before m o v in g on to th e n ew target. R ather a sm ooth tran sitio n w as m ad e (w ith o u t se co n d a ry acceleratio n s) in m id flig h t to a n e w trajec to ry term in atin g fu rth er aw ay. The subject w as often u n aw are th a t th e target h a d m oved. Since the corrections w ere perform ed w ith o u t vision of the arm , these results also im ply th a t w e use kinesthetic inform ation a n d /o r an internal m odel of the arm to u p d a te the m otor e rro r an d the reaching p ro g ra m in real-tim e. G eo rg o p o u lo s et al. (1981) fo u n d a p ro m p t transition to a new trajectory after targ et p erturbation w h en they observed m onkeys train ed to m ake p lan a r arm m ovem ents. The m onkeys m ade reaching m ovem ents to single targets and to targets w hich sw itched either to a nearby location or to one o n th e opposite side of the startin g point. The tim e of target perturbation w as varied from 50 to 400 m s after the first target w as presented. They found a reaction tim e of 260 m s b o th for the initiation of m ovem ent to th e first target a n d for th e m odification of the trajectory to reach a su d d en ly appearing new target. T hus, as in Pelisson's ob serv atio n s, p rim ary m o v em en t com pletion w as n o t necessary before im plem enting a novel trajectory. P aulignan et al. (1990) p ertu rb ed the location of a vertically oriented d o w e l u p o n in itia tio n of a reach in g m o v em en t to w a rd th e dow el. R ecording the kinem atics of reaching a n d g rasp in g u n d e r n o rm al and p e rtu rb e d conditions, they n o ted trajectory adjustm ent w ith in 100 m s (on th e average) after targ et location p ertu rb a tio n , w ith o u t com prom ise of accuracy (that is, there w ere no trials w here the subject failed to g rasp the dow el). The m ovem ents w hich w hen u n p e rtu rb e d lasted ab o u t 500 m s, laste d a b o u t 100 m s longer w hen th e targ et w as shifted at m o v em en t onset, indicating on-line incorporation of novel sensory inform ation. 16 H aving surveyed properties of visually g u id ed reach u n d e r various conditions, inclu d in g pertu rb atio n , w e n o w tu rn to th e review of existing m odels of trajectory form ation, after w hich w e w ill develop a n ew m odel w hich addresses the phenom ena of interest. 2.2 Modeling the Control of Reach V arious m o d els of cen trally p ro g ra m m e d rea ch in g h a v e b een developed to explain m otor behavioral data. The w o rk of H ogan, Flash, A tkeson, an d H ollerbach, discussed above, suggests th a t a m in im u m jerk trajectory tem p late m ight be scaled for d u ra tio n an d a m p litu d e for a d esired m ovem ent. Bullock an d G rossberg's (1988) vector in teg ratio n to e n d p o in t (VITE) m o d el of trajectory g en eratio n d irectly ad d re sses the in te rn a l c o m p u tatio n of the CNS d u rin g trajectory g en eratio n . T heir m odel is b ased on a continuous com parison betw een targ et location an d ! h a n d location, p ro v id ed by efferent copy of the m otor com m and. A "go" signal provides the ap p ro p riate tem poral scaling as w ell as a trigger signal to in itiate m ovem ent. The rep e ate d co m p ariso n d u rin g m o v em en t is re m in isc e n t of th e ite ra tiv e c o rre ctio n m o d el of rea ch m o v e m e n t g e n eratio n d iscu ssed first by C raik (1947), later by S tark (1968), and C rossm an an d G oodeve (1983). This w ill be d iscussed in d e p th in the sp e ed / accuracy d iscu ssio n of C h a p te r 5. T hese m o d els co n sid er m ovem ent p arad ig m s in w hich a single targ et is presented. W e w ish to co n sid er th e case in w hich a ta rg e t's location is sw itch ed d u rin g the m ovem ent. Flash an d H enis (1992) reco rd ed reaching kinem atics for a low accuracy poin tin g task, u n d er u n p e rtu rb e d an d p e rtu rb e d conditions. T hey fo u n d s u p p o rt for a superposition m o d el, b a se d o n th e h a n d - m in im u m -jerk m odel. This m o d el fits p e rtu rb e d trajecto ries w ith a trajecto ry th a t is th e sum of tw o m in im u m jerk functions. T he first function corresponds to th e u n p e rtu rb e d m o v em en t to th e initial target, j w hile the second is a m inim um jerk trajectory from th e initial targ et to j th e p e rtu rb e d target location. G iven the correct onset tim e for the second m o v em en t a n d correct d u ratio n s for b o th m o v em en ts, th e re su lt is a 17 sm ooth, co n tin u o u s m ovem ent to w a rd th e n ew target. In contrast, an ab o rt-rep lan schem e, also b ased o n th e m inim um -jerk m odel, d id n o t fit th e d a ta q u ite as w ell w h en com pared u sin g a least-squares-fit m ethod. H ow ever, in the follow ing discussion w e develop a m inim um -jerk based, on-line control m odel of the control of reach w hich is sim ilar to the abort- rep lan m odel, w hich reproduces several resu lts from low -accuracy, point- to -p o in t reaching, b u t w hich also lays th e basis for m odeling, in later chapters, th e d u ratio n of m ovem ent, the coordination of reach an d grasp, a n d th e kinem atics of reach in g u n d e r h ig h accuracy co n strain ts. W e proceed w ith the m odel developm ent in th e follow ing discussion. A trajectory for a physical system consists of a tim e series of states, su ch as th e p o sitio n , v elo city , a n d acc elera tio n o f th e h a n d , th e m o m e n tu m a n d configuration of each joint in th e arm , or som e o th er a p p ro p ria te description of the cu rren t condition of th e system of interest. G iven th at the system is to be driven from som e initial state to som e final sta te , th e re a re in g e n e ra l an in fin ite n u m b e r of tra je c to rie s of in te rm e d ia te states. O ften an o p tim iz a tio n c riterio n m ay select one particu lar trajectory, w hich has the least cost. For th e generation of reach trajec to rie s, v a rio u s o p tim iz a tio n c riteria m ay b e selected , su ch as m in im u m tim e or m in im u m m ean sq u a red jerk (as d iscu ssed above). U no et al. (1989) suggest th at arm dynam ics are taken into account d u rin g trajectory p lan n in g , rath e r th an sim ply h a n d kinem atics, an d su g g est a m inim um m ean -sq u ared joint-torque-change criterion (w hich w e discuss fu rth e r in C h ap ter 6). In this discussion w e a d o p t th e m in im u m jerk criterio n in tro d u c e d by H o g an (1984). T his h a n d kinem atics criterion predicts, regardless of the start and end points, a straight line p ath for the h a n d an d a quintic polynom ial position (displacem ent along the straig h t path) function of tim e. The coefficients of this polynom ial are determ ined by th e b o u n d a ry conditions - the startin g an d en d in g position, velocity, an d acceleration of the han d. W e w ill n o w red eriv e this resu lt, both for th e p u rp o se s of th e p resen t discussion an d to p rim e the d e riv atio n s in subsequent chapters. 18 W e b e g in b y d e fin in g a sta te v ecto r for a o n e d im e n sio n al m ovem ent: x(t) is a 3 X 1 vector of position, velocity, an d acceleration. T aking the d riving input, u(t) to be th e jerk, w e have x = A x + B u (2.1) w h e re 0 1 0 " o ' A = 0 0 1, B = 0 0 0 0 1 W e also have initial and final b o u n d ary conditions o n th e m ovem ent, x(t0)=x°, x(tf)=xf, w here th e m ovem ent begins at tim e tQ an d en d s at tim e tf. The object is to m inim ize a q u an tity w hich d ep en d s o n th e state a n d / o r the in p u t th ro u g h o u t th e m ovem ent. In this case the cost function pen alizes the squared input, o r jerk, integrated along the trajectory. ft = t f 2 J= J u2 dt t=t, (2.2) W e solve for th e o p tim u m trajectory by ap p ly in g th e m inim um principle (Bryson an d H o, 1975). W e first define a H am ilto n ian , w hich is to be m inim ized by th e choice of in p u t u(t), H L + pT x w here L is the in teg ran d of the cost functional, x is th e state an d p is the costate, to be defined. T hus the H am iltonian is H= u2 + p^( Ax+ B u) (2.3) N ext w e define the costate dynam ics equation, For the H am iltonian described by (2.3), aT P = — A p (2.4) 19 B ecause th e p ro b lem fo rm u latio n consists of a lin ear system w ith a q u a d ra tic cost fu n ctio n al, w e can solve th e m in im izatio n p ro b lem by finding an extrem um of the H am iltonian by differentiating w ith respect to th e control, and setting the derivative equal to zero. ^ ■ = 0 = 2 ^ +pTB, ir I PTB 2 V (2.5) N o w (2.1), (2.4), an d (2.5) define a system of differential equations w hich m ay be solved w ith suitab le b o u n d a ry conditions to yield th e optim al trajectory. To solve th e problem of finding th e solutions for x(t) an d p(t) w e p lu g u*(t) into (2.1). (N ote th at u*(t)=u*(t)T since u*(t) is scalar.) x= Ax + B 1 T x= A x- ^BB p (2.6) C om bining (2.6) w ith (2.3) yields A 0 I T - «BB 2 -A x L P J P lugging in A an d B, w e get the follow ing differential equation. 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 - 0 0 1 (2.7) The next step is to find the state transition m atrix, defined by x(t) ,p(t) ^ o ) LP(*o) For convenience, w e define fo u r "sub-m atrices" of th e sta te tran sitio n m atrix , (2.8) x(t)' ® n (*->o) K'o) _p(t) I I 4 - * • N © .PCto) 20 W e begin w ith the fourth row of this system of differential equations. p =0 1 Integrating w e obtain P j O q ) The fifth ro w is P (t) = - p ( t ) = - p ( t D) 2 1 1 Integrating, w e obtain, p 2( t ) = - P i ( t o)(t - t o) + P 2( * o) Sim ilarly, the sixth ro w yields 2 (2.9) (2.10) p 3(t) = ~ p l ( t 0) ( t - t o) - P 2( t 0) ( t - t o) + P 3( t o) (2.11) Eqns. (2.9), (2.10) and (2.11) give us the "bottom half" of th e state transition m atrix 1 0 0 * * * 21^t '* °) 'o 0 o' < J > 22(t , t °) 0 0 0 .0 0 0. - O - t o ) 1 2 O - 'o ) L2 (2.12) Because th e m atrices in (2.12) are d ep en d en t on their p aram eters t an d tD in term s of th eir difference, w e can define them as th e follow ing single p aram eter functions. 1 0 o' -At 1 0 < t> 2 i(At) = 0 0 o' 0 0 0 < * > 22(At) = .0 0 0. 1 2 2 A * At w h ere At = t - tQ. T aking a sim ilar ap p ro ach , w e in te g ra te th e th ird , second, a n d first row s of (2.7) to ob tain th e rest of th e state tran sitio n m atrix. 21 O n (At) = 1 At ■|At 0 1 At 0 0 1 At4 3 in 1 20 At 5 At3 - 20 At2 60At 20At2 - 60At 120 (2.13) N o te th a t this tran sitio n m atrix gives the solution trajectory, if th e initial conditions, x(tG) an d p(tD) are know n. Since w e are given the initial and final states, x(tQ) and x(tf), w e can find p(tQ), as a function of tf: W e assign t= tf in (2.8), 1 X r * 1 ® l l ( t f, t o ) ° 12(* f'* ° ) 'K 4©)' I ° 2 l ( t f, t ° ) 4>2 2 ( t f ' t ° ) . .P O o ). o r X ' d < & ( d ) l r ° 1 2 (D )' K to ) . H ' d l ( d ) . 21 "> 22(D ). . pO o) (2.14) w here D = tf - tQ. The first row of this m atrix equation is ^ * f ) = ° n ^ x( 1 o) + ° i 2( D) K 1 o) Solving for p (tG) w h ere, (2.15) *"i2(At) = " f 12 At 240 - 120At 20At2 120At 20At - 56 At2 8 At3 8Atv At (2.16) T his fo rm u la for p (tG) p ro v id es the in fo rm atio n n eed ed for fin d in g the solution trajectory. It is sim ply e n te red in the rig h t h a n d side of (2.8) along w ith x(tQ). C onsider the case of static b o u n d ary conditions: K t o) = "o' 1 X i 0 ft I I 0 .0. 1 O i 22 w h ere X* is the desired position at tf. P lugging th e b o u n d a ry conditions into (2.15), w e have X K 'o ) = * i 2 (D) Plugging in (2.16), FOo) X 0 0 -6 D 5 240 - 120D 20D2 x f' f 1 2 " 2 3 -120X 6 D 120D _20D2 - 56D 3 - 8 D 8 D D4 _ 1 o o . . . . . . .- .J D 5 D 2 From (2.8) w e have, x(t) = O n (A t)x(t G ) + < D 12(At)p( t G ) o r .f -120X D" 12 6D D2 Plugging in the transition m atrix from (2,13), x(t) AtX 2D" < 1... - 5 At3 20At2 1 2 ' 3 2 6 D 5 At - 2 0 At 60 At .20 At2 - 60 At 120 .D This is th e trajectory for position, velocity, a n d acceleration, w ith static b o u n d a ry conditions. The first com ponent gives th e position trajectory alone: x 1(t) = X 1 M w h ich is th e fifth o rd e r polynom ial fu nction of tim e g iv en b y H ogan (1984). B ut w h a t if th e b o u n d a ry co n d itio n s a re n 't static? T his is im p o rta n t for th e situ atio n w here th e ta rg e t location is u n ex p ected ly p e rtu rb e d . If th e h a n d is in m o tio n w h en th is occurs, th e o p tim al trajectory to the n ew target is constrained by a non-static initial b o u n d ary 23 condition. Instead of solving th e entire state trajectory from th e non-static in itial c o n d itio n to th e ta rg e t state, let u s sim ply d e riv e th e o p tim al control a t an y initial state x ° , g iv en a m ovem ent d u ra tio n D, a n d final state x^. From (2.5), u*= - — p TB u*(‘ o) = - - | p( , o)Tb w hich is, from th e definition of B, u* 0 o) = ~ ^ P sO q) To find the initial costate, w e retu rn to (2.15). K*o) - <t,i2<D)( ^ ‘ f) - ® ll(D)K ‘ a)) K eeping the static final state, this is PO o) D 240 - 120D 20D 120D - 5 6 D 2 8 D 3 20D 8 D3 D 4 X 0 0 from w hich w e need th e th ird com ponent, P ^ to ) D — [2 0 D2 5 8D“ X 0 0 1 0 D 1 I d 2! 2 D O N X X 1 . 0 0 1 L X3jJ 1 D 1 ° 2 ■ ^ 1 ) x° 0 1 D 2 .0 0 1 L 3J,/ p ,( t - 3 -2 120D 48 D 6 D -1 Xf - x ° - D x ° - 4 D 2x° 1 2 2 3 p 3( t o) = -1 2 0 D ^ X - x ° M ultiplying o u t an d sim plifying, Dx~ - „ 2 2 2 ‘ 3 - D 2x° J - 4 8 D 2(: (2.17) x ° + D x ° ) + 6x ° D -1 P 3( t o) = - 120D 3[ x f - x ° j + 72D 2x ° + 18x°D 1 C om bining w ith (2.17), 24 i f (t _) = 6C|xf - x ° ]/D 3 - 36x° /D 2 - 9x° /D v °} ^ ^ 2 3 (2.18) Eqn. (2.18) gives a very useful form ulation of th e optim al solution, for it applies not only to the trajectory at th e tim e of p ertu rb atio n (to) an d in the state x °, b u t to any optim al trajectory to th e given target, at any tim e. If the c u rre n t state of th e system includes velocity X 2 ° = v an d acceleration X 3 °= a , an d if w e let D = tf-t, i.e. D is the time to go, an d AX=X^ - x i° , th en w e m ay express u*(t) as the feedback based control, u*= 60AX/D3 - 3 6 v /D 2 - 9 a /D (2.19) In this form u latio n th e controller does n o t sto re th e trajectory. It need only generate D, the tim e rem aining, and m onitor th e cu rren t state x and ta rg e t X^. T his is an im p o rta n t difference fro m th e p re p ro g ra m m e d fo rm u latio n (trajectory p u rely as a function of tim e), for here if at any in stan t th e state is p e rtu rb e d from x to another state x at tim e t , or if the target is p ertu rb ed from X^ to a new position X^, th e system th en proceeds along the (new ) optim al trajectory from x to the targ et state (X^, 0, 0)T in tim e D = tf - t . The significance of this m odel for sim ulation of reaching to p e rtu rb e d targets is show n below . It is im p o rtan t to m ention th a t this fo rm u latio n is n o t d e p e n d e n t o n a feedback control stru ctu re. If the c u rre n t state x(t) w as p ro d u ce d by an in tern al m o d el of th e controlled p la n t rath e r th an from the p la n t itself, th en th e control w o u ld be open loop, or feedforward, in natu re. The im p o rta n t a d v an tag e of this state- based control structure is that it allow s a w hole fam ily of trajectories to be encoded in the m athem atically sim ple controller. F urther, because of the linear system form ulation of (2.19), the resu ltin g kinem atic trajectory x(t) scales w ith th e in p u t, i.e., th e ta rg e t d istan ce AX. T hus th e m o d el duplicates the findings of M acKenzie et al. (1987) th at peak velocity (along w ith the rest of th e kinem atic profile) scales w ith m ovem ent am plitude. In o u r fo rm u latio n , th e tim e rem ain in g , D, goes to zero as the m ovem ent's en d approaches. To avoid instability in th e system at this 25 J! L Inverse v a dynamics and plant x Feedback v , Unit u X -x Judt t-A "tJTL Inverse v a dynamics and plant x Feedback "Look-ahead" v(t-A) Fig. 2.1. U sing feedback to generate a tim e varying trajectory, a A m echanism for generating a state trajectory b y m ap p in g the current state an d target into a control signal, u. P lant and inverse dynam ics are "lum ped together." C om m and to the p lan t is specified in term s of acceleration [a(t)]. (u is in term s of jerk an d passes th ro u g h an in teg ratin g filter to yield a), b T he "sliding w in d o w in teg rato r" (SWI), w hich integrates the in p u t (u) over tim e for th e past (A) tim e interval. It is u se d in c as p a rt of th e state lo o k -ah ead m odule, c E xtension of a to accept d elay ed feedback by calcu latin g th e estim ates x (t) a n d v(t) of th e p re se n t sta te fro m th e e fferen t com m and to th e plant, a(t), and delayed feedback, x(t-A) an d v(t-A). A indicates tim e d elay for p la n t state feedback to th e trajectory generator. 26 p o in t, w e im pose a (positive) low er b o u n d o n D. S im ulation show s that this is a m odest constraint. T he im p le m e n ta tio n o f th e fe e d b a c k tra je c to ry g e n e ra to r m echanism th at w e have discussed is show n in Fig. 2.1a. W e assum e that th e trajecto ry for th e h a n d is sp ecified in term s of its acceleratio n kinem atics, show n as the in p u t "a" to th e box labeled "inverse dynam ics a n d plant." T his label em phasizes th a t this is a kinem atics m odel, not accounting for th e forces involved in m oving th e arm . W e assum e th at w ith in this box, the d esired kinem atic trajectory (expressed in term s of tim e varying acceleration) is tran slated into the a p p ro p ria te d riv in g forces to m ove the arm . The o u tp u t of this box is th en the resu ltin g position (x) a n d velocity (v) of the hand. (A lthough the inverse dynam ics subproblem is not addressed here, m odeling efforts such as those of K aw ato et al. (1988) h av e stu d ied it at length, an d w e w ill ad d ress it in C h ap ter 6.) The box labeled "feedback u n it" perform s the com putation described in (2.19), after w h ic h u p a sse s th ro u g h an in te g ra tin g filter to y ield th e d riv in g acceleration signal. T he p ro b lem th a t w e m u st n o w a d d re ss is th a t Fig. 2.1a uses instantaneous feedback, w hereas in reality, the cu rren t targ et position and h a n d velocity an d position, v an d x, are sensed w ith som e latency. W e th u s tu rn to the design of a "look-ahead" unit. Fig. 2.1b show s a "sliding w in d o w integrator" w h o se o u tp u t is th e in teg ral of its in p u t over the m o st re c e n t tim e p e rio d A. To e m p h a siz e th e sim p lic ity of th e co m p u tatio n perfo rm ed , w e describe a sim ple artificial n e u ra l n etw o rk co n stru ct an d show th a t it can be a d ap ted for this p u rpose: T he sliding w in d o w in teg rato r m ay be constructed from a lin ear "leaky in teg rato r" n eu ro n m odel, w ith a long tim e constant. The description of the response of such a n eu ro n m odel is x d y /d t = -y + kx, w h ere x is the n eu ro n 's tim e constant, y is its o u tp u t, x is its in p u t, an d k is a synaptic w eight. If x is large, an d k = x, th e n this d escrip tio n can be 27 ap p ro x im ated by d y / d t = x: hence the o u tp u t is th e integral of th e input. Let the in p u t be the difference betw een a signal and a delayed copy of itself: x = u(t) - u(t-A) d y /d t = u(t) - u(t-A) Then by integrating w e have: T his co n stru ctio n is sh o w n in Fig. 2.1b, w h ere th e "A" box in d icates a tem p o ral delay, a n d th e p o sitiv e feedback line in d icates th e in teg ral p ro p erty of the neuron. It is used in Fig. 2.1c (the boxes labeled ”SWI") to estim ate th e p lan t's positio n a n d velocity from d elay ed feedback in the presence of noise. If th e noise is u n b iased , this is th e m o st reasonable estim ate of th e cu rren t state. The delayed feedback m odel is n o w pictured in Fig. 2.1c. Position an d velocity feedback are both delay ed by an am o u n t A. (N ote th a t this delay rep resen ts th e collective d elay for the sensorim otor loop, b o th the tim e for external events to affect th e internal program , i.e., sensory delay, a n d the tim e for th e internal p ro g ram to influence m o v em en t kinem atics, i.e., m otor delay.) To com pensate for this delay, w e have in clu d ed a "state look-ahead" m o d u le to estim ate th e current p la n t state. E ssentially, since it is only co m puting m ovem ent kinem atics, it perform s a do u b le integral of th e control signal,. a(t), to estim ate the change in position, x(t), d u rin g th e delay p eriod (and also th e change in th e velocity, v(t), along the w ay) a n d uses these values to extrapolate from the delayed state values to an estim ate of the p resen t value. W ith no m echanical noise o r unexpected p e rtu rb a tio n s of th e h a n d , th e "look-ahead" m o d u le g iv es a p recise t t-A A ssum ing u(t) = 0 for t <0, this sim plifies to: t t-A 28 prediction. The m athem atical discussion in this section assum ed an ideal, n o iseless system . N o te also th a t th e p e rc e p tio n of ta rg e t p o sitio n encounters delay. To indicate th at this delay n eed n o t be the sam e length as th e arm state delay, w e label it At . T he p re se n t p o sitio n estim ate, g e n erate d b y th e "look-ahead" m o d u le, is su b tra cted from this delay ed target location to yield m otor error. The effect of this target location delay is sig n ifican t in th e p e rtu rb e d ta rg e t p a ra d ig m , w h e re th e kinem atic response to a shifting target is delayed by an am o u n t A j. H aving p resen ted this feedback control m odel of reach generation, w hich o p erates in th e presence of noise an d d elay ed inform ation, in the next section w e p resen t the results of sim ulations of this m odel to explain b eh av io ral d a ta on accuracy of m o v em en t a n d resp o n se to p e rtu rb e d targets. 2.3 Simulation Results W e sim u la te d th e reach g e n e ra to r o f Fig. 2.1 for n o rm al an d p e rtu rb e d ta rg e t p a ra d ig m s, m o d elin g a o n e -d im e n sio n a l m o v em en t trajectory. (M u lti-d im en sio n al reach in g m o v em en ts are sim u la ted in C h a p te r 4.) T he c o n tro lla b le p a ra m e te rs o f th e sim u la tio n w e re m o v em en t a m p litu d e (xfy, m o v em en t d u ra tio n (MT), noise a m p litu d e (NA), the displacem ent (8Xfy an d tim e of onset of targ et perturbation, and th e increase in m o v em en t tim e after targ et p e rtu rb a tio n . N oise in the sy stem w as sim u lated as a ran d o m value, u n ifo rm ly d istrib u te d in an in terv al JR _R 2 ' 2 J su p erim p o sed o n the m o v em en t velocity, w h ere R = N A x V, th e controllable p a ra m eter N A (set at 0.10 in the sim ulations d isc u sse d below ) tim es th e c u rre n t velocity. (T his n o ise m o d el is consistent w ith th at of Schm idt et al., 1977 a n d M eyer et al., 1988, an d is d isc u sse d fu rth e r in C h a p te r 5.) The sim u la tio n o u tp u t consisted of position, velocity, and acceleration profiles, w hich w e collected for each sim u lated reach. To sim ulate targ et m ovem ent, th e targ et w as shifted in one step from X^ to X^+SX^, and m ovem ent d u ratio n extended from M T to 29 M T+8M T, so th at the value of D in (2.19) after the shift w as increased by 8MT. A d d ressin g th e ta rg e t p e rtu rb a tio n ex p erim en t of P elisson et al. (1986) d isc u sse d above, w e sim u la ted th e effect of p e rtu rb in g ta rg e t positio n at m ovem ent onset, assum ing a feedback delay of 100 m s, a n d a m o v e m e n t a m p litu d e in c re a sin g fro m 40 cm to 44 cm . A llo tte d m o v e m e n t tim e w as in itia lly 520 m s, in c re a se d to 540 m s a fte r p ertu rb a tio n , both tim e values w ith in th e ran g e of m ea su re d m ovem ent tim es for the corresponding distance. The result is seen in Figs. 2.2 b an d c. A t th e q u alitativ e level, th ere is th e general b ell-shaped velocity profile a n d double-peaked acceleration profile in both the m odel an d in th e data. F urther, the transition to the new trajectory in Fig. 2.2c is seam less, as seen in th e actu al d a ta of Fig. 2.2a. O u r m odel of c o n tin u o u s co m p ariso n b e tw ee n ta rg e t p o sitio n a n d arm state correctly p re d ic ts th e sm o o th tran sitio n . (This re su lt is in c o n tra st to th e ch aracteristic tran sitio n s d u rin g backw ard an d sidew ays target p ertu rb atio n s, seen in the d a ta and rep ro d u ced by the m odel, below and in C hapter 4.) Q uantitatively, in both th e d a ta an d m o deling results, peak velocity occurs h alfw ay th ro u g h the m ovem ent (at 260 m s) in th e u n p ertu rb ed case, w ith a slight increase in its tim e of occurrence in th e p e rtu rb e d case (at 270 m s in th e d ata, 280 m s in th e m odel). The m ag n itu d es of peak velocity an d acceleration are affected v ery little by p e rtu rb a tio n in both the m odel and data. T hese values are only slightly low er in the m odel th a n in the data. P eak velocity is 220 c m /s in the data, 144 c m /s in the m odel. Peak acceleration is about 1000 c m / s ^ in the d ata, 850 c m /s ^ in th e m odel. T he im p o rta n t re su lt to em phasize is the lack of in terru p tio n of the trajectory after p e rtu rb a tio n of th e target location. A trajecto ry tran sitio n is observable w h en th e ta rg e t of reach is p e rtu rb e d significantly in location and direction from the h a n d 's starting p osition, as in the trajectory reversal ex p erim en t of G eo rgopoulos et al. (1981). To see w h eth er o u r m odel d u p licated th eir subject's m ovem ent profile, w e ran the targ et p ertu rb atio n sim ulation u sin g th e distance and tim ing p aram eters from th eir data. W e sim ulated the m ovem ent, show n 30 * o 102 258 ■ 6 7 # 188 0 9 0 ' 0 0 6 ire ■ 4 TIME (M SE C ) T IM E tM S e C } feiiiaLf ■ « £ * ■ 100 - 50 - 0.4 0.6 0.0 time (s) Hand Speed (am/s) Hand Displacement (cm) D O U B L E S T E P 150- 100 - 50- 0.6 0.4 0.0 0.2 time (s) Hand Speed (cm/s) Hand Displacement (cm) a 500- -500 -1 0 0 0 - I 0.6 0.2 0.4 time (s) 0.0 10 0 0 < -500- -iooct 0.6 0.0 0.2 0.4 Fig. 2.2. Target perturbation experiment of Pelisson et al. (1986). a Hand velocity and acceleration profiles for unperturbed reach to 40 cm target (left) and perturbed reach, from 40 to 44 cm (right), b Simulation of unperturbed reach, showing position and velocity profiles, c Simulation of target perturbation response. When the target is perturbed 1 0 % further from the subject at movement onset and in the same direction as the initial target, neither stopping nor reacceleration is seen. Rather, a seamless transition to a new trajectory is made. Also shown is acceleration for d unperturbed and e perturbed cases. 31 in Fig. 2.3a, of reach in g to a targ e t 8 cm aw ay from th e h a n d 's initial position, w hich th en sw itches to a p o in t 8 cm b eh in d th e initial position, 50 m s b e fo re th e o n se t of th e in itia l m o v e m e n t. T he tra je c to ry p e rtu rb a tio n begins 200 m s in to th e sim u la ted m o v em en t b ecau se of sensory delay. In the sim ulation, the chosen m o v em en t tim e param eter directly affects p eak velocity: Longer m ovem ent tim e m eans low er peak velocity. U sin g an initial m ovem ent tim e of 260 m s (taken from th eir d a ta for th e u n p e rtu rb e d case) a n d a m odified m ovem ent tim e of 350 m s taken from th e p e rtu rb e d case (Fig. 2.3a), w e fo u n d , as d id they, th a t the secondary velocity p eak is h igher th an the first. T he ratio of th e second p eak h eig h t to th e first in th eir d ata is 1.62, an d in o u r sim ulation (Fig. 2.3b) it is 1.69. T he tim ing of the occurrence of th e peaks is also the same: The first peak occurs 130 m s after m ovem ent onset in th e d ata, at 120 m s in th e m odel. The second velocity peak occurs at 340 m s in the data, 350 m s in th e m odel. The tim e of m ovem ent reversal is 210 m s in th e data, 230 m s in th e m odel. T hus th e kinem atics of ta rg e t rev e rsa l in th e sim u la tio n a re sim ila r to th o se seen in th e b e h a v in g subject: T he response to p ertu rb atio n is delayed only by the nom inal reaction tim e, and to reverse th e trajectory a significantly h igher velocity curve is created, w ith th e k in em atic la n d m a rk s (peak v elo city a n d tim e of rev ersal) occurring at realistic tim es. 2.4 Discussion This ch ap ter described a lim b m ovem ent g en eratio n m odel w hich ad m its targ et p e rtu rb a tio n s d u rin g m ovem ent. Its o u tp u t is in term s of h a n d k in em atics, a n d so is co m p arab le to d a ta from ex p erim en ts in h u m a n an d m onkey m o to r behavior. It explicitly show s h o w a family of m o v em en t trajectories m ig h t be g e n e ra te d by a co n tro ller in te rac tin g either w ith th e controlled p lan t or an internal m odel of the plant. 32 a H BNGFW TH velocity rz ♦♦I Tt | RTl 1 KTl- | H T 2 — b 10 -2 - - 6 - -10 0.4 0.6 0.2 0.0 Time (s) Fig. 2.3. D irection reversal experim ent of G eorgopoulos et al. (1981). a A ctual data. Left, to p view of h a n d p ath ; rig h t, sp eed profile, points spaced by 10 ms. b Sim ulation. A m onkey m oves a planar m a n ip u la n d u m to c ap tu re in d icated targets. First ta rg e t is 8 cm fu rth er from the subject th an th e h an d 's startin g position. Second targ e t is 8 cm n e a re r th e subject th an th e sta rtin g p o sitio n an d appears 50 m s before m ovem ent begins. Position (term inating at -8 cm, m easured from the h an d 's startin g position) an d absolute value of velocity (unit: x 10 c m /s) are show n. D etailed co m p ariso n of m odel an d d ata given in text. 33 S ep arate circu itry for se p ara te ta rg e ts, p e rtu rb e d targ e ts, or d ifferen t d u ratio n s is unnecessary. R ather, th e single m odel p ro d u ces trajectories p aram etrized by target location an d duration. In th e sim u la tio n s d isc u sse d ab o v e, o n e o f th e in d e p e n d e n t param eters w as the additional m ovem ent tim e (8MT) allocated after target p e rtu rb a tio n . T his w as a d ju ste d to m atch th e tim e c o u rse of th e e x p erim en tal d ata, b u t since this p a ra m e te r has a p ro fo u n d effect o n m o v em en t kinem atics (reacceleration an d p eak velocity, as w ell as total m o v e m e n t tim e) it w o u ld b e p re fe ra b le to sp ecify th is sim u la tio n p a ra m eter as d e p en d in g on experim ental param eters such as (a) tim e of o n se t of ta rg e t p ertu rb atio n , (b) distance to initial targ et, (c) d istan ce to p e rtu rb e d target, an d (d) m ovem ent tim e for u n p e rtu rb e d m ovem ent. In C h ap ter 3 w e p resen t a m odel of d u ra tio n u n d e r norm al a n d p e rtu rb e d conditions w hich suggests h o w d u ration can em erge as a trade-off betw een v a rio u s costs in p erfo rm in g a m ovem ent. The resu lt is a rela tio n sh ip b etw een d u ratio n an d the experim ental p aram eters listed above. 34 C h a p t e r 3 A M o d e l o f D u r a t i o n i n N o r m a l a n d P e r t u r b e d R e a c h in g M o v e m e n t W e extend a m inim um jerk m odel of reach trajectory p lan n in g to include a p en alty for d u ratio n , an d show th at it can be u se d to m odel an d predict th e d u ra tio n s of u n p e rtu rb e d a n d p e rtu rb e d v o lu n ta ry re a c h in g m o v em en ts as a fu nction of m o v em en t d istan ce a n d p e rtu rb a tio n , in several bodies of experim ental data._______________________________________ 3.1 Introduction A n u m b er of m o to r behaviorists have in v estig ated th e response to m o v in g a visu al ta rg e t w hile a subject is reach in g to g ra sp it. In th e p re v io u s c h a p te r w e m o d e le d th e k in em atics o f rea ch u n d e r su ch conditions. Since w e set th e d u ratio n of each m ovem ent to th at w hich is o b se rv e d e x p e rim e n ta lly , an u n a d d re s s e d q u e stio n is w h a t is th e d e te rm in a n t of th e m o v em en t d u ra tio n in u n p e rtu rb e d a n d p e rtu rb e d reaching. Pelisson et al. (1986) u se d a p ertu rb atio n p arad ig m in w hich at m o v em en t o n set th e target w as occasionally a n d u n expectedly p e rtu rb e d fu rth e r aw ay from the subject. A sm all change w as seen in m ovem ent tim e. In co n trast, in th e targ e t rev ersal ex p erim e n t carried o u t w ith m onkeys b y G eorgopoulos et al. (1981), th e m o v em en t tim e w as m ore 35 th a n d o u b led after targ et p ertu rb atio n . W hat is n eed ed is a single m odel of m ovem ent tim e d eterm ination w hich predicts a v ariety of such data. R esearchers h av e su ccessfu lly m o d eled rea ch in g trajecto ries by assu m in g th e n erv o u s system p lan s th e trajectory according to th e p a th traced o u t by the h and in space, an d optim izing the trajectory according to a cost function w hich penalizes th e lack of sm oothness in th e m ovem ent (H ogan, 1984; Flash an d H ogan, 1985; H off an d A rbib, 1992a,b). T he cost function is the integral over th e m ovem ent d u ra tio n of th e sq u are of the d e riv a tiv e of acceleration, o r jerk. Yet to fin d th e trajecto ry b y this optim ization technique, it is necessary to first know its d u ratio n . In w h a t follow s, w e show ho w to extend the cost function to include a p en alty for d u ra tio n , th e n allow th e d u ra tio n to em erg e from th e o p tim iz a tio n process. The resu lt is a form ula for p redicting th e m ovem ent tim e based on th e geom etry of the experim ent an d th e tim ing of perturbation. 3.2 The Minimum Jerk / Minimum Time Model T he follow ing d eriv atio n follow s the form of th e o p tim al control d eriv atio n of th e previous chapter, b u t w ith tw o extensions: W e consider tw o -d im en sio n al, ra th e r th a n o n e-d im en sio n al m o v em en ts, a n d w e let th e d u ratio n itself be found by optim ization. W e first define a state vector for each dim ension of a tw o dim ensional m ovem ent: xi(t), i= l, 2, are 3 X 1 vectors of position, velocity, a n d acceleration. T aking th e d riv in g in p u t, ui(t) to be the jerk, each dim ension of the m ovem ent is described by x = A x + Bu 1 1 1 x = A x „ + Bu _ 2 2 2 (3.1a) (3.1b) w here, as before, A = 'o 1 o " " o " 0 0 1 , B = 0 .0 0 0. .1. xi(0)=xi°, xi(tf)=xif, i=l,2. 36 The n ew cost function penalizes d u ratio n as w ell as jerk: t=t f f J = + R J * u i + u ?)d t t=0 or, equivalently, f t=tf j= J (Ru 2 + ru 2 + l) d t t=0 1 2 (3.2) w h e re R is th e relativ e w eig h tin g b etw een th e tw o factors in th e cost (positive an d constant), an d is to be determ in ed later. The trajectory is to go from th e given initial state to th e given final state. W e solve for the o p tim u m trajectory b y ap p ly in g the m in im u m p rin cip le (Bryson an d H o, 1975). First w e define th e costate dynam ics equations, - f>1 = ATpl (3.3a) - p 2 = A \ (3.3b) W e th en define th e H am iltonian, w hich is to be m inim ized by th e choice of in p u ts u ^ (t), U2 (t). In th e m in im u m p rin cip le , th e H a m ilto n ia n is defined as H = L + pT x w here L is the in teg ran d of th e cost functional, x is th e state an d p is the costate. In o u r case the state is the 6 X 1 vector p ro d u ced by concatenating the tw o 3 X 1 vectors x j and X 2 - Sim ilarly, th e costate is the concatenation of p i an d p 2 - T hus th e H am iltonian is H = L + x + x v \ 1 2 2 o r H =(Ru^ + Ru^ + l)+pJI A x 1 + B u 1J + p T fA x 2 +Bu, (3.4) Since th e fo rm ulation is a linear system w ith a q u ad ratic cost functional, w e can m inim ize th e H am ilto n ian b y d ifferen tiatin g w ith respect to the control, an d setting th e d erivative equal to zero. 37 (3.5a) = 0 = 2Ru* 2 + p Tb , (3.5b) N o w (3.1), (3.3), an d (3.5) define a system of differential eq u atio n s w hich m ay b e solved w ith su itab le b o u n d a ry conditions to yield th e op tim al trajectory. W hile th e b o u n d a ry conditions xi°, xjf, i= l,2, are given, since tf is being optim ized it is n o t given a priori. Instead, w e have one additional constraint, th a t the H am iltonian at th e final tim e is zero (Bryson a n d H o, 1975.) Since th e system dynam ics an d cost function in te g ra n d are au to n o m o u s (not explicit functions of tim e), w e hav e th a t th e tim e d e riv a tiv e of the H am ilto n ian is zero also. T his is ad v an tag eo u s, as it allow s u s to use th e co n strain t a t th e m ost convenient tim e t. Setting (3.4) equal to zero, H = 0 T herefore, th e H am iltonian at any tim e in the trajectory is zero. H{t) = 0 (3.6) an d rearran g in g term s, w e have ty ^ 0 = Ru* „ + pjBu* + Ru*^ + p^Bu* + 1 + p^Ax „ + p ^ A x _ 1 * 1 1 2 2 2 K1 1 r 2 2 W e can sim plify the expression b y using the definition of B 0 B = 0 1 from (3.5). Perform ing th e substitution, 38 0 = _2 _2 _2 _2 Pl,3 P 1,3 P 2,3 P 2,3 4R 2R a n d sim plifying, 4R + 1+ pTAx , + pTAx ^ 2R K1 1 v 2 2 2 2 P 1,3 P 2,3 4R + 1 + pTAx „ + pTAx „ 4R *1 1 v 2 2 (3.7) Also, w e m ay use the definition of A, ~0 1 0 x . _ i ,2 0 0 1 , Ax = 1 x . ,3 1 . 0 0 0 0 . A = an d p lu g into (3.7) to obtain ,.2 . 2 0 = pfA x. = p. x. n + p. x . l l 1,1 1,2 1,2 1,3 P 1 3 P 2 3 ^ r - ^ - +1 + P l , l Xl,2 + P l,2’<l,3 + P 2 ,lX2,2 + P2.2X2,3 (3.8) w hich, w e rem in d ourselves, applies at every time t. W e re tu rn no w to th e p ro b lem of fin d in g th e so lu tio n s for xj(t), pj(t), i= l,2. Plugging uj*(t) into (3.1) w e have x . = Ax . + B 1 1 B Pi v 2R T A BB x . = Ax . p . 1 1 2R 1 for i=l,2. C om bining (3.9) w ith (3.3) yields (3.9) Pij A BB 2R T P; .0 - A Plugging in A an d B, w e get the follow ing differential equation: 0 1 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 1 0 0 -1 2R 0 0 0 L Pi. 39 This is sim ilar to the system in the C h ap ter 2 (Eqn. 2.7). As before w e w ish to find th e state transition m atrix, defined by 'x.(t)' x.(o) p.(t) - 1 J = o (t,o ) 1 p.(o) - 1 For convenience, w e d efin e fo u r "sub-m atrices" of th e sta te tran sitio n m atrix , (3.10) 1 X ...1 < E * n (t,0) «& 12(t,0)‘ 0 w X 1 p.(t) L r l J ^ 2 1 ( t , 0 ) 4>22( t ' ° ) p .( 0) L r i The solution consists of m atrices of m onom ials of o rd er u p to 5, V 1 0 1 t 4 - 5t3 2 0 12 0 O I I K J 1 5t 3 - 2 0 t2 60t , 2 0t2 - 60t 120 . (3.11a) o 21(t,o) = 1 h -L o o J 0 0 0 0 o' 0 o 22( t, 0) = - 1 1 0 1 2 0 0 0. . 2 * (3.11b) N o te th a t this resu lt gives th e solution trajectory for (3.10) if th e initial conditions, xi(0) an d pi(0), are know n. Tw o problem s rem ain: W e d o not k now pi(0), an d w e have still to find the d u ratio n tf, using (3.8). G iven the in itial a n d final values for th e state vector, xj(0) an d xf(tf), w e can find pj(0), as a function of tf: W e assign t= tf in (3.10), x .( 0) p.(0) L 1 th e first ro w of w hich is ,< i( , f) = < 1 > l l ( t f 0) ’< i(0) + <,,12(‘ f'0)P i(0) Solving for pf(0), Pi(°) = * T k t f '(Oxt ( .f ) - ‘> i 2 ( , f-0) o n ( t £'0) x i<0) w h e re (3.12) 40 < j > \ t o ) = ^ 12 t 5 240 - 1201 201 120t - 5 6 t 2 8 t3 201 8t“ (3.13) T his fo rm u la for pi(0) p ro v id es th e in fo rm atio n n e ed e d for fin d in g th e solution trajectory. It also allow s u s to find th e duration. G iven b o u n d ary conditions xj(0) an d xj(tf), w e find pi(0) using (3.12) an d th en p lu g it, along w ith xj(0), into (3.8), th e equation for th e free en d tim e condition. (N ote w e are arbitrarily an d conveniently choosing to solve this equation at tim e t=0.) This yields the follow ing sixth o rd er polynom ial in tf. 0 = -3600 R (Axj j 2 + Ax2 j 2) + tf 2880 R (Ax! ! Ax1 2 + Ax2/1 Ax2 2 + 2Axj j x12° + 2Ax2;1 x2 2° ) + tf2 R(-576 Ax| ^2 - 360 Axj j Ax 1 3 - 576 Ax2 2 2 - 360 Ax2 ^ Ax2 ^ - 2160 Ax! 2 x i,2° - 2160 xj 2° 2 - 2160 Ax2 /2 x2 2° - 2160 x2/202) + tf^ R(144 Axj 2 Ax 1^3 + 144 Ax2 2 Ax2 3 + 240 Axj 3 xj 2® + 48 Axjp. x i3 0 + 240 Ax2 ,3 x2,2® + 48 Ax2 2 x2 ^ 3®) + tf4 R(-9 Ax1 3 2 - 9 Ax2 3 2 - 12 Ax1 3 x1/3° - 12 x1/3° 2 - 12 A x2,3 x2 30 - 12 x2,3° 2) + tf6 (3.14) w h ere Axj = x ^ - xj°. This polynom ial m ay th e n be solved num erically, given values for the b o u n d ary conditions x f° , X2 0, x ]/, X2 ^, an d a value for R. W e w ill n o w solve it analytically for th e special case w h en th ere are th e static b o u n d a ry conditions, x° = 0 X*" 0 > X II 1 0 .0. 0 _ for i= l,2. T hen taking (3.8) at t=0, the last four term s vanish leaving 41 2 2 P 1 3 P2 3 4R 4R 4R = p2l,3 + P 2,3 (3.15) P lugging xj(0)=0 into (3.12), Pi( 0) = 4 > “’ ( t f,° )X i(tf) P lu g g in g in th e static xf(tf), u sin g (3.13), an d calculating o n ly th e th ird com ponent of pi(0), P . (0) = - ^ 2 0 t 2x* i,3 .5 f i f _ -120R J C om bining this w ith (3.15) yields 1202R 2 f 2 1202 R 2 f 2 4R = ~~~T~X1 + — J ~ ~ X2 f f 2 f 2 2 " t = 6 0 R +X^ i K 1 2 J (3.16) W e define D to be the distance traveled from t=0 to t=tf: 2 f 2 1 +X2 D = x{ + x f W e can th en w rite (3.16) as, I i t f =(60D) R (317) This last equation says th at th e d u ratio n of a m ovem ent is p ro portional to th e cubed ro o t of th e distance m oved, and th e co n stan t of p ro portionality is th e "arb itrary " w e ig h tin g o f sm o o th n ess v e rs u s tim e, R. T his d e p en d e n ce o n R is co n sisten t w ith o u r in tu itio n : If sm o o th n ess is p en alized m ore th an d u ratio n , a long, slow m o v em en t w ill resu lt, w ith lo n g e r d u ra tio n , i.e. in cre asin g R increases tf. T his p resen ts b o th a 42 pro b lem an d a solution for m odeling d u ratio n as an em ergent p ro p erty of an optim izatio n m odel. The problem is th at d u ratio n is still essentially a c h o sen p a ra m e te r, since th e m o d e le r chooses R. H o w e v e r, w h e n m ovem ent distance varies in an experim ent, w e can ex trap o late from one m ovem ent tim e to th e m ovem ent tim es of o th er distances. W e d o so by solving (3.17) for R 2 (60 D) T hen for a chosen m ovem ent distance an d tim e, w e find R, w hich is used alo n g w ith o th er m o v em en t distances in (3.17) to find th eir associated d u ratio n s. F u rth er, th e d u ratio n of a u n p e rtu rb e d m o v em en t can be u se d to p red ict the d u ratio n of a p e rtu rb e d m ovem ent. W e assum e th a t w hen the ta rg e t of reach is p e rtu rb e d th ere is som e d elay w h ile th e v isu al targ et location signal reaches th e n eu ral m o to r circuitry w hich im plem ents the optimal m o v em en t to w a rd the target. W hen th e n ew ta rg e t location is fed to this circuitry, a new o p tim al m o v em en t is in itia te d to w a rd the secondary target. The d u ratio n of this m ovem ent is predictable b y solving th e above o p tim al control problem , b u t w ith novel b o u n d a ry conditions. T he n ew term inal condition is given by the p ertu rb ed targ et location. The n ew initial condition is the state of the system , xj(t), i= l, 2, at the tim e the n ew trajectory begins. This view of p ertu rb atio n resp o n se is discussed in detail in C h ap ter 2. To find the d u ratio n of the p e rtu rb e d m ovem ent, first u se tf an d D from the n o n -p ertu rb ed m ovem ent to find R. T hen ap p ly R to (3.14) an d solve th e polynom ial eq u atio n to find th e d u ra tio n of the second section of the p ertu rb ed m ovem ent. N ote th at xj(0) in (3.14) is the current, nonstatic state at the tim e th e p e rtu rb e d m o v em en t begins. This tech n iq u e is u se d in th e n ext section to accurately m o d el th ree different bodies of targ et p ertu rb a tio n experim ents, an d p red ic t som e n ew results. By w a y of a re m a rk , n o te th a t e v e n th o u g h th e d im e n sio n s are d y n am ically d eco u p led (Eqn. 3.1), th e m in im izatio n c riterio n y ield s a m o v em en t tim e tf w hich d e p en d s on th e b o u n d a ry conditions for each 43 d im en sio n , a n d w hich in tu rn affects th e trajectory of each d im en sio n . For exam ple, if th e targ et location com ponent is changed in d im en sio n 1 only, th en tf w ill be affected, as w ill be the trajectory in dim ension 2. This has im plications for o th er m u ltid im en sio n al m ovem ents, su ch as ocular saccades, in w hich o n e d im en sio n of eye ro tatio n slow s for th e o th er w h e n an obliq u e m o v em en t is b ein g m ade. F u rth e r, th is d im en sio n al co u p lin g is in co n trast to th e co u p lin g of reach an d g rasp d iscu ssed in chapter 4. 3.3 Modeling Perturbation Data The m inim um je rk /tim e m odel w as ap p lied to reaching d a ta in an a tte m p t to p re d ic t th e v a ry in g m o v em en t tim es w h ic h re s u lt from d ifferen t targ et location p e rtu rb atio n s. T he in p u ts to th e m o d el are the initial a n d p e rtu rb e d target locations relative to th e initial h a n d location, th e n o n -p e rtu rb e d m o v em en t d u ra tio n , a n d th e tim e of re a c tio n to p e rtu rb a tio n (w hich is th e tim e of targ e t p e rtu rb a tio n p lu s sen so rim o to r delay, or "reaction time"). Pelisson et al. (1986) h ad subjects reach 30, 40 o r 50 cm , som etim es unex p ected ly p e rtu rb in g targ et location 10% fu rth er at m o v em en t onset. T hey fo u n d m o v em en t tim es sim ilar to u n p e rtu rb e d m o v em en t tim es w h en reaching to the slightly m ore d istan t targ et locations, an d p ertu rb ed m o v em en t trajectories w hich w ere sim ilar in shape to u n p e rtu rb e d ones (as d escribed an d m o d eled in ch ap ter 2). Fig. 3.1 show s six m easu red m o v em en t tim es (averages w ith associated sta n d a rd deviations). T here are th ree for the u n p e rtu rb e d m ovem ents of am p litu d e 30, 40, an d 50 cm, an d three for th e p ertu rb ed m ovem ents of am plitude 33, 44, an d 54 cm . In th e d u ra tio n m odel the u n p e rtu rb e d m o v em en t tim es w ere tak en to be th e av erag e for each d istan ce a n d since th e p e rtu rb a tio n o c c u rre d at m o v em en t o nset, th e tim e of trajectory p e rtu rb a tio n w as o n e reactio n tim e (RT) w hich w as taken to be 200 m s, a typical reaction tim e v alu e 44 900 Measured duration Model prediction 300 25 35 45 55 M ovem ent a m p litu d e (cm) Fig. 3.1. M ovem ent d u ratio n in Pelisson et al. (1986). E xperim ental d a ta a n d m o d el p red ic tio n show n. N o te th e sm all in crease in d u ratio n associated w ith p ertu rb a tio n in th e sam e direction as the in itial m o v em en t (i.e. m o v em en ts of a m p litu d e 33, 44, 54 cm) c o m p a re d w ith u n p e rtu rb e d m o v e m e n ts (i.e. m o v e m e n ts of am p litu d e 30, 40, 50 cm), w hen the p e rtu rb a tio n occurs early in the m o v em en t. E rro r b a rs in d ic a te th e ra n g e of e x p e rim e n ta lly recorded values. (Stark, 1968). T he resu ltin g sixth o rd e r polynom ials w ere solved u sin g M athem atica, an d th e resu lts of estim atin g th e p e rtu rb e d d u ra tio n s are seen in Fig. 3.1. C learly, the m odel p red icts th e sm all m o v em en t tim e increase. T he d u ratio n prediction m odel takes into account n o t only the d ista n c e fro m th e ta rg e t a t p e rtu rb a tio n , b u t also th e d ire c tio n of m o v em en t, a p ro p e rty ak in to m o m e n tu m , b u t at th e kinem atic level. Since the direction of m ovem ent is sim ilar to th a t w hich w e w o u ld expect for an u n p e rtu rb e d m ovem ent to th e nearb y target, a d u ratio n sim ilar to FIRST LIGHT STRYS ON CHSEtt 50 100 150 200 250 300 400 CONTROL Fig. 3.2. M acaq u e h a n d trajecto ries in ta rg e t re v e rsa l task of G eorgopoulos et al. (1981). As d u ratio n of p resen tatio n of p rim ary targ et increases, so does total m ovem ent tim e. CO £ a 0 2 % 1 I > o 800 600 400- 200 Modeled duration (ms) Measured duration (ms) ISI (ms) Fig. 3.3. M ovem ent tim e d a ta an d m odel p red ictio n for th e target reversal ex perim ent of G eorgopoulos et al. (1981). T he d o tte d line in d ic a te s th e d u ra tio n of n o n -p e rtu rb e d re a c h in g m o v em en t. M easured d u ratio n is available for one case, the m odel predicts the d u ratio n s for th e o th er p e rtu rb a tio n cases in th e experim ent. ISI - in terstim u lu s interval, the tim e before th e targ et is sw itched. N ote th a t the m o d el p red icts sh o rte r overall m o v em en t tim e w h e n the target is sw itched sooner. 46 th e o rig in al is p red ic te d . T he significance of th is is seen in th e next m od elin g experim ent. The d u ratio n p rediction m odel w as ap p lied to the targ e t reversal experim ent of G eorgopoulos et al. (1981). A n u n p e rtu rb e d reach in g m ovem ent in th e h orizontal p lan e to a targ et 8 cm d ista n t took 260 m s to com plete. In som e trials, after som e interstim ulus interval (ISI), th e ta rg e t w as sw itch ed to a p o in t 8 cm from th e sta rtin g p o in t, in a direction opposite the direction of th e initial target (Fig. 3.2). In this case th ere w as a significant increase (about 100%) in m o v em en t tim e, d u e to th e m o v em en t reversal. T he m o d el p re d ic ts closely th e d u ra tio n for ISI=200 m s, w hich is read from the velocity profile in Fig. 2.3a as 550 m s (w ith a resolution of 10 m s). The m odel predicts 524 m s (Fig. 3.3). It also predicts d u ratio n s for a variety of ISI's, for w hich d u ratio n inform ation is u n av ailab le in th e p ap er. N o te th a t th e m odel p red ic ts sh o rte r overall m o v e m e n t tim e w h e n th e ta rg e t is sw itc h ed so o n er. T his se t of predictions gives a m ethod for testing the validity of th e m odel. L astly, the m o d el w as a p p lied to th e targ e t location p e rtu rb a tio n ex p erim en t of P au lig n an et al. (1991). Subjects reach ed a b o u t 24 cm to g rasp a sm all vertical dow el. D u rin g som e trials th e targ e t location w as sw itched, at m ovem ent onset, to one of tw o dow els slightly to the left or rig h t of the initially indicated dow el. The secondary dow el positio ns w ere sym m etrically displaced w ith respect to the initial one, ho w ev er th e w rist positions in g rasp in g each dow el w ere n o t sym m etrically offset from these d o w el positions, as can be seen in Fig. 3.4. R elative to th e in itial w rist po sitio n , w hile g rasp in g th e center dow el th e w rist w as a t X-Y p o sitio n (2 cm , 24 cm ). T he w rist p o sitio n s for the left an d rig h t targ e ts w ere estim ated to be (-6, 27) an d (8, 21), respectively. Since th e p e rtu rb a tio n occurred at m o v em en t onset, the tim e of trajectory p e rtu rb a tio n is equal to o n e reaction tim e, w hich w e fo u n d gave th e best fit to the d a ta w h en tak en to be 280 ms. T he results of using these param eters is show n in Fig. 3.5. W e also fo u n d that using the static final state d id n o t p ro d u ce as good as fit to th e d a ta as w h en a n o n zero final acceleration (b u t zero final velocity) w as used to control the target approach d u rin g reaching to grasp. L 47 PL x C 20 PR Y BODY AXIS X 50 mm P. RIG H T. C 20 W X P. LEFT Y Fig. 3.4. T arget location p ertu rb atio n experim ent of P aulignan et al. (1991). O bject to be g rasp ed is unex p ected ly m oved left o r right. P ath s of th u m b (T), index finger (I) an d w rist (W) are sh o w n for each of th e three conditions. 48 800 M easured duration M odel prediction | 700- c o *»H 2 X XI o > ti 6 0 0 - 5 0 0 - | 4 0 0 - S 300 PL C20 PR Target Fig. 3.5. M ovem ent tim e d ata an d m odel p red ictio n for the targ et p e rtu rb a tio n e x p e rim e n t of P a u lig n a n e t al. (1991). C20 - N o n p e rtu rb e d m o v em en t. M o d e le d d u ra tio n is id e n tic a l to m e a su re d d u ra tio n by definition. PL / PR - P e rtu rb e d left a n d p e rtu rb e d right. (This orientation constraint is discussed in C h ap ter 4.) To p ro d u ce the resu lts in Fig. 3.5, w e used the m odest term inal acceleration values xl,3(tf) = 0, x2/3 (tf) = -4 m /s 2 w hich, w e w ill show in chapter 4, insure th e correct ap p ro ach direction at the trajectory’s end. 3.4 Discussion In general, th e d u ratio n d ata exam ined above show s an increase in m o v em en t tim e for p e rtu rb e d m ovem ents. T his is faithfully rep ro d u c ed b y o u r m odel. T he exception is the p ertu rb atio n from 30 to 33 cm in th e 49 Duration Optimizer Inverse v a dynamics and plant x Feedback v Unit X -x Look-ahead' a(t) v(t) v(t-A) x(t) x(t-A) Fig. 3.6. U p d a te d version of trajectory controller from C h ap ter 2, show ing d u ratio n d ep en d en cy on targ et distance a n d cu rren t lim b state. experim ent of Pelisson e t al. (1986), in w hich th e average m ovem ent tim e is shorter for the p e rtu rb e d m ovem ent (Fig. 3.1). This is not p red icted by th e m in im u m je rk /tim e m odel, w hich never p red ic ts briefer m ovem ents to m o re d ista n t targ ets. If th is d iscrep an cy h o ld s for a m u ch larg er experim ental d ata set, a rethinking of the m odel w ill be necessary. A n im p o rta n t q u estio n is ho w th e o p tim al control so lu tio n w ith th e m odified cost function of this chapter com pares to the optim al control so lu tio n to th e p ro b lem as d escribed in c h ap ter 2. In tu itiv ely , once a d u ra tio n is chosen, (3.2) becom es sim ply a m in im u m jerk cost functional, w ith a co n stan t (the duration) ad d ed on. T hus one w o u ld expect th a t the op tim al control w o u ld be th e sam e as th at fo u n d in ch ap ter 2. W e w ill n o w form ally show this equivalence. C o m p arin g (2.8) an d (3.10) w e see sim ilar system s b ased o n transition m atrices (w ith th e difference th a t the 50 system of this chapter is tw o-dim ensional in stead of one-dim ensional). If w e u se ®j.(At) exclusively to refer to th e four tran sitio n m atrices defined in (2.12) an d (2.13), an d ^..(A t) to refer to th e tran sitio n m atrices of (3.11), w e m ay rew rite th e latter as < J > 'l 1 ( At) = 4 > 11 ( At) ‘fr'j 2 (At) = ~-<t>i2 ( At) < & '2 1 (At) = 1>21( At) < & '2 2 (At) = 4>22( At) w h e n ce N o w (3.12) m ay be w ritten, w here At=tf-t0 . The optim al control is, as derived earlier, T hese tw o equations m ay be com bined: 3 (3.18) N o w , re tu rn in g to th e ch ap ter 2 derivation, com bining (2.15) an d (2.17), w e have "*(* o) = - J * f) - ®u1 < D)* 11 ( D)< 1 o>) 3 w hich is th e sam e as (3.18), because D = tf-tG = At. T h u s/g iv e n th e sam e b o u n d a ry conditions, x(tG) an d x(tf), th e control and resu ltin g trajectories a re th e sam e for th e p ro b lem w h en o p tim ize d according to eith er th e m in im u m jerk or th e m in im u m jerk /tim e criterion. T he a d d e d featu re of the optim ization of this chapter is th a t it gives n o t only th e control, b u t also th e m ovem ent d u ratio n . U sing the m odel developed in this chapter, w e can no w m odify our reach trajectory generation m odel (Fig. 2.1), to show th e origin of duration. 51 T he duration optim izer in Fig. 3.6 takes as in p u t th e targ e t location an d th e cu rren t state of the h an d , th en com putes d u ratio n (in the form of time rem aining) according to (3.14) a n d sends this d u ra tio n to th e trajectory g e n e ra tio n m o d u le . W h en th e ta rg e t lo c a tio n is p e rtu rb e d , th e a p p ro p ria te ly m o d ified d u ra tio n is g e n erate d . W e can th in k of th is c o m p u tatio n as b ein g carried o u t b y a n eu ral map, so th a t it occurs in a brief, constant tim e. In the follow ing chapter w e w ill ex p an d this view of th e d u ra tio n co m p u tatio n to reflect constraints on b o th reach an d grasp, a n d converge on a m odel of th e tem poral coordination of these tw o m otor processes. 52 C h a p t e r 4 T h e C o o r d i n a t i o n o f R e a c h a n d G r a s p i n P r e h e n s i o n : N o r m a l a n d P e r t u r b e d C o n d i t i o n s This ch ap ter ad d resses th e m odeling of tw o-dim ensional reach an d grasp, u n d e r n o rm al a n d p e rtu rb e d conditions. P resh ap e a n d enclose of the h a n d are b o th m odeled in grasp. T ran sp o rt an d p resh ap e are coordinated via th eir tim ing, an d this co o rd in atio n can be ex p lain ed b y M a x im u m tim e sy n ch ro n izatio n a n d a Constant enclose time co n strain t. N o rm a l a n d p e rtu rb e d tran sp o rt an d p resh ap e trajectories are based on optim ality p rin c ip le s for m o v em en t efficiency (sm o o th n ess) w ith a p e n a lty for a p e rtu re a d d e d to p re sh a p e . D elays in in fo rm a tio n flow b e tw e e n sensorim otor p ro g ram s for reach an d g rasp affect tim ing an d kinem atics of these actions. In reaching to g rasp a n d reaching to point, the different ta sk s p u t d iffe re n t c o n stra in ts o n th e fin al sta te of th e h a n d . A m a th e m a tic a l orientation constraint a t th e e n d of reach realistically m odels reaching in the context of prehension._____________________________ 4.1 Introduction O u r original in terest in h a n d tra n sp o rt an d p resh ap in g arose from th e w o rk of Jeannerod (1981; see also Jeannerod a n d B iguer, 1982) w ho stu d ie d th e sh ap in g of th e h an d as it m oved from its initial p o sitio n to 53 pick u p a ball. T he h an d is preshaped so th a t w h en it has alm ost reached th e ball, it is of the rig h t shape an d orientation to enclose th e ball p rio r to g rip p in g it firm ly. By ex am in in g con secu tiv e fram es of a m ovie, o n e could see th a t th e m ovem ent m ay be d iv id ed in to tw o p arts, a fast initial m ovem ent, an d a slow ap p ro ach m ovem ent. This led to the coo rd in ated control p ro g ram for th e behavior sh ow n in Fig. 4.1a (A rbib, 1981). In the to p h alf of th e figure, w e see th ree p ercep tu al schem as— schem as w hose job it is to find inform ation ab o u t th e environm ent, rath er th an to control m ovem ent. Solid lines indicate th e tran sfer of d a ta from one schem a to an o th e r a n d d ash ed lines indicate th e tran sfe r of activation. Successful com pletion of locating th e object activates schem as for reco g n izin g the size a n d o rie n ta tio n of th e object. T he o u tp u ts of th ese p e rc e p tu a l schem as are available on sep arate channels for th e control of th e h a n d m ovem ent. This in tu rn involves th e co n cu rren t activation of tw o m o to r schem as, o r control system s. O ne m oves th e arm to tra n sp o rt th e h an d to w a rd s th e object, th e o th e r p re s h a p e s th e h a n d , w ith th e fin g er se p a ra tio n a n d o rie n ta tio n g u id e d b y th e o u tp u t of th e a p p ro p ria te p erceptual schem as. N ote that, in this m odel, once the h a n d is presh ap ed , the schem a involved in shaping the h a n d "goes to sleep." It is o n ly the com pletion of th e fast p h ase of h an d tran sfer th a t triggers the slow p h ase of h a n d tran sfe r as w ell as "w aking up" th e final stage of th e g rasp in g schem a, w hich w ill shape the fingers u n d e r control of tactile feedback. R ecent ex p erim e n ts from Je an n e ro d 's la b o ra to ry sh o w th a t th e m o d el of Fig. 4.1a is n o longer tenable. W hile this m o d el p ro v id e s a p re lim in a ry e x p lan atio n of tra n s p o rt/p re s h a p e in teractio n , it do es n o t a d d re ss th e trajectory fo rm atio n w hich, as w e w ill see, reflects su b tle effects of th e interaction of th e tw o m otor processes. It is the task of the p resen t ch ap ter to p ro v id e an u p d a te d m odel w h ich ad d resses n ew d a ta o n h o w h a n d trajecto ries v ary in v isu a lly g u id e d rea ch in g task s as requirem ents of speed an d accuracy vary, o r w h en th e target is p e rtu rb e d d u rin g th e m ovem ent. T he tw o m ajor changes are: (a) T he tra n s p o rt schem a cannot be d iv id ed into tw o sep arate phases, the first of w hich is ballistic, b u t m u st instead be m odeled as a single feedback system . It is the 54 recognition criteria -------- activation of visual search activation of reaching i visual input Visual Location t :: target location I visual input size visual and kinesthetic input Z ± Ballistic Movement Slow Phase Movement Hand Reaching 1 visual input Size Orientation Recognition Recognition orientation visu a l, kinesthetic, and tactile input f * 1 * Hand Hand Preshape Rotation i i i b Grasping Actual Grasp visual and tactile input activation of reaching and grasping Time needec^ D uration “ I ~T J L Tim e- Based Coordination Duration ■4 c> Time -needed Preshape Duration ► Time needed T ± Enclose Fig. 4.1. a C o o rd in a te d co n tro l p ro g ra m for reach a n d g rasp . R edraw n from A rbib et al. (1985). b A n overview of the n ew m odel of th e m o to r schem as, a n d their coo rd in atio n th ro u g h tim ing, th at is p resen ted in this chapter. 55 laten cies w ith in th a t sy stem th a t m ak e fast m o v em en ts a p p e a r to be ballistic in all b u t th eir final portions, (b) The one-w ay flow of activation from the tra n sp o rt to th e g rasp schem a m u st be rep laced by a tw o-w ay in teractio n . W e p o stu la te th a t th is in te rac tio n is m e d ia te d solely by tim in g relations. A high-level ov erv iew of th e n e w m o to r schem as is p resen te d in Fig. 4.1b. The details of their o p eratio n a n d interaction are p ro v id ed in th e rem ain d er of this chapter. A fter review ing behavioral studies of reach an d g rasp in Sect. 4.2, in Sect. 4.3 w e extend the m odel of reaching to explain both th e kinem atics of h a n d p resh ap e for prehension, an d the tem poral interactions o f reach and grasp. W e p ro v id e an optim izatio n principle for h a n d p resh ap in g w hich trad e s off th e costs of m ain tain in g th e h an d in an o p e n p o sitio n an d the cost of accelerating the change in grip size. This yields a control system for p resh ap in g . W e th en p resen t a m odel w hich uses only expected d u ratio n for coo rdination of tran sp o rt an d preshape. In Sect. 4.4 w e sh o w th ro u g h c o m p u te r sim u la tio n th a t th e m o d el can d e scrib e th e k in em atics of in te ra c tio n o f h a n d tra n s p o rt a n d p re s h a p e u n d e r a v a rie ty of circum stances including p ertu rb atio n s of object position a n d object size. 4.2 Investigations of Hand Transport and Prehension Interaction W e first in tro d u ce th e experim ental p a ra d ig m s b e h in d the d a ta to b e discussed, after w hich w e review the d ata in detail. P au lig n an e t al. (1991a) h ad subjects reach to grasp a dow el located in th e horizontal plane, 35 cm from the h a n d , in a direction 20° to th e rig h t of th e m id -sag ittal p la n e . In certain tria ls th e lo catio n w as u n e x p e c te d ly sh ifte d , at m o v e m e n t o n se t, to a se co n d ta rg e t lo c a te d e ith e r 10° (le ftw a rd p e rtu rb a tio n ) or 30° (rig h tw ard p e rtu rb a tio n ) to th e rig h t of th e m id- sagittal plane. P ertu rb atio n of location h ad an effect not only o n tran sp o rt of th e h an d (w hich corrected to reach the new target), b u t also o n the 56 § g 20" S 8 R S cl s 10- T5 § ‘ K 0.0 0.2 0.4 0.6 0.8 1.0 normalized time b 1 4 0 70 - (X c n 35 - 0.0 0.2 0.4 0.6 0.8 1.0 normalized time lOd 80- 0.0 0.2 0.4 0.6 0.8 1.0 normalized time Fig. 4.2. T ransport an d p resh ap e for u n p ertu rb ed trials of P aulignan et al. (1991a). E ach g ra p h show s th e av erag e of ten trials, w ith s ta n d a rd d e v ia tio n b a rs in c lu d e d . T im e is n o rm a liz e d . a D isp la ce m e n t of w rist v e rsu s tim e, b S peed of w rist, c G rip ap ertu re. E xperim ental d a ta courtesy of M Jean n ero d , INSERM , Lyon, France. 57 a 20- 3 1 0 " 0.0 0.2 0.4 0.6 0.8 1.0 140 105- & C D H tS 100 2 3 ts a . < normalized time » I I ' " 'I' 0.0 0.2 0.4 0.6 0.8 1.0 normalized time ■ T — i 1 -------1— I------1 — I----- r - 0.2 0.4 0.6 0.8 1.0 normalized time b 3 10- IX 0.0 0.2 0.4 0.6 0.8 1.0 100 0 0 8 3 B (X < normalized time S 70- 0.0 0.2 0.4 0.6 0.8 1.0 normalized time 1—'—i—« —i—r- 0.2 0.4 0.6 0.8 1.0 normalized time Fig. 4.3. T ransport an d p resh ap e for p e rtu rb e d trials of P aulignan et al. (1991a). a,c,e Left p ertu rb ed trials. b ,d ,f R ight p ertu rb ed trials. a,b H a n d d isplacem ent versus no rm alized tim e. A verage for ten trials is sh o w n along w ith sta n d a rd d e v ia tio n bars. c,& H a n d sp e e d versus n orm alized tim e. e,f G rip a p ertu re versus n orm alized tim e. 58 kinem atics a n d tim ing of preshape: M axim um aperture (the sep aratio n of th e th u m b a n d forefinger) w as fo u n d to occur la te r in th e m ovem ent, after th e h a n d tem p o rarily closed to a sm aller ap ertu re. S am ple resu lts can be seen in Figs. 4.2 a n d 4.3. Fig. 4.2 sh o w s trajectories of h a n d d isp la c e m e n t, h a n d sp e e d , a n d g rip a p e rtu re d u rin g u n p e rtu rb e d tra n sp o rt, p resh ap e , a n d enclose. Fig. 4.3 sh o w s th e sam e d a ta for th e p e rtu rb e d cases. In Fig. 4.3e,f the partial reclosing of the h an d can be seen. G e n tilu c c i e t al. (1992) p e rfo rm e d a lo c a tio n p e rtu rb a tio n preh en sio n experim ent, in w hich subjects reached to g rasp one of th ree 4 cm d iam eter sp h eres p o sitio n ed on a table 15, 27.5, an d 40 cm from the h a n d 's startin g position. D uring som e of th e trials, w hile reaching to the n e a re st sp h e re, a sw itch w as m ad e to one of th e m o re d ista n t ones. K inem atics of h a n d m o v em en t a n d fin g er p re sh a p e w ere re c o rd ed to d e te rm in e th e effect of ta rg e t d istan c e a n d p e rtu rb a tio n o n su bject p erfo rm an ce. P au lig n a n et al. (1991b) h a d subjects reach a n d g rasp do w els of d iam eters 1.5 cm an d 6 cm, an d p e rtu rb e d th e targ et size by sw itching the dow els at m o v em en t onset, b o th from sm all to large a n d large to sm all. In th e p e rtu rb a tio n trials, as the h a n d took ad d itio n al tim e to reach the c o rre c t a p e rtu re , th e tra n s p o rt slo w e d to rea ch th e o b ject a t a n ap p ro p riately later tim e. W e n o w tu rn to th e in te rp re ta tio n of th ese fin d in g s a n d th e ir in teg ratio n into a cohesive com putational m odel. 59 4.3 M odeling Transport and Prehension Interaction 4.3.1 Temporal Interaction of Transport and Prehension In th e p o sitio n p e rtu rb a tio n trials of P a u lig n a n e t al. (1991a), a kinem atic response w as seen w ithin 100 m s1, in the form of an early peak in acceleration. D efining th e q u an tity Al as th e sensorim otor delay from location p ertu rb atio n to response in the h an d tran sp o rt process, w e can set an u p p e r b o u n d of Al < 100 ms. The m ovem ent tim e for th e u n p e rtu rb e d task av erag ed 510 m s, a n d w as lengthened by location p ertu rb atio n : O n th e average, p ertu rb ed -left m ovem ents w ere len g th en ed b y 80 m s w hile p e rtu rb e d -rig h t m ovem ents w ere lengthened by 112 m s (as discussed an d m o d eled in C h ap ter 3). In the u n p e rtu rb e d case, as th e h a n d p resh ap ed , th e m axim um h a n d a p ertu re w as attained at 323 m s. A fter target location p ertu rb atio n , there w as an abbreviated ap ertu re p eak (202 m s L, 232 m s R) follow ed by a n e w m axim um a p ertu re (420 m s L, 446 m s R). This early a p ertu re peak p u ts an u p p e r b o u n d of ab o u t 200 m s o n the reaction tim e of p resh ap e to location perturbation. Thus if w e use Al p to rep resen t the in creased d elay for th e p re sh a p e m o to r p ro cess to re sp o n d to location p e rtu rb a tio n bey o n d th e delay Al th en w e can state this u p p e r b o u n d as Al p + Al < 200 m s. N o te also th a t m o v em en t d u ra tio n m in u s tim e-to- se c o n d -p e a k -a p e rtu re w as c o n sisten t (170 m s L, 176 m s R). In th e u n p e rtu rb e d case this difference is 187 ms. It m ay be th a t the m axim um a p e rtu re is sy n ch ro n ized w ith th e e n d of th e rea ch in g m o v em en t, to control interaction w ith the object to b e g rasp ed , w ith coordination based o n k eep in g consistent th e enclose tim e (ET), th e difference b etw een the tim e at w h ich th e m o v em en t e n d s a n d th e tim e a t w h ich th e h a n d achieves m axim um ap ertu re. 1 All time values will be given relative to movement onset, which was coincident with target perturbation. The time values are intersubject averages. 60 W h en u n p e rtu rb e d , object size h a d little effect o n re a c h in g kinem atics o r m o v em en t tim e, w h ich w as sim ilar to th e control for the ta rg e t p e rtu rb a tio n p a ra d ig m (P au lig n an e t al., 1991b). T he difference b etw een en d of m o v em en t an d m axim um a p e rtu re (ET) w as consistent w ith th e earlier experim ents: 199 m s (S), 187 m s (L). O bject size d id , of course, effect th e m ag n itu d e of peak aperture, w hich w as 9.2 cm (S), o r 12.5 cm (L). In th e sm all-to-large p e rtu rb a tio n (S-L) case, m o v em en t tim e in creased b y 175 m s o n th e average. The increase w as "all in th e low v e lo c ity p h a se," sin ce th e e a rlie r k in e m a tic la n d m a rk s (m ax im u m velocity, m axim um deceleration) d id n o t change. In th re e subjects, tw o p e ak s w ere o b serv ed in h a n d a p ertu re. T he first p e ak occurs slightly earlier (294 ms) th an in th e u n p e rtu rb e d sm all ta rg e t p re sh a p e (309 m s). (N ote that this im plies th at reaction tim e of p resh ap e to size change is less th a n or eq u al to 294 m s. W e m ay in tro d u ce Ag as this tim e delay an d restate th e u p p e r b o u n d , Ag ^ 294 m s.) The second p eak has a m p litu d e corresponding to the large targ et (12.2 cm), as expected, an d occurs 475 m s after m o v em en t onset. T he tran sitio n b e tw ee n th e tw o p eak s can be identified by the tim e w hen the rate of change of a p ertu re becom es zero, at 330 m s. In tw o o th er subjects there w as only an inflexion at th e transition p o in t, there w ere n o t tw o distinct peaks. Since m o v em en t tim e increased to 684 m s in resp o n se to th e sm all-to-large p e rtu rb a tio n , ET in (S-L) w as 209 m s. L arge-to-sm all p e rtu rb a tio n (L-S) caused an increase in m o v em en t tim e of 85 m s. A s in th e S-L p ertu rb atio n , the tra n sp o rt kinem atics before m ax im u m d ece le ratio n w ere u n c h an g e d . T h u s th e tra n s p o rt reaction time to th e size p ertu rb a tio n is longer th an th e tim e to p eak deceleration, a b o u t 300 m s. W e in tro d u ce A g+ A gj as th is tim e d elay a n d restate the low er b o u n d , Ag+Agx > 300 m s. The a p ertu re trajectory typically show ed only a single peak: The enclosure p h ase w as p ro lo n g ed u n til th e sm all dow el a p ertu re size w as reached. In the enclose tim e stu d ies of G entilucci et al. (1992), ET w as, again, approxim ately 200 m s, corroborating the results of P au lig n an et al. (1991a). A lth o u g h m o v e m e n t tim e, tim e to m a x im u m a p e rtu re , a n d o th e r 61 kinem atic lan d m ark s v aried w ith distance an d p e rtu rb a tio n , th e enclose tim e w as fo u n d to be consistent th ro u g h o u t. It w o u ld seem th e central nerv o u s system coordinates tran sp o rt an d g rasp to allow th e h a n d 200 m s to close on the target, th e closure coinciding w ith the m ovem ent's end. To fo rm ally check w h e th e r ET v a rie s w ith p e rtu rb a tio n , w e exam ined the d a ta (p ro v id ed b y th e Jeannerod lab, from th e experim ents re p o rte d in P a u lig n a n e t al., 1991a) fro m 56 u n p e rtu rb e d tria ls, 48 p e rtu rb e d left (PL), an d 56 p e rtu rb e d rig h t (PR). The m ean enclose tim e w as 180.9 m s. For the u n p e rtu rb e d trials ET w as 185.4 m s, for PL it w as 177.4 m s, a n d for PR it w as 179.3 m s. T hus ET v aried v ery little d u e to p e rtu rb a tio n , com pared to th e changes in o th er tem p o ral variables, such as m ovem ent tim e. W e p erfo rm ed an analysis of v ariance (AN OVA ) on th e ET data, resulting in an F-ratio of .271. To conclude th a t p ertu rb atio n affects ET at the 99% confidence level, th e F-ratio m u st b e a t least 4.75. Since this confidence threshold is far from the calculated F-ratio, it is clear th a t variation in ET is d u e to ran d o m fluctuations rath e r th an an effect of th e p e rtu rb a tio n . A d e ta ile d d isc u ssio n of th is a n aly sis is g iv en in A ppendix A. It is no w a p p ro p ria te to in tro d u ce th e constant enclose tim e m odel e m b o d ie d in Fig. 4.4. F o llo w in g th e sc h em atic a p p ro a c h in th e coordinated control pro g ram of Fig. 4.1, there are m otor activity generators (the T ran sp o rt, P resh ap e, a n d E nclose controllers) w h ich tak e as in p u t ta rg e t p o sitio n values an d p la n n e d m o v em en t d u ratio n s. T he tran sp o rt trajecto ry g e n e ra to r takes as in p u t ta rg e t lo catio n a n d d u ra tio n (the allo tted tim e for the m ovem ent). T he in p u ts for th e p re sh a p e trajectory g e n erato r are m ax im u m a p e rtu re (itself a fu n ctio n of object size) an d d u ra tio n . O bject lo ca tio n a n d size p ro v id e h y p o th e s iz e d d u ra tio n e stim a tio n m o d u le s (the boxes lab eled "T ran sp o rt tim e n e ed e d " an d "P reshape tim e needed") w ith in fo rm atio n for in d e p e n d e n t ju d g m en ts of th e tim e n ecessary for each process. (The "T ran sp o rt tim e n eed ed " calculation in clu d es th e "D u ratio n o p tim izer" d iscu ssed in C h a p te r 3.) A lth o u g h the m odules are d e p en d e n t on experim ental p aram eters, th e 62 activation _ li; E D Transport Goal (distance) Object size Transport Transport time needed "Look - ahead" Internal Model Transport Feed back Controller A ui itior Plant for Transport (Arm) Transport State Time-Based Coordination - x * m LP vlAX | MAX I SI Preshape Maximum aperture Preshape time needed r rafton Preshape "Look - Feed ahead" back Internal Controller Model Plant for Preshape (Hand) T Preshape State Enclose Enclose Feed back Controller T ~ * Fig. 4.4. Feedback controllers for tran sp o rt, p resh ap e, an d enclose. C ooperative com putation b e tw e e n s u b p ro g ra m s d e te rm in e s m o v e m e n t tim e. T h ick lin es c a rry sp a tia l a n d lim b s ta te in fo rm a tio n , th in lin e s c a r ry te m p o r a l in fo r m a tio n fo r synchronization, an d d o tted lines carry activation signals. 63 p re se n t m o d el sim p ly uses em pirical d a ta to ap p ro x im ate th e ir o u tp u t. These estim ates interact th ro u g h the elem ents labeled "M AX" so th a t the m a x im u m of "T ran sp o rt tim e n e ed e d " a n d "P resh ap e tim e n e e d e d " + "E nclose tim e n e ed e d " (the in p u t ET is th e d e sired enclose time for the grip , as discussed above) is fed to th e trajectory generation schem as. The result is that the two processes are each scaled to the longer duration. A fter th is tim e is d e te rm in e d , ET is su b tra c te d fro m th e "D u ra tio n " d eterm in ed b y th e "MAX" com parators before being sent to the p resh ap e controller. T hus th e h a n d p resh ap e w ill alw ays reach its targ et m axim um ap ertu re p rio r to tran sp o rt position reaching its target, an d th e am o u n t of th e lead w ill be the tim e n eed ed for h an d closure. It is im p o rta n t to em phasize th at th e constant enclose time m o d el does n o t claim th a t h a n d enclose tim e is co n stan t across all object sizes a n d g rip ty p es, ra th e r it say s th a t en clo se tim e is c o n sta n t across p ertu rb atio n situations, for a p articu lar task. In Sect. 4.5.2 w e w ill tu rn to th e subject of h o w object size, as an exam ple of o n e of m an y relev an t factors, affects enclose tim e. The p ath w ay s along w hich sp atial a n d tem p o ral d a ta are p assed h a v e in trin sic d e la y s, sh o w n in Fig. 4.4, som e v a lu e s of w h ic h w e d isc u sse d above. F u rth e r, A j an d Ap a re th e lu m p e d se n so rim o to r feedback delays for tran sp o rt an d p resh ap e, respectively, as in tro d u ced in C h ap ter 2 for th e m odel of reach control. The enclose controller is, in the m odel, identical to th e p resh ap e controller, b u t w ith different inputs: Its d u ratio n is ET, an d its targ et value is th e object size, rath e r th an the larger m axim um ap ertu re. T hus it has th e function of closing th e h a n d to the object size at th e en d of th e m o v em en t tim e. T he p la n n e d m o v em en t d u ra tio n for each co n tro lle r is p re s e n te d in th e fo rm of th e tim e remaining for the m ovem ent, an d as such it constantly decreases, once the m o v em en t begins, until reaching zero. T he box labeled "M axim um a p e rtu re " perfo rm s a m a p p in g from object size to th e m axim um a p ertu re of th e grip. From the tw o object sizes in P aulignan et al. (1991b), w e generate the linear m apping, 64 M ax ap = .75 * Dowel diameter + 4.55 cm, w h ich is consistent w ith th e findings of M arteniuk et al. (1990), w h o h a d subjects reach to g rasp disks of various sizes (5.5 to 12.5 cm in diam eter) a n d d eriv ed th e relationship M ax ap = .77 * Disk size + 4.89 cm. 4.3.2 Trajectory Generation for Transport In fo rm u latin g the m o v em en t controller for tra n sp o rt, w e extend th e m in im u m -jerk b ased controller describ ed in C h a p te r 2 from a o n e d im ensional p lan t to m ultiple dim ensions. T his is straig h tfo rw ard since, as sh o w n in C h ap ter 3, th e dim ensions are decoupled. T here is sim ply an in d e p e n d e n t controller for each m o v em en t d im en sio n . T he significant ad d itio n to b e in tro d u ced is a change in th e final e n d p o in t constraint to allow control of the h a n d orientation as the targ et is approached. In the 2-D system , w e m ay control the orientation of th e h an d at the trajectory's en d , as w ell as its position. In th e 1-D system w e h a d th ree b o u n d a ry conditions at the final tim e, one for each com ponent of th e final state. W e specified th a t the final state be static (have zero velocity an d acceleration) an d be located at the targ et xf=(X^f, 0, 0 )T The extension to tw o dim ensions pro v id es for six final tim e conditions, xfj, i=l,2. Since w e w ish the target position to be reached, w e specify x^ l,l=Xi^, (4.1) w h e re th e targ et is located at (X ^ , X2 f). Since the final position is to be static, w e specify th at the velocity of each com ponent be zero: x ^ 2 = 0 , x^2,2=0 (4-7) W e also w a n t the target to be approached from a specific angle. As m ay be prev iew ed in th e u n p e rtu rb e d tw o dim ensional reach of Fig. 4.7a an d th e p e rtu rb e d reach of Fig. 4.8a, th e w rist trajecto ry ap p ro ach es from th e d irectio n of th e subject. In tu itiv ely this is n ecessary because th e h a n d m u st a p p ro a c h w ith th e finger a p e rtu re facing th e targ et. T he m ost c o m fo rta b le p o s tu re fo r o n e 's h a n d to a c c o m p lish th is , in th is 65 e x p erim e n tal p a ra d ig m , is w ith th e fingers facin g aw ay , th u s ta rg e t a p p ro a c h is fro m th e su b ject’s sid e of th e ta rg e t. W e c a p tu re th is ph en o m en o n b y specifying the follow ing fifth b o u n d a ry condition, Lim (4.3) w h e re < ( > is th e slo p e of th e trajecto ry (v iew ed fro m above) d u rin g ap p ro ach . N o w since (4.1) h o ld s, th is is a ratio o f v a n ish in g ly sm all q u an tities, an d cannot be ev alu ated in its given form . In stead w e apply L 'H o p ita l's ru le a n d d iffe re n tia te th e n u m e ra to r a n d d e n o m in a to r, obtaining, xl l (t)_Xl fxl l (t)“ Xl < | > = Lim :----------- — = Lim - — --------------' x ’ (t) — 0 X (t) < b = Lim = Lim [t)_° X2,2(t) N ow , from (4.2), this too is a ratio of vanishingly sm all quantities, th u s w e differentiate once m ore, x ’ (t) X (t) < (> = Lim ’ , , = Lim ’ , . t^ t x’ (t) t_^t X^,(t) f 2 ,2 2,3 o r 0 : xU * f ) 2,3(*f) 2,31 i) (4.4) T hus th e fifth co n strain t is eq u iv alen t to fixing th e ratio of th e term inal acceleration values. The sixth constraint is th e m a g n itu d e of one of the term inal acceleration values (w ith Eqn. 4.4 determ in in g th e m ag n itu d e of th e other). This m ag n itu d e is a free p a ra m eter of th e sim u latio n chosen to best fit the experim ental data. W e can w rite these tw o constraints as 66 x2,3<tf) = af2/ xl,3^f) = afl = < t > af2/ w h ere stands for "final acceleration." To find th e optim al control using these novel b o u n d ary conditions, w e need the initial costate, for w hich w e re tu rn to (2.15). Pj(* o) = ® 1 2(DX x i( 1 f) - oj) w h e re th e su b sc rip t i has b een in tro d u c e d to in d ic ate th e m o v em en t dim ension, i= l, 2. 2' PiO o) D 240 120 D 20D 120D - 5 6 D 2 8 D3 20 D 8 D3 D 4 -------= 1 X 1 i D [ x° il 1 ,1 vo 0 — 0 1 D i ,2 a f y. ij 0 0 1 x ° _ - i,3. w hich differs from the equivalent expression in C h ap ter 2 in th a t th e final acceleration is nonzero. T aking th e th ird com ponent of the costate, P i i ' o ) [- - 3 -2 120D 48 D 120 D X - x i , l X f - x ° - E ,1 i 1 - 6 D J - x ° - D x° 2 3 a*. _ L i 3 Dx° - “ ° 2x? J i ,2 2 i,3j 4 d 2x° - 48D 2(xO 2 + Dx<? J + 6^x° a* 1 = - 120D_3f x f - x ? 1 + 7 2 D _ 2 x ? _ + 3x? - I i i / U i,2 \ 1,3 l) (4.5) F rom (2.17) w e have, ui( t o) = - ^ p i/ 3 ( t o) (w h e re th e o p tim a lity s u p e rs c rip t o n u j h as b e e n o m itte d for convenience.) C om bining this w ith (4.5) u .(, o ) = - x ° J - 72 /D V 2 - 3 / d ( 3 x ° 3 - . { ) 67 1 2 0 - I 4 °' 5 20- 0.5 0.1 0.2 0.3 0.4 t(s) v = - 352.34 + 7194.2t - 37139tA 2 + 72893tA 3-48597tA 4 Fig. 4.5. S am ple w rist v elocity p ro file fit w ith a fo u rth o rd e r p o ly n o m ial. D ifferen tiatin g a n d ev alu atin g at final tim e y ield s term in al acceleration. A s in C h ap ter 2, w e let th e c u rren t state of th e system (at tim e t) have velocity x °j 2 =vi an d acceleration x °i 3 =aj, w e let D = tf-t (i.e. D is th e tim e remaining) an d A X i= X ^ -x °ij. W e fu rth er a d d th e targ et acceleration, a^. T hen w e m ay express u(t) as the feedback based control, u. = 60AX./D3 - 3 6 v . / D 2 -3 f3 a . - a f.]/D .. i i i V . i i) (4.7) U | a n d U2 are th en used, as w as (2.19) in Sect. 2.2, to co n stru ct th e 2-D tra n sp o rt controller. Since th e o rig in a l m in im u m -je rk m o d el a ssu m e d n o te rm in a l a cceleratio n , w e w ish e d to in v estig ate th e e x ten t to w h ic h n o n -ze ro term inal acceleration is consistent w ith th e data. For ten trials of reaching m o v em en ts, w e p erfo rm ed a least-squares fit of th e p o rtio n of th e w rist velocity p ro file after p eak velocity, u sin g for th e m o d el curve a fo u rth o rd e r p o ly n o m ial (as in th e m inim um -jerk m odel). A sam p le tria l w ith least-sq u ares curve is show n in Fig. 4.5. The calculated velocity profile w as th en sym bolically differentiated an d ev alu ated at th e trajectory's en d to yield th e term inal acceleration. For th e ten trials, th e average term inal acceleration w as -2.55 m /s ^ , an d the sta n d a rd deviation w as 1.78 c m /s^ . Fig. 4.6 show s the effect of different values of term inal acceleration on the w rist's path. The value w hich qualitatively best fits the d ata is 4 m /s ^ , 68 1 0 - -10 30 -10 0 10 20 e 30 c 20- 10- 15 -15 -5 x (cm) 5 2 0 - 1 0 - -10 20 30 -10 0 10 d 2 0 - 1 15 -15 -5 5 x(cm) f 30 15 -5 x (cm) 5 -15 30 20 1 10 0 .5 x(cm) 5 15 -15 Fig. 4.6. Top row: Ten each of wrist paths for a perturbed-left and b perturbed- right cases. Below: Modeled wrist paths, assuming terminal acceleration values of c 0 m /s2, d 1 m /s2, e 4 m /s2, f 10 m /s2, and g 20 m/s2. 69 w hich is w ithin a sta n d a rd deviation of the average term inal acceleration, as described above. This is the value used both for trajectory m odeling in Sect. 4.4 an d for d u ratio n m odeling in C hapter 3. 4.3.3 Trajectory Generation for Preshape and Enclose For th e p re sh a p e fo rm atio n controller, w e search for th e sim p lest co st fu n c tio n w h ic h c a p tu re s th e c h a ra c te ristic s o f th e m o v e m e n t kinem atics. As w ith an y co n tin u o u s system , so m e sm oothness criterion is n e ed e d to p re v e n t d isco n tin u o u s "jum ps" in th e resu ltin g trajectory. T he p a rtial reclosing of th e h a n d d u rin g p ro lo n g ed m o v em en t caused by location p ertu rb a tio n im plies th at there is som e "cost" to hav in g th e h a n d o p en m ore th a n a certain am ount. T he relative im p o rtan ce of these tw o criteria is n o t k n o w n a p rio ri, so a w eig h tin g p a ra m e te r is in tro d u c ed , yielding the follow ing criterion for preshape: w h ere x(t) is th e h a n d 's a p ertu re, a n d w is th e relativ e w eig h tin g of the tw o com ponents an d is tu n ed em pirically. This cost criterion is used both for p re sh a p e a n d enclose, u n d e r th e a ssu m p tio n th a t th e sam e n eu ral control system d rives th e fingers in each case. T he difference is the in p u ts to th e controller for enclose: The targ et h a n d a p e rtu re is th e object size, an d the allocated tim e is sim ply the enclose tim e (ET). To fin d th e trajectories associated w ith th is o p tim izatio n criterion, th e o n set tim e of m o v em en t m u st be k n o w n (t= tG), alo n g w ith the en d tim e (t=tf). B oundary conditions on the system state (e.g., th e initial and fin al v alu es for th e a p e rtu re of th e h an d ) m u st also b e k n o w n . In ad d itio n , any w eighting p aram eters m u st be determ ined. In this case, the only free p aram eter in the p resh ap e criterion is w . W e w ill discuss ho w th is w as e m p irically d e te rm in e d . G iven th e se p a ra m e te rs, w e again em p lo y th e m in im u m p rin cip le to d eriv e an o p tim a l trajecto ry of the 70 sy ste m 's sta te b a se d o n th e o p tim iz a tio n criterio n a n d th e sy ste m 's dynam ics. The system 's state is described by the sep aratio n of th e th u m b an d index finger (the aperture) d u rin g th e pinch grip. W e define th e control to b e th e second d eriv ativ e of th e ap erture. T hus w e h av e th e dynam ics of th e system described by x= Ax + Bu (4.8) W e also have initial an d final b o u n d a ry conditions o n th e m ovem ent, x(to)=x0, x(tf)=xf, w h e re th e m o v em en t b eg in s at tim e to an d e n d s a t tim e tf. The cost fu n c tio n is W e solve for the o p tim u m trajectory by ap p ly in g the m in im u m principle. W e first define th e H am iltonian, w hich is to be m in im ized by th e choice of in p u t u(t), w h ere L is th e in teg ran d of th e cost functional, x is th e state a n d p is the costate, to be defined. T hus the H am iltonian is w h e re 1 (4.9) H = L + pT x H = x ^ Q x + w u^ + p ^ (A x+ Bu) (4.10) N ext w e define th e costate dynam ics equations, For th e H am iltonian described by (4.10), T p = - 2 Q x - A p (4.11) 71 Since th e fo rm u latio n consists of a lin e ar system w ith a q u a d ra tic cost fu n ctio n al w e can m in im ize th e H a m ilto n ia n b y d iffe re n tia tin g w ith respect to u, an d setting the derivative equal to zero. ^ = 0 = 2w u* + p TB, 1 pt b 2 w (4.12) N o w (4.8), (4.11), an d (2.12) define a system of differential equations w hich m ay be so lv ed w ith su itab le b o u n d a ry co n d itio n s to y ield th e o p tim al trajectory. To solve th e problem of finding th e solutions for x(t) an d p(t) w e p lu g u*(t) into (4.8). x = Ax + B| - —B^p x = Ax 2 w 1 „ „T BB p 2 w (4.13) C om bining (4.13) w ith (4.11) yields r i t ! X A - — BB 2 w X .p. T - 2 Q - A .P. P lugging in A an d B, w e get the follow ing differential equation. 0 10 0 0 0 0 - 1 -2 0 0 0 0 - 1 2 w 0 0 x L P . (4.14) A lternatively, w e m ay w rite th e in d iv id u al equations, P =~P1 2 1 P = - 2x1 1 1 x = x _ X = - 2 2 ^ P2 or, com bining them , P Pl v =- 2 i n = 2 x " = 2 x!L=- w ^ 2 72 w h e re th e ro m an su p erscrip ts in d icate levels of d ifferentiation. T aking ju st th e first an d last of the eq u ated term s, = w p 2 W P 2V + P 2 = ° (4.15) From (4.12), u = ---- — pTB = ---- — p 2 w 2 w 2 (w h e re th e o p tim a lity s u p e rs c rip t o n u h a s b een convenience.) C om bining this w ith (4.15), o m itte d fo r w ( - 2 w u*v ) + ( - 2 w u) = 0 o r w u i v + u = 0 *4 ui v + u_ o, = w T he so lu tio n to th is fo u rth o rd e r, lin e a r, h o m o g e n e o u s d iffe re n tia l e q u atio n is u(t) = — [er ( - a sinr + a oosr) + e- r (a s in r - a oosr)] ^2 1 . 2 3 4 (4.16) w h e re r_ (*~ t o) , and a j-a 4 are param eters determ in ed b y b o u n d a ry conditions. To determ ine these, w e w rite the four b o u n d ary equations: xl (t0) = *1° (initial finger aperture) (4.17a) x2 (t0 ) = X 2 ° (initial finger a p ertu re velocity) (4.17b) xl (tf) = >qf (final finger aperture) (4.17c) x2 (tf) = 0 (final finger ap ertu re velocity) (4.17d) Integrating (4.16) tw ice (follow ing Eqn. 4.8), w e have x^(t) = er(a ^oosr + a ^sinr) + e- r (a ^cosr + a ^sinr) (4.18a) 73 1 r x„(t) = — t — — {e [(a _ - a .)sinr + (a . + a Jcosr] 2 J l x 2 1 1 2 + e- r [ ( - a - a )sinr+ ( - a + a )cosr]} (4.18b) A t th e initial b o u n d a ry , t= tQ, so r=0, sin(r)=0, an d cos(r)= l. A t th e final | b o u n d ary , t=tf, so r=D ', w here D' = an d D =tf-t0 - C om bining (4.17) a n d (4.18), X1° = aj + a3 X2 ~ 7 £ 7 ( a i + a 2 ~ a 3 + a 4) x ^ = a a j + P a2 + y a 3 + 8 a4 1 0 = [a (ai + a2) + P (a2 - a!> + y (a4 - a3) + 8 (-a 3 - a4)] (4.19a) (4.19b) (4.19c) (4.19d) w here w e use as sh o rt hand, a= e^ cosD', P=e^’sinD', y=e'^ cosD', and 8 =e-^ sinD'. The p aram eters a.\ - a4 then com e as the solution to the m atrix equation, f X1 a P y 8 1 3 1 0 a - P a + P i -i i O n 1 8 3 2 * 1 ° 1 0 1 0 3 3 V 2 xx2° .1 1 - 1 1 - _a 4. (4.20) For th e controller application, w e w ish to find u (tQ), for then, as before, w e m ay take tQ as the current tim e, an d since u(tQ) is a function of x(tQ) an d D, th e solution w ill describe a state-based control. From (4.16), 74 Inserting th e values for a 2 an d a4 w hich com e from (4.20), u ( t 0)= ~ l [x ^ (e 2P + e ~ 2D * - 2oos2D') ° d e t t 2 1 + V T x x ° ( e 2D ' - e ~ 2 D ') + xf 14 sin D '(e~ D ' - e D ')] ^ ^ w h ere d e t = e^D ' + e"2D' _ 2 - 4sin^D '. T hus, as in Eqn. (4.7) for transport, w e h a v e a m a p p in g fro m th e c u rre n t sta te o f th e sy ste m a n d th e re m a in in g m o v e m e n t d u ra tio n (D) in to th e d riv in g fu n c tio n to be em itted by a controller. Eqn. (4.21) describes the o p eratio n of th e Preshape feedback controller of Fig. 4.4. 4.4 Simulations Using the Transport / Prehension M odel W e created a co m p u ter sim u latio n of th e elem ents sh o w n in Fig. 4.4 in o rd er to test th e m odel against experim ental data. The in p u ts to the sim u latio n are object location a n d size, o p tio n ally m o d ified d u rin g the sim u latio n to reflect p ertu rb atio n s. The graphic o u tp u t of th e sim ulation show s th e kinem atics of reach and presh ap e d u rin g the task. Position and velocity of the h a n d are sho w n for th e X an d Y com ponents. In addition, th e m ag n itu d e of p o sitio n an d velocity are p lo tte d , for co m p ariso n w ith actual data. A n X vs. Y p lo t of th e h a n d 's p a th w ith o n e p o in t for each sim u la tio n tim e ste p g iv es a u se fu l v isu a liz a tio n of th e c o m b in ed tra n sp o rt data. The tim e course of th e h a n d 's a p ertu re d u rin g p resh ap e is p lo tte d as w ell as its derivative. T he sim ulation softw are is described in A ppendix B. 4.4.1 Simulating Perturbed Location and Size W e begin by rep ro d u cin g the results of P au lig n an e t al. (1991a) and th en show h o w the sim ulation can be u sed to rep ro d u c e o th er bodies of 75 d ata. Follow ing th e review in Sect. 4.3.1, w e set th e transport time needed in th e sim u latio n to 510 m s, a n d th e preshape time needed to 310 m s. (W e d o this because the m odel says th at the m ovem ent tim e - 510 m s - is th e m axim um of th e tran sp o rt tim e n eed ed an d the su m o f th e p resh ap e tim e n eed ed an d th e enclose tim e. The outcom e w ill be th e sam e if either th e fo rm er or th e latter time needed v alue is low ered.) W e assign A l th e v a lu e of 100 m s b ased o n th e tim in g of location p e rtu rb a tio n reaction. A lp is th e p resh ap e resp o n se latency to location p e rtu rb a tio n (200 ms) m in u s A l, th u s it is set to 100 m s. T he ta rg e t a p p ro a c h o rien tatio n p a ra m e te r 0 w as set to 90°, th e direction directly aw ay fro m th e subject. Fig. 4.7 show s the kinem atics of tran sp o rt an d p reh en sio n for u n p e rtu rb e d m o v em en t to th e center location to g rasp th e sm all target. Figs. 4.7 a, c, an d e show actual data, w hile b, d, an d f show sim ulation results. Fig. 4.7a show s th e p a th of w rist d u rin g u n p e rtu rb e d reach a n d g ra sp o f central dow el. Fig. 4.7b show s th e sim u latio n of u n p e rtu rb e d w rist p a th w h en reaching to w a rd th e center target. (The left an d rig h t targ et locations are also show n.) Fig. 4.7c show s d isp lac em e n t an d sp e e d o f w rist in th e u n p e rtu rb e d task. Fig. 4.7d show s sim ulation of w rist m ovem ent. Fig. 4.7e sh o w s d ista n c e b e tw e e n th u m b a n d fo re fin g e r (a p e rtu re ), a n d th e derivative of this distance (ap ertu re speed). Fig. 4.7f show s the sim ulation of a p e rtu re form ation. Because th e minimum-jerk cost fu n ctio n al is used, th e sim u lated h an d p a th an d velocity profile of Fig. 4.7 b, d c ap tu re the ro u g h ly straig h t w rist m ovem ent a n d single-peak velocity profile, seen in th e data. H ow ever, a slight asym m etry is visible in th e velocity profile of Fig. 4.7c. W hereas th e m inim um -jerk trajectory specifies a sym m etrical v e lo c ity p ro file , su c h th a t m a x im u m v e lo c ity o c c u rs a t h a lf th e m ovem ent tim e -- 255 m s -- th e actual peak velocity occurs earlier, at 185 m s on the average, m eaning th a t th e deceleration p h ase of th e m ovem ent is prolonged. This asym m etry is d u e to the accuracy req u irem en t im posed by th e dow el grasping task. The effect of accuracy on the asym m etry of the velocity profile w ill be m odeled in detail in C hapter 5. The a p ertu re form ation m odel is able to cap tu re th e am p litu d e and tim ing of peak aperture, as w ell as the enclose tim e, seen in th e d ata, Fig. 76 20 -1(7 -5 5 x (cm) 15 25 C 120 100 8 0 60- 4 0 20 0.0 0.2 0.4 0.6 0.8 t (sec) Hand Displacement (cm) H and Speed (cm /sec) e 75- 50 -25- -50 0.0 0.2 0.4 0.6 0.8 t (sec) Aperture (mm) Aperture Speed (cm/sec) 20 lO 15 -15 d loo- 8 0 6 0 4 0 20 0.8 0.0 0 .2 0.4 0.6 t (sec) Hand Speed (cm/sec) Hand Displacement (cm) f 1 0 0 0.4 0.6 0 .1 t(sec) Aperture Speed (cm/sec) Aperture (mm) Fig. 4.7. Unperturbed movement to center target. a,c,e Actual data. b ,d ,f Simulation results, a Path of wrist during unperturbed reach and grasp of central dowel, in Paulignan et al. (1991a). b Simulation of unperturbed wrist path, c Displacement and speed of wrist in unperturbed task, d Simulation of wrist movement, e Distance betw een thumb and forefinger (aperture), and the derivative of this distance (aperture speed), f Simulation of aperture formation. Experimental data courtesy of M Jeannerod, ENSERM, Lyon, France. 77 2 0 - 1 0 - -1 0 ' •5 5 x (an) 15 25 C 80 60 40 0 .6 0 .8 0 .0 0 .2 0.4 t (sec) Hand Displacement (cm) Hand Speed (cm/sec) -25- -50 0 .0 0.2 0.6 0.8 0.4 t (sec) Aperture (mm) Aperture (cm/sec) b 30- 20 lO -5 x (cm) 5 -15 15 d 80- 60- 40- 20- 0.6 0 .0 0.2 0.4 0.8 t (sec) Hand Speed (cm/sec) Hand Displacement (cm) f 1 0 0 75- 50- 25- -25- -50 0.6 0.2 0.4 0.8 0.0 t(sec) Aperture Speed (cm/sec) Aperture (mm) Fig. 4.8. D ata an d sim ulation for tra n sp o rt a n d p reh en sio n w h e n targ et location is p e rtu rb e d to the left. G raphs are as described for Fig. 4.7. 78 4.7e. The m odel predicts a slightly faster o p en in g p h ase (Fig. 4.7f) th an is e v id e n t in th e d a ta , w h ile th e closing p h a se v elo city is a c c u ra tely rep ro d u ced . In o rd er to m odel h ig h er o rd e r derivatives in th e trajectory, m odels m ore d etailed th an that described b y (4.8) an d (4.9) w o u ld hav e to be introduced. O u r goal here is to capture th e tim ing an d am p litu d e of the a p ertu re itself, u n d e r norm al and p e rtu rb e d conditions, usin g th e sim plest possible m odel. W e n o w tu rn o u r attention to the p e rtu rb e d conditions. For ta rg e t location p e rtu rb a tio n , rig h t o r left, th e tra n sp o rt tim e n e ed e d in the m o d el is increased by 100 m s, ag ain follow ing th e d a ta in Sect. 4.3.1. Fig. 4.8 show s the d a ta an d sim ulation results for th e perturbed left case. W hen targ et location is p e rtu rb e d , tra n sp o rt has a corrective trajectory, follow ing a velocity p eak at ab o u t 200 m s, a n d a p ro lo n g ed deceleration phase. N ote th at the target ap p ro ach direction is m ain tain ed d e sp ite ta rg e t location p e rtu rb a tio n (d u e to th e te rm in a l o rie n ta tio n c o n strain t on th e m o d eled h a n d path). W hile th e velocity sim u latio n sh o w n in Fig. 4.8d does n o t cap tu re th e secondary p eak of th e p articu lar trajectory sam ple d isplayed in Fig. 4.8c, the p o p u latio n d a ta show n in Fig. 4.3c show s th a t a reacceleration does n o t n ecessarily occur at all. The sim ulated a p ertu re trajectory show n in Fig. 4.8f exhibits th e tem p o rary re closing seen in th e recorded trajectory d u rin g location p e rtu rb a tio n (Fig. 4.8e). T he local m in im u m in the a p ertu re is 25 m m less th an th e final p eak a p ertu re in b o th th e d ata a n d sim ulation. T he d o u b le sequence of acceleration follow ed by deceleration seen in th e a p e rtu re velocity profile of Fig. 4.8e is also rep ro d u ced b y the sim ulation. For targ et size p ertu rb atio n , tra n sp o rt tim e n eed ed is increased by 175 m s for S-L, 100 m s for L-S, follow ing the em pirical data. The values of Ag an d w w ere set experim entally, observing sim u lated p resh ap e profiles for each com bination. V alues of Ag from very low , 100 m s, to th e u p p e r b o u n d , 294 m s, w ere entered into the m odel, w serves as a tim e constant in th e p resh ap e function, controlling th e sh arp n ess of th e response of the p resh ap e profile to shifting targ et aperture. V alues below .10 give realistic curves. V alues from .05 to .10 w ere tried in com bination w ith th e values of Ag a b o v e in n o rm a l a n d p e rtu rb e d sim u la tio n s, o b se rv in g p e a k 79 a p e rtu re velocity, tim in g of p e rtu rb a tio n response, an d general sim ilarity in sh ap e to experim ental curves. T he values w hich gave th e best fit are Ag = 250 m s an d w=.09. Then to give the em pirical v alu e of 300 m s to the tim ing of tran sp o rt's response to change in object size, w e assigned to A g j th e rem aining 50 ms. Fig. 4.9 show s th e d ata an d sim ulation results for th e case w hen the targ et size is unexpectedly increased, w hile Fig. 4.10 show s th e case w here th e ta rg e t size is u n e x p ec te d ly decreased. As ex p ected , th e p re sh a p e process m akes an on-line correction to a larger a p ertu re in Fig. 4.9d as w ell as in Fig. 4.9c. The m odel yields a quantitatively correct reconstruction of a p e rtu re form ation: As in th e d ata, th e sim u lated a p e rtu re levels off at a b o u t 90 m m , 300 m s after m ovem ent begins, th en p ro ceed s to th e larger ap ertu re, 120 m m , reaching it at 475 m s. The a p e rtu re deriv ativ e has tw o p e a k s, c o rre s p o n d in g to e ac h " o p e n in g p h a s e " in th e a p e rtu re dev elo p m en t, b u t th e m ag n itu d es in the sim ulation d o n o t correspond to th e m a g n itu d e s in th e d a ta . A gain, th e sim p le m o d el ch o sen h e re cap tu res th e essential features of a p ertu re form ation. To cap tu re features of h ig h er o rd e r d eriv ativ es in th e trajectory, a m o re d e ta ile d m o d el is n e ed e d . T he effect of size p e rtu rb a tio n o n tra n s p o rt is q u a litativ e ly re p ro d u c e d : T he la te r p a rt of tra n s p o rt (th e d e c e le ra tio n p h ase) is p ro lo n g ed in Fig. 4.9b as in 4.9a, slow ing to synchronize w ith the p resh ap e process. In Fig. 4.10a,b, the effect on tran sp o rt is a sim ilar slow ing. In Fig. 4.10c,d th e en clo sin g h a n d , a fter p e ak in g at 320 m s a t o r n e a r th e u n p e rtu rb e d p eak a p e rtu re of 120 m m , sw itches sm o o th ly to a sm aller p eak a p ertu re and continues into the enclose phase. In Fig. 4.9a,b an d Fig. 4.10a,b, th e tim in g of p eak velocity is n o t affected b y size p ertu rb atio n , since th e p e ak occurs p rio r to th e delayed effect of size p e rtu rb a tio n on tran sp o rt. P au lig n an e t al. (1991b) fo u n d th a t in th e u n p e rtu rb e d case m ax im u m h a n d velocity o ccurred a ro u n d 185 m s fo r b o th u n p e rtu rb e d reach to th e sm all an d large objects, an d for b o th size p e rtu rb a tio n cases (Figs. 4.9a, 4.10a). T he sim ulations reflect this consistency in tim e to peak velocity (Figs. 4.9b, 4.10b). 80 a 19000 1400 10000 1050 5 0 0 0 700 350 175 550 525 700 Time (ms) C 140 400 1 0 5 -400 3 5 -BOO 175 350 525 700 Time (ms) 600 400 200 350 525 700 0 175 t (ms) " " Hand Speed (mm/s) Hand Displacement (mm) --------------Accel (xlOO m m /sA 2) d 140 90- -10- -60 350 525 700 175 0 t (ms) A perture Speed (cm /sec) A perture (mm) Fig. 4.9. D ata a n d sim u latio n for tra n sp o rt an d p reh en sio n w h e n targ e t size is u n ex p ected ly increased (S-L). a,c A ctual data, b, d S im ulation results, a S peed (scale is left vertical axis, in m illim eters p e r seco n d ) a n d acc elera tio n (scale is rig h t v e rtic a l axis, in m illim eters p e r second squared) of w rist in S-L task (P au lig n an et al., 1991b). b S im ulation of w rist m ovem ent, sh o w in g sp eed an d displacem ent, c A p ertu re (scale is left vertical axis, in m illim eters) a n d its d e riv ativ e (scale is rig h t vertical axis, in m illim eters p e r second) in th e S-L task, d S im u la tio n of a p e rtu re fo rm a tio n . E xperim ental d a ta courtesy of M Jeannerod, ESTSERM, Lyon, France. 81 tsooo 10000 1030 5 0 0 0 700 350 -5000 -10000 175 350 525 700 0 Time (ms) 1000 800 600 400 200 175 350 525 700 t (ms) Hand Speed (mm/s) Hand Displacement (mm) Accel (xlOO m m /sA 2) 400 Grip aperture Grip velocity -800 175 350 525 700 Time (ms) 140 90- -lO -60 175 350 525 700 0 t (ms) Aperture Speed (cm/sec) Aperture (mm) Fig. 4.10. D ata a n d sim ulation for tra n sp o rt and p reh en sio n w hen targ e t size is unexpectedly decreased. G rap h s are as described for Fig. 4.9. 82 4.4.2 Simulating Jeannerod (1981) Jean n ero d (1981) reco rd ed kinem atics of tra n s p o rt, p re sh a p e an d g rasp to targ e ts of v a ry in g sizes, an d d u rin g ta rg e t size p e rtu rb a tio n . Subjects reached to g rasp a target object placed o n a tabletop, th en p u t the object in a sm all box. In one experim ent th e object w as a sp h ere w hich could be su d d e n ly m o d ified in a p p a re n t size, b e tw ee n 4 a n d 7 cm in d ia m e te r. T he size c h an g e w as p ro d u c e d a t m o v e m e n t in itiatio n . P e rtu rb atio n of object size, w hile effecting th e fo rm atio n of g rip by the fingers, d id n o t affect tran sp o rt of the hand. The tim ing an d m ag n itu d e of v a rio u s k in e m a tic la n d m a rk s w e re u n a ffe c te d b y p e rtu rb a tio n . M ovem ent tim e w as typically 800 m s in b o th n o rm al trials a n d p e rtu rb e d trials. W e w ish e d to te st w h e th e r o u r m o d el of tra n s p o rt/p re s h a p e in te rd e p e n d e n c e c o u ld also a d d re s s th e se fin d in g s. In "S-L" size p e rtu rb a tio n sim u latio n d iscu ssed above, th e "m ovem ent tim e needed" p a ra m e te r w as ch an g ed from 510 m s to 800 m s, reflectin g th e slow er m o v em en t e n co u n tere d in Jean n ero d 's stu d ies. T he object sizes w ere m o d ified in co rre sp o n d e n ce to th e v alu es g iv en for th e ex p erim en t. R esults are sh o w n in Fig. 4.11 for sim ulated reaches to u n p e rtu rb e d (Fig. 4.1 la,b ) a n d p e rtu rb e d (Fig. 4.11c,d) targets. (G raphs o f tra n sp o rt an d p re sh a p e trajectories from th e o riginal ex p erim en t are u n av ailab le. W e show th e sim u lated trajectories n o t for direct com parison, b u t to illustrate th e response of each process in the m odel to th e pertu rb atio n .) N ote th at d e sp ite th e size p e rtu rb a tio n , ad d itio n al tim e is n o t n e ed e d , since th e ad ju sted p resh ap e fits w ith in the lim its of th e tra n sp o rt m o v em en t tim e. Specifically, in the M A X com parator show n in Fig. 4.4, th e d u ra tio n sent to th e tra n s p o rt controller is 800 m s b o th b efo re a n d after object size p e rtu rb a tio n . B efore p e rtu rb a tio n , movement time needed (800 m s) is co m p ared a g ain st preshape time needed p lu s enclose time (310 m s+200 m s = 510 m s), an d the m axim um is 800 m s. A fter p ertu rb atio n , th e p resh ap e tim e n eed ed increases by 175 m s, so the sum of p resh ap e a n d enclose tim e 83 50 - 40 " 30 - 20 - 10 - 0.8 0.0 0.2 0.4 0.6 t (sec) H a n d Speed (cm /sec) H a n d D isplacem ent (cm) 100 20- 0.6 0.8 0.2 0.4 0.0 t (sec) 60 5 0 “ 4 0 - 30 - 20 - 10- 0.8 0.4 0.6 0.0 0.2 t (sec) H a n d Speed (cm /sec) H a n d D isp lacem en t (cm) d 100 80 - I £ 60- £ B & O , < 20 - 0.8 0.6 0.0 0.2 0.4 t (sec) Fig. 4.11. R esults for rep licatio n of th e k in em atics re c o rd e d in Jeannerod (1981). a H an d displacem ent and speed for u n p e rtu rb e d reach, b H an d ap ertu re for u n p ertu rb ed reach, c H a n d trajectory for p e rtu rb e d case. M ovem ent tim e rem ains 800 m s. d H a n d a p ertu re for p ertu rb ed case. 84 b eco m es 685 m s. H o w ev e r th is is still d o m in a te d b y th e slo w e r m o v em en t tim e, so the M A X rem ains 800 m s, a n d tran sp o rt is unaffected b y th e p e rtu rb a tio n . T he m o d e l p re d ic ts th a t o n ly fo r m o v e m e n t d u ra tio n s u n d e r a b o u t 685 m s, w ill size p e rtu rb a tio n affect o v erall m o v e m e n t tim e. 4.4.3 Predictions of the Model T o w ard testin g th e p red ic tiv e n e ss of th e m o d el d e sc rib e d , w e sim u lated a novel variatio n on th e p reh en sio n task stu d ied in th e above ex p erim en ts. In p a rticu lar, w e w o n d e re d w h a t th e effect w o u ld be of asking a subject to g rasp an object v ery n ear to h er h a n d , u n d e r a tig h t tim e constraint. T hus th e tra n sp o rt process w o u ld , if u n c o n stra in e d b y th e p reh en sio n process, o p erate v ery quickly. To sim ulate this w e set, in th e sim u latio n , th e tran sp o rt-tim e-n eed ed p a ra m eter to 200 m s, a n d the m o v em en t distance to 2 cm. The results are show n in Fig. 4.12. Since th e tim e n eed ed by the tran sp o rt process is m uch less th an th a t n eed ed b y the p re sh a p e a n d enclose processes, th e m o v em en t tim e is d ic ta te d b y th e latter. T hus tran sp o rt is extended in tim e to m atch th e h a n d 's m ovem ent, resu ltin g in the low velocity profile seen in Fig. 4.12a. The m odel u se d in this ch ap ter p red icts th a t in general, for sh o rt reaching m ovem ents, th ere is a lo w er b o u n d o n th e m o v em en t tim e w h en th ere is an asso ciated grasp in g task. Further, as noted in the discussion on the sim ulation of the experim ents of P aulignan et al. (1991a), alth o u g h w e assum e th e p resh ap e tim e-needed is 310 m s, this is actually an u p p e r b o u n d on its value. It m ay b e th a t th e tra n sp o rt-tim e -n e e d e d is d o m in a n t, o v e rrid in g a sh o rte r enclose tim e. By p erfo rm in g th e task d escrib ed here, an e x p erim en ter w o u ld reveal th e tru e p resh ap e-tim e-n eed ed valu e, since it p resu m ab ly w o u ld be the lim iting tim e factor in the overall m ovem ent. The "m axim um co m parison" elem en t of th is m odel p red ic ts th a t d u ratio n extensions caused by location and size p ertu rb atio n s d o n o t sum , 85 20 15 10 5 0 0.0 0.2 0.4 0.6 0.8 t (sec) H an d Speed (cm /sec) H a n d D isplacem ent (cm) 100 5 0 - -50- -100 0.0 0.2 0.4 0.6 0.8 t (sec) A p ertu re Speed (cm /sec) A p ertu re (m m ) Fig. 4.12. R esults of sim ulating a short, quick m ovem ent, a W rist m ovem ent, b H a n d a p ertu re form ation. Since a p ertu re form ation is the lim iting factor in th e m ovem ent, tra n sp o rt slow s to m atch it tem porally, resulting in a low peak w rist velocity. ra th e r th e m ax im u m is taken. Fig. 4.13 show s resu lts of sim u la tin g a m o v em en t w ith sim u ltan eo u s p e rtu rb a tio n of b o th targ e t lo catio n an d size. A t the onset of m ovem ent, the targ et is b o th shifted to the rig h t an d enlarged, u sing th e target locations from P aulignan e t al. (1991a) an d sizes from P a u lig n a n et al. (1991b). Since th e increased m o v em en t tim e for leftw a rd p e rtu rb a tio n (100 m s) a n d for in creased object size (175 m s) o verlap in stead of sum m ing, th e total m ovem ent tim e increase is 175 m s, the larger of the tw o changes. 4.5 Discussion W e h av e e x p an d e d an existing schem a m o d el of th e in teractio n b etw een reach a n d g rasp to give a d e scrip tio n at th e kin em atic level, a p p ro p ria te for com paring against th e o u tp u t of behavioral experim ents. W e hav e b u ilt u p o n the m inim um -jerk trajectory form ation m odel to 86 100 < • * » > * 8 C O N , 80 - 1 60 1 0.2 0.6 0.8 0.0 0.4 30 20- 10- -15 ■ 5 5 15 x (cm) Hand Speed (cm/sec) Hand Displacement (cm) 150 100 “ 50- -50 - -100 0.8 0.0 0.4 0.6 0.2 t (sec) Aperture Speed (cm/sec) Aperture (mm) Fig. 4.13. R esults of sim ulating a m o v em en t w ith p e rtu rb a tio n of b o th targ e t location an d size, a W rist m o v em en t p ath , b W rist disp lacem en t an d speed, c H a n d a p e rtu re form ation. Since th e increase m ovem ent tim e for leftw ard p ertu rb atio n (100 m s) an d for increased object size (175 m s) overlap instead of sum m ing, th e total m ovem ent tim e increase is 175 m s, the larg er of the tw o changes. 87 p o sit a control m o d el w hich reflects th e consequences of sensory in p u t a n d m o to r co rrectio n d u rin g m o v em en t. T he constant enclose time m o d el for th e tim in g of reach an d g rasp su p p o rts w h a t seem s to be a logical p lan n in g strateg y for th e central n ervous system , w hich is to start from th e constraints of object interaction (e.g. the h a n d 's enclose tim e) an d th e n to p lan th e p reced in g segm ents o f th e m o to r task (tra n sp o rt an d preshape) in accordance w ith these constraints. 4.5.1 Comments on the M aximum Duration Model R egarding th e tem p o ral interaction m odel of Sect. 4.3.1: T here are o th er exam ples of m otor control processes w here tem p o ral coordination is p ro d u c e d b y slow ing o r d elay in g a process w hich w o u ld o th erw ise be quicker. In o cu lo m o to r control it is k n o w n th a t, a lth o u g h th e re are sep arate b rain stem centers for g en eratin g vertical a n d h o rizo n tal saccade co m p o n en ts, an o b liq u e saccade is ch aracterized b y a ro u g h ly stra ig h t m o v em en t of th e p u p il. If b o th co m p o n en ts p ro c e e d e d a t m ax im u m speed an d the saccade direction w as o th er th an 45°, th e shorter com ponent w o u ld fin ish first, c au sin g an a b ru p t d isc o n tin u ity in th e m o v em en t direction. Instead, it is observed th a t th e sh o rter co m p o n en t is slow ed in its p ro g ressio n to an extent th a t the tw o com p o n en ts term in ate together (Fuchs et al., 1985). Rogal an d Fischer (1986) fo u n d th at for com bined eye an d arm m ovem ent, saccade an d reach are p rep a red in parallel, b u t if the saccade p re p a ra tio n tim e exceeds th a t of reach, th en reach execution is d elay ed . T his resu lts in h ig h er correlation b e tw ee n reach a n d saccade o n set tim es w h en saccade p rep aratio n is m ad e to be very tim e consum ing. T hus, in this exam ple, it is the p rep aratio n tim es of tw o processes w hich are synchronized in th eir duration. W ith th e m in im u m je rk /tim e m odel of ch ap ter 3, w e sh o w ed th at re a c h is a c c u ra te ly m o d e le d by th e s im u lta n e o u s o p tim iz a tio n of sm oothness in one direction, sm oothness in th e o rth o g o n al direction, an d m o v e m e n t d u ra tio n . A slig h t ch an g e in d ista n c e to m o v e in one d im e n sio n w ill effect th e kinem atics o f th e o th e r d im e n sio n a n d the 88 d u ra tio n of th e m ovem ent. In co n trast to this tig h t co upling, w e have seen th at the m otor processes of reach an d grasp are m ore loosely coupled -- via th e m axim um duration c o m p a riso n ra th e r th a n v ia c o m b in ed optim ization. W e h av e seen th a t u n d e r som e conditions a change in one of these processes w ill im p act the o th er (i.e. the findings of P au lig n an and co lleag u es), w h ile u n d e r o th e r c o n d itio n s o n e p ro c e ss m a y ch an g e w ith o u t a ffe ctin g th e o th e r (i.e. th e fin d in g s o f J e a n n e ro d , 1981). Intuitively, it is reasonable th at reach an d grasp are m ore loosely coupled th an are th e com ponents of reach, since th ey are m ore d ista n tly related , h a v in g d ifferen t relatio n sh ip s in v ario u s tasks, an d b ein g able to occur in d e p e n d e n tly of each o th er. In co n trast, th e co m p o n en ts of a single m o to r p ro cess, su ch as th e reach in g of th e arm , o p e ra tin g alw ay s in synchrony, are im p o rtan t for the nervous system to co-optim ize. 4.5.2 Observations on Enclose Time Earlier w e discussed the lack of an effect of p e rtu rb a tio n on enclose tim e (ET). This m o tiv ated consistency of ET as a m o d el of m o to r task co o rd in atio n . It is in te restin g to note w h a t h a p p e n s in th e situ a tio n w h ere p e rtu rb atio n s are not p a rt of th e experim ent. "Blocked" trials w ere p erfo rm ed by P aulignan et al. (1991a), w here the subject rep eated ly reached to each of th e th ree targets, w ith o u t p ertu rb atio n . T aking ET v alu es for 225 such trials, an d p erfo rm in g a six-treatm ent A N O V A (three blocked cases, one control case, an d tw o p ertu rb atio n cases), w e fo u n d an F-ratio of F(5, 379)=3.99, w hich says th at there is an experim ental effect o n ET at the 99% confidence level. H ow ever, analyzing the blocked trials alone, yields F(2, 222)=.314, w ell b elo w th e th re sh o ld for th e 99% co n fid en ce level. T hus n eith er targ e t location (w hich changes b etw een th e blocked trials) n o r p e rtu rb a tio n has an effect on ET, b u t th e experim ental situ atio n does have an effect: Enclose tim e is consistently longer in blocked trial th an in e ith er control o r p e rtu rb e d trials. Since th ere is n o th in g m echanically d ifferen t betw een reaching to th e u n p e rtu rb e d targ et a n d reaching to the corresponding targ et d u rin g blocked trials, the difference m ay be d u e to a 89 d ifferen t psychological state w h e n it is k n o w n th a t p e rtu rb a tio n s are so m e w h at likely th an w h en th ey are absent. T hus ex p ectatio n h as an effect on kinem atics. W hile enclose tim e (ET) for the h a n d d u rin g g rasp is h y p o th esized to be in v a ria n t u n d e r p ertu rb a tio n , is certainly is not th e case th a t ET is in v a ria n t across all g rasp in g tasks. In M arten iu k et al. (1990), d ifferen t sized disks w ere g rasp ed by h u m an subjects. The disks v aried in diam eter (DIA) from 1 cm to 10 cm. It w as fo u n d th a t as d isk size d ecreased , m ovem ent tim e (MT) increased (as p redicted by the speed / accuracy trade off), b u t also th e tim e to peak a p ertu re (TPA) decreased. T he relation ship b etw een TPA an d DIA in their d ata is sum m arized b y th e least sq u ares fit lin ear relatio n sh ip , TPA = 491 m s + DIA 6.10 m s/c m Sim ilarly, M T is related to DIA by M T = 681 m s - DIA 5.94 m s/c m Since ET = MT-TPA, th e relationship betw een ET an d DIA is given by ET = 190 m s - DIA 12.04 m s/c m T h u s enclose tim e in creases as d isk size decreases. T h ere are tw o in terp retatio n s of this. First, th e h an d physically needs m ore tim e to close to a sm aller size. Second, th ere m ay be a sp e e d /a c c u ra c y trad e-o ff for g rasp in g as w ell as for reaching. This relatio n sh ip b etw een D IA a n d ET predicts ET=172 m s for a disk of diam eter 1.5 cm. This is in the b allp ark of th e ET seen in P aulignan et al. (1991a). H ow ever, for a d isk of size 6 cm th e p red icted ET is 118 m s, w hereas in P aulignan et al. (1990b) th e average enclose tim e for the 6 cm d iam eter dow el is 187 m s. T here m ay b e o th er factors (disk height, experim ental set-up) th a t contribute to the discrepancy b etw een ET in th e tw o experim ents. F u rth er in v estig atio n is n e e d e d to isolate the responsible factors. 90 4.5.3 Critique of an Alternative Model F lash a n d H en is (1992) re c o rd ed rea ch in g kin em atics for a low accuracy p o in tin g task, an d c o m p ared altern a tiv e m odels o f trajectory m o d ifica tio n , b a se d o n th e h a n d -m in im u m -je rk m o d el. T h ey fo u n d stro n g e st ev id en ce for a superposition m o d el w hich fits th e p e rtu rb e d trajectory w ith a m odel trajectory th a t is th e su m of tw o m in im u m jerk functions. T he first function co rresp o n d s to th e u n p e rtu rb e d m o v em en t to th e initial targ et, w h ile th e second is a m in im u m jerk trajectory from th e initial targ et to the p e rtu rb e d targ et location. In p articu lar th is m odel fit th e ir d a ta b e tte r th a n an a lte rn a tiv e abort-replan m o d e l w h ic h te rm in a te s a n in itia l m in im u m -je rk m o v e m e n t a t th e p o in t of p e rtu rb a tio n a n d rep la ce s it w ith a n e w m in im u m -je rk trajecto ry . C oincidentally, the latter m odel is equivalent to the o u tp u t of th e feedback controller of Sect. 4.3.2 w h en given a p e rtu rb e d in p u t. M athem atically, th e difference b etw een th e tw o c o m p ared m odels is th a t superposition yields a sequence of th ree quintic p o ly n o m ial functions of tim e for the h a n d p o sitio n , w hereas abort-replan yields a sequence of only tw o. Since o n ly a lte rn a tiv e v a ria tio n s o n th e m in im u m -je rk th e m e w e re b e in g co m pared (in this goodness-of-fit analysis of m odel correctness) it is not clear w h e th er it is the superposition v ersu s abort-replan o r w h e th er it is th e choice of optim ization criterion w hich is responsible for th e im precise fit. C h a p te r 6 w ill sh ed fu rth e r lig h t o n th e effect of c h a n g in g o p tim izatio n criteria. In th eir m odel com parison, only th e low accuracy p lan ar-targ et capture task w as analyzed. T here is no indication ab o u t how w ell th e superposition m odel ex ten d s to n a tu ra l b eh av io ral m o v em en ts, such as th e reach a n d g rasp tasks exam in ed in this chapter. In contrast, ch ap ter 5 w ill show ho w th e m inim um -jerk on-line control m o d el, u se d u p to this p o in t in th is d issertatio n , m ay be e x ten d ed u n d e r stochastic c o n d itio n s to m o d el a rich b o d y of d a ta re la te d to th e c o n tro l of m o v em en t accuracy. Lastly, th in k in g in term s of n e u ra l im plem entation, it is preferable to consider a flexible controller w hose activity is m odified b y changing ru n tim e param eters (as in th e case of o u r controller m odel), ra th e r th a n a system w hich instantiates an ad d itio n al trajectory g en erato r 91 for each p e rtu rb a tio n in th e system 's in p u t. In th e fo rm er case w e can th in k of a cellular n etw o rk w ith v ary in g n eu ral in p u t. In th e latter case, w e im agine th e n eed for additional neural h a rd w are to be allocated to the control problem d u rin g th e course of m otor activity. 4.5.4 Suggested Experiments T he goal of the m odeling w o rk in this thesis is to stim u late fu rth er ex p erim en tatio n . W ith the sim ulation of im ag in ed experim ents in Sect. 4.4.3 w e h o p e to in sp ire novel v ariations on th e b eh av io ral experim ents alread y perfo rm ed , b o th to test an d to refine th e m o d el p resen ted here. Specifically, the testable hypotheses are 1) th a t there is a low er b o u n d on m o v em en t tim e, d ictate d b y th e p reh e n sio n p ro cess, w h ich lim its th e sp eed of tra n sp o rt, an d 2) th a t sim u ltan eo u s p e rtu rb a tio n of size an d location of a reaching targ e t w ill n o t slow th e action b y th e sum of the in d iv id u a l d e la y s, b u t b y th e m a x im u m of th e tw o. F u rth e r th e a p p licatio n of th e m o d el to th e d a ta of Jean n ero d (1981) in Sect. 4.4.2 su ggests th a t th ere is a m axim um m o v em en t tim e for w hich object size p e rtu rb a tio n w ill slow th e m ovem ent, th at m axim um being 685 m s. In th e tra n sp o rt a n d p reh e n sio n m o d el, th e o u tp u ts of th e tw o m aps labeled 'T ra n sp o rt tim e needed" and "P reshape tim e needed" are set em pirically b ased on the data. In reality, th ey are d e p e n d e n t on targ et lo c a tio n a n d size, as w ell as tim e , m a g n itu d e , a n d d ire c tio n of p erturb ation. W e began m ap p in g o u t this relatio n sh ip in C h ap ter 3 w h en w e m o d e le d th e d e p e n d e n c y o f d u r a tio n o n th e p a ra m e te rs o f p ertu rb atio n . H ow ever, th e experim ental d a ta only sparsely sam ples the p aram eter space. In o rd er to clearly elucidate th e d ep en d en cy of the "Tim e n e e d e d " m a p s' o u tp u ts o n th e v a rio u s e x p e rim e n ta l in p u ts , fu rth e r experim ents are n eed ed in w hich a greater v ariety of th ese p aram eters is u tilized . 92 C h a p t e r 5 T h e S p e e d / A c c u r a c y T r a d e - o f f : A S t o c h a s t i c O p t im a l C o n t r o l A n a l y s is The w o rk p re se n te d h ere in teg rates a m odel of v ariab ility in m ovem ent w ith a m odel of continuous control u n d e r delay ed feedback to rep ro d u ce findings on accuracy a n d trajectory in m ovem ents of constrained accuracy. W e offer an alternative to the tw o-phase, feedforw ard / feedback m odel of reaching, an d show h o w a single continuous control process m ay generate b o th th e initial, fast, "ballistic" p h ase of reach an d th e later slow , accurate p h ase. The m o d el com bines o p tim iza tio n criteria o f sm o o th n ess an d accuracy in a stochastic m odel. A u n iq u e m odel of variability d u rin g the trajectory allow s novel insight into the properties of reach trajectories. 5.1 Behavioral Background and Past Models of Variability in Reaching F or a lm o st a c e n tu ry , m o to r c o n tro l b e h a v io ris ts h a v e b e en stu d y in g th e tim in g o f m o v em en t an d its re la tio n sh ip to accuracy of control. A body of sem inal w ork w as pub lish ed by W o o d w o rth (1899) on v isu ally g u id e d h a n d m ovem ents a n d th e effect o f im p o sed m o v em en t tim e o n accuracy of m ovem ent control. In 1954 Fitts p u b lish ed results of a sty lu s ta p p in g e x p erim en t, w h e re th e subject m o v ed back a n d fo rth b etw een tw o sq u ares o n a table to p , th e in terio rs of w h ich w ere to be 93 to u ch ed w ith the stylus. H e n o ted a trade-off betw een the m ovem ent tim e for each tap an d th e size of th e target, in w hich th e m o v em en t tim e (MT) w as a lin ear function of th e lo g arith m of th e ratio of m o v em en t distance to targ et w id th . The logarithm term w as called the Index of Difficulty (ID). A fter th e linear relatio n sh ip w as fo u n d rep eated ly in m an y experim ental p arad ig m s, it w as given th e n am e Fitts' Law. W h ile e a rly e x p e rim e n ts on a cc u ra c y c o n stra in ts fo c u se d on m e a su re d q u an tities such as m o v em en t tim e a n d d istance, m o re recent stu d ies h av e tak en ad v an tag e of n ew tools for m easu rin g kinem atics of m o v em en t a t h ig h frequencies. M ilner a n d Ijaz (1990) trac k ed p o sitio n an d velocity of the w rists of subjects in sertin g a peg in a hole d rilled in a vertical surface. For different size holes, different m o v em en t tim es w ere m easured. As hole size decreased, m ovem ent tim e increased. F u rth er, in exam ining the sp eed of m o vem ent th ro u g h o u t th e task, it is seen th a t as m o v em en t tim e increased, th e sp e ed v ersu s tim e p ro file w as d isto rte d such th at the tail of the m ovem ent (the low velocity p hase at th e end) w as e x ten d ed (Fig. 5.1). This d isto rtio n w as also fo u n d b y M acK enzie et al. (1987) w h o h a d subjects p o in t to targets of v ary in g size an d distance "as quickly an d accurately as possible." They fo u n d th at p eak velocity scaled w ith d istance an d th at th e deceleration phase increased w ith th e req u ired accuracy (as dictated by target size), i.e., increasing th e distance scaled the v elo city p ro file , k e ep in g its sh a p e th e sam e, w h ile in cre asin g ta rg e t accuracy stretch ed o u t th e trailin g deceleration phase. M arten iu k e t al. (1987) o b tain ed sim ilar resu lts w h en th ey h a d subjects perfo rm reaching- to -g ra s p a n d re a c h in g -to -p o in t m o v e m e n ts. In c re a s e d a c c u ra c y re q u ire m e n ts cau sed d ecreased m ax im u m velocity a n d in creased total m o v em en t tim e. A lth o u g h th e g rasp in g task took m ore tim e th a n the e a sie r p o in tin g task , th e in c re a se d d u ra tio n w a s c o n fin e d to th e deceleration phase, w ith both tasks reaching m axim um velocity at ab o u t th e sam e tim e. F urther, M arteniuk et al. (1987) h ad subjects g rasp a sm all d isk a n d su b seq u en tly either toss it into a large container or tig h tly fit it in to a sm all one. T racking th e velocity profile before g rasp o ccu rred , the velocity skew ing effect of accuracy w as seen for the sm aller target, i.e. the TANGENTIAL VELOCITY (m /~c) ° TANGENTIAL VELOCITY < » /* M 7 itnet 2 1 .4 n Ttaget Site 1 0 .S m m T W ire e it Rjju 11. 1 tam Target Sbe 1 T .5 m m I.TI 1 .9 1 1.11 TIME ( m c ) •.** «.M ••• TIME (»«) Fig. 5.1. W rist velocity during insertion of a 9.5 mm diameter peg into holes of various sizes, a Hole is ^ 50.8 mm wide, b Hole is 25.4 mm. c Hole is 17.5 mm. d Hole is 11.1 mm. From Milner and Ijaz (1990). 95 d e c e le ra tio n w as le n g th e n e d as th e d isk w as a p p ro a c h e d . T his is interesting since the change in accuracy occurs in a su b seq u en t task, b u t it is n o t su rp risin g , since for a m ore accurate task, th e object m ay have to be g rasp ed m ore precisely. T hus th e effect can be view ed as an o th er instance of the effect of accuracy o n reaching kinem atics. The rea ch -to -g rasp m o v em en t has often b e e n c h arac terize d as a tw o-phase process w ith a quick feedforw ard p hase being follow ed by a slow feed b ack p h a se (W o o d w o rth 1899; Je a n n e ro d 1984). T w o p a ra lle l su b p ro cesses, rea ch in g a n d p reh e n sio n , occur d u rin g th is tw o p h a se process, a n d each goes th ro u g h a m ark ed tran sitio n at a b o u t th e sam e tim e. A ccording to this view , after about 75% of th e m o v em en t tim e the h a n d com pletes its feedforward, p ro p rio cep tio n -b ased p re sh a p in g process an d tu rn s to a tactile-input controlled feedback process, closing u n til the targ et object has been grasped. Sim ilarly, the reaching m otion goes from a relatively quick feedforward phase to a slow er p h ase w hich observers have characterized as a feedback p hase for accurate positioning. T he in tu itio n is th a t feedback in th e slow er p h ase is in term s of visu al p e rcep tio n of the h a n d -ta rg e t d isc rep a n cy , a n d th a t th e q u ick in itia l p h a se m u st lack feedback control, because of insufficient tim e to process such inform ation. W ith th e m odel to b e p resen ted (an extension of th e continuous, delayed feedback control m odel presented in chapter 2) w e w ill arg u e th at for free space p o in tin g an d reaching to g rasp , a lth o u g h th e kinem atics of reach m ay go th ro u g h a transition, th e control process does not change. A single feedback process can be responsible for both th e quick a n d for the slow , accurate phase. [N ote, how ever, th e difference b etw een m o to r control in free space an d w h en interacting w ith the environm ent: in g rasp in g and m u g placing (A rbib et al. 1985), there is a distinctive contact phase, w here th e subject m akes use of tactile/fo rce feedback. H ere w e restrict ourselves to th e free space p arad ig m .] T he accuracy of co n tro l, in o u r m odel, d e p en d s o n th e sp eed of m ovem ent, so by slow ing n ear th e end, th e sam e feedback process pro v id es accurate final positioning w ith o u t changing the control strategy. 96 D espite th e increased inform ation afforded b y recording trajectories, th e ab o v e stu d ies o n ly discuss tre n d s in th e a v erag e m o v em en t p ath . H ow ever, in th eir w ork on reaching to p e rtu rb e d targets, P au lig n an et al. (1991) also recorded position variability for w rist a n d fingers a t each p o in t during th e m o v em en t (Fig. 5.2), as o p p o se d to o th e r stu d ie s in w hich m o v em en t variability w as o n ly o bserved at th e m o v em en t's end, in term s o f accu racy . T h ey fo u n d th a t v a ria b ility in c re a se d to a b o u t th e m o v em en t's m id p o in t (to a m ax im u m of 25-30 m m ), th e n d ecreased to w a rd s th e e n d (to a b o u t 5 m m ). T he a u th o rs su g g e ste d th a t th e p h en o m en o n com es ab o u t because reaching is subserved by tw o different motor programs, an early one w hich controls d irectio n precisely, b u t not p o sitio n , an d a later one w hich p ro d u ces final p o sitio n feedback control. In this ch ap ter w e w ill p resen t a m odel of a single feedback co n tro ller w h ic h re p ro d u c es th is v a ria b ility p h e n o m e n o n , as w ell as o th e r d ata. M a rten iu k e t al. (1990) stu d ie d , am o n g o th er v ariab les, w ith in -su b ject w rist variability d u rin g reaching to g rasp disks of various sizes. A gain it w as seen th a t variability in creased early in th e reaching m o v em en t, and decreased to the end. T he d ata o n reducing variability to w a rd th e en d of m o v em en t strongly indicates th e presence of a feedback m echanism . A related issu e is th e d elay a n d m o d ality of kinem atic feedback. Jeannerod (1988) review s various studies of th e m in im u m necessary tim e for visual inform ation to influence m ovem ent. T he typical reaction tim e for h an d m ovem ent to a visual targ et is 200-300 m s (e.g. Stark, 1968). This corresponds to th e shortest d u ratio n m ovem ent for w hich vision in a task affects accuracy (Keele a n d Posner, 1968). But Z elaznik e t al. (1983) found an effect of vision o n accuracy for m ovem ent as sh o rt as 120 m s. This is su p p o rte d b y m ore recent findings of P au lig n an et al. (1991) th a t targ et m o v e m e n t affects h a n d trajecto ry in a b o u t 100 m s, w h e n th e h a n d 's acceleratio n p ro file is ob serv ed . Since the reactio n tim e (to in itia te a m o to r response) in clu d es th e tim e n eed ed to overcom e in ertia, w e m ay re ly o n th e re su lts of P a u lig n a n , Z elazn ik , a n d th e ir co lleag u es as p ro v id in g a reliable estim atio n of th e v isu al sen so rim o to r d elay along in tern al inform ation p athw ays. K eeping in m in d th a t p ro p rio cep tiv e 97 Y B 20 B 30 c E M >- "3 en + p m X T3 4 0 30 20 1 0 0 100 0 0 100 0 100 TIME FRAMES Fig. 5.2. V ariab ility in w rist, th u m b , a n d in d ex fin g e r d u rin g reaching to grasp a sm all vertical dow el. Top: T op view of p a th s of w rist, th u m b , a n d index finger d u rin g reach to each of th ree targets. V ariab ility fo r each p a th is sh o w n in th e X a n d Y d irectio n s. Bottom : The variability d a ta p lo tte d as a function of tim e. (From P aulignan et al., 1991.) 98 feedback p lay s an im p o rta n t ro le in th e co n tro l of reach, a n d th a t its feedback tim e m ay be different, w e accept 100 m s as a good approxim ation o f th e m in im u m lumped sensorimotor delay in th e co n tro l loop for reaching. V a rio u s c o m p u ta tio n a l m o d e ls h a v e a d d re s s e d is s u e s in m o v em en t v a riab ility a n d th e sp e e d /a c c u ra c y trade-off. B ullock a n d G ro ssb e rg 's (1988) v ecto r in te g ra tio n to e n d p o in t (VITE) m o d e l of trajectory g en eratio n d irectly a d d resses th e in te rn al c o m p u tatio n of th e CN S d u rin g trajectory generation. T heir m odel is b ased on a continuous co m p a riso n b e tw e e n ta rg e t lo catio n a n d h a n d lo catio n , p ro v id e d b y e fferen t copy of th e m o to r com m and. A v aria b le -m a g n itu d e go signal p ro v id es th e a p p ro p ria te tem p o ral scaling as w ell as a trig g er signal to in itiate m o v em en t. A n iterativ e correction m o d el, d iscu ssed b y C raik (1947), M ey er e t al. (1982), a n d Je a n n e ro d (1988), a d d re ss e s F itts' lo g arith m ic sp e e d /a c c u ra c y trade-off. If m o to r e rro r is sa m p le d a t a constant rate, an d after each sam pling a corrective m ovem ent is generated w hich covers a constant p ercentage of th e rem ain in g m ovem ent distance (w ith th e p ro cess b ein g re p e a te d u n til som e th re sh o ld of accuracy is reached), th en th e m o v em en t’s accuracy is exponential in th e n u m b er of sam plings. P u t an o th e r w ay , th e m o v em en t tim e is lo g arith m ic in the req u ire d accuracy. In th e VITE m odel, w ith th e rig h t p a ra m eter settings, th e sy stem o v e rsh o o ts th e ta rg e t by an a m o u n t e x p o n e n tia l in th e m o v em en t tim e, hence m ovem ent tim e is, again, logarithm ic in accuracy, so Fitts' law is re p ro d u c e d b y th e m odel. W e note h ere th a t w h ile th e iterative correction m odel predicts a fixed d eg ree of u ndershoot, th e VITE Fitts' law reproduction predicts a fixed degree of overshoot. In reality, both o v ersh o o t an d u n d e rsh o o t occur in goal d irec te d reach, as rev iew ed by Jean n ero d (1988). H iray am a et al. (1992) p re se n t an o th er d eterm in istic m o d el of th e sp eed / accuracy trade-off. H ere a n eu ral n e tw o rk m odel learns to create arm reaching trajectories. By p u ttin g constraints on the convergence of the n eu ral n etw o rk learning, inaccuracy in final position is created. T he shortcom ing of these m odels is in th eir determ inism . T hat is, th ey have in clu d ed no stochastic elem ent in th e com putational m odel, 99 essentially m o deling inaccuracy w ith o u t variability, w hich is an illogical co n stru ctio n . V a ria b ility is c o n sid e re d in th e im p u lse -v a ria b ility m o d el of S chm idt et al. (1977). T he m odel is b ased o n th e v a riab ility of m uscle g e n erate d forces, w h o se v ariatio n s in d u ra tio n a n d a m p litu d e increase w ith th eir intensity. Im plicit in this m odel, how ever, is th e assu m p tio n of fe e d fo rw a rd c o n tro l: th a t v a ria b ility in th e d r iv in g in p u t is u n c o m p e n sa te d a n d th erefo re resu lts in e rro rs in fin al p o sitio n . W e w o u ld prefer a m odel w hich takes into account th e evidence for afferent influence on m o v em en t accuracy. P erhaps th e m ost advanced m odel of controlling final lim b position w ith v a riab ility in v o lv ed w as p u t fo rth by M eyer et al. (1988). T hey su g g est th a t a reaching m o v em en t m ay be th o u g h t of as consisting of a series of in d e p e n d e n t p ulses, each p u lse hav in g a sta n d a rd d ev iatio n in th e d isp la c e m e n t it g e n erate s w h ic h is in v ersely p ro p o rtio n a l to its d u ratio n . For th e tw o -p u lse case, at th e en d of th e first p ulse, a second (sm aller) p u lse is generated to cover the rem aining distance, correcting for th e e rro r in th e first p u lse b u t in tro d u cin g errors of its ow n. The authors optim ize th e d u ra tio n of each pulse to m inim ize th e final variability, an d fin d th a t to ta l m o v em en t tim e (MT) is a p p ro x im a te ly a sq u a re -ro o t fu n ctio n of th e accuracy.^ F urther, th ey claim th a t if th eir o p tim izatio n a p p ro a c h is a p p lie d to an n -p u lse m o v em en t, th e n M T is re la te d to accuracy to th e 1 /n p o w er, a n d th at th e lim it as n goes to in fin ity is a lo g arith m ic rela tio n sh ip sim ilar to F itts' law . T he d ra w b a c k of th eir c o n ce p t is th a t it d o es n o t c o n sid e r th e p ro b le m s a sso c ia te d w ith im p lem en tatio n as a controller for m ovem ent, e.g. d elay a n d noise. A t the en d of one m o v em en t pulse, a second one is im m ed iately generated. This im plies precise and im m ediate feedback about lim b position w hich is in sta n ta n e o u sly c o n v erted into m o to r o u tp u t, or else som e m eth o d of e stim atin g th e e rro r in a su b m o v e m e n t before it ends. W e p refe r to 1 Accuracy is given by D /W , the distance moved divided by the variability at the end point. 100 consider an explicit m odel of h o w delayed, continuous feedback about the state of th e lim b is in co rp o rated into th e current efferent m o to r com m and to o p tim ize m o v em en t. Such a m o d el is d e v elo p ed in th e follow ing section. 5.2 The Problem to be Studied The p h en o m en a to be m o d eled in th e p resen t stu d y are tw o-fold. F irst, w e w ish to c ap tu re in o u r m odel Fitts' sp e e d /a c c u ra c y trade-off. Second, w e w ish to cap tu re th e fact th a t w h en accuracy req u irem en ts are increased, reaching m ovem ents are m odified so th a t m ore tim e is sp e n t in th e low velocity, deceleration p o rtio n of the m ovem ent, i.e. th e velocity profile exhibits skewing. (Intuitively, velocity skew ing increases accuracy, since n ear the en d of th e m ovem ent an d at low sp eed little inaccuracy is in tro d u c e d , w h ile a n y e x istin g in ac cu ra cy can be c o rre c te d .) In con stru ctin g a m odel of accuracy d u rin g m ovem ent, w e w ish to em ploy th e sim plest construction w hich em bodies th e p ro p erties w e h y p o th esize to be resp o n sib le for th e p h e n o m e n a of interest. In th is p ro b lem th e p ro p erties are three-fold: First, th a t there is a stochastic elem en t in the m echanics of the p lan t, an d it is responsible for variability in m ovem ent. (This is in co n trast to th e deterministic m odels of in accu racy d iscu ssed above.) The second p ro p erty is th a t control is based on kn o w led g e of the p la n t's state a n d is d elay ed in its arrival. O th er m o d els of v ariab ility assu m e a feedforward m o d el for trajectory g en eratio n b u t, as discu ssed above, d a ta o n v a ria b ility d u rin g m o v em en t sh o w s th a t it d ecreases to w a rd th e en d of a trajectory - evidence for corrective feedback. F urther, to be realistic, sen so rim o to r d elay m u st be in clu d ed , since v isu a l an d p ro p rio c e p tiv e in fo rm a tio n a b o u t u n e x p e c te d in a c c u ra c y c a n n o t be g a th e re d a n d a p p lie d to th e trajecto ry in sta n ta n e o u sly . T h ird ly , in m o d elin g noise w e a d a p t th e m odel of variability u se d by M eyer et al. (1988): The variability for one tim e step, At, w ill be p ro p o rtio n a l to the d isp la c e m e n t g e n e ra te d b y a p u ls e d u rin g th a t tim e step : If th e d isplacem ent caused by th e pulse is Ax, th en since Ax = v(t) At, this m eans 101 th e variability d u rin g one segm ent is p ro p o rtio n al to the velocity d u rin g th a t tim e segm ent. T hus, in o u r m odel, th e noise a d d e d to th e p la n t at an y in stan t w ill be p ro portional to th e instantaneous velocity of th e plant. T he id ea is to a p p ly the above p ro p erties to a dynam ical system , th en ask th a t th e best m ovem ent trajectory b e found (for a m easu re of goodness to b e d e sc rib e d ), w ith o u t sp ecify in g a p rio ri an y p a rtic u la r trajec to ry characteristics. The em ergent trajectory is then com pared to actual d ata on v o lu n ta ry reaching. In keeping w ith the m inim alist p h ilo so p h y for m odel construction, w e choose to sim ulate the control of the kinem atics of a one-dim ensional p o in t, w hose state is its position, velocity, an d acceleration. T he goal of th e control w ill be to b rin g th e p o in t th ro u g h a given distance to com e to re s t at a ta rg e t p o in t, in a g iv en am o u n t of tim e, an d w ith m ax im u m accuracy. A nalogies m ay be d raw n to the control of w rist position d u rin g a re a c h in g m o v e m e n t, as d is c u s s e d in th e p r e v io u s c h a p te rs . M athem atically, w e define th e 3 X 1 state vector x(t) to be th e trip le of p o sitio n , velocity, a n d acceleration a t an y tim e d u rin g th e m ovem ent. M ath em atically , w e ex p ress th e d ifferen tial re la tio n sh ip b e tw e e n th e com ponents by th e system of equations 'o l 0 x = A x, A = 0 0 1 .0 0 0 To d riv e th e system w e a d d to the above equation an in p u t u(t). In o rd er to p reserv e th e d efined relatio n sh ip b etw een p osition a n d velocity, and b etw een velocity an d acceleration, w e d o n o t m odify th e first tw o row s of th e above equation. Instead w e ap p ly the in p u t to the acceleration, in the th ird row : 'o 1 o' 0 x = A x + Bu, A = 0 0 1 , B = 0 0 0 0 1. so th a t the in p u t is the derivative of th e acceleration, or jerk. F urther, w e choose to ex p ress th is system in a d iscrete tim e form b ecau se of th e m o d elin g tech n iq u es w e in te n d to em ploy. For an y co n tin u o u s tim e 102 scalar v ariab le y, w e define th e discrete tim e v ariab le y k u sin g th e first o rd e r appro x im atio n to the tim e derivative of y (w hile noting th at h ig h er o rd er transform ations are possible), an d using s as th e size of the tim e step: y k+i = y k + s y W ith th is definition w e derive xk+i = A x k + Buk- ’l s o’ 0 A = 0 1 s , B = 0 .0 0 1. _s_ (5.1) w here k=0, 1, . . ., N -l, N being the preset, fixed nu m b er of tim esteps. The last ad d itio n to this m odel is to define th e stochastic elem ent in the plant. W e in tro d u c e th e scalar ran d o m variable n ^ . For sim plicity w e give it a g aussian d istrib u tio n w ith m ean zero an d variance a 2, an d define nj an d nj to be in d ep e n d en t if i*j. The m odification to th e p lan t dynam ics is, x. = Ax, + Bu. , k+1 k k' A = A + £nk, % = ’12 ’22 ’32 0 0 0 (5.2) N o te th a t th is fo rm u la tio n a d d s n o ise an a m o u n t p ro p o rtio n a l to th e second com ponent of x, i.e. the p la n t’ s velocity. To rem ain as general as possible, noise is a d d e d to each elem ent of th e system state vector. W e assum e th e com ponents of th e constant m atrix £ , are arbitrary, b u t know n. N o w th e control p roblem is to fin d a control sequence u k w hich driv es th e system from a given initial state xQ (at tim e 0) to a d esired final state xjq, at tim e s*N. Each control m ay be based on all p ast control values, b u t only on state inform ation u p to a fixed tim e in the past. T his is o u r m odel of delayed feedback. (In Sect. 5.3.1 it w ill tu rn o u t th a t the system n e ed n o t sto re th e extensive h isto ry o f state a n d control values.) The v alue for the delay in this discrete tim e m odel w ill be lOOm s/s, w here s is th e sim u latio n tim e step, an d the choice of 100 m s w as discu ssed in th e p rev io u s section. T here are, of course, infinitely m any trajectories w hich b rin g th e system to th e d esired targ et p o in t. W e fu rth e r im p o se a cost fu n ctio n w h ic h is to be m in im ized b y th e chosen trajectory, an d th u s 103 w o u ld , in th e absence of noise an d if certain convexity constraints hold, select a single trajectory o u t of the fam ily of possibilities. The cost function has a com bination of term s for final position accuracy an d for sm oothness. T he accuracy term is a w ay of en co d in g th e accu racy criterio n in th e e x p erim en tal d a ta to be m o d eled , w hile th e sm o o th n ess criterio n is an ad ap tatio n of a cost function used by H ogan (1984) a n d Flash a n d H ogan (1985), as discussed in chapter 2. The cost is given by: W = E t ( XN ~ Xg)|Xo} E{ ( XN “ Xg] Xo1 + E{ (XN “ Xg) V N (XN “ XG )+ k ^ 0UkR u k |Xo} w h ere Q jq, V]\j an d R are sym m etric, positive sem idefinite m atrices w ith a p p ro p ria te d im e n sio n s, xjvj is the actual final state, a n d x q is th e goal state. T he targ et final state for th e system is static, i.e. the velocity and acceleration com ponents are zero, since the goal is to stop the system at its targ et position. For convenience, an d w ith o u t loss of generality, w e define th e targ et final position to be zero as well. (Thus th e initial position w ill necessarily be nonzero.) Thus x g = 0 an d the cost function becom es J 0< x0> « e { xn |x 0} Qn e { * J * 0} + b|* T v n xn + £ uT r u k=0 X0 (5.3) In th e face of noise, no control sequence can g u a ra n tee th a t the final state w ill be th e d esired one. H ow ever, th e choice of trajectory w ill vary the deviation from th e targ et. T he first tw o term s in (5.3) p e n a liz e such deviations. The first term penalizes dev iatio n of the expected final state from th e targ e t state (zero). If xjq w ere scalar, an d Q]sj w ere u n ity , th is term w o u ld be E [x n ]2. The choice of Q n w eights th e p e n alty o n various elem ents of th e final state vector. T he second term w eig h ts th e variance of the final state, as w ell as its average. If Vjq w ere u n ity a n d xjq scalar, this term w o u ld be E[x n 2]. From elem en tary p ro b ab ility w e k n o w th at E[x]sj2] = (ax jq )2 + E[x]sj]2, w here (axjq)^ is the variance of xjq. T hus the w eig h t of th e p en alty on xjq's variance is Vn , w hile th e w eig h t of penalty o n its average is Q n +Vn - 104 T he th ird term in th e cost fu n ctio n is th e sm o o th n ess p en alty , a fu n ctio n of the jerk in p u t, an d hence is the discrete tim e v ersio n of Eqn. 2.2. W hile the state is p en alized o n ly a t th e trajectory's e n d , th e jerk is p en alized th ro u g h o u t. T here are tw o reasons for a d d in g this term . First, it is com m on to include in a cost form ulation a p en alty on th e input. This p re v e n ts a so lu tio n tra je c to ry w h ic h m ak es an u n n a tu ra l ju m p in p o sitio n to th e ta rg e t, u sin g a n in fin itely larg e in p u t. S econd, as m en tio n ed above, th e jerk cost term has been ap p lied by itself (i.e. w ith o u t th e accuracy term s) in successfully m o d elin g a fam ily of lo w accuracy m o v em en ts (H ogan, 1984) in m o deling p e rtu rb e d reach in g in ch ap ters 2 an d 4 of this dissertation. It is h o p ed th a t by ad d in g th e accuracy term s to th e cost fu n ctio n w e can account fo r a b ro a d e r fam ily of m o v em en ts, w h ile still in clu d in g th e low -accuracy resu lts as a special case. T he jerk term is w ritte n in a g en eral w ay, as if uj^ w ere a v ecto r o f a rb itra ry dim ension. In o u r case u ^ is scalar, so R is scalar also, th u s th e term could b e rew ritten R u j^ . O u r p roblem th en is to m inim ize (5.3), subject to the constraint (5.2). T he solution is given in th e next section. 5.3 Discrete Time Optimal Control U sing Dynamic Programming D ynam ic p ro g ra m m in g gives an a p p ro a c h to th e d iscrete tim e o p tim a l control problem . B ertsekas (1976a) show s h o w , for a p ro b lem such as th e one p o sed in the last section, an optim al control can be fo u n d , w hich is a function of th e c u rre n t state o f th e system x^. In our case, h o w ev er, a t each stage, w h ere is to b e g e n erate d , b ecau se of d elay ed feedback the c u rre n t state is n o t available. For som e stochastic fo rm u latio n s, th e solution is one of optim al estim ation, w h e re o n e finds th e optim al control for th e non-delayed case, then uses as th e c u rren t state th e b e st estim ate of th e sta te , b a sed o n av ailab le in fo rm a tio n . (This c o n stru ct is u sed in th e m odel of ch ap ter 2, w ith its "lo o k -ah ead " state estim ator. T hat system is designed to be optim al w h en th ere is n o noise, 105 an d th e feedback control stabilizes th e system in the presence of noise.) In th e p re se n t p ro b lem how ever, w e w ish to explicitly o p tim ize accuracy, tak in g into account a m odel of the noise in the system . In this case such separation of estimation and control is n o t p o ssib le, a n d a d iffe re n t a p p ro a c h is necessary. To solve th e o p tim iz a tio n p ro b lem , w e w ill co n stru ct a n ew dynam ical system , w hose sta te consists of th e c u rre n t a v a ila b le in fo rm a tio n a b o u t th e o rig in a l sy stem . By c h o o sin g th e a p p ro p ria te co n stru ctio n , th e s ta n d a rd o p tim iza tio n a p p ro a c h w ill be applicable. In short, w e w ill m ap th e delayed feedback problem into a n on delay ed problem , th en ap p ly a conventional solution. 5.3.1 Formalization of the Delayed Feedback In n o n -d elay ed feedback control, in the discrete tim e p a ra d ig m , a c o n tro l u k is given in term s of th e state of th e system at the time ufc is determined. T hat is, the control is given as u k (xk ). Since this control th en determ ines the state one stage later (i.e. xk+ i as in Eqn. 5.2), w e define the delay of the system , A, to be unity. W e w ish to discuss control b ased on the state of the system at a tim e fu rth er in the p a st th an stage k, i.e. for A>1. If w e assum e th at th e only restriction on inform ation is the feedback latency, a n d th a t all of th e control inform ation is available, th en calculation of u k m ay u tilize n o t only th e d elay ed state in fo rm atio n , b u t also th e en tire h isto ry of control signals. T hus the inform ation available to th e controller m ay be w ritten as a vector (xo'xi x k— i+ i ' V ui ....."k-l* O p tim a l feed b ack control is d e p e n d e n t o n accu rate k n o w le d g e of th e cu rre n t state. W e w ill n o w arg u e th a t th e necessary inform ation for this k now ledge is only (xk - A + l,Uk -A + l,Uk-A + 2, ‘ /Uk - l ) C o n sid er th e case of estim ating xk given th e in fo rm atio n v ecto r (uk _i, xk_i, xk ). The state and control at stage k-1 give the expression for xk , 106 xk = A k-ixk_i + B uk-i w h e re A j^.i is ran d o m , so is ra n d o m also. H o w ev er, since xk is in c lu d e d in th e in fo rm a tio n v e c to r its in fo rm a tio n o v e rrid e s th a t p ro v id e d b y x ^ -i a n d u k - i , y ield in g a precise v a lu e as o p p o se d to a stochastic d istrib u tio n of values. C learly, the in fo rm atio n from stage k-1 can be d isreg ard ed , g iven state inform ation at stage k. Sim ilarly, in o u r m ore general exam ple, if xk_A+i is know n, then w e can disreg ard state and control in fo rm atio n from earlier stages, since th ey hav e less inform ation ab o u t th e state at stage k-A+1 th an does xk-A+1- L astly, in p redicting from th e g iv en in fo rm atio n , th e early state a n d control in fo rm atio n (i.e. th a t p receding state k-A+1), w o u ld only p ro v id e ad d itio n al inform ation if th ere w as som e dependence betw een the system 's stochastic elem ents (the A ^ ’s) after k-A+1 a n d th e e a rlie r in fo rm a tio n . H o w ev e r, w e h a v e assu m ed th at th e A ^'s are in d ep en d en t, so this condition does not apply. T hus w e m ay use th e ab b rev iated in fo rm atio n v ecto r in su p p ly in g th e c o n tro ller w ith all av ailab le in fo rm atio n re g a rd in g th e sy ste m ’s state. F ollo w in g B ertsekas (1976b), let us d efin e th e v e cto r 1^ w hich is th e inform ation available at stage k: T I k = (x k - A + l,uk - A + l,u k-A+2' " ,u k - l ) (5.4) w here k=0, 1,. . ., N -l, so w e m u st define x an d u before stage 0: Let xj=x0, uj=0, j<0. This is consistent w ith the p lan t being static, a n d hav in g no d riv in g in p u t, p rio r to the initial tim e. Let p = n+m (A -l), w h ere n is the state d im en sio n (in o u r case 3) a n d m is th e control d im en sio n (in o u r case 1). Ik is th en a pX l vector. W e n o w re fo rm u la te th e p ro b lem , reflecting th e delay ed feedback condition. Let th e pX p m atrix an d the pX m m a trix H k be d efin ed so th a t 1^ evolves a cc o rd in g to th e sta te e q u atio n I . i = F I + Hu k+1 k k k (5.5) C learly w e need to set 107 Fk = A k-A+1 B 0 0 . 0 BnXm 0 0 ID mXm 0 . 0 BmXm 0 0 0 ID mXm . 0 a * ii BmXm 0 0 0 0 ' ID mXm BmXm . 0 0 0 0 • o -ID mXm . (5.6) w h ere IDmXm is the m Xm id en tity m atrix. W e next define th e nX p m atrix Gfc so th at xk = G kI k (5.7) n a m e ly , [ A k- 1A k -2 ” A k-A +1' A k - l ‘" A k-A+2B' A k -1" ‘ A k-A+3B ''''' A k -1 B' B] N o w th a t w e have recoded the system dynam ics, w e sim ilarly redefine the cost fu n ctio n in term s o f these dynam ics. W e rew rite th e cost fu nction (5.3) to have as its arg u m en t the inform ation vector Iq: V xo) = e { xnIxo} q n e { xn Ixo} + e xnv nxn + f “ kR ul I . k=0 (5.8) This is valid since, by the definition of Ifc, I0 has the sam e in fo rm atio n as xQ. N o w w e w ish to elim inate x]\j from (5.8), in favor of an expression in Ijsj- W e u tilize (5.7), n o tin g th a t Ifc an d a re in d e p e n d e n t, since Ifc d e p e n d s on Aq, A i,..., A]<_a+1, a n d Gfc d ep en d s on A k-A +2/ Ak_A+3/ A ^ - 1 • Therefore, e{ xn i o} = e { g n i nIi o} = g e { i n1 i o} In co m p u tin g th e m ean of G ^ / w e recall th a t each of th e co m p o n en t A m atrices are in d e p e n d e n t ra n d o m v ariab les, so th e ir m ean s m ay be co m p u ted separately, hence ;= e { g J = [ _ A— 2 _ A-3 _ A— 4 A , A B, A B,..., A B, B. A p p ly in g this to th e first term of (5.8) an d a p p ly in g (5.7) to th e second term , 108 'ov 0/ L N| OJ N L N! oJ I or, em ploying again th e independence of Gjq and In , W - K ' nI'o} T _T G Qn GE { ' n K > + E { w h e re T { g n v n g n > Eqn. (5.9) is the cost function to be m inim ized, u n d e r th e constraint of E qn. (5.5), in term s of the in fo rm a tio n vector. N o te th a t (5.5) is a d y n am ic system w hose state vector 1^ is, b y definition, available a t each stage k. T hus a feedback controller m ay p ro v id e an optim al based on th e c u rre n t state, as in th e sta n d a rd p ro b lem form ulation. In th e next section w e w ill solve th e o p tim izatio n problem for the fo rm u latio n given b y (5.5) and (5.9). 5.3.2 Optimization by Dynamic Programming W e b eg in by d efining the cost-to-go for o u r m u ltista g e sy stem , startin g w ith th e terminal cost. W e first define th e function w h ere I is an inform ation vector at a stage p rio r to or at stage N . T hen the terminal cost is obtained by sub stitu tin g I]\j for I: (5.10) T v K I ' nW I V I N N N 109 w hich says th at th e term inal cost sim ply reflects deviation from the target state. N ow , for earlier stages (k<N) th e cost-to-go is Jk(Ik)/ w h ere Jk(I) is d efined recursively, } (5.1D This says th a t at each stage th e cost-to-go is th a t of th e succeeding stage p lu s th e cost of the in p u t at th e cu rren t stage. This is consistent w ith the d efin itio n of Jq(Io)/ w hich says the cost-to-go from th e beg in n in g of the trajectory is th e cost of the in p u t th ro u g h o u t the trajectory, p lu s a pen alty at th e e n d (5.9). N o w consider th e p ro b lem of o p tim izin g th e in p u t at stage ujsj-i, to m inim ize the cost-to-go at the N - lst stage, jN - ld N - l)- F °r k = N -l, (5.11) becom es w i) = ' n(i) + e { un- , run J i } (5.12) C om bining (5.10) an d (5.12), T W O - e I ' n I1} Q n e { i n I‘ } + e{ i n V n in Ii} + e { ut n _ 1R u n _i | i } ( 5 .1 3 ) N o w th e cost-to-go at the N - lst stage is, T J N - l O N - O ^ b N l ' N - J Qn E{ I nII N-i } + E{ In ' V n |IN-1/ + uT R u (5.14) N -l N— 1 w h ere th e conditional expectation is d ro p p e d from th e last term because lN -1 only in clu d es u jsj.^ th ro u g h ujsj_2 , hence u ^ - l is in d e p e n d e n t of Ijsj-1 . In o rd e r to m a n ip u late th e conditional ex p ectatio n in (5.14), w e assign k = N -l in (5.5) an d substitute it into the conditional expectation e { i n |i n - i J' = E{ FN - l I N -l + H UN - i|IN - i } w hich is ju st E{ FN - l } IN -l + Hu n - i b e c a u se an d u ^ . } are in d e p e n d e n t o f In -1 - T he above can be abbreviated 110 FI + Hu N - l N - l w here, from (5.2) an d (5.6) it is clear th at th e average is, F = A B O 0 0 ID 0 0 0 0 0 0 .0 0 0 mXrri 0 0 ID 0 0 mXm . 0 . 0 . 0 . ID mXm (5.15) S ubstituting this expression into (5.14), T ^N-i O n - i) = ( f i n - i +H u n - i) ( f i n - i + Hu n - i ) + E ifpxr , + H u u 1 V fp T , + H uk T J d , ,f |\ N -l N -l N-lJ N l N -l N -l N - l/ N-lJ + un - i R u n - i N o w w e w ish to perform th e optim ization of this cost function. Since the p ro b lem involves a lin ear system w ith a q u ad ratic cost function, a n d no ran g e restrictions on the in p u t, w e m ay do so b y differentiating th e above, an d setting it equal to zero. 3u ^ J N - l ( I N - l ) = O = 2 ( ' i , N - l + H “ N - 0 Q H + 2 E 1 !F N _11 N -1 + H u N - 1J v n h N T ^ - l j + 2 un - i r N o w w e take th e tran sp o se, d iv id e by tw o, an d ev alu ate th e expectation term (rem em bering the sym m etry of V]\j and Qpsj, stated earlier). 0= H Q ( F I N N - l + Hu N - l + H V | FI N N - l + H u N - l + Ru N - l - '\ Q + V N N 0= H N o w , solving for u n _i , ( F JN-1 + HUN - l ) + RuN- 1 I l l o= h t | q + v N N f i n - i + I h 1Q n + V |H + R u, N -l u. N - l w hich can be w ritten ( T ( ~ ~ > N — i T f ~ ~ \ H 1 Q + V H + R H Q + V \ I N N; I N N J FI N - l u = L I N - l N - l N - l w h e re l k t , = - | h 1 Q + v ^ N - l I I N N H + R h T q + V I N N Let us define the follow ing sym bols: P, = F + HL, k k k P = F+ HL, k k so th at (5.16a) (5.16b) 1k+, = Fk Ik +Huk ' V k + HLkI k - p kI k (N ote th a t since Pk d ep en d s on Ak-A+1 w hile Ik d e p en d s on A q th ro u g h A k-A ' ^ e tw o are in d ep e n d en t ran d o m variables.) N o w w e can tu rn back to th e cost-to-go fu n ctio n (5.13), e n te rin g its o p tim a l v a lu e u sin g th e definition of Pk, T Jn - 1 ^ H W n- I 1 1 } ^ n E { P N-1I N-1^1} + e{ i t PT V P I 1 N -l N -l N N -l N -l + e { i n - i l n - i r l n - i i n - iIi } T _T _ = E * flxr III P Q P r E-f IX T .II I L N - l 1 J N -l n N -l » - N -l J f T T + E1I N-1PN-1 T T V - r ^ K T i ^KT 1 ^ XT I ^ XT 1 ^ XT 1 N N -l N -l N -l N -l N -l N -l 112 T T _ + e { i n - . ( p n - i ' ' n p n - i + l n - , r l n - , ) i n- , 1 '} N o te, in th e last term th e ran d o m variables are Ijsj-i an d P n - 1/ a n d th at th ey are in d ep e n d en t, since In -1 only d ep en d s o n p rev io u s stages of the system dynam ics, i.e. Pfc, k < N -l. T herefore the above m ay be w ritten, W ‘> ^ { v J 1} V i 5n V ^ L nJ 1 } + ER - , E{ PL V N PN _, + EN -!RLN -,} 1 N—1 I '; T = E{ 1 N -l11} V , E{ V-!1 ' } + ER ~ , V N - l 1 N -l ' 4 w h e re (5.17) Q = P Q P N - l N - l n N - l (5.18) V T = E N -l PIr , V PX I , + l J T ,R L m , N -l N N -l N -l N -l Pxi , V PX T , N -l N N -l + l n - i r l n - i (5.19) In sim plifying this last expression, w e can't sim ply p u ll o u t each P>j-1 an£* replace it w ith its expectation since it occurs twice, i.e. the factors inside the b rackets are n o t in d ep e n d en t ran d o m variables. In stead , w e pro ceed by rep la cin g P n - 4 b y the sum of its m ean an d a stochastic term w ith zero m ean. Let us define the constant m atrix 113 F4 - £ 0 0 0 0 0 .0 0 . . . 0. so that, w ith (5.2), (5.6), and (5.15), F = F + F n. k k-A+1 R eferring to (5.16), P , = P + F rt, k k k-A+1 P lugging into (5.19), V = E N -l P + F n ^ N -l N-Aj N P + F n . N -l N-A, + l n - ir l n - i T v =p v p + wT + w N -l N -l N N -l I ^ \ T + E1 F nN-AV N F nN-Aj+LN-lR L N-l w h e re _T „ * W = E1 P V F nK T A I N -l N N-A. = P V F E-T n A> = 0 N -l N L N-AJ T h u s _ _T _ _ £ ~ £ v =p v p +e( i^. I f v f + lL . rl . N -l N -l N N -l L N-AJ N N -l N -l ~ _T ^ ~ £ V = P V P + a 2 F V F + LF R L N -l N -l N N -l N N -l N-l 114 w h e re cr^ is th e variance of n ^ . T his concludes th e d e riv a tio n of th e optim al control an d the cost-to-go for stage N - l . To sum m arize, w e began w ith an expression for the cost-to-go at stage N , T j n (i) = e { i J i } q n E { g i } + E{ g V N I N |l} w h e re Q = G Q G N N N = e { g n v n g n > W e d eriv ed the optim al control at stage N -l, u = L I N - l N - l N - l w h e re ‘N - l ( T ( ~ ~ N — i T ' ~ ~ \ H Q + V H + R H Q + V \ I N N; I N N j after w hich w e derived the cost-to-go at stage N -l, T V aE{ IN-l|l/ + E T I V L N -l N -l N -l I w h e re Q = P Q P N - l N - l N N - l -r T — -T „ _ P ~ ^ T V = P V P + a 2 F V F + L R L N - l N - l N N - l N N -l N -l N o te th a t Jn-i(I) is in the sam e form as J n (I). If w e u se th e above ap p ro ach for calculating at stage N -2 the optim al control ( ujsj_ 2 ) a n d cost- to-go ( Jn_2(I) )/ b ased on Jn _ i(I), w e w ill o b tain a resu lt sim ilar to (5.17), b u t b ased o n Q an d V , ra th e r th a n Q a n d V . T his logic N - l N - l N N ap p lies to all p re c e d in g stag es, 0 th ro u g h N -2, a n d th e re su lt is th e follow ing recursive form ula. The optim al control for th e system is lfk = Lk I k 115 w h e re H I Q + V |H + R k+1 k+1, T (~ ~ \ H | Q + V k+1 k+lj and _T _ Q = P Q P k k k+1 k -T - - „ f V = P V P + ct2 F k k k+1 k Q = G Q G N N £ T V F +L. R L, k+1 k k v n ' e { gn v n gm} T his co n clu d es th e d e riv a tio n of th e o p tim a l c o n tro l for th e sta te d problem . N o te th a t in th e so lu tio n , u*k d e p e n d s o n 1^, th e c u rre n t in fo rm atio n vector. Fig. 5.3 visualizes th e in fo rm atio n flow for this type of feedback control. The controller receives n o t only a d elay ed version of th e state vector, b u t also a copy of its ow n o u tp u t, u ^ . W e assum e th ere is som e in te rn al b u ffer in th e controller to h o ld th e recen t v alu es of u ^ , co rresponding to th e elem ents of Ir. N ote also th a t the controller's input- to -o u tp u t m apping, or transfer function, is given b y an d is, in general, tim e varying. noise k-A Plant C ontroller Fig. 5.3. S chem atic of co n tro ller a n d co n tro lle d p la n t. N o te feedback delay, noise a d d e d to p lan t, a n d efferent copy o f control retu rn in g to th e controller. 116 Before proceeding to sim u latio n results u sin g th is o p tim al control, w e m u st d eriv e fo rm u las for tw o quantities: T he ex p ected v alu e of the system state at any stage, w hich is the average trajectory, a n d th e variance of the state, w hich gives th e degree of variability of th e trajectory a ro u n d its average, a t any p o in t in tim e. 5.4 Expected Value and Variance of the Trajectory To find the expected v alu e an d th e variance of positio n at an y stage k, given an initial state x q (or Iq) an d th e optim al control sequence Lfc, w e begin by calculating the expected state at any stage, Since th e P's are in d ep en d en t of each other, an d in d ep e n d en t of Iq: T his is th e expected v alu e for th e state. T he ex p ected v a lu e for the co rresp o n d s to th e average dynam ics of th e system at stage k. T hus the expected value of the state at stage k is given by applying to the initial state th e average dynam ics of stages 0 th ro u g h k-1. To find th e variance in the p o sitio n at an y stage k, w e start w ith the follow ing ex p ressio n for the second m om ent, E{ xk|*o}= E { GkI k|xo} = G E { ‘ k|xo} = G E { I k|Io} E{ I k|1o } “ E{ Pk-,Pk- 2 - P 0 I o|I <,} P k-1 k-2 (5.20) p o sitio n is sim p ly th e first c o m p o n en t of th is vector. N o te th a t p k 1 0 0 M= 0 0 0 0 0 0 117 W e rew rite this in term s of the inform ation vector, E { 4 < ,)k }= E { l T k G IM G kIJIo} o r, in a m ore succinct form , E{ 4 (4 ‘0} = E K ,:V l J Io}' “ k = G IMGk W e ex pand as in (5.20) E- K (i,k } =E {1 op o pL pl , w k_2 Poi, To ev alu ate this expression w e first define th e follow ing new variable (5.21a) U k = P TPT ... P^ P^ M P P ... P P. j j j+l k - 2 k - 1 k k - 1 k - 2 J +-1 j for k > j, and U k = M k k (5.21b) N o w eR < 4 ‘o } = e{ iW J io} But if I0 is given, th en it is treated as determ inistic an d p u lle d from the E{}: E-R<i,k } “IM uS} > K } o (5.22) To calculate E 1 u oJ , w e begin b y n o tin g th a t (5.21) allow s u s to w rite the rec u rsiv e d efin itio n E {uiM p Iu k 4 j < k EK } - » { « k} (5.23) To evaluate (5.23) w e proceed as in (5.19), b y replacing Pj by th e su m of its m ean an d a stochastic term , T k P U f P. J J + 1 J . = E" % l P.+F n. . . 118 w h ere the details of the d eriv atio n have been om itted. T hus w e hav e the recu rsiv e form ula, A lo n g w ith (5.22) th is fo rm u la gives th e seco n d m o m e n t E{x2}. To calculate th e variance, w e sim p ly u se the form ula for E{x) g iv en above, a n d th e form ula for the variance from elem entary probability, w here, to extract the position com ponent from th e state vector in E{x}, an d sq u are it sim ultaneously, w e can em ploy the follow ing form ula: 5.5 Simulation Results W e a p p lied th e o p tim al control d eriv ed in Sect. 5.3 to th e p roblem stated in Sect. 5.2, a n d u sed the form ulae of Sect. 5.4 to extract th e results, sh o w n in Figs. 5.4 th ro u g h 5.6. The determ ination of R, Q sj, Vjq, a , an d % w ere m ad e em pirically, in o rd er to quantitatively rep ro d u ce th e d a ta from M ilner an d Ijaz (1990) (show n in Fig. 5.1), alth o u g h th e qualitative resu lts w e set o u t to p ro d u ce (i.e. th e effect of speed on v ariab ility an d velocity skew ing) occur w ith a w id e ran g e of p a ra m e te r settin g s. W e u se d a sim u latio n tim e step o f .02 s, an d a feedback d elay o f .100 s as chosen earlier. W e fo u n d em pirically th at to obtain the rig h t level of variance in th e trajectories, w e sh o u ld set <5=7, an d th e center (non-zero) colum n of t, E{ u k k } = E { M k } . E { G^M G k} a | = e { z2 } - E ( z} 2 119 (Eqn. 5.2) w as £ 1 2 =.001, £ 2 2 = -001, £ 3 2 = 1- T he trad e -o ff b etw een sm o o th n ess a n d accuracy, w hich d eterm in ed th e p ro g ressiv e skew ing of th e v elo city p ro file w ith m o v em en t d u ra tio n w a s c o n tro lle d b y th e relative values of R, Q]\j, an d Vjq. Q vj, and Vjsj are 3X3 m atrices an d R is scalar, eq u al to a b ase level p a ra m eter, Rw t, d iv id e d b y th e m o v em en t tim e. (The m o v em en t tim e n o rm a liz a tio n w as d o n e so th a t lo n g er m o v em en ts w o u ld n o t in cu r larg er "jerk" p en alties sim p ly b ecau se they h a d m ore discrete tim e steps.) T he best fit to the d ata occurred w h en using Rw t = 10“4/ VN = 1Q0 i d 3X3/ O n = 1 0 6 i d 3X3/ w h ere ID 3 X3 is th e 3 X 3 id en tity m atrix. T he sy stem an d o p tim a l co n tro ller w ere sim u la te d u sin g these p aram eter settings, for a 20 cm displacem ent having a v ariety of durations. Fig. 5.4 show s the results for d u ratio n s of 350 m s, 800 m s, a n d 1000 ms. T he left colum n of g raphs show s displacem ent as a function of tim e, w ith th e s ta n d a rd d e v ia tio n p lo tte d at each p o in t in tim e . N o te th e m onotonically increasing variability in Fig. 5.4a. T his occurs because the d u ra tio n is n o t lo n g c o m p a red to th e feedback d elay . In co n tra st, v a riab ility decreases to w a rd m o v em en t's e n d in Figs. 5.4c an d 5.4e, as feed b ack co rrectio n occurs. T his is co m p arab le q u a n tita tiv e ly to th e variability d ata in Fig. 5.2, in w hich th e m axim um variability is 2.5 - 3.0 cm a n d th e fin al v a ria b ility is a b o u t 0.5 cm. For th e sim u la te d 800 m s m o v e m e n t (Fig. 5.4c), th e m ax im u m v a ria b ility (i.e. o n e s ta n d a rd deviation to each side of th e average) is 4.28 cm. A t th e en d it is 0.40 cm. For th e 1000 m s sim u lated m o v em en t these values are 4.12 cm an d 0.18 cm , respectively. In stu d y in g Fig. 5.2 w e see th a t th ere a re a d d itio n al k in em atic fea tu res th a t m ig h t b e ad d re ssed : T he m a g n itu d e of p eak variability, its tim e of occurrence d u rin g th e m ovem ent, a n d th e overall shape of th e variability trajectory seem d ep en d en t on th e an g u lar position of th e target, w hich is different for th e left, center, an d rig h t diagram s. To explore these phenom ena, additional detail w ould have to be a d d e d to the m odel p resen ted here, w hich captures the basic tim e course of variability for different m ovem ent d u rations. 1 2 0 The rig h t colum n of Fig. 5.4 show s velocity as a function of tim e for th e th ree d u ratio n s. A s one w o u ld expect, peak velocity decreases w ith in creasin g m o v em en t tim e. F u rth e r, th e p ro p o rtio n o f th e m o v em en t tim e a fte r th e p e ak velocity in creases as d u ra tio n in creases. T his is com parable to the skew ing seen in Fig. 5.1. N ote, how ever, th a t th ere are o th e r k in em atic featu res, specifically an a p p a re n t late c o n stan t velocity p h a se, w h ic h are su b tleties b e y o n d th e scope of o u r m odel. F u rth e r refinem ents to th e "first ord er" p lan t dynam ics an d cost function u se d in this ch ap ter are n eed ed to p ro d u ce an exact m atch to th e data. O u r goal h ere is to explain th e basic kinem atic p h en o m en a o b serv ed across m any experim ental paradigm s. A q u an titativ e com parison of th e M ilner an d Ijaz (1990) d a ta to o u r sim ulation results (for six different durations) is show n in Fig. 5.5. In each plot, experim ental d ata are in dicated b y closed squares, sim u latio n results b y o pen squares. Fig. 5.5a plots final p o sitio n accuracy as a fu n ctio n of m o v em en t tim e for the real an d sim u lated m ovem ents. In Fig. 5.5b, the accuracy d a ta is g rap h e d in the fo rm of "Index of difficulty." N o te the lin e ar sh a p e co n sisten t w ith Fitts' L aw for b o th th e real a n d sim u lated m o v e m e n ts. Fig. 5.5c p lo ts p e a k v e lo c ity , w h ic h d e c re a se s w ith m o v em en t tim e, a n d Fig. 5.5d sh o w s th e sk ew in g effect of in creased m o v em en t tim e, evidenced by slow increase in tim e to p eak velocity as m ovem ent tim e increases. In each case there is a close m atch b etw een the d a ta a n d sim ulation. For th e sake of com parison, in Figs. 5.5 c a n d d the m ag n itu d e an d tim e of p eak velocity are sh o w n for th e ideal, non-skew ed m inim um -jerk velocity profile. This is indicated in each figure b y a thick lin e. To em phasize the im portance of using variability in the m odel, tw o a d d itio n al sim ulations w ere run. First, th e 1000 m s sim u lated m o v em en t w as ru n w ith feedback delay, b u t w ith o u t noise. The resu lt is sh o w n in Fig. 5.6a,b. N o te th at w ith o u t noise, p red ictio n of th e sta te is perfect, d esp ite feedback delay, so it is as if the sim ulation w ere of a determ inistic sy stem w ith o u t feedback delay. T he re s u ltin g v elo city trajec to ry is sym m etric, as in H ogan (1984) w hen only a sm oothness criterion w as 121 a b -5- 1 X -15- -25 0.0 0.2 0.4 0.6 1.0 0.6 t(sec) 100 80- 2 0 - 0.0 0.2 0.4 0.6 0.6 1.0 t (sec) c d 100 W 40 2 0 0.6 1.0 0.0 0 .2 0.4 0 .6 t (sec) -25 0 .0 0 .2 0.4 0 .6 0 .8 1 .0 t(sec) e f 1 0 0 80 - 60 - 2 0 - 0.6 1.0 0 .0 0.2 0.4 0.6 t (sec) -5- u * -15- -25 0.0 0.2 1.0 0.4 0 .6 0 .8 t(sec) Fig. 5.4. S im ulated m ovem ents at various speeds, a. Position as a function of tim e for a 350 m s m ovem ent. N o te th e m onotonically in creasin g variability, b . V elocity profile for 350 m s m ovem ent, n e a rly b e ll-sh a p e d . c, d. P o sitio n a n d v elo city for 800 m s m ovem ent, e, f. Position a n d sp eed for 1000 m s m ovem ent. N ote th at in c an d e v ariab ility decreases to w a rd m o v em en t's en d , as feedback correction occurs. (C om pare to Fig. 5.2.) N o te also th e increasing skew edness of th e velocity w ith m ovem ent tim e in b , d, and f. (C om pare to Fig. 5.1.) 122 C D o cx. TJ G ffl 0.2 0.4 0.6 0.8 1.0 1.2 10 0.2 0.4 0.6 0.8 1.0 1.2 M T (sec) M T (sec) 120- 'u 100- O ) co s 80- V - 60- % ■ ex. 40- CO . 3 20- 2S 0- 0.2 0.4 0.6 0.8 1.0 1.2 M T (sec) u 0 ) CO s Oh C A X e t i 6 M h O Q 0.6 0.2 0.4 0.6 0.8 1.0 MT (sec) Fig. 5.5. C o m parison of sim ulation to th e d ata of M ilner a n d Ijaz (1990). E x p e rim e n ta l d a ta a re in d ic a te d b y clo sed sq u a re s, sim u latio n resu lts b y o p e n squares, a A ccuracy as a function of m o v em en t tim e, b The sam e d ata as a, g ra p h e d in the form of "Index of difficulty." N ote th e linear shape p red icted b y Fitts' Law. c Peak velocity, w hich decreases w ith m o v em en t tim e, d Skew ing effect of increased m ovem ent tim e, evidenced b y very slow increase in tim e to p eak velocity. 123 0.0 0.2 0.4 0.6 0.8 1.0 t(sec) 1 0 0 80 ( A 1 > 2 0 0.8 1.0 0.2 0.4 0.6 0.0 t(sec) c d 1 X -25 0.0 0.2 0.4 0 .6 0.8 1.0 t (sec) 100 tfi 60 - 0.4 0.6 0.8 1.0 0.0 0.2 t (sec) Fig. 5.6. a, b. S im ulated m ovem ent w ith d elay ed feedback, b u t no noise. D u ra tio n is 1000 m s, as in Fig. 5.4e, f. W ith o u t noise, p re d ic tio n of th e state is perfect, d e sp ite feed b ack delay, so th e re su ltin g velocity trajectory is sym m etric, as w h e n a sm o o th n ess criterio n is im p o sed to a d eterm in istic system , c, d. S im ulated m o v em en t w ith Vjsj set to zero. D uration is, again, 1000 ms. N ote th at, com pared to Fig. 5.4e, the term inal variability is greater. This is th e resu lt of rem oving th e term inal variability term from th e cost fu n ctio n . 124 im posed. This reinforces the idea th a t velocity skew ing is d u e to control in th e face of noise an d delayed feedback. N ext th e sam e m o v em en t w as sim u lated w ith th e noise restored, b u t w ith Vjsj set to zero (i.e. no accuracy criterion in the cost function). A gain, note the sym m etric average velocity profile. N o te also that, com pared to Fig. 5.4e, th e term in al v ariab ility is g reater (0.79 cm versus 0.18 cm in Fig. 5.4e.) This, w e feel, is com parable to th e class of relatively low accuracy m ovem ents m odeled in H o g an (1984), w h e re th e m in im u m jerk cost w as e m p lo y ed without an accuracy term . T h u s th e m o d e l p re s e n te d h e re a d d re ss e s a la rg e r class o f lim b m o v e m e n ts. It is im p o rtan t to em p h asize th a t this m o d el is an extension of the m o d el of rea ch in g d iscu ssed in p re v io u s ch ap ters, n o t a re p la c e m e n t w h ich a d d re sses a d ifferen t b o d y of d ata. T his sh o u ld b e in tu itiv e ly o bvious from th e cost function w hich extends th e sm oothness p e n a lty to in clu d e a p en alty for variability. To form ally show th at th e m odel in this ch ap ter reduces to th e m in im u m jerk m o d el w h en noise is rem o v ed , w e p ro v id e th e follow ing deriv atio n . R etaining th e d y n am ic p ro g ram m in g a p p ro a c h , w h ic h p e n aliz es final sta te in ste a d of se ttin g a b o u n d a ry c o n stra in t, b u t sw itc h in g , fo r co n v en ien ce, to th e c o n tin u o u s tim e p arad ig m , w e have the follow ing o p tim izatio n problem : M inim ize subject to x= Ax + Bu a n d g iv en an in itial state, xD. O m ittin g th e sto ch astic ele m e n t, th is p ro b lem p en alizes only th e m ag n itu d e of th e control (sm oothness) an d d ev iatio n from th e targ et state. The so lu tio n to this control p ro b lem is (Bryson an d H o, 1975), (5.24) u= - R 1 B TK(t)x (5.25) w h ere K(t) is defined by the m atrix Riccati equation, 125 - K = K A + A TK + Q - K B R 1 BTK K(tf)= Q N (5.26) N o w , fo r th e c o n tin u o u s tim e, d e te rm in istic v e rsio n of th e p ro b lem described by (5.2) a n d (5.3), w e define the constants, A, B, Q an d R to be 0 1 0 0 0 0 0 A= 0 0 1 B = 0 I I o 0 0 0 . 0 0 0 . .1. . 0 0 0 . R = 1 so th at K is given by the differential equation ’ 0 1 0 " * 0 0 0 ' ’ 0 ‘ I I 0 0 1 + 1 0 0 K - K 0 0 0 0 0 1 0 . 1 [0 01] K o r " 0 1 0 ‘ ’ 0 0 0 ' ’ 0 0 0 " I I 0 0 1 + 1 0 0 K - K 0 0 0 . 0 0 0 . 0 1 0 0 0 1 . K (5.27) G iv en th a t Q jq is sy m m etrical, in sp e ctio n of (5.26) rev eals th a t K is sym m etrical also. Let us rew rite K as, a b c K = b d e c e £ w hence (5.27) becom es a b c a b C 1 0 1 0 ‘ ‘ 0 0 ° 1 b c a b C1 0 0 0 r- b c b d e = b d e 0 0 1 + 1 0 0 b a e _ b a e 0 0 0 b a e c e f _ c e f J . 0 0 0 . . 0 1 0 J C e f . c e fj . 0 0 1 . . « e i . or, sim plifying, a b c b d e c e f ’ 0 a b " ’ 0 0 0 ’ ’ 0 0 c a b c 0 b d + a b c - 0 0 e b d e 0 c e b d e . 0 0 f . c e f a b c ' 0 - a - b c2 ce cf b d e = - a - 2b - c - d + ce e 2 ef c e f . . - b - c- d - 2e 1 n ef f2 126 T he above p ro v id es n in e scalar eq u atio n s, th ree of w hich are d u p licates because of th e sym m etry of K. The six in d ep en d en t equations are, a= b = c e - a c= c f - b d= e ^ - 2 b e= e f - c- d f = f 2 - 2 e (5.28) Since th ey are non-linear, th ey are difficult to solve. H ow ever, w e m ay em ploy the follow ing ansatz: Plugging R, B, an d K into (5.25), w e have u= - [ 0 0 l] a b c b d e c e f o r u= - [c e f] x The analytic solution to the m inim um jerk problem w as, from (2.19), u= - 60 36 _9_ D 3 D 2 D J w here D = tf - t. Thus, c= 60 D 3 ' e= 36 D 2 ' f= D D o these satisfy th e six equations of (5.28)? The first equation becom es 2 a = 60 D* or, after integration 720 a = D p lu g g in g th is, alo n g w ith th e fu n ctio n s for c a n d e, in to th e second e q u atio n , 127 60 36 720 1440 _ 3 2 _ 5 “ 5 D D D D 360 D 4 T he functions for c, e and f are consistent w ith the th ird equation in (5.28), 180 _ 60 9 360 4 3 _ 4 D D D D 180 _ 180 4 4 D D The fo u rth equation in (5.28) yields d, j - 362 _ 2 360 _ 576 4 4 4 D D D 192 D 3 Finally, th e functions for a, b, c, d, e an d f are consistent w ith th e last tw o equations in (5.28): A fter substitution, the fifth eq u atio n is 72 36 9 60 192 e= 72 _ 324 - 60-192 _ 3 " 3 D D 72 _ 72 3 3 D D The sixth equation is j__9_ __81_ 2 36 2 2 2 D D D 9 _ 81-72 2 2 D D 9 9 2 2 D D T hus th e m in im u m jerk-based feedback control la w satisfies th e Riccati equation. This m eans th e resu ltin g control a n d state trajectory m inim ize 128 (5.24), a n d em p h asize th a t w ith o u t th e noise p enalty, th e so lu tio n to the o p tim iza tio n p ro b lem of this c h ap ter is th e sam e as th e m in im u m jerk so lu tio n of the prev io u s chapters. 5.6 Discussion B eh av io ral stu d ie s rev e al th a t m o v e m e n t can b e a ch iev e d at v a rio u s sp e ed s, w ith d ecreased sp eed b ein g asso ciated w ith in creased accuracy requirem ents. A possible explanation for this sp eed / accuracy trade-off is th a t th e increased tim e, occupied by a p ro lo n g ed low velocity p h ase, allow s m o re m o n ito rin g of th e available sen so ry in fo rm atio n (a process w h o se existence is evidenced by targ et p e rtu rb a tio n experim ents, w h e re a ta rg e t shift lead s to a sm ooth, "on-line" tra n sitio n to a novel tra je c to ry w ith o u t th e h a n d h a v in g to first te rm in a te th e in itia l m ovem ent an d stop, as discussed in C hapters 2, 3 an d 4.) D uring the low v elo city p h a se , little n o ise is in tro d u c e d , as n o ise is p ro p o rtio n a l to velocity in this m odel. T he m athem atical m o d el of this c h a p te r show s th a t this skewed velocity stra te g y m a y em e rg e sim p ly b y sp e cify in g accuracy as an o p tim izatio n criterion: The trajectories in Fig. 5.4 are not p re-d esig n ed , ra th e r th ey em erg e as th e b est w ay to efficiently p ro d u ce accu rate m ovem ent. The view of Fitts' L aw created by this stu d y is th at, given a relative tra d e -o ff b e tw ee n m o v e m e n t efficiency a n d accu racy w ith in a fixed m o v em en t tim e (Eqn. 5.3), a p a rtic u la r rela tio n sh ip d e v elo p s b etw een a cc u ra c y a n d d u ra tio n , w h ic h is su m m a riz e d as F itts L aw . T his re la tio n sh ip is le a rn e d th ro u g h ex p erien ce so th a t d u ra tio n m ay be selected to yield a desired degree of accuracy. T he m odel show s th a t a tw o p h ase m odel of reach (W oodw orth 1899; Jeannerod 1984) is unnecessary, b o th the kinem atic p ro file an d the feed fo rw ard / feedback n a tu re of th e tw o phases are in this one m odel. F irst, th e q u ick in itial p h a se fo llo w ed b y slo w d e ce le ratio n p h a se is a p p a re n t in th e o u tp u t of this m odel. Second, because of th e d elay ed 129 feedback (Fig. 5.3), th e initial 100 m s of th e m ovem ent are in d ep e n d en t of feedback signals, so th e m o v em en t is essentially feedforw ard. A fter th a t 100 m s, feedback begins to influence th e m ovem ent, so th e n a tu re of th e control changes. Som e consider th e question of w h eth er th ere is feedback control at all in reaching to be an open issue. T he results of th is chapter le n d credence to th e feedback view : T he m o v em en t kinem atics an d the variability profiles are d e p en d e n t on th e feedback stru ctu re of th e m odel. T he evidence is especially stro n g w h en the v ariab ility profiles (Figs. 5.2 a n d 5.4) are c o n sid ered . D ecreasin g v a ria b ility to w a rd th e e n d of m o v em en t is a sig n atu re of a feedback process. F rom control th eo ry w e h a v e th a t a feed b ack c o n tro lle r m ay be rep la c e d b y an e q u iv a le n t feed fo rw ard controller w hich perform s the sam e in p u t to o u tp u t m ap p in g (called its transfer function ). T he d ifference b e tw ee n th em is in th e corrective response to noise or other p erturbations. A s m ost o th er m odels (review ed in Sect. 5.1) do not m odel noise or p ertu rb atio n , their m odels of th e control o f reach do n o t thoroughly exercise their control schem es. The m o d el p re se n te d h ere m akes ex p licit p re d ic tio n s a b o u t v a ria b ility in m ovem ent. First, if variability is stu d ied d u rin g reaching experim ents (as in Fig. 5.2) w ith m ovem ents of v ery short d u ratio n , th en it sh o u ld be seen to increase m onotonically, as in Fig. 5.4a. Second, for deafferented subjects reaching w ith o u t vision of their arm o r h an d , since visual an d kinesthetic feedback has been rem oved, variability should increase m onotonically. W e m ay n o w relate this m odel to o u r schem atic of reach a n d g rasp control (Fig. 4.7). R equired accuracy in the task influences its d u ratio n . For a g rasp in g task, accuracy is based o n the difference betw een th e h a n d 's a p e rtu re a n d th e targ e t size. Since th e a p e rtu re is calculated from th e targ e t size, it is th e targ et size w hich m u st be sent to th e Transport time needed m o d u le (as sh o w n in th e d iag ram ) to d e term in e th e m o v em en t d u ra tio n n e ed e d for this level of accuracy (p resu m ab ly learn ed th ro u g h experience). A ccuracy in reach th en em erges as a resu lt of this p lan n e d d u ra tio n . T his in v estig atio n ex p lo red th e o rig in s of kin em atic fea tu res of m o v em en t, in term s of goals of th e m o v em en t (d u ratio n , accuracy, an d 130 sm o o th n e ss). N o t a d d re s s e d h e re is w h a t th e c o n tro l s tru c tu re (sch em atized in Fig. 5.3) says a b o u t th e n a tu re of m o to r control in the brain. In term s of th e n eu ral instantiation of th e control strategy w e have a ssu m ed , certain features are easily v isu alized as h av in g n eu ral analogs. G iv e n th a t th e p la n t is th e b io m e ch a n ics of th e a rm , a lo n g w ith kinesthetic an d visual sensors for state estim ation, an d th a t th e controller is realized by m otor centers in the brain, it is easy to view the delay ed state in p u t as b ein g im p lem en ted by th e afferent n erv e p ath w ay s, th e efferent copy b y various recu rren t n eu ral p ath w ay s in the brain, an d th e internally tim e v a ry in g c o n tro lle r as b e in g im p le m e n te d by re c u rre n t n e u ra l netw orks w ith com plex dynam ics, located in m otor centers. T here are tw o o th e r p ro p e rtie s of th e m odel controller w hich su g g est n ew p ro p erties experim enters m ig h t search for in n eu ral m o to r control centers. First, that m o v em en t d u ra tio n is en co d ed in term s of th e tim e rem a in in g in the p la n n e d m ovem ent. The controller's b eh av io r is described b y th e m atrix L ^. Its reverse-tim e recursive definition im plies th a t it is d e p e n d e n t not o n th e total m ovem ent d u ratio n (N) b u t o n th e n u m b er of stages u n til the e n d (N -k). A n n e u ro p h y sio lo g ist re c o rd in g sin g le cells in m o to r or p rem o to r cortex d u rin g lim b m ovem ents of v a ry in g d u ra tio n m ig h t look for n eu ral activity w hich is d ep en d en t n o t on the tim e from the beginning of th e m o v em en t, b u t ra th e r th e tim e u n til th e m o v em en t's end. O f course it is im p o rta n t to rem em ber th at th e tem p o ral d ep en d en ce of the controller in the m odel is a resu lt of the o p tim ization m athem atics, b u t if o p tim a l co n tro l can c ap tu re o b serv ab le m o to r b eh av io r, it m ig h t also cap tu re p ro p erties of the internal com putation. T he se co n d p ro p e rty o f th e c o n tro lle r w h o se c o n sid e ra tio n is significant to n eu ral m otor control is its control of speed an d accuracy. In th e m odel it is assum ed th at the d u ratio n is chosen w h en th e trajectory is p la n n e d a n d th at variability em erges as a result. T he altern ativ e is th at m o v em en t p ro ceed s w ith o u t an y p lan n e d d u ratio n , term in a tin g w h e n a d eg ree of accuracy is reached. This is th e id ea of th e iterative correction m o d e l d isc u sse d earlie r. H o w ev e r, th e re is th e p o w e rfu l c o u n te r a rg u m e n t th a t the velocity profile at th e beginning o f a m o v em en t (i.e. 131 before peak velocity) for tasks of high accuracy is different from th e early velocity p ro file o f low accuracy m ovem ents. T his signifies th a t en tire m o v em en t is p lan n ed ah ead of tim e w ith th e accuracy constraint in mind. Im plicit in the m odel of this ch ap ter is th a t the b rain learns a m ap from req u ire d accuracy to p lan n ed d u ratio n , to be u se d w hen p resen ted w ith a task h a v in g an in trin sic accuracy req u irem en t. A sid e from th e m o to r b e h av io ral ev id en ce for th is view , a n e u ro p h y sio lo g ist m ig h t exam ine m o to r cortex "set" activity p rio r to lim b m ovem ent, looking for variations correlated w ith task accuracy. L astly, th ere is th e q u estio n of w h e th er a self-o rg an izin g system such as the b rain can attain perform ance w hich o p tim al w ith respect to som e cost function. A significant b o d y of research, u sin g artificial n eu ral n etw o rk m odels, has sh o w n th at it is certainly possible (Barto et al., 1983; B arto e t al., 1989; W erbos, 1990): In reinforcement learning a system w hose b eh av io r is m odifiable b y o u tsid e influence is g iv en perform ance feedback (i.e. m easu rem en ts of cost) w ith o u t b ein g to ld explicitly w h a t perform ance is desirable (as in supervised learning). By a process w hich is m athem atically ak in to dynam ic p ro g ram m in g , th e system converges on th e optim al behavior. This finding, w hich lends essential credibility to the concept of optim ality in learned m otor behavior, is the subject of C h ap ter 7. 132 C h a p t e r 6 A r m D y n a m i c s i n T r a je c t o r y F o r m a t i o n W e rev iew p la n t m odels a n d cost fo rm u latio n s for trajectory m odeling, h ig h lig h tin g th e a d v an tag e s of th e d ifferen t a p p ro ach es. To m o d el a p e rtu rb e d p o in tin g task w hich spans both proxim al a n d d istal p ortions of reachable space, a dynam ic arm m odel an d cost fo rm u latio n are utilized. C o m p ariso n is m ad e to th e m in im u m jerk m odel. A novel analysis of p ertu rb atio n reaction tim e is m ade, based on m odeling results.____________ 6.1 Optimization Choices in Trajectory Modeling M orasso (1981) and Flash an d H ogan (1985) fo u n d th at d u rin g target o rien ted reaching, th e h a n d follow s relativ ely stra ig h t lines d esp ite the sp eed or direction of m ovem ent, i.e. d esp ite the m ovem ents of th e joints to carry the h a n d over the straig h t path. A bend e t al. (1982) fo u n d th at w h en forced to trace a curved path, subjects chained to g eth er straig h t "sub trajectories." This im plies th a t p lan n in g is in term s of th e h a n d 's en d p o in t, rath er th an in term s of joint m ovem ent, a n d len d s credence to the m inim um jerk reach trajecto ry m o d elin g o f F lash a n d H o g a n (1985; discussed in C hapter 2) an d th e re a c h /g ra sp m odeling of C h ap ter 4, w hich is based o n th eir m odel. U p o n closer inspection, su b tle d ev iatio n s fro m stra ig h t lines are n o tic e a b le in re a c h in g trajecto ries. To e x p la in th e m , F lash (1987) 133 h y p o th esized th at since the arm is d riv en b y spring-like m uscles, it m ight b e th e equilibrium location of th e h a n d w hich is p lan n e d according to a stra ig h t-lin e , minimum jerk trajectory. T he actual h a n d location w o u ld d ev iate from this p a th as d ictated by the arm 's viscoelastic biom echanics. A lth o u g h th e m odeling results w ere prom ising, K atayam a a n d K aw ato (in press) a rg u e th a t the chosen stiffness values w ere too larg e an d th a t for realistic stiffnesses, th e d ev iatio n from th e eq u ilib riu m trajecto ry is so g rea t th a t u n realistic m o v em en t trajectories result. (T hat is, in a m ass- sp rin g -d am p e r system w hose in p u t is th e sp rin g 's eq u ilib riu m len g th , as th e sp rin g 's stiffness ap p ro ach es infinity th e sp rin g 's actual len g th w ill track its equilibrium length alm ost perfectly. A sim ilar p h en o m en o n m ay u n d e rlie F lash 's results.) U no e t al. (1989) p ro p o se d th a t in ste a d of m o d e lin g trajecto ries b y o p tim iz in g h a n d k in em atics, lim b d y n am ics s h o u ld b e c o n sid e re d . T h ey a d o p te d th e tw o -lin k p la n a r a rm configuration of Flash, b u t m inim ized in the trajectory the integral of the su m of the sq u ared joint torque derivatives, ap p ly in g the ap p ro p riate lim b dynam ics m odel. T hey fo u n d th a t for reaches in th e proxim al reg io n of reachable space, straight line trajectories w ere generated sim ilar to those of th e kinem atic m odel. This is show n in Fig. 6.1. H ow ever, for reaches into m o re e c c e n tric re g io n s, h u m a n m o v e m e n ts sh o w e d c h a ra c te ristic c u rv a tu re , re p lic a te d by th is m inim um torque change m o d el. T he minimum jerk m odel, h o w ev er g en erates only stra ig h t lin e trajectories, a n d hence cannot replicate these results. L ater, D om ay e t al. (1992) used a so p h istic a te d 17 m u scle b io m ech an ical m o d el o f th e m a c a q u e arm , a ttac h ed to th e tw o -lin k p la n a r m an ip u lato r. U sing m inim um muscle ten sio n change as th e p e n a lty fu n c tio n , th e y also fo u n d re a listic tra je c to rie s, sim ila r to th e m inim um torque change tra je c to rie s . C u rrently, th eir w o rk is progressing one additional level in w a rd from the b io m ech an ical p e rip h e ry , to a d d re ss th e m uscle com m and, ra th e r th a n m u scle tension. In sh o rt, th e o rig in a l m inim um jerk m o d e l p ro v id e s a g o o d a p p ro x im atio n to free space m o v em en t trajectories, b u t th e criticism is th a t it d em an d s th at all trajectories follow s straig h t line p ath s, w h ere 134 a .10cfT l T4 a T1 Y T2 C Ti Y o ' X p a th o' X p a th Fig. 6.1. Left: H u m an p lan ar point-to-point reaching, for a variety of targ ets (top view ). R ight: T rajectories p red icted b y m in im u m torque change c riterio n , u sin g a p la n a r, tw o -lin k m a n ip u la to r m odel. From U no et al. (1989). p ath s to som e eccentric targets are curved. The o th er m odels (m inim um torque change, minimum muscle tension change, etc.) p ro v id e q u a lita tiv e im p ro v e m e n ts in th e c u rv e d p a th s g e n e ra te d . P ro p o n e n ts of th ese a ltern ativ e m odels arg u e th a t they m u st be m ore accurate because of the a d d e d re a lism of a d d in g a rm d y n a m ic s o r m u sc le b io m e ch a n ics. H ow ever, it is critical to bear in m in d the difference b etw een a dynam ic lim b m odel a n d a dynam ic o p tim izatio n criterion. W e agree th a t for all reach trajectories a dynam ic lim b carries o u t th e m ovem ent. T his does n o t im p ly th a t th e criterio n b e in g o p tim iz e d m u st be b a se d in joint d y nam ic p aram eters rath er than, say en d p o in t kinem atic p aram eters. In m odels su ch as minimum torque change, o p tim iza tio n is carried o u t in term s of jo in t to rq u es an d jo in t kinem atics. T he resu ltin g o p tim ize d control is given as joint torques, an d an e n d p o in t trajectory is d eterm in ed w h en this control is ap p lied to th e lim b dynam ics m odel. In m odels such as m in im u m jerk (a.k.a. m in im u m hand acceleration change) , optim ization is carried o u t in term s of the lim b en d p o in t kinem atics. It is th e n a ssu m e d th a t som e m echanism calculates th e in v erse d y n am ics to d riv e the lim b — applying th e correct joint to rq u es such th a t th e lim b en d 135 p o in t follow s th e calculated optim al trajectory. T here is no fu n d am en tal reaso n w h y a kinem atic cost function m ay n o t b e u sed to o p tim ize th e m o tio n of a dynam ic lim b. In this case, th e optim al joint torques m ay be calculated eith er as a tem porally su b seq u en t stag e to the optim ization, or in p arallel as p a rt of th e optim ization process: K aw ato et al. (1988) im ply th a t o p tim iz a tio n in kin em atic term s re q u ire s m u lti-sta g e m o v em en t planning: T hat th e CNS w o u ld have to p la n a n d store th e m otion of the h a n d , th en d o th e sam e for th e m otion of th e joints, a n d finally for the a p p lie d jo in t to rq u es. T his is in su p p o se d c o n tra st to o p tim iz a tio n fo rm u la tio n s b a se d o n d y n a m ics, w h ich are sa id to im p ly p a ra lle l d e te rm in a tio n of kinem atics an d dynam ics of m ovem ent. H ow ever, by stu d y in g m odels of auto n o m o u s optim ization, such as th e reinforcem ent le a rn in g alg o rith m (to be in tro d u c e d in C h a p te r 7), o n e im m ed iately u n d e rs ta n d s th a t th e o p tim iza tio n fu n ctio n n e e d n o t be of th e sam e m o d a lity as th e c o n tro l b e in g o p tim iz e d . F o r e x a m p le , it is c o m p u tatio n ally p lau sib le th a t v isu al d e te ctio n of h a n d "jerk” d u rin g reaching m o v em en t serves as a p en alty for an ad ap tiv e central controller w hich o u tp u ts joint to rq u e (or m uscle eq u ilib riu m lengths, or som e o th er su ita b ly lo w level com m and). C learly , no h iera rch ica l se q u en c e of p la n n in g stag es is re q u ire d . T hus, in d e te rm in in g th e p ro p e rtie s of m o v e m e n t o p tim iz a tio n w e m u st tu rn to th e o b se rv e d trajecto ries, exam ining th e d a ta rath e r th an a rg u in g for a p a rticu la r m odeling choice "a priori." In the follow ing sections w e in tro d u ce a n ew set of d a ta to be investigated, an d com pare different optim ization hypotheses. 6.2 The Prablanc / Martin Perturbed Pointing Task P rablanc an d M artin (1992) investigated th e response of subjects to u n ex p ected p e rtu rb a tio n s in the location of a targ et w hile p erfo rm in g a p o in tin g task. Subjects m ad e p lan ar arm m ovem ents along a h orizontal tab le to p , p o in tin g a t targ ets located at eccentricities of 20, 30, a n d 40 degrees, 65 cm from the subject's h ead (Fig. 6.2). T he starting p oint w as on 136 20 ' 3 0 4 0 30 cm 5 0 65 cm Fig. 6.2. Schem atic of th e ap p aratu s used b y Prablanc a n d M artin (top view ), show ing start p o sitio n an d array of targ ets relativ e to subject. th e tab le-to p , in th e m id -sag itta l p lan e , 30 cm b e h in d th e 0° targ et. U n e x p ec ted p e rtu rb a tio n s of ten d eg rees, e ith e r left o r rig h t, w e re occasionally in tro d u c ed at m ovem ent onset. A s fo u n d in earlier stu d ies (P aulignan et al. 1991a, G oodale et al., 1986) corrections w ere m ade w ith ex trem ely sh o rt d elay s from ta rg e t change to m o v em en t onset. T his su p p o rts th e id e a th a t m o v e m e n t c o n tro l is b a se d o n c o n tin u o u s com parison of targ et a n d h a n d positio n (or its in te rn al rep resen tatio n as u p d a te d o n the basis of corollary discharge), rath e r th an being b ased o n the execution of a p rep lan n ed trajectory. Further, th e tim e of lim b response to v isu al targ et p ertu rb atio n , w hich w as m easured in term s of deviatio n of th e p e rtu rb e d trajectory from th e u n p e rtu rb e d trajectory, occurred earlier for leftw ard th an for rig h tw ard p ertu rb atio n , w ith a differential of about 100 ms. Sam ples of th e recorded trajectories are sh o w n in Fig. 6.3 an d 6.4. A n in te restin g p ro p e rty of th e u n p e rtu rb e d reaches is th e ir cu rv atu re, w h ic h becom es m o re p ro n o u n c e d to w a rd eccentric ta rg e ts, as in the findings of U no et al. (1989). A nother p roperty to n ote is the approach 137 500 Y(mm) : do sed loop : open loop 30 40 20 400 300 200 100 x(mm) 500 500 Fig. 6.3. T rajectories of th e rig h t index finger d u rin g p o in tin g , reco rd ed b y P rablanc an d M artin (1992). U n p e rtu rb e d p o in tin g to targ e ts located at 20°, 30°, an d 40° of eccentricity. N o te th a t rem oving vision of th e h a n d (i.e. "o p en loop") d u rin g reach d id not significantly affect th e trajectories. 40- 40+ ■ • • Fig. 6.4. T rajectories o f th e rig h t in d ex finger d u rin g p o in tin g , reco rd ed by P rablanc a n d M artin (1992). T rajectories in w hich th e target w as unexpectedly shifted, at m ovem ent onset, 10° leftw ard or rightw ard. 138 d irectio n to a single targ e t u n d e r v a rio u s conditions. For ex am p le, an u n p e rtu rb e d reach to th e 20° target approaches the target directly (Fig. 6.3, left). In contrast, the pertu rb ed reach in w hich th e target sw itches from 30° to 20° (i.e. th e 30- reach) ap p ro ach ed from the rig h t side (Fig. 6.4, center). C o rre s p o n d in g d iffe re n c e s in a p p ro a c h d ire c tio n a re se en w h e n com paring 20+, 30 an d 40- and w h en com paring 30+ a n d 40. This resu lt is in c o n tra st to th e trajectories seen in th e prehension e x p e rim e n ts of P aulignan et al. (1991a) in w hich ap p ro ach direction w as consistent despite p e rtu rb a tio n s. This rem in d s u s of th e effect of th e task on trajecto ry form ation: A pproach direction is critical in reaching to g rasp , b u t n o t in rea ch in g to p o in t. L astly, m o v em en ts w e re m ad e w ith a n d w ith o u t vision of th e lim b. This factor d id n o t significantly affect th e form of th e trajectories (as can be seen in th e sam p les of Fig. 6.3), a lth o u g h final position accuracy w as reduced in the non-vision case. To m o d el th e u n p e rtu rb e d a n d p e rtu rb e d trajectories d iscu ssed above, w e ap p lied the p ertu rb ed reach m odel of C hapter 4, w hich is based on the minimum jerk cost function. W e o m itted the a p p ro a ch d irectio n constraint to reflect the fact th at this is a pointing rath er th an a p rehen sion task. W e u sed the d u ratio n values from P rablanc an d M artin, g iv en in T able 6.1. T he p e rtu rb a tio n resp o n se onset (w hich m odels reaction tim e) w as set to 200 m s. The resu lts a re sh o w n in Figs. 6.5 a n d 6.6. N o te especially th a t th e minimum jerk criterion alw ays generates stra ig h t p a th s for th e u n p e rtu rb e d trajectories. W e n ex t w ish ed to fin d w h e th e r a d ynam ics b ased m odel w o u ld gen erate m o re realistic trajectories for th e eccentric targ ets. F or this w e d e v elo p ed a n ew trajecto ry g e n e ra tio n m odel, discussed in the next section. 6.3 Dynamic Optimization and Perturbation M odeling The goal is to see if a d ynam ics b a sed cost fu nction w ill give a qualitatively b etter m odel of the data. W e a d o p t the minimum torque 139 20 P0 P- P+ 30 P0 P- P+ 40 P0 P- P+ 403 499 467 408 480 516 444 476 554 T a b le 6.1. M ovem ent d u ratio n s (in m s) u se d in th e sim ulations. From P rablanc an d M artin (1992). 0.4- 5 . ► » 0.2- 0.0 0.0 0.2 0.4 x(m) 0.6 Fig. 6.5. M inim um jerk b a se d trajecto ries fo r th e u n p e rtu rb e d p o in tin g p a ra d ig m of P rablanc a n d M artin. N o te th is e n d p o in t based criterion alw ays produces straight line trajectories. 0.4 x (m) 0.6 0.4 x (m) 0.6 0.4 0.0 0.4 x(m) 0.6 0.0 Fig. 6.6. M inimum jerk b ased trajectories for th e p e rtu rb e d pointing paradigm . 140 change criterion a n d th e p la n a r m an ip u lato r m odel of U no et al. (1989), an d apply th e optim ization procedure described in C h ap ter 2. T he in v e rs e d y n a m ic s e q u a tio n s fo r th e tw o -lin k p la n a r m a n ip u lato r m o d el of th e u p p e r lim b, schem atized in Fig. 6.2, are w ell k n o w n an d are as follows: w h ere for each p aram eter, subscript 1 refers to the sh o u ld er o r u p p e r lim b an d subscript 2 refers to the elbow o r low er lim b. z{ is th e to rq u e at each joint, M i is th e m ass of each link, Li is its length, Si is th e distance along each link to its center of m ass, Ii is th e rotational inertia of each link, 9i is velocity a n d 9 is jo in t acceleration. The p a ra m eter v alu es w e re taken from U no et al. (1989). The above equations can be w ritten as a system of equations involving a 2 X 2 inertia m atrix: To find the forw ard dynam ics (needed for th e optim ization), w e solve for th e acceleration vector, w hich involves in v ertin g the in ertia m atrix. The resu lt w as obtain ed usin g M athem atica, w ill not be sh o w n here, an d m ay be abbreviated (follow ing the notation of U no et al., 1989) as, z 1 .2 a n g u la r jo in t p o sitio n , a n d b i is joint viscosity. F u rth e r, 9 is jo in t d y /d t = h i(x , y) + h 2 (x) z (6.1) '9l e i Z1 w h e re x_ .9zl , y = 0 . 2. z = / .Zz 141 , h 2 (x) is the inverse of th e in ertia m atrix, a n d h |(x , y) contains th e velocity d ep en d en t term s. T he m in im u m torque c h a n g e p ro b le m , for th is d y n a m ic sy ste m in v o lv e s th e fo llo w in g d escrip tio n of th e system state an d dynam ics, in p u t, cost function, an d b o u n d a ry conditions. The state of the system is given as th e 6 X 1 vector, x y z. W e define th e in p u t v ecto r u = (u^ U 2)^ to be u = d z /d t , i.e. u is the d e riv a tiv e of th e jo in t to rq u e o r th e torque change w h ich is to be m in im iz e d . T he sta te d e v e lo p s a cc o rd in g to a m a trix d iffe re n tia l eq u atio n , f = ,<X,u) or, u sing (6.1), dX dt X y y = h ^ y ) + h 2( x ) z _z_ u (6.2) The problem is to find the trajectory w hich takes th e state from X(0)=Xq to X (tf)= X f a n d w hich m inim izes th e in teg ral of the su m o f th e sq u a re d to rq u e derivatives for the joints, i.e. the cost functional J is, t=t J - C m dt (6.3) A s before, w e solve for th e o p tim u m trajectory by ap p ly in g th e m in im u m principle. W e first define th e H am iltonian, w hich is to be m in im ized by th e choice of in p u t u(t) H = u 2 + u| + p Tf(X,u) (6.4) w h ere p is the costate, defined by 142 The 6 X 6 Jacobian is found by differentiating (6.2), (6.5) ax 0 I 0 ah ah„ ah 2 Z ~<$x~ i h 0 0 0 (6.6) T he p a rtia l d e riv a tiv e s in (6.6) w e re c alcu late d fro m th e d y n a m ics eq u atio n s u sin g M athem atica, an d are n o t sh o w n here. N o te th a t each elem en t in (6.6) is a 2 X 2 m atrix. If th e H am iltonian is q u a d ra tic in th e control, w e can find th e extrem um by differentiating w ith resp ect to th e control, an d setting the derivative equal to zero. E xam ining (6.2), w e see th a t H (u) is in the form , H = u^ + u2+ P 5U 1 + P 6U 2 + S(x 'y 'Z'P> D ifferentiating by u, Z l 2 P6 2 3H n _ ^ = 0 = 2u1+p5. Ul = “ dH du. = 0 = 2 U 2 + P 6 ' U2 ~ (6.7) N o w (6.2), (6.5), an d (6.7) define the system of differential equations: y h 1 ( x^y) + h 2^ z L p J 1 P5 2 P6. \T ML Ux J (6.8) w hich m ay be solved w ith th e given b o u n d a ry conditions to y ield th e optim al trajectory. F orw ard integration techniques m ay solve this system num erically g iv en an initial value for th e su p erstate (X p)T, h o w ev er the problem specifies an initial v alu e for X w ith o u t an initial v alu e for p , an d 143 fu rth e r req u ires a p a rticu la r final v alu e for X. T his ty p e of p roblem is called a two point boundary value problem, w h ic h w e solve u sin g the follow ing local lin earizatio n p ro ced u re a n d m u ltid im e n sio n al N e w to n 's m eth o d to tu n e th e initial value of p so th a t th e trajectory converges to th e desired final v alu e of X. The basic id ea is to find a relatio n sh ip b etw een AXf, th e change in final state, a n d Ap0 , th e change in initial costate, th en to change Ap0 in o rd er to "p u sh " Xf in the right direction. W e d o this in three steps: First, given Xq, an initial approxim ation to pQ, an d th e associated trajectory (X(t) p (t))T, w e find a linear approxim ation to (6.8) in the neig h b o rh o o d of this trajectory. Second, w e u se this lin earized system to b u ild u p the local d if f e r e n tia l r e la tio n s h ip b e tw e e n p Q a n d Xf. T h ird , w e u se a m u ltid im en sio n al N e w to n 's m eth o d to change p Q. Iteratin g these th ree steps sufficiently yields th e d esired solution. P roceeding w ith the first step, let us rew rite (6.8) in a sim pler form: (6.9) X >(X , p) = f 3f 1T -P. lax J P w here, com paring (6.8) w ith (6.2), f(X ,p ) = f(X ,-l/2 (p 5, p 6)) N o w consider a sm all change in the state an d costate: (6.10) (X + Y)'~ f'(X + Y, p + q) = “ (ax) <p+q> - .(p+ q)’. Since Y an d q are sm all, w e can approxim ate th e rig h t h a n d side b y a first o rd er differential expansion: f(X, p)+ -^ Y + H ax '(x + y Y ■ - ---- 1 XP+9)’. I I I ........ ax ax J o r 144 X Y "f'(X,p) 'af' ar 3XY + -5Fq + — f 3 f ' T + T ( d{ ) .p. A . lax. P r l a x j q . Subtracting from this (6.9), Y A W riting as a m atrix differential equation, Y A From (6.10) w e see that 0 0 o' 0 0 0 0 0 I. w here each elem ent of th e m atrix is, again, a 2 X 2 subm atrix. N o te th at (6.11) is a linear differential equation, as the m atrix M is a fu n ctio n of X an d p , b u t not Y or q. The starting approxim ation of th e d esired trajectory gives u s th e tim e v a ry in g m atrix M , from w h ic h w e m ay co n sid er solutions to this linear system for initial values of (Y q)T. A s in C hapter 2, w e associate w ith this system a tran sitio n m atrix 0 ( tf , 0), w hich m aps initial values of (Y q)T into values a t th e final tim e, i.e., W e are in te reste d in h o w q(0) relates to Y(tf), since th e rela tio n sh ip is e q u iv a le n t to th e rela tio n sh ip b e tw ee n sm all ch an g es in p Q an d Xf. T herefore, if w e rew rite th e above as Y(0) q (0 ) ar = jf. at ’ = _ i ax ax' dp 2 ML ar ax 3p f v \ T Y = M(X, p) Y 0 - I . .q. .q. ML ax £) ar ■ 5 ] T Y + ig p q 145 * i i ( V 0) M V ° ) y ( o) .* 2 i(V ° ) • a O f ' 0). _q(0 ). w e are only interested in O ^ t f / 0). V iew ing Y and q as differentials of X an d p respectively, O ^C tf, 0) is the Jacobian of X w ith respect to p , i.e. d X ( ‘f ) = Y ( ‘f ) = ® 1 2 ( * |- ° ) q (0 > = V O f ■ ° ) d p » This subm atrix is constructed num erically in the follow ing w ay. Let n be th e d im en sio n of th e state an d th e costate. (In o u r p ro b lem , n=6.) W ith Y(0) set to zero an d q(0) set to the n X 1 vector ef (ef(j)=l if j=i, 0 otherw ise) for i b etw een 1 a n d n, (6.11) is in te g rate d , u sin g M fro m th e c u rre n t a p p ro x im a tio n o f th e d e sired trajectory. The re su ltin g Y(tf) is the i**1 colum n of O f 2(tf> 0). Finally, N ew to n 's m eth o d says to u se th e inverse of th e local linear relatio n sh ip to ad ju st the in d e p e n d e n t variable (pG) in o rd er to p u sh th e d e p e n d e n t variable (Xf) in the rig h t direction, i.e., Ap° = ° 12(lf ' °) (6.12) W e a d d a co n v erg en c e co efficien t, 0 < p < l, to c o n tro l th e ra te of pro g ressio n an d stability of the algorithm . If the final state o f the cu rren t ap p ro x im atio n is X(tf) an d th e desired final state is Xf, th en (6.12) can be w ritte n Ap0 = ^ ( V ° ) ( x , - x( lf )) To su m m arize, o u r alg o rith m consists of ite ratin g th e fo llo w in g th ree steps: 1) U sin g XQ a n d th e c u rre n t a p p ro x im atio n of p Q, n u m e ric a lly integrate (6.8) to get M. 2) U sing M, num erically in teg rate (6.11), u sin g Y(0)=0 an d q(0)=ef, i= l to n. The n X 1 final state vectors of these n integ ratio n s are the colum ns of the n X n m atrix O f 2(tf, 0). 146 3) C hange P o by AP o = n 4 » ^ ( t f, 0) ( X f - X (t f) ) . This alg o rith m w as ru n for each trajectory for 100 iterations, u sin g an in te g ra tio n tim e step of .01 a n d p=0.1. T he sim u latio n re su lts are sh o w n in Fig. 6.7 for reaches to the three targets. C om paring them to Fig. 6.3, w e see replication of th e convex p a th sh a p e of th e reco rd ed h u m an reach in g m ovem ents, as o p p o sed to th e stra ig h t p a th s g en erated b y th e m in im u m jerk m odel, show n in Fig. 6.5. W e c o m b in e d th e so lu tio n to th e m in im u m torque change problem w ith th e p ertu rb ed trajectory generation m odel of C h ap ter 2. To su m m arize th is m o d el, it says th a t a t th e tim e of p e rtu rb a tio n a n ew o p tim a l tra je c to ry is a d o p te d , ta k in g th e sy stem fro m th e sta te at p e rtu rb a tio n tim e to th e n ew targ et. To each of th e th ree sim u la ted trajectories, left a n d rig h t p e rtu rb a tio n s w ere ap p lied , an d the m odified tra je c to rie s w e re n u m e ric a lly c o m p u te d , u s in g th e sa m e ite ra tiv e p ro ced u re as before. The sim ulation results are show n in Fig. 6.8. It is clear th a t for th e u n p e rtu rb e d trajectories, th e minimum torque change m o d el gives a q u alitativ ely b e tter re su lt th an the m inim um jerk b a se d m o d el. It is d ifficu lt to see w h e th e r e ith e r is su p e rio r in th e p ertu rb atio n case, as both give satisfactory m atches to the data. To fu rth er exam ine the m odel results w e com puted the tangential velocity profiles by n u m e ric a lly d iffe re n tia tin g th e p o sitio n trajecto ries, th e n ta k in g th e m a g n itu d es of th e resu ltin g velocity vector trajectories. The resu lts are sh o w n in Fig. 6.9 for th e u n p e rtu rb e d an d p ertu rb ed reaches b ased on the 40° target. In th e minimum jerk b ased m odel, th e velocity profile of the rig h tw a rd p e rtu rb a tio n , "40+," dev iates from th e u n p e rtu rb e d velocity p ro file e arlie r th a n does the "40-" profile, w h ich rem ain s close to th e u n p e rtu rb e d profile for several tens of m illiseconds longer. In term s of h a n d kinem atics th e left an d rig h t p e rtu rb a tio n s are sym m etrical. T he reaso n for th e difference in th e resp o n se is th e d u ratio n , tak en from th e d a ta an d ap p lied to th e m odel: 476 m s for th e "40-" versus 554 m s for the "40+.” The longer d u ratio n for "40+" m eans a low er velocity for the 0.75 1 > s 0.55 0.35 -0.25 -0.05 0.15 x (cm) 0.35 F ig. 6.7. M in im u m torque change b a se d tra je c to rie s for th e u n p e rtu rb e d p o in tin g paradigm . -0.25 -0.05 0.15 * (an) 0.35 -0.25 l ' l -0.05 0.15 x (an) 0.35 0.15 x (cm) 035 F ig. 6.8. M in im u m torque change b a se d tra je c to rie s fo r th e p ertu rb ed pointing paradigm . 40° Vel. (m/s) _ 2,0 - o o.o -r- o.o 0.2 0.4 time (s) 0.6 Vel. (m/s) 0.2 0.4 time (s) Fig. 6.9. Left: V elocity p ro file for u n p e rtu rb e d a n d p e rtu rb e d reaches to w ard the 40° target, m ad e w ith th e minimum jerk m odel. R ight: V elocity p ro file s g e n e ra te d from th e m inim um torque change m odel. 148 p e rtu rb e d section of th e m o v em en t, seen in Fig. 6.9. (N ote: T he minimum jerk view in itself gives no reaso n for a difference in d u ra tio n for p ertu rb atio n to one side o r th e other from the u n p e rtu rb e d trajectory.) In co n tra st to this m odel, th e m inim um torque change m o d el sh o w s a m ore dram atic change for the 40- velocity profile, w ith a m ore im m ediate o n set, seen in Fig. 6.9. This resu lt is con sisten t w ith th e m o re radical m odification of the m otion of the lim b in the u n d erly in g dynam ics m odel w h en a leftw ard p ertu rb atio n occurs, a n d gives an interesting perspective to th e reactio n tim e analysis of P rablanc a n d M artin. T hey h a d ju d g ed reaction tim e b y looking for significant deviation b etw een th e p e rtu rb e d a n d u n p e rtu rb e d trajectories. A cross all targets (20°, 30°, 40°), th ey fo und th a t th e deviation occurred in the leftw ard p ertu rb atio n at an earlier tim e th a n th e rig h tw a rd p e rtu rb a tio n . O ne con clu sio n is th a t th e re is a p e rc ep tu a l asy m m etry w hich causes a quicker reaction. H o w ev er, th e minimum torque change m odel show s th a t in stead of a sensory difference betw een the tw o perturbations, it m ay be a m atter of lim b dynam ics w hich causes th e early deviation in leftw ard p erturbation. T aking as a difference th re sh o ld 10% of th e p eak velocity, i.e. 20 c m /s , th e "40-" trajecto ry d e v ia te d at 250 m s, v e rsu s 300 m s for th e "40+" trajecto ry . T his is com parable to P rablanc an d M artin's velocity com parison, w hich show ed th e "40-" trajectory deviating at 208 m s an d the "40+" trajectory deviating at 315 m s, on th e average. 6.4 Model Comparison for a Prehension Task T hinking back to the P a u lig n a n e t al. (1991a) d a ta , w e n o w ask w h e th e r th e m inim um torque change m o d el m ig h t p ro v id e a b e tte r trajec to ry m o d el th a n th e one o f C h a p te r 4. Fig. 6.10a sh o w s ten s u p e rim p o s e d , u n p e rtu rb e d tra je c to rie s fro m th e p re h e n s io n ta sk d isc u sse d in C h a p te r 4. Fig. 6.10 b a n d c sh o w a co m p ariso n of the m inim um jerk a n d minimum torque change m o d els for th is trajecto ry . 149 C learly th ey b o th n ear th e approxim ately straig h t h a n d p a th for th e reach, w hich occurs in proxim al space. In rev iew , th e o rig in al m inim um jerk m o d el p ro v id e d a g o o d ap p ro x im atio n of free space m ovem ent trajectories, b u t th e criticism w as th a t it d em an d s th at all trajectory p ath s be straig h t, w hile in reality p a th s b etw een som e eccentric targ e ts are curved. T he o th er m odels p ro v id e q u a lita tiv e im p ro v em en ts in th e cu rv ed m o v em en ts g en erated . A ll of th e rev iew ed m eth o d s ad d ress sim ple p o in t-to -p o in t, free space reaching m ovem ents, n o t n a tu ra l, p u rp o sefu l tasks. H o w ev er in this d issertatio n w e have a d d re ssed reaching w ith in the co n strain ts of a p reh en sio n task, u n d e r p ertu rb atio n , an d w ith accuracy constraints, an d com pared different criteria to see w hich m odel applies u n d e r th e given conditions. T his last com parison brings u p th e trade-off b etw een m odel detail a n d com putational tractability: T he m ore com plex m odels resu ltin g from dynam ic cost functions require num erical solution involving h u n d re d s of iterations. W hen they have a n u m b er of h a n d -tu n e d param eters as in th e p reh en sio n m o d el of C h ap ter 4, th o u sa n d s of iterations, ru n interactively w ith th e h u m a n m o d eler, m ay be n ecessary to a rriv e a t th e correct p a ra m e te r v a lu e s. T his m ak es critical th e choice of w h e th e r th e m a th e m a tic a l m o d e l h a s a closed fo rm so lu tio n . W h en th e lim b c o n fig u ra tio n is w ith in th e ra n g e th a t k in e m a tic a n d d y n a m ic o p tim izatio n give sim ilar results, since th e k inem atic fo rm u latio n s yield closed form so lu tio n s, th ey are p referab le, k e ep in g in m in d th a t the principles d erived ap p ly to the m ore com plex m odels. 6.5 Conclusion W e have discussed the validity of different optim ality form ulations a n d c o m p a red tw o, minimum jerk an d minimum torque change, for a set of arm reaching d a ta involving p ertu rb atio n . A nalyzing d ev iatio n tim e for p ertu rb ed trajectories yielded a novel view of reaction tim e data, in 150 a 30 - 20 . £ 1 0 . v 0 . -10 . -10 5 x ( c m ) 20 35 | | Target position S .’ > if Start position 30 20 10 0 15 -15 • 5 5 x (cm) 0.6 0.5 0.3 0.2 0.4 -0.3 -0.2 -0.1 0.0 x (m) Fig. 6.10. a Ten su p erim p o sed w rist trajectories, from P au lig n an an d Jean n ero d d ata, b M inim um jerk trajectory, c M in im u m torque change trajectory. w h ich m o to r resp o n se, n o t p e rc ep tu a l d elay is resp o n sib le for reaction tim e discrepancy. W e sh o w ed th a t u n d e r certain circum stances it is not clear w hich of alternative optim ality form ulations best fits a set of data. If th e choice is im p o rtan t, o th er d a ta m u st be exam ined. If not, the m ore tra c ta b le , th o u g h p e rh a p s o n ly a p p ro x im a te , fo rm u la tio n m a y be preferable. 151 C h a p t e r 7 L e a r n i n g O p t i m i z a t i o n O f D y n a m i c P r o c e s s e s : A N e u r a l N e t w o r k M o d e l W e review m odels of trajectory learning in artificial n eu ral n etw o rk s and stress th e p ro b lem of disco v erin g trajectories ra th e r th a n copying them th ro u g h su p e rv ise d train in g . For th e task of au to m atic calcu latio n of o p tim a l trajec to rie s w e co n v erg e o n re in fo rc e m e n t le a rn in g as th e a p p ro p ria te p arad ig m , and show an exam ple of optim izin g sm oothness in a reaching m ovem ent. This is considered an im p o rtan t lin k in justifying th e optim ality m odels of this dissertation as being neurally realizable._____ 7.1 Connectionist Approaches to Trajectory Learning C onnectionist netw orks, also called artificial n eu ral n etw o rk s, have p ro v e n in te re stin g m o d els of c o m p u ta tio n b e c a u se th e ir p ro ce ssin g b e h a v io r d e v e lo p s b a se d o n e x am p le s of d e s ire d p e rfo rm a n c e or ev alu atio n of their perform ance, unlike conventional co m p u ter p ro g ram s w h ic h req u ire an explicit alg o rith m to h av e th e a p p ro p ria te behavior. C o n n ectio n ist n etw o rk s h av e fo u n d ap p licatio n s in pattern recognition, associative memory, a n d m o st rele v an t to o u r w o rk , motor trajectory generation a n d control. In c h o o sin g a C o n n e c tio n ist a rc h ite c tu re a p p ro p ria te for m odeling th e issues of the p a st chapters, w e contrast three 152 categ o ries o f trajec to ry g e n e ra tio n n e tw o rk s: P a ra m e tric a lly tu n e d n e tw o rk s, su p e rv ise d le a rn in g n e tw o rk s, a n d rein fo rce m en t le a rn in g netw orks. In th e first category, a fam ily of trajectories g en erated b y the n etw o rk is p aram etrized by a sm all nu m b er of internal, adjustable values, w hich are tu n e d based on som e perform ance m easure, such as accuracy in targ e t acquisition. O ne m ig h t th in k of adju stin g th e elevation o f a g u n b arrel (the param eter) so th at th e lau n ch ed projectile closer ap p ro ach es a targ e t (the perform ance m easure). T here is little control o f th e sh ap e of th e trajectory, b u t certain features, su ch as m axim um sp eed , m ax im u m height, or e n d p o in t can be tuned. H ouk et al. (1990) use such a n etw o rk in a m odel of spinocerebellar circuitry. T he brainstem is m odeled as an array of adjustable p a ttern generators, w hich generate lim b m ovem ent. C ertain aspects of th e p attern s are tu n ed by cerebellar inhibition so th a t th e lim b reaches th e d esired targ et position. A nother type of p aram etric trajectory generator is a constant feedback control m atrix, such as in th e cat hindlim b m odel of Loeb et al. (1989). The interaction of the feedback controller an d its co ntrolled p la n t g en erate a class of trajectories (based on th e initial system state), an d w hile n o t all of th e trajectories' details m ay b e sh ap ed , th e p a ra m eters in th e feedback m atrix can be chosen to o p tim ize som e perform ance m easure. M ore re le v a n t to o u r m o d elin g n eed s are classes of trajecto ry gen eratio n netw o rk s for w hich p otentially an y trajectory of o u tp u t values m ay b e g e n e ra te d . M o st co m m o n a re temporal sequence learning networks. A d esired o u tp u t p a tte rn is p resen te d to th e n e tw o rk w hich learn s o v er tim e to reg en erate it. (There are also issu es in tem p o ral sequence recognition, beyond th e scope of this discussion.) T he earliest such n etw o rk m odel w as G rossberg's o u tstar avalanche (G rossberg, 1971). O th e r n o tab le m odels in clu d e th o se of H o p field (1982), B u h m an n an d Schulten (1988), an d W ang an d A rbib (1990). P earlm u tter (1989) p resents su p e rv ise d trajectory train in g in a co n tin u o u s tim e d o m ain . O rien ted specifically to lim b control trajectories are the trajectory generation m odels of Jordan (1988) a n d M assone an d Bizzi (1989). As w ith o th er trajectory g en eratio n m odels, su ch n etw o rk s h a v e in tern al d ynam ics w h ich allow 153 th em to generate d esired p attern s in state space. A com m on p ro p e rty of th eir learn in g algorithm s is th a t th ey are supervised in natu re. T hat is, th e d esired p a tte rn to be rep ro d u c ed must be presented to the network for it to learn. Tw o aspects of the architecture of Jordan (1988) are notew orthy. First th a t it allow s excess degrees of freedom in th e learned control. T hat is, th e tra in in g p a tte rn m a y in c o m p le te ly sp e c ify th e in te rn a l rep resen tatio n to be learned. The n etw o rk selects a p seu d o inverse of th e m an y to one m ap p in g from internal states to o u tp u t patterns. The second notable aspect, for the application of lim b configuration sequencing, is the application of a sm oothness penalty. T he evaluation of sm oothness is via in te rn a l n e tw o rk s tru c tu re th a t a llo w s c o m p a riso n of te m p o ra lly n e ig h b o rin g c o n fig u ratio n s. (P e arlm u tte r, 1989, sim ila rly u tiliz e d a sm o o th n ess elem en t in th e e rro r function.) S h o rtly w e w ill d iscuss netw orks th a t ad m it arbitrary trajectory penalties, in d ep e n d en t of netw ork stru ctu re. To an sw er th e q uestion of w h eth er a self-organizing system m ay le a rn o p tim a l trajecto ries, b a se d o n ly o n a n a rb itra ry o p tim iz a tio n criterion (i.e. w ith o u t being show n the optim al trajectory, as in supervised le a rn in g ), w e co n sid er a th ird class of trajec to ry g e n e ra tio n n e u ra l n etw o rk s, b ased on reinforcement learning. Barto et al. (1983) in tro d u ce reinforcem ent learning as a w ay to learn th e solution to a difficult control problem w ith a m inim um of perform ance feedback. A single n eu ral u n it called an associative search element (ASE) is resp o n sib le for m a p p in g the state (position a n d velocity) of an in v erted p e n d u lu m an d a rolling cart atop w hich it balanced, into a com m and to m ove the cart left o r right. The goal is to balance the p endulum . The feedback control problem is solvable analytically, b u t th e goal is to learn th e so lu tio n o n ly from in fo rm atio n a b o u t w h e n th e p e n d u lu m co m p letely fell o v er. C learly th is is an exam ple of the tem poral credit assignm ent problem : H ow does one know , w h en th e p e n d u lu m falls over, w hich com m and, of th e long sequence of m o to r com m ands, w as erroneous? To link com m ands to reinforcem ent, Barto et al. in tro d u ced eligibility. A w eig h ted ASE in p u t p ath w ay has an eligibility w h ich is changed from zero to som e n o n zero v alu e w h en it 154 becom es active d u rin g a m otor com m and. (The m otor com m and d ep en d s on th e w e ig h te d p a th w a y activ ities in th e n o rm a l "n e u ra l n e tw o rk sense.") T he eligibility value th en decays aw ay w ith tim e. T he w eight alo n g th e p a th w a y is c h an g e d d u rin g rein fo rce m en t b y an a m o u n t p ro p o rtio n al to this eligibility. The eligibility gives an approxim ate m odel of th e blame associated w ith an active pathw ay: If th e p e n d u lu m rem ains b alan ced for a long tim e after th e action, the blam e for th a t p a th w a y is sm all, an d the p a th w a y 's w eig h t is changed little. If th e p e n d u lu m falls over im m ediately after th e action, th e blam e is large, a n d th e p a th w a y 's w eig h t is ch anged b y a large am ount. To im prove learn in g in the ASE, th e reinforcem ent in p u t w as replaced by th e o u tp u t of an adaptive critic element (ACE). The o u tp u t of the ACE is th e expected reinforcement for a given system state an d ASE action. The o u tp u t is tu n e d to decrease the discrepancy betw een th e o u tp u t at the previous tim e step an d th e sum of th e c u rre n t o u tp u t a n d c u rre n t rein fo rcem en t. T h u s re in fo rc e m e n t inform ation bridges tim e, one tim e step into the past. O ver tim e, the ACE o u tp u t values are tu n e d to be predictors of reinforcem ent, m any steps into th e future. This allow s learning to take place at the tim e of action, rather th a n o n ly w h en e rro r occurs. W e m ay relate the rein fo rcem en t u sed by B arto et al. to th e cost functions w e associate w ith trajectories: If in a reinforcem ent based n etw o rk p u n ish m en t is equal to trajectory cost, then tu n in g of the trajectory p ro d u ced by th e netw ork is, according to this cost function, precisely the p ro p erty w e are looking for in a train in g m ethod. N e u ra l rein fo rce m en t lea rn in g b ears stro n g p a ra lle ls to e arlier m achine learning w ork. In his w ork on learning to p lay checkers, Sam uel (1959) u se d a lin e ar fu n ctio n to m a p th e sta te of th e p la y in g b o ard (rep resen ted as a 16 dim ensional feature vector) into an evaluation of its quality. The evaluation of the b o a rd 's state is analogous to th e evaluation carried o u t by th e ACE described above, w here th e state of th e system is m ap p e d into th e expected reinforcem ent. The chosen checkers m ove w as th a t w hich w o u ld b rin g th e e stim ated fu tu re state of th e b o a rd to th e h ig h est value. This is m uch like th e o u tp u t of th e ASE controller, in th at an estim ation of th e v alu e of the fu tu re state is u sed to m ak e the decision 155 ab o u t p resen t choices (controls). T he parallel extends fu rth er in th a t both th e ACE a n d S am uel's ev alu atio n function are adaptive. C o m parison of th e actual checkers gam e to the earlier b o ard evaluation lead to changes in th e w eig h ts of S am uel's lin ear ev alu atio n function, e.g. to re d u c e th e values of w eights th a t h a d m ade a positive contribution to an o vervalued b o ard . In reinforcem ent learning, elem ents of th e controller (ASE) w hose activity contribute to a later p o o r state are penalized. Elem ents of th e ACE are p en alized according to th e discrepancy b etw een th e estim ated fu tu re state v alu e an d th e actual value, m easured at a later tim e. The A S E /A C E arch itectu re p ro v id es th e tw o crucial elem en ts to solve trajectory o p tim izatio n problem s: 1) To "reach th ro u g h tim e" to perfo rm tem poral credit assignm ent, an d 2) To express errors w ith respect to th e c o n tro lle r's actio n s, a n d u ltim a te ly th e c o n tro lle r's tu n a b le p aram eters. W e w ill retu rn to critic based schem es after review ing several o th er n eu ral n etw o rk approaches an d describing h o w th ey give rise to the above tw o properties. Back p ro p a g a tio n p ro v id e s a m a p p in g o f a g ra d ie n t d e sc e n t convergence p ro ced u re onto a n eu ral n etw o rk to p o g rap h y . In a layered n e u ra l n etw ork, th e depend ency of th e o u tp u t n eu ro n s on th e w eights in e ach la y e r is g iv e n in an e x p licit m ath e m a tic a l fo rm . T y p ically , perform ance erro r is expressed as discrepancy betw een d esired a n d actual o u tp u t values. A chain of p a rtial derivatives yields th e d eriv ativ e of the e rro r w ith resp ect to each w eight, th e necessary relatio n sh ip to perfo rm g ra d ie n t d e sc e n t o n th e n e tw o rk 's w eig h ts, w h ic h a re its tu n a b le p a ra m e te rs . In a re c u rre n t n e tw o rk , n e u ro n a c tiv a tio n lev els are d e p e n d e n t n o t only on the activation of o th er n eu ro n s at th e sam e tim e, b u t of th eir activation at p rev io u s cycles of activ ity flo w th ro u g h th e netw o rk , even on th eir ow n prev io u s activities. A pp ly in g reinforcem ent, ra th e r th a n d e sire d p erfo rm an ce, to th e n e tw o rk e rro r in p u t in th e a lg o rith m , back propagation through time (BTT; R u m elh art et al., 1986; N g u y e n a n d W id ro w , 1989; W erbos, 1990b) p ro v id e s a w ay to relate re in fo rc e m e n t in p u t to n e tw o rk a c tiv ity m u c h e a rlie r in tim e. S im ultaneously, th e d eriv ativ e of th e reinforcem ent e rro r w ith resp ect to 156 th e tu n a b le p a ra m e te rs is calculated, m ak in g clear h o w to ad ju st th e n e tw o rk fo r im p ro v e d perfo rm an ce. C learly , BTT p ro v id e s th e tw o n ecessary elem en ts, d isc u sse d above, n ecessary to solve o u r class of problem s. N g u y e n an d W idrow (1989) visualize BTT as unraveling a n etw o rk as m an y tim es as th ere hav e been n etw o rk cycles. T hus th e p roblem of p e rfo rm in g BTT o n a re c u rre n t n e u ra l n e tw o rk w ith m lay ers, ru n th ro u g h n cycles b ecom es th e p ro b le m of p e rfo rm in g n o rm a l back p ro p a g a tio n th ro u g h a feed fo rw ard n e tw o rk of n*m layers. (A general trea tm e n t of BTT is given b y W erbos, 1990b.) N g u y e n an d W idrow use BTT to solve the problem of backing u p an articulated tractor trailer truck. T he n e tw o rk p ro v id es steering controls th ro u g h o u t the o p eratio n . The e rro r inform ation com es only at th e end, w h en th e tru ck hits or m isses its p a rk in g spot. O ver tim e th e n e tw o rk learn s to p a rk th e tru ck fro m a v ariety of initial locations a n d configurations. K aw ato et al. (1989) u se a tra je c to ry g e n e ra tio n n e tw o rk to m in im ize to rq u e change in th e m o v em en t of a robotic arm , solving the problem a d d ressed analytically in C h ap ter 6. Sim ultaneously, th e inverse kin em atics a n d in v erse d y n am ics p ro b lem s are solved. T he lea rn in g process is rem iniscent of BTT, as back p ro p ag atio n proceeds th ro u g h the n e tw o rk w hich utilizes a sp atial rep re se n ta tio n of tim e: For each tim e step th e re is a d ed icated n e tw o rk to calculate th e to rq u e values to be a p p lie d to th e ro b o t's jo in ts. N g u y e n a n d W id ro w 's n e tw o rk "u n ra v e lin g " is ex p ressed literally in th e arch itectu re. W hile su ch a stru c tu re is co m p u tatio n ally useful, it is clearly biologically im plausible. F u rth er, th e m in im izatio n of to rq u e change w as enforced by in h ib itio n b e tw ee n to rq u e v a lu e g e n e ra tio n n e u ro n s in n e ig h b o rin g tim e ste p n etw o rk s. T hus th e n e tw o rk to p o g rap h y co n strain s th e fo rm at of the o p tim izatio n criterion to be b ased on tem p o ral sm oothness, ra th e r th an allow ing an arb itrary criterion, as in reinforcem ent learning. K aw ato co m p ares th e o p tim iz a tio n p ro cess c arried o u t b y th e n e tw o rk to th e g ra d ie n t d escen t so lu tio n of th e m in im u m p rin c ip le (w hich w e discussed in C hapter 2). The back p ro p ag atio n th ro u g h tim e is 157 an alo g o u s to th e rev erse tim e in te g ra tio n of th e costate. R ecall the H am ilto n ian definition for the m in im u m principle: H = L + fT p, w here f is th e forw ard system dynam ics equations, p is the costate, a n d L is the cost functional integrand. In the m inim um principle w e search for m inim a by differentiating H w ith respect to the control, u, an d setting it equal to zero. If f an d p are calculated num erically, how ever, then g rad ie n t descent m ay b e u sed (Bryson and H o, 1975): C hoose a control u , perfo rm the forw ard in te g ra tio n to o b tain x a n d th e rev erse in te g ratio n to o b tain p, th en T ^ r j r v calculate p. This is th e g rad ie n t of H w ith resp ec t to u, w hich can then be u sed to change u to m ove it to w a rd a local m inim um . Jordan an d R um elhart (1990) p resen t a recu rren t n etw o rk explicitly based on th e m in im u m p rin cip le, in th eir treatise on le a rn in g w ith a d istal teacher. T hey p re se n t th eir n etw o rk as a m eth o d of learn in g a d esired trajectory of sensations y*(t), p ro d u ced by a sequence of actions u(t), w here th e p ro b lem of m a p p in g y*(t) in to u(t) is u n d e rc o n stra in e d , a n d it is u n k n o w n ho w u(t) relates to the actual sensations y(t). In th eir netw ork, co m p o n en t forward models m ap controls, u(t), a n d en v iro n m en tal states, x(t), in to p re d ic te d sensations. Back p ro p ag a tio n p ro d u c e s th e p artial derivatives n eed ed in the g rad ien t descent algorithm d iscussed above. If one takes the d esired sensation trajectory, y*(t), to be a m inim um cost (e.g. the sequence {0,0,0,...}) an d the actual sensation trajectory, y(t), to be the in cu rre d cost, th en this m odel is applicable to the class of o p tim izatio n problem s w e are interested in. W hile BTT is th e rig h t v a rie ty of a lg o rith m fo r th e class of problem s w hich concern us, it h as a nu m b er of im p lem en tatio n problem s w hich m ake it biologically im plausible: First the back p ro p ag a tio n stage req u ires storage of n eu ral activations for each cycle of activity, essentially req u irin g the physical unraveling depicted by N g u y en an d W idrow . This is b o th n e u ra lly im p lau sib le an d co m p u tatio n ally com plex, as in the physical representation of tim e u sed by K aw ato et al. (1989). Second, BTT does n ot deal w ell w ith noise (W erbos, 1990a). Both of these problem s are su rm o u n te d by rein fo rce m en t lea rn in g , w h ich u se s te m p o ra lly local train in g for th e critic, a n d can be couched in a stochastic p a ra d ig m and 158 th u s a d m it n o ise (W erbos, 1990a). W e n ex t rev iew th e o p tim iza tio n ap p ro ach of dynamic programming (DP), after w hich w e describe a neural n etw o rk im plem entation of reinforcem ent learning based o n D P (w hich is m o re g en eral th a n th e A S E /A C E m odel d iscu ssed above) a n d sh o w its applicability to learning the fam ilies of trajectories discussed in this thesis. In Sect. 5.3.2, w e described dynam ic p ro gram m ing as a m eth o d for d o in g m u ltistag e optim ization, a n d a p p lied it to th e p roblem of finding th e optim al control at each time stage in a discrete tim e trajectory, in order to m in im ize th e overall cost of th e trajecto ry (given a p re d e fin e d cost function an d system dynam ics). D ynam ic pro g ram m in g gives a recursive a lg o rith m , w h ich is b ased on o p tim izin g a single stage, a ssu m in g all su b seq u e n t stages are optim al, an d th en p ro ceed in g b ack w ard s in tim e. T here are tw o key elem ents to the process: First, it calculates th e optim al cost-to-go, th e segm ent of the cost function from the cu rren t stage to the en d , assum ing an optim al trajectory. Second, it calculates th e single stage control w hich optim izes the sum of the (local) single stage cost a n d the rem a in in g cost-to-go. C learly, after th e optim al single stage control and cost are determ ined, the cost-to-go is k n o w n one additional stage backw ard in tim e, an d th e procedure m ay be iterated until the first stage is reached. W erbos (1990a) p re se n ts a g e n eral m e th o d for tu n in g a p la n t c o n tro lle r in th e m a n n e r of rein fo rce m en t lea rn in g . T he c o n tro lle r p ro d u ces a control u(t) based o n the state of th e p lan t R(t) an d the tim e, t. A critic m onitors the state of th e p lan t an d produces an o u tp u t J(t) w hich is a n e stim a tio n of th e c u m u la tiv e fu tu re cost (or b e n efit), as th e c o n tro lle r-p lan t system pro ceed s from th e c u rre n t state. T he cost (or benefit) at each tim e, U (t), is given b y th e en v iro n m en t. T he critic is p aram etrized by a vector of w eights w , an d the controller is p aram etrized b y a v ecto r of w eig h ts w '. W erbos7 H e u ristic D ynam ic P ro g ram m in g (H D P) alg o rith m consists of iteratio n s of th e follow ing tw o a d a p ta tio n steps. In each step n, 1) U p d a te w su c h th a t J(R (t), w ( n )) equals J(<R(t+l)>, w (n-l))+u(R(t)) for all possible vectors R(t), w h ere the expectation 159 v a lu e o f R(t+1) refers to th e ex p ectatio n a ssu m in g th a t u(R (t), w '(n -l)) is u sed to control system actions. 2) U p d a te w ' such th a t u(R(t), w '(n )) m axim izes [or m inim izes] <J(R(t+l), w<n >)>. T he first step a d d re sse s th e accuracy of th e critic's e v alu atio n of the co n tro lle r, w h ile th e seco n d ste p a d d re sses th e p e rfo rm a n c e of th e co n tro ller, assu m in g an accu rate ev alu atio n . (N o te th a t th e critic is analogous to B arto's ACE an d th e controller is analogous to th e ASE.) As W erbos p o in ts o u t, th is is a d irect n eu ral im p le m e n ta tio n of d y n am ic program m ing, in w hich a cost-to-go is calculated for each of a sequence of stages, an d th e decision at each stage is optim ized w ith respect to this cost- to-go. T he critic's o u tp u t is th is cost-to-go, w h ile th e tu n in g of th e controller p ro v id es th e o p tim izatio n w ith resp ect to th is cost. W e n o w tu rn to a neural n etw o rk im plem entation of H D P. In o rd e r to im p lem en t H D P, w e need to im p lem en t th e tw o-step p ro ce d u re described above. Both steps can be v iew ed as m inim ization actions, th e first step as m in im izin g th e e rro r b etw een th e actu al an d estim ated cost-to-go, and th e second step as m inim izing the cost-to-go. To find a m inim um , one m ight use a heuristic, ran d o m , or exhaustive search th ro u g h th e control space. In the m eth o d to be described, gradient descent is used. It has th e a d v an tag e of quickly a n d d irectly p ro ceed in g to a solution. T he d isad v an tag e , as w ith any g ra d ie n t m e th o d , is th a t the solution m ay be only a local m inim um , not the global one. W erbos (1990a) presents a connectionist version of H D P, w hich w e have schem atized in Fig. 7.1, called the B ackpropagated A d ap tiv e C ritic (BAC). The critic an d controller, as described for H D P, are im plem ented as feedforw ard n eu ral netw orks. For reasons th a t w ill becom e clear, there is also a n e u ra l n e tw o rk m odel of th e d ynam ics of th e co n tro lled p lan t, w hich m ap s th e cu rren t state R(t) an d control u(t) in to a p red icted next state for th e system , R(t+s), w h ere s is th e sim u lated tim e step for th e system . The netw orks are trained usin g g rad ien t descent, applied th ro u g h back propagation. The g rad ien t descent form ula for any netw ork (to be VwJ VuJ V 160 U(t) R(t), t J(t) u(t)j R(t+s) Critic w Controller Dyn. Model w" Fig. 7.1. A rchitecture of W erbos' B ackpropagated A d a p tiv e C ritic (BAC). ap p lied iteratively) is, A w = - p.V w e w h ere e is an e rro r function to be m inim ized, p is a learning rate, an d w is th e vector of adjustable n etw o rk p aram eters (e.g. synaptic w eights). To ad ju st th e critic, w e apply this form ula to the critic netw ork w eights, using as th e e rro r function, e j = ( J(F(R,u,t),t +s) +U(t) - J(R,t) ) 2 w h e re F() is th e o u tp u t of th e D ynam ic M odel n etw o rk an d U (t) is the ex tern al reinforcem ent signal. T his im plem ents the first H D P rule. To co m p u te th e g rad ien t w ith respect to the critic's w eights w e em p lo y the differentiation chain ru le (developed for m atrices in A p p en d ix C), d e . J VwCJ d j(R ,t) V w J(R,t) = - 2 V w j(R,t) (j(F(R,u,t),t + s) + U(R,u,t) - j(R,t)) T he first term is the g rad ie n t of a n e tw o rk ’s o u tp u t (in this case J(R,t)) w ith respect to its w eights (in this case th e vector w ), a n d is calculated usin g back pro p ag atio n (see for exam ple R um elhart et al., 1986). V alues of J are obtained by applying the ap p ro p riate values of state (R) an d tim e (t) to th e critic netw ork using its current w eights. N ote th at the state a t tim e t+s is o b tain ed from th e D ynam ic M odel n etw o rk . T he D y n am ic M odel 161 n etw o rk is sim ilarly train ed , using the discrepancy b etw een the p red icted a n d actual "next states" as an erro r m easure: e M = ( R ( t + s ) ~ F ( R ( t ) , u ( t ) , t ) ) 2 Finally, th e m odification ru le for the C ontroller is based on the g rad ien t of th e cost function w ith respect to th e controller's p aram eters, d en o ted by th e vector w ', A w '- - (iV J w This realizes the second H D P ad ap tatio n rule. The g rad ien t is obtained by back p ro p a g a tin g th ro u g h th e cascade of n etw o rk s: C ritic, D ynam ics M odel, C ontroller, as show n in Fig. 7.1. A pplying the chain ru le again, „ . r, r 3R(t + s) 9u(t) Aw = - llV . .J— -----——----- R(t+s) 5u(t) 9w ’ ( 7 2 ) The first term is calculated by back p ro p ag atin g J th o u g h th e critic to th e R(t+s) in p u t. T he seco n d term is calcu lated by co n tin u in g th e back p ro p ag a tio n th ro u g h the D ynam ic M odel to th e u(t) in p u t. Finally, the re s u lt is back p ro p a g a te d th ro u g h th e co n tro ller a n d th e w eig h ts are adjusted in the sta n d ard netw ork tu n in g fashion. This n etw o rk m odel has the ad v an tag e th at, in th a t it uses a critic- based approach, n o back pro p ag atio n th ro u g h tim e is needed. T hus it is a m ore n eu rally p lau sib le m odel th an BTT. If th e back p ro p ag a tio n from th e critic to th e controller seem s an im plausible w ay to tu n e behavior, the calculation m ay be replaced by a strategy such as th at of Barto an d Sutton (1981). W hat is needed is a relationship betw een control choices an d critic o u tp u t. T he d e riv a tiv e p ro v id es these efficiently. B arto a n d S u tto n in ste a d u se a r u n -a n d -tw id d le m e th o d , w h e re a c o n tro l is ch o sen stochastically, a n d its effect on th e q u a lity of th e su b seq u e n t sta te is observed. It is a slow er process because it req u ires trial a n d erro r, b u t eventually th e n eed ed inform ation is collected to tu n e th e controller. In th e analytic v ersio n of dynam ic p ro g ram m in g , absolute control o p tim a are fo u n d th ro u g h ex h au stiv e tech n iq u es, n o t th ro u g h iterativ e num erical procedures. The BAC, in using g rad ie n t descent, explores the 162 local control a n d state space a ro u n d th e c u rre n t so lu tio n , th u s fin d in g o n ly n e ig h b o rin g o p tim a, b u t d o in g so quickly. Further> th e n eu ral im p le m e n ta tio n of th e co n tro ller a n d critic p ro v id e s th e p ro p e rty of g e n e r a liz a tio n : A fter le a rn in g th e se t o f o p tim a l tra je c to rie s fo r c o rre sp o n d in g b o u n d a ry c o n d itio n s, trajecto ries th e n g e n e ra te d for in te rm e d iate b o u n d a ry conditions w ill be in te rp o la tio n s o f th e learn ed optim a. Strictly speaking, th e in terp o lated trajectories are n o t necessarily optim al, b u t w ith the continuous n a tu re of m an y system s of interest, the results are often acceptable. T he cost-to-go is d efin ed b y th e reverse tim e recu rsiv e eq u atio n , J[t]=J[t+s]+U [t], J[tf]=0. T here are tw o approaches to calculate J[t] at each tim e step. O ne is to integrate J[t] backw ards in tim e, as illu strated by the follow ing "C" code: J[tf]=0; for (t=tf-s; t>0; t=t-s) J[t]=J[t+s]+U[t]; The alternative is to go forw ard in time: for (t=0; tctf; t=t+s) J[t]=J[t+s]+U[t]; J]tf]=0; The sacrifice in th e latter strategy is efficiency. The form er strategy takes a single pass of, say, n steps. The latter strateg y takes considerably m ore: A fter th e first p a ss th ro u g h th e b lo ck of code sh o w n , o n ly J[tf] is g u aran teed to have th e correct value. The second to last cost-to-go, J[tf-s], is co m p u ted based on J[tf], b u t before J[tf] is assigned th e correct value, therefore it contains garbage, as do all the earlier cost-to-go values. A fter th e second iteration of the code, correct values are contained in J[tf-s] and J[tf] only. It is only after n iterations of th e code, n^ steps in all, th at all the J[t] contain m eaningful values. This is the approach u sed in critic based re in fo rc e m e n t le a rn in g s tra te g ie s , p ro v id in g a te m p o ra lly local, biologically plausible construct, at the expense of efficiency. It sh o u ld be m en tio n ed th a t w h en a n eu ral n etw o rk is u sed to calculate J[t], since all 163 th e tu n a b le w eig h ts are u se d to g e n e ra te all th e J[t] v alu es th e re is generalization from one train ed J[t] v alu e to another. (This is in contrast to th e se p ara tely sto red critic v alu es in th e sim p lified exam ple given above.) T here is th e p o ten tial for accelerated learning, e.g. train in g J[tf] m ay ad ju st the w eights such th a t J[t] values for tc tf are b etter th a n the initial "garbage" generated before J[tf] w as trained. 7.2 An Application of Reinforcement Learning W e u se th e B ack p ro p ag ated A d a p tiv e C ritic (BAC) d e sig n to im p lem en t connectionist dynam ic p ro g ram m in g for optim ization. The system to be co n tro lled is a one d im en sio n al p o in t, w h o se state is its position, velocity a n d acceleration. It is d riv en by its jerk (the d eriv ativ e of acceleration). The discrete tim e system dynam ics is given by k+1 1 s o‘ o " 0 1 s xk + 0 .0 0 1. K _ s_ u. (7.2) au: w h ere x is th e state, u is the in p u t, an d s is the tim e step. The system ru n s from k=0 to N . T he goal is to learn th e m in im u m jerk trajectory w hich reaches a specified target state, ^ N , from som e initial state, xG. T hus the rein fo rcem en t signal com bines a p e n alty for lack of sm oothness w ith a p en alty for deviation from the desired final state: ’2 k- ( xk' n ; n ~ k ~nj w h ere a is a positive scalar, an d Q is a positive, sem idefinite 3x3 m atrix. W ithout loss of generality, w e set th e desired final state to the zero vector, so th a t this becom es, .2 U , k < N T l) ^ ( Xk - xn ) ' k = n u k= a u ~ , k < N k xT O x . , k = N k k (7.3) 164 T he system is to learn to generate the w h ich m in im izes th e o verall cost N _ N - l ^ U i, = x ktQ x ivt + ^ u t- k = 0 N N k « 0 k The system is im p lem en ted as a p ro g ra m in th e S un-based NSL sim u latio n e n v iro n m en t (W eitzenfeld, 1991). T he critic a n d controller are each im plem ented as three layer netw orks, as sh o w n in Fig. 7.2. The fo u r u n it in p u t layer encodes the system state (position, velocity, an d Pos^ oooo V e U -A - / F I W A ccel w t i m e k 8 y l w 0 Fig. 7.2. T hree-layer netw ork used for critic an d controller in BAC architecture. acceleration) an d tim e as firing rates of single units. T he h id d e n layer co n tain s n in e u n its, an d th e o u tp u t lay er h as a sin g le u n it. In th e controller, this u n it's o u tp u t is u ^ . In the critic it is Jfc. The dynam ics m o d e l w a s im p le m e n te d a n a ly tic a lly , u sin g (7.2). F o r th e b ack propag ation stage (Eqn. 7.1), w e have, by differentiating (7.2), ^ l ^ = ~ a ^ =[O O s0 W e ran the sim ulation w ith a tim e step of 2 m s an d a d u ratio n of 500 m s, for 27 iterations. The initial an d final states w ere static (zero velocity and acceleration) a n d w ere se p ara ted b y 1 m. T he w eig h ts in th e p en alty form ula w ere a=.01, Q=l3X3/ a=0.01 for the first half of the sim ulation (to force convergence to the d esired final state), th en a=0.1 for the second half (to penalize sm oothness strongly). Fig. 7.3 displays th e graphic o u tp u t of 165 input UVER:A_lnput_NN2 wym Ire-1.00 wy max: 1.00 ts:50.00 / s ' 10:6 11 :3 imax* t0:0.00 t1:50.00 U U V Eli:A _C >utpuLK lN 2 wymireO.do wymax:1.00 u:4S.b6 10:0 11 :0 imax:1 t0:0.00 t1:50.00 J lAYtR:A_0>utput_llN1 w ym in:0.00 w ym axd .00 ts ^ 9 ,e 6 10:0 1 1 :0 imnxil t0:0.00 Fig. 7.3. NSL sim ulatio n of BAC arch itectu re learn in g m in im u m jerk trajectory. Six su p erim p o sed iteratio n s show convergence to solution. T op, left to right: Position, velocity, acceleration, a n d tim e. M iddle: Jerk. Bottom: Cost-to-go. th e NSL sim u latio n of the BAC arch itectu re d u rin g th e lea rn in g of th e m inim um -jerk trajectory. D isplayed are th e trajectories of th e state, the control, a n d th e cost-to-go. Fig. 7.4a show s the convergence d u rin g the sim ulation: T he bo tto m trace is th e o p tim al cost, th e u p p e r trace is the convergent cost of th e trajectory being learned. The optim al cost changes because w eight, a, changes, as described above. Fig. 7.4 b an d c show the com ponents of the trajectory after convergence. All four com ponents are in d is tin g u is h a b le fro m th e a n a ly tic so lu tio n to th e m in im u m -je rk o p tim izatio n problem . 166 ptimal Cost 0.6 U o.i- 0.4 0.3- 0 .2 - 0.1 - 0.0 0.0 0.1 0.2 0.3 0.4 0.5 T im e (s) Position (m) Velocity (m /s) 4 - -2 - 0.0 0.1 0.2 0.3 0.4 0.5 Tim e (s) Fig. 7.4. R esu lts of re in fo rc e m e n t le a rn in g sim u la tio n . a C onvergence of th e sim ulation, m ea su re d in term s of cost being m in im ized , w ith com parison to th e o p tim al cost, b P osition an d velocity trajectories after convergence, c A cceleration a n d jerk (input) trajectories. All four trajectories are q u ite close to analytic m in im u m jerk trajectories. 167 7.3 Discussion H aving com pared different types of trajectory learning m odels, w e converged o n reinforcem ent learning as being th e m o st au to n o m o u s and neurally plausible. Instead of choosing the netw ork architecture to serve a specific o p tim iz a tio n p rin c ip le (as in K aw ato et al., 1989), a rb itra ry m easures of cost are in p u t from th e outside. T hus to learn to control the sam e p lan t u n d e r a different optim ization, th e sam e neural h a rd w are m ay b e used. F u rth er, it is n o t th a t a solution to th e o p tim izatio n is fo u n d analytically an d th en train ed into th e n etw o rk (as in M assone an d Bizzi, 1989), in ste a d th e n e tw o rk fin d s th e o p tim u m b a se d o n in te rn a l calculations. The resu lt is that, based only on th e p en alty given by (7.3), th e system generates th e trajectories show n in Fig. 7.4b,c w hich m atch the analytic solution to the optim ization problem , as sh o w n in C h ap ter 2. Just as in the problem s of learning to play checkers an d learn in g to balance an in v erted p e n d u lu m discussed above, it is not th at the system is train ed on a so lu tio n w hich it later m im ics, ra th e r it discovers a so lu tio n th ro u g h experience. This is th e m ag n itu d e of th e difference b etw een supervised an d reinforcem ent learning in n eu ral system s. As d escrib ed earlier, g ra d ie n t d escen t fin d s local o p tim a. The sim u latio n described in Sect. 7.2 o p tim ized the trajectory for th e single given initial condition, treatin g only th e states o n or n ear th e trajectory of interest. In o rd er to train a tru e optim al controller, in th e sense of the c o n tro ller d escribed in C h a p te r 2, th e n eu ral co n tro ller of th is ch ap ter w o u ld b e tra in e d th ro u g h o u t th e state space, so th a t it g en erates th e o p tim al trajectory (or an interpolation of optim a) from any initial state to th e given targ e t state. T hen, given th e o p tim al resp o n se to any initial state, the correct p ertu rb atio n response is also generated. R ecurrence giving rise to dynam ic p attern p ro d u ctio n is n o t show n explicitly in Fig. 7.1, rath e r it is the reciprocal influence of th e controller o n th e p la n t (via u(t)) an d th e p la n t's in p u t to th e co n tro ller (via R(t)) w hich generates dynam ic trajectories. If a feedforw ard system is desired, th e o u tp u t of the forw ard m odel R(t+s) m ay be delayed by one tim e unit, 168 a b D-t u(t) R(t) u(t) R(t) Controller Controller Fig. 7.5. a C ontroller w ith time remaining (d u ratio n m in u s c u rre n t tim e) as in p u t, b C o n tro ller w ith d u ra tio n o n ly (en co d ed as fre q u e n c y o f p u lse ) as in p u t. In te rn a l d e c re m e n t u p d a te s rep re se n ta tio n of time remaining. th en in p u t to th e controller. T hen th e in te rn a l n e tw o rk recu rren ce is explicit. In e ith er case, an unrealistic sp atial rep re se n ta tio n o f tim e is unnecessary to generate dynam ic patterns. T he optim al controllers of C h ap ters 2 an d 4 h a d as in p u t th e tim e rem aining. The n eu ral controller in th e BAC architecture of Fig. 7.1 has cu rren t tim e as an in p u t. In general, there are tw o pieces of inform ation in the tem p o ral signal: The m ovem ent d u ratio n an d th e tim e for w hich th e m ovem ent has progressed, expressed eith er as cu rren t tim e or as tim e rem ain in g . Jo rd an (1988) sh o w ed th a t a trajectory g en eratio n n etw o rk could be p aram etrized by a set of plan inputs, w hich specify w hich of a set of trajectories to be generated. D ifferent trajectories could be train ed into th e n etw o rk w ith different p lan in p u ts, a n d later recalled ap p ro p riately . In the case of o u r BAC application, the p lan in p u ts w o u ld specify both the targ et location a n d th e desired m ovem ent d u ra tio n w hich could be either calculated b y a d ed icated n etw o rk p erfo rm in g an analysis such as th at d e scrib e d in C h a p te r 3, o r le a rn e d alo n g w ith th e trajecto ry in th e o p tim iz a tio n p ro c e d u re . (The d u ra tio n c o u ld fu rth e r be subject to coordination delays as m odeled in C hapter 4.) In addition to th e p lan n ed d u ratio n , th e cu rren t tim e is generally indicated as having external origin, a lth o u g h th e id e a of a g lo b al sy n c h ro n iz in g clock is b io lo g ic ally u n p alatab le. H ow ever, it is straig h t fo rw a rd to show ho w in tern ally a co n tro ller m ig h t calculate d u ra tio n rem a in in g from an in itial d u ra tio n 169 signal. Fig. 7.5a show s a controller w hich d e p en d s o n a tim e remaining in p u t. Fig. 7.5b sh o w s a m o d ified stru c tu re w h ich in itially accepts a d u ra tio n signal, th en decrem ents it b y th e tim e step, s, to m ain tain an in te rn a l re p re se n ta tio n of tim e rem aining. C learly, th is is an ad hoc in tern al clock m odel — in reality th e chronological function, w o u ld likely com e from dynam ic n etw o rk properties, w ith tim e being im plicitly coded in the n etw o rk state. The p o in t is to illustrate th at tim ing m ay com e from a n in itial d u ra tio n in stru c tio n , a global clock is u n n e c e ssa ry for th e n e tw o rk 's function. Schneider an d Z em icke (1989) fo u n d th a t d u rin g a reach in g task, jerk decreased w ith practice. The task specified start a n d en d points, an d a b arrier w hich th e h an d h ad to avoid, b u t the subject w as free to choose any trajectory, an d d id so such th a t jerk d ecreased w ith tim e. This resu lt su p p o rts th e n o tio n th a t w e learn co o rd in ated m o v em en t n o t o n ly to correctly perform the specified task, b u t also to be efficient in doing so, and th a t the tendency to w ard efficiency is internally m otivated, n o t instructed. 170 C h a p t e r 8 C o n c l u s i o n W e su m m a riz e th e re s u lts of th is d is s e rta tio n , e m p h a s iz in g th e cum ulative n a tu re of th e sequence of developed m odels.__________________ W e fo u n d th a t a m odel of control b ased on co n tin u o u s afferen t in teg ratio n , tu n e d w ith a m in im u m o f su p erv isio n , u n d e r p erfo rm an ce criteria in co rp o ratin g efficiency of m ovem ent, accuracy, a n d sp eed , can rep ro d u ce findings from a variety of m o to r b ehavior studies: In C h ap ter 4, w e fo u n d th a t tran sp o rt an d p resh ap e are coordinated via th eir tim ing, a n d th is c o o rd in a tio n w a s e x p la in e d b y "M a x im u m tim e " sy n ch ro n izatio n a n d a "C onstant enclose tim e" constraint,^ th a t n o rm al an d p e rtu rb e d tran sp o rt and preshape trajectories w ere based in optim ality p rin c ip le s fo r m o v em en t efficiency (sm oothness) w ith a p e n a lty for a p ertu re a d d ed to preshape, and th at delays in inform ation flow betw een sensorim otor program s for reach an d grasp affected tim ing an d kinem atics of these actions. In reaching to g rasp a n d reaching to point, the different tasks p u t different constraints on th e final state of the h an d . The differing constraints at the final tim e affected th e entire trajectory. W e w ere able to rep ro d u ce tim in g an d kinem atic d a ta from reaching-to-grasp experim ents, a n d to p re d ic t fu tu re resu lts. W e w ere able to m o d el th e effect of p e rtu rb a tio n of targ et location an d size, an d th e trajectory of reaching, w h e n th e subject g rasp ed w ith th e h a n d at a p articu lar o rien tatio n w ith respect to the target. 171 In C hapter 3, w e show ed th a t d u ratio n in m ovem ent w as a trade-off b etw een efficiency a n d quickness, in tro d u cin g a cost criterion w hich w as an extension of th e o rig in a l efficiency (m in im u m -jerk ) c riterio n , an d h en ce re ta in e d th e earlier resu lts. In C h ap ter 5, w e sh o w ed th a t th e sp eed /accu racy trade-off an d velocity profile characteristics d u rin g accurate reach w ere based in com bined optim ization of accuracy an d sm oothness. A gain, the accuracy based criterion w as show n to be a n extension of the m in im u m -jerk m o d el to a d d re ss hig h -accu racy m o v em en ts as w ell as low -accuracy ones. A single delayed-feedback control m o d el ex p lain ed stereotypical reaching m ovem ents w hich w ere p rev io u sly th o u g h t to be bi-m odal in th e n a tu re of th eir control. A rm trajectories to w ard extrem e p o rtio n s of th e reachable space, an d differing reaction tim es in reach to targ et p ertu rb atio n s in different directions w ere explained usin g a m odel of lim b dynam ics in C hapter 6. In th e proxim al w orkspace, results w ere sim ilar to those of the kinem atic m odel so, as w ith the earlier m odels, the dynam ics-based m odel provides an enlargem ent of the se t of reproducible experim ental results. Finally, in C h ap ter 7 w e sh o w ed th a t optim ality is learnable by a self-organizing system . As w e developed the m otor control m odels th a t allow ed us to d raw th e se c o n clu sio n s, w e in tro d u c e d to th e fie ld a n o v e l fu sio n of o p tim izatio n an d control, w here w e sim ultaneously explained m ovem ent kinem atics an d m odel-based in teg ratio n of afferent a n d efferent signals. W e show ed h o w com plex m ovem ent patterns m ay com e ab o u t as a resu lt of in te ra c tio n of co n tro lle r a n d p la n t, g iv in g a n e w p e rsp e c tiv e on trajectory "planning." 172 A p p e n d ic e s Appendix A. Analysis of Variance (ANOVA) The Idea Behind ANOVA The problem to be addressed is that of deciding w h eth er several sets of v alu es w ere g en erated by a single process o r b y differen t processes. C onsider th e tw o gaussian distributions in Fig. A .la. Since th e values in each g ro u p cluster a ro u n d their m ean, an d the m eans are far ap art, w e assu m e th a t th e g ro u p s w ere g e n erate d b y d ifferen t processes. It is unlikely th a t th e values w ere chosen from a single p o p u latio n a n d "just h ap p en ed " to com e o u t so different in each group. N o w consider th e tw o g ro u p s in Fig. A .lb . A gain th ey each are d istrib u te d a ro u n d th eir respective m eans, b u t now the separation of the m eans is sm all com pared to th e sp re ad of values (or variance) in th e groups. It could be th at each g ro u p w as g e n e ra te d by a se p a ra te p ro cess w ith its o w n sto ch astic d istrib u tio n , or it m ight be th at the values w ere ran d o m ly chosen from a single p o p ulation, and th e difference betw een the m eans is d u e to ran d o m differences in th e selected values. T he A nalysis of V ariance (ANOVA) p ro v id es an objective m ethod of d eterm in in g w hich is th e case, an d does so for tw o or more g ro u p s of values sim ultaneously. N o w co n sid er a g ro u p of n v a lu e s {xj} selected fro m a larg e p o p ulation. F urther, consider the quantity, 1 2 n.Z. ( Xi - 2) 1=1 173 a value b A value Fig A .I. a T w o g a u ssia n d istrib u tio n s of v a lu e s, g e n erate d by different processes, b Tw o gaussian distributions of values, possibly generated by the sam e process. W h at v alu e of z m inim izes this quantity? To an sw er the question, w e differentiate th e expression w ith respect to z a n d set th e derivative equal to zero. T h at is, w h en z is th e m ean of th e g ro u p , th e q u a n tity is m inim ized. C oincidentally, w hen z is the m ean, the expression is identical to the or, sim plifying, so that 1 y n L x i i= l 174 S' 0 3 1 tu value x y T Fig. A.2. G ra p h ic re p re s e n ta tio n o f in d iv id u a l m e a n s a n d p o p u la tio n m ean. form ula for the variance of the group. In Fig. A.2 w e show th e m eans for in d iv id u a l g ro u p s {x^} and {yj} an d the m ean of the entire p o p u latio n , T . A N O V A co m p a res th e su m of th e v a ria n ce s of th e g ro u p s w h e n co m p u ted using th eir o w n m eans to the variance of the en tire p o p u latio n u s in g th e p o p u la tio n 's m ean . A s th e sim p le d e riv a tio n a b o v e d em onstrates, the latter variance m easure w ill be alw ays be larger. (The difference dim inishes, how ever, as th e in d iv id u a l m eans a p p ro a c h the p o p u la tio n m ean, as is th e case w h en large g ro u p s are chosen from a single population.) A N O V A ju d g es th e significance of th e difference b e tw ee n th e p o p u la tio n v arian ce an d th e in d iv id u a l variances b y co m p a rin g th is difference to the variance itself. As show n in Fig. A. lb , differences m ay be in sig n ifican t w h en th ey are sm all c o m p ared to th e varian ce of values. L astly, th e analysis is done, as is often th e case in statistical stu d ies, in term s of rejecting a null hypothesis. T he n u ll h y p o th esis is th a t th e g ro u p s are extracted from a single p o p u latio n an d th at discrepancies in th eir m eans com e from intrinsic ran d o m n ess in the sam pled values. The ex p erim en ter is to choose a p value, typically .05 or .01. If p is .01, the process seeks to reject the null hypothesis at the .01 level, m e a n in g th a t th e re is less th a n a .01 chance th a t th e d iscrep an cy cam e a b o u t from ra n d o m flu ctu atio n s, v ersu s a g en u in e difference b etw een th e g ro u p s. (T his is also called rejecting the null hypothesis at the 99% confidence level.) In tu itiv ely , the d esired rejection d e p e n d s o n h a v in g 1) a larg e en o u g h difference betw een th e g roups, 2) a sm all enough variance, an d 3) 175 a large en o u g h sam ple size. If few values are exam ined an d the variance of th e p o p u latio n is large, th en it is quite probable th a t even th o u g h the g ro u p s of values are chosen from the sam e p o p u latio n , the g ro u p s m ay hav e a large discrepancy. W e n o w tu rn to th e form al treatm en t of these in tu itiv e ideas. ANOVA Formulae The quantities and term s u sed here follow H ardyck an d Petrinovich (1969). T here are th ree key q u a n titie s rela te d to th e co m p u tatio n of p o p u latio n variance: The total mean square, s ^ , th e mean square w ithin groups, s^y, an d th e mean square between groups, s ^ . T he first is th e variance com puted using the p o p u latio n m ean, the second is th e variance com puted using th e g ro u p m eans, an d the th ird is th e difference betw een the tw o. The form ulae for their com putation are, w here N is the total nu m b er of sam pled values, k is the n u m b er of gro u p s (>2), T is the total population m ean, x is th e m ean for g ro u p j, an d nj is th e n u m b er of sam ples in g ro u p j. Ideally, if the n u ll h y p o th esis w ere tru e, s ^ w o u ld be zero, for each of the g ro u p m eans w o u ld equal th e i e a ll groups p o p u latio n m ean. In reality, s ^ is g reater th an zero , a n d analyzing its significance in v o lv es co m p arin g s ^ , th e v a ria tio n betw een g ro u p s, to s^y, the v ariatio n w ithin groups. This com parison, called the F-ratio, is defined as, 176 T his v alu e is th en com pared ag ain st an A N OVA tab le (e.g. T able A-4 in H ard y ck a n d P etrinovich, 1969). G iven a p level, a n u m b er of g ro u p s k a n d a sam p le size N , th is tab le p ro v id es v alu es of th e F -ratio w hose p ro b ab ility of occurrence is less th a n p if th e n u ll h y p o th esis is true. (Thus, for a sufficiently low p value, the experim enter can be quite sure th a t th e F-ratio listed in the table — or any h igher v alu e -- occurs only in th e presence of a valid experim ental effect.) The listed F-ratio increases as N increases or p decreases. F-ratio values are typically given for p=.01 and p=.05. Application to Enclose Time (ET) Analysis From th e experim ent of P aulignan et al. (1991a) w e obtained tim ing d a ta for reach an d g rasp u n d e r various experim ental conditions. O u r h y p o th e sis w as th a t th e enclose tim e (ET) for th e h a n d (d efin ed as m ovem ent tim e, MT, m inus tim e to m axim um g rip a p ertu re, TGA) w as c o n stan t d e sp ite p e rtu rb a tio n s in object location, w h ich do affect MT, TGA, an d o th er m easu red values. T here w ere six g ro u p s of ET values. "Blocked trials" of reach an d g rasp w ere perform ed to targets at 10°, 20°, a n d 30° of eccentricity. (For fu rth e r d escrip tio n of th e ex p erim en tal a p p aratu s, see C h ap ter 4.) T hen "control trials" (C20) w ere p erfo rm ed , w h ere subjects reached to the center (20°) target. A m ong th e C20 trials w ere in tersp ersed unexpected "p ertu rb atio n trials," w here th e targ et w as m oved either leftw ard to the 10° target (PL) or rig h tw ard to th e 30° target (PR). G roup sizes an d m ean ET values are given in Table A .I. A six group A N O V A w as perform ed w ith the 385 sam ples, yielding an F-ratio of 3.99. For th e n u m b er of g ro u p s a n d sam ples u se d , th e m in im u m F-ratio for significance a t the 95% confidence level is 2.23 an d at th e 99% confidence level is 3.06, so th ere is clearly an effect. H ow ever, it is n o t clear th at the effect is d u e to p ertu rb atio n instead of som e other experim ental factor. To resolve this, w e inspected the m ean values in Table A .l, seeing th a t they 177 clearly cluster a ro u n d 200 m s for th e blocked trials an d 180 m s for the c o n tro l/p e rtu rb e d trials. W e perform ed a th ree g ro u p A N OVA for the 160 sam ples from the Group nj ET (ms) B10 70 213.5 B20 77 204.9 B30 78 2 1 1 .8 C20 56 185.4 PL 48 177.4 PR 56 179.3 Table A .I. G roup sizes an d m eans for Enclose Tim e (ET) data. c o n tro l/p e rtu rb e d trials, obtaining an F-ratio of .271. T he m in im u m F- ratio values are 3.06 for the 95% level and 4.75 for th e 99% level, so th e b etw een subject variance is far from significant, i.e. p ertu rb a tio n has no effect on enclose tim e. Perform ing the A N O V A on th e 225 sam ples from the blocked trials p ro d u ced an F-ratio of .314, again w ell below th e 95% level of 3.04 a n d th e 99% level of 4.71. This im plies th a t targ et location does n o t affect enclose tim e. This im plies th a t th e effect seen in the six g ro u p A N O V A is d u e to the blocked versus control factor rath er th an the p ertu rb atio n . C h ap ter 4 fu rth er discusses th e significance of these results for the m odel of coordination of reach an d grasp. Appendix B. Description of the NSL Simulation of the Transport / Prehension Model A NSL sim u la tio n is c o n stru cted in term s of in te ra c tin g r u n m odules w hich collectively p ro d u ce the activity of a com plex, d y nam ic system . For the m odel discussed in C h ap ter 4, th e ru n m odels sim ulate th e system controlling reach and prehension, schem atized in Fig. 4.4. The sy stem sim u lates tw o -d im en sio n al h a n d reach in g trajectories, a n d the 178 a p e rtu re form ation of th e h a n d in p resh ap e an d grasp. T arget location a n d size are represented, an d optionally p ertu rb ed . The n eed ed d u ratio n is calculated for each of reach a n d g rasp , an d th is tim e in fo rm atio n is sh ared as sh o w n in Fig. 4.4. Both spatial an d tem p o ral in fo rm atio n are subject to com m unication delays, also show n in th e d iagram . Trajectory g e n e ra tio n for each of reach, p resh ap e , an d enclose is b a se d o n the in te rac tio n of a sim u lated feedback co n tro ller a n d a sim u la te d p lant, rep resen ted kinem atically. The in p u t to each controller is th e p la n t state, targ et location, an d p lan n ed duration. The o u tp u t of th e sim ulation is the tim e course of the kinem atics of the w rist and finger ap ertu re, in term s of position, velocity, and acceleration. T he fo llo w in g is a list of th e m o d u le s w h ic h c o m p o se th e sim ulation, along w ith a brief description of th eir function. A long w ith each m o d u le description are the param eters relev an t to th a t m odule. All tim e values have units of seconds. All distance values are in cm. G eneral S im ulation P aram eters delta .02 sim ulation tim e step end__tim e .800 sim u latio n d u ratio n RU N _M O D U LE(location) - sets a n d p e rtu rb s targ e t location, p ro v id in g sim u latio n in p u t. TTA RG ET (2,24) (x, y) location of center target LeftTarget (-6,27) (x, y) location of target for left p ertu rb atio n R ightT arget (8,21) (x, y) location of target for right p erturbation RUN_M ODULE(size) - sets an d perturbs target size. P in itial 4 initial gap betw een fingers1 Sm allO bjectSize 6.2 ap ertu re w hen g rip p in g sm all object LargeObjectSize 10.6 ap ertu re w hen g ripping large object R U N _ M O D U L E (T arg T o M ax ap ertu re) - m a p s ta rg e t size to h a n d m axim um aperture, follow ing the form ula (from Sect. 4.3.1): 1 All aperture values include offsets of IRED markers from fingers, about 4 cm. 179 M ax ap = .75 * Dowel diameter + 4.55 cm C alculated m axim um a p ertu re contributes to th e targ et state u sed by the PreshapeTrajGen RUN_M ODULE below . ta_b 4.55 in te rce p t of lin e ar rela tio n sh ip b e tw ee n m ax. aperture an d target size ta_m 0.75 slo p e of lin e a r re la tio n s h ip b e tw e e n m ax. aperture an d target size R U N _M O D U L E (T tim eN eeded) - calcu lates tim e n e ed e d b y tra n s p o rt process, b ased on nom inal tran sp o rt tim e an d w h eth er targ et location has been perturbed. A lso it im plem ents th e delays Al an d Al p seen in m inim um d u ratio n need ed by tran sp o rt tim e of p e rtu rb a tio n (a p p lie s to size a n d location) Ap, discussed in Sect. 4.4 App, discussed in Sect. 4.4 R U N _M O D U L E (P tim eN eeded) - calcu lates tim e n e ed e d b y p re sh a p e process, b ased o n nom inal p resh ap e a n d enclose tim es, a n d w h eth er targ et size has been p ertu rb ed . It im plem ents th e delays Ag an d Agp seen in Fig. 4.4. PMVMT_TIME .310 m inim um d u ratio n need ed by p resh ap e ET 0.200 Enclose tim e: T im e lead of m axim um a p ertu re before m ovem ent end. sizD elay 0.250 Ag, discussed in Sect. 4.4 STDelay 0.050 Agp, discussed in Sect. 4.4 RUN _M OD ULE(TM A X) - calculates th e m o v em en t tim e u se d b y th e tra n sp o rt controller, b ased on n e e d e d d u ra tio n s for tra n s p o rt an d preshape, as show n in the left box labeled "MAX" in Fig. 4.4. RUN _M OD ULE(PM AX ) - calculates the m o v em en t tim e u se d b y th e p re sh a p e co n tro ller, b ased o n n e ed e d d u ra tio n s for tra n s p o rt an d preshape, as show n in th e rig h t box labeled "MAX" in Fig. 4.4. Fig. 4.4. TMVMT_TIME .510 p trb T im e 0.000 locD elay 0.100 LPDelay 0.100 180 R U N _M O D U L E (T trajG en) - im p le m e n ts th e " T ra n sp o rt F eed b ack C o n tro ller" in Fig. 4.4, m a p p in g th e h a n d 's s ta te (p o sitio n a n d velocity), ta rg e t p o sitio n , term in a l acceleration, a n d th e specified d u ratio n into th e h a n d 's d riv in g in p u t, using th e fo rm u la Sect. 4.3.2 (Eqn. 4.7), u. = 60AX./D3 - 3 6 v . / D 2 -3(3a. - a f. |/D i i i \ l lj A lso im plem ents the sim ple kinem atic m odel for the h a n d 's transfer fu n ctio n (in te g ratin g th e acceleration co m m an d in to v elo city an d position), and the feedback delay, Ap. appA ngl 90 an g u lar direction of h a n d ’s a p p ro a ch to target (deg.) term A ccel -5 final tim e constraint o n acceleration m ag n itu d e (m /s 2) R U N _M O D U L E (PreshapeT rajG en) - im p lem en ts th e sim p le kinem atic m odel for the ap ertu re's transfer function (integrating th e acceleration com m and in to velocity an d position), an d th e feedback delay, Ap. Im plem ents th e "Preshape Feedback C ontroller" in Fig. 4.4, m ap p in g the ap ertu re's state (position a n d velocity), the goal ap ertu re, and the specified d u ra tio n in to th e a p ertu re acceleration, u sin g th e form ula derived in Sect. 4.3.3 (Eqn. 4.21), u(tn)= —= l - [ x ° ( e 2D '+e_ 2 D ’-2oos2D ') ° det t 2 1 + V 2 T x ° ( e 2D,- e ' 2D') + x^4 sinD ’(e - ^ - ) ] T riggers th e RU N _M O D U LE "E ncloseT rajG en" w h e n p re s h a p e is com plete. Pw t 0.09 D efines t in the above form ula: Pwt=A/2x. R U N _M O D U L E (E ncloseT rajG en) - sim ila r to "P re sh a p eT rajG e n ," it controls the h a n d 's ap ertu re d u rin g the enclose phase. RU N _M O D U LE(dow elContact) - sim ulates the inflexible physical target by term inating h an d closure w hen targ et diam eter is reached. 181 The system ’s initialization m odules are as follows: IN IT _M O D U L E(T IN C init) - in itializes tim e v ariab les for p e rtu rb a tio n sim u la tio n . IN IT _M O D U L E (perturb_init) - sets p a ra m e te rs for m o v e m e n t tim e increase, for each type of perturbation. IN IT_M O D U LE(transport_init) - initializes state variables associated w ith tran sp o rt controller and plant. IN IT_M O D U LE(preshape_init) - initializes state variables associated w ith p resh ap e controller and plant. IN IT_M O D U LE(delayInit) - initializes latency constructs an d associated variables. INIT_M ODULE(XYgraphInit) - initializes interface to graphics display. Appendix C. Matrix Differential Calculus W e describe the convention u sed for differentiation of vectors and m atrices, an d the form at for the chain rule u n d e r this convention. If z is a fu n ctio n of x, i.e. z(x), z is an n d im en sio n al fu nction (z(x) is an n x l vector) a n d x is an m d im en sio n al v ecto r, th en w e d efin e th e first derivative, or Jacobian, to be: a d z- d z i_ dx dx. M J i.e. an n X m m atrix. The corollaries are 1) th at the derivative of a vector function of a scalar variable has th e form of a colum n vector, a n d 2) th at th e derivative of a scalar function of a vector variable, or gradient, is a row vector. For z(y(x)), the chain rule has the form , d z = _dz 9y dx dy dx This can be verified by considering one com ponent of the Jacobian, 182 M azi 3y dx dy dx. v d y , dx. i,j ) k k 1 w h ere th e rig h t m o st expression is the sta n d a rd fo rm u la for th e chain ru le. A pplying the chain rule to the derivative of th e critic o u tp u t, J (of C hapter 7), w ith respect to the w eights of the action netw ork, w a, w e have, 9J _ 9J 3 R (t+ s)fo(t) SwA ~ 3R(t + s) du(t) d w A an d finally, using the g rad ien t notation w here ap p ro p riate, v T _ V x 3R(t + s) du(t) WA R(t+s) du(t) 5w 183 R e f e r e n c e s A b en d W , Bizzi E, M orasso P (1982) H u m a n arm trajectory form ation, Brain 105: 331-348 A rbib M A (1981) P ercep tu al stru c tu re s a n d d istrib u te d m o to r control, Handbook of Physiology, Sect. I. The Nervous System , Vol 2. M otor control, Brooks VB. ed. B altim ore, W illiam s and W ilkins, 1449-1480 A rbib M A , Iberall T, Lyons D (1985) C o o rd in ated control p ro g ram s for control of th e h an d s, Hand Function and the Neocortex, A W G o o d w in an d I D arian-Sm ith, eds. Exp Brain Res (Suppl) 10: 111-129 A tkeson CG, H ollerbach JM (1985) K inem atic featu res of u n re stra in e d vertical arm m ovem ents. J N eurosci 5(9): 2318-2330 B arto A G , S u tto n RS (1981) L an d m ark learn in g : A n illu stra tio n of associative search, Biol C ybern, 42: 1-8 Barto AG, S utton RS, A nderson CW (1983) N euronlike ad ap tiv e elem ents th at can solve difficult learning control problem s, IEEE Trans Sys, M an, C ybern SMC-13(5): 834-846 B arto AG, S u tto n RS, W atk in s CJC H (1989) L earn in g a n d seq u en tial decision m aking, U. M ass., COINS Technical R eport, 89-95 Bertsekas D P (1976a) A pplications in specific areas, Dynamic Programming and Stochastic Control, N ew York, A cadem ic Press, 70-110 Bertsekas D P (1976b) Problem s w ith im perfect state inform ation, D ynam ic Programming and Stochastic Control, N e w York, A cadem ic P ress, 111- 178 Bizzi E, A ccornero N , C happie W , H ogan N (1984) P o stu re control an d trajectory form ation d u rin g arm m ovem ent. J N eurosci 4(11): 2738-2744 B ryson AE, H o YC (1975) A p p lie d O p tim a l C o n tro l, N e w Y ork, H em isphere Publ. 184 B uhm ann J, S chulten K (1988) S toring sequences of b iased p a tte rn s in n eu ral n etw o rk s w ith stochastic dynam ics, Neural Computers (N euss 1987), R E ckm iller an d C h v on der M alsburg, ed s., B erlin, Springer- V erlag, 231-242 B ullock D , G ro ssb e rg S (1988) N e u ra l d y n a m ics of p la n n e d arm m ovem ents: E m ergent invariants an d speed-accuracy properties du rin g tra je c to ry fo rm a tio n , N eural N etw orks and N atural Intelligence, S G rossberg, ed., C am bridge, MIT Press, 553-600 C raik KJW (1947) T heory of th e h u m an o p erato r in control system s. I. The operator as an engineering system . Br J Psychol 38: 56-61 C rossm an ERFW, G oodeve PJ (1983) Feedback control of hand-m ovem ent an d Fitts' law , Q uart J Exp Psychol 35A: 251-278 D ornay M, U no Y, K aw ato M, S uzuki R (1992) S im u latio n of o p tim al m o v e m e n ts u s in g th e m in im u m -m u s c le -te n s io n -c h a n g e m o d e l, Advances in Neural Information Processing System s 4„ A M oody, SJ H anson, an d RP L ippm ann, eds. San M ateo, M organ K aufm ann Publ., in press. F eldm an AG (1986) O nce m ore on th e eq u ilib riu m -p o in t h y p o th esis (X m odel) for m otor control. J M otor Behav 18(1): 17-54 Fitts PM (1954) The inform ational capacity of th e h u m an m o to r system in controlling the am plitude of m ovem ent. J Exp Psychol, 47: 381-391 Flash T (1987) T he control of h a n d equilibrium trajectories in m ulti-joint arm m ovem ents. Biol C ybern 57: 257-274 F lash T, H enis E (1992) A rm trajectory m o d ificatio n d u rin g reach in g tow ards visual targets, J Cog N euro, 3(3): 220-230 F lash T, H o g an N (1985) T he c o o rd in a tio n of a rm m o v em en ts: an exp erim entally confirm ed m athem atical m odel. J N eurosci 5(7): 1688- 1703 Fuchs AF, K aneko CRS, S cudder CA (1985) B rainstem control of saccadic eye m ovem ents, A nn Rev N eurosci, 8: 307-337 G entilucci M, C hieffi S, Scarpa M , C astiello U (1992) T em poral coupling b e tw e e n tr a n s p o rt a n d g ra s p c o m p o n e n ts d u rin g p re h e n s io n m ovem ents: Effects of visual perturbation, Behav Brain Res, 47: 71-82 185 G eorgopoulos AP, K alaska JF, M assey JT (1981) Spatial trajectories an d reactio n tim es of aim ed m ovem ents: effects of practice, u n certain ty , an d change in target location. J N europhysiol 46(4): 725-743 G oodale M A , Pelisson D, Prablanc C (1986) L arge adjustm ents in visually g u id ed reaching d o n o t d ep en d on vision of the h an d or p ercep tio n of targ et displacem ent. N atu re 320(6064): 748-750 G ro ssb e rg S (1971) E m b e d d in g F ields: U n d e rly in g P h ilo so p h y , M a th e m a tic s, a n d A p p lic a tio n s to P sy ch o lo g y , P h y sio lo g y , a n d A natom y, J C ybern 1: 28-50 H a rd y c k C D , P e trin o v ich LF (1969) Introduction to Statistics for the Behavioral Sciences, P h ilad elp h ia, S aunders Publ. H irayam a M, K aw ato M, Jordan M I (1992) Speed-accuracy trade-off of arm m o v em en t p red ic te d by the cascade n eu ral n e tw o rk m o d el, J M otor Behav, in press H off B, A rbib M A (1992a) A m odel of th e effects of speed, accuracy and p e rtu rb a tio n o n v isu ally g u id ed reaching, Control of A rm M ovement in Space: N europhysiological and C om putational A pproaches, R C am initi ed., E xperim ental Brain R esearch S upplem ent 22, in press H off B, A rbib M A (1992b) Sim ulation of interaction of h a n d tran sp o rt and p resh ap e d u rin g visually g u id ed reaching to p ertu rb ed targets, J M otor Behav, su b m itted H o g a n N (1984) A n o rg a n iz in g p rin c ip le for a class o f v o lu n ta ry m ovem ents. J N eurosci 4(11): 2745-2754 H o p field JJ (1982) N eu ral netw orks a n d physical system s w ith em erg en t collective com putational abilities, Proc N a t A cad Sci, USA, 79: 2554-2558 H o u k JC, Singh SP, Fisher C, Barto AG (1990) A n A daptive Sensorim otor N etw o rk In sp ired by the A natom y an d Physiology of th e C erebellum , Neural Networks for Control, W T M iller, RS S utton, PJ W erbos eds., C am bridge, MIT Press, 301-348 Je an n e ro d M (1981) In te rse g m e n ta l c o o rd in a tio n d u rin g rea ch in g at n a tu ra l v isu a l objects, A tten tio n and Performance IX , J L ong, A B addeley, eds. H illsdale, E rlbaum , 153-168 Je an n e ro d M (1984) T he tim in g of n a tu ra l p re h e n sio n m o v em en ts. J M otor Behav 16(3): 235-254 186 Je a n n e ro d M (1988) The neural and behavioral organization of goal- directed movem ents. O xford, C laren d o n Jeannerod M , Biguer B (1982) V isuom otor m echanism s in reaching w ithin e x tra -p erso n a l space, Advances in the A nalysis of Visual Behavior, DJ Ingle, RJW M ansfield. M A G oodale, eds. C am bridge, MIT Press, 387-409 Jordan M I (1988) Supervised learning and system s w ith excess degrees of freedom , MIT COINS Technical R eport 88-27 Jordan M I, R um elhart DE (1992) F orw ard m odels: S upervised learning j w ith a distal teacher, C ognitive Science, in press K atayam a M, K aw ato M (1991) L earning trajectory an d force control of an artificial m u scle arm : N e u ra l n e tw o rk c o n tro l w ith h ierarch ical o b jectiv e fu n c tio n s, International Conference on Advance Robotics, Pisa, Italy K aw ato K, Isobe M, M aeda Y, Suzuki R (1988) C oordinates transform ation a n d lea rn in g control for visually g u id e d v o lu n ta ry m o v em en t w ith iteration: A N ew ton-like m eth o d in function space, Biol C ybern, 59: 161-177 K aw ato M, M aeda Y, U no Y, Suzuki R (1989) T rajectory form ation of arm m o v em en t b y cascad e n e u ra l n e tw o rk m o d el b a se d on m in im u m to rq u e-ch an g e criterion, Technical R eport TR-A-0056, ATR A u d ito ry an d V isual Perception Research Laboratories, K yoto, Japan K aw ato M, M iyam oto H , Setoyam a T, Suzuki R (1988) Feedback-error- learning neural netw ork for trajectory control of a robotic m anipulator. N eural N etw orks 1: 256-265 K eele SW , P o sn er M I (1968) P ro cessin g of v isu a l feedback in ra p id m ovem ents. J Exp Psychol, 77: 155-8 Loeb GE, H e J Levine WS (1989) Spinal cord circuits: A re they m irrors of m usculoskeletal m echanics? J M otor Behav 21(4): 473-491 M acK enzie CL, M arteniuk RG, D ugas C, Liske D, Eickm eier B (1987) Three d im en sio n al m o v em en t trajectories in F itts' task: im p licatio n s for control. Q uarterly J Exp Psychol 39A: 629-647 M arten iu k RG, L eavitt JL, M acK enzie CL, A thenes S (1990) F unctional relationships betw een g rasp an d tran sp o rt com ponents in a p rehension task. H u m an M ovem ent Science, 9: 149-176 187 M arteniuk RG, M acK enzie CL (1990) Invariance an d variability in h u m an preh en sio n : Im plications for th eo ry dev elo p m en t, Vision and Action: The Control of Grasping, M A G oodale, ed., A blex P ublishing C orp., 49- 64 M arteniuk RG, M acK enzie CL, Jeannerod M, A thenes S, D ugas C (1987) C o n stra in ts on h u m a n arm m o v em en t trajectories. C an ad J Psychol 41(3): 365-378 M assone L, Bizzi E (1989) A N eural N etw o rk M odel for L im b Trajectory Form ation, Biol C ybern 61: 417-425 M eyer DE, A b ram s RA, K o rn b lu m S, W rig h t CE, S m ith JEK (1988) O ptim ality in h u m an m otor perform ance: Ideal control of rap id aim ed m ovem ents, Psych Rev 95:340-370 M eyer DE, K eith JEK, W right CE (1982) M odels for the speed an d accuracy of aim ed m ovem ents. Psychol Rev 89(5): 449-482 M ilner TE, Ijaz M M (1990) The effect of accuracy co n strain ts on three- dim ensional m ovem ent kinem atics. N eurosci 35(2): 365-374 M orasso P (1981) Spatial C ontrol of A rm M ovem ents, Exp Brain Res, 42: 223-7 N g u y en D, W idrow B (1989) The truck backer-upper: A n exam ple of self learning in neural netw orks, IJCN N II, 357-363 P au lig n an Y, M acK enzie C, M arteniuk R, Jeannerod M (1991a) Selective p e rtu rb a tio n of visu al in p u t d u rin g p reh en sio n m ovem ents. 1. The effects of changing object position. Exp Brain Res, 83: 502-512 P au lig n an Y, Jeannerod M, M acK enzie C, M arteniuk R (1991b) Selective p e rtu rb a tio n of visu al in p u t d u rin g p reh en sio n m ovem ents. 2. The effects of changing object size. Exp Brain Res, 87: 407-420 P e a rlm u tte r BA (1989) L earning State Space T rajectories in R ecu rren t N eu ral N etw orks, N eural C om putation 1: 263-269 Pelisson D, Prablanc C, G oodale M A Jeannerod M (1986) V isual control of reaching m ovem ents w ith o u t vision of the lim b: II. E vidence of fast, unconscious processes correcting the trajectory of the h a n d to th e final position of a double-step stim ulus, Exp Brain Res, 62: 303-311 188 P rablanc C, M artin O (1992) A utom atic control d u rin g h a n d reaching at u ndetected tw o-dim ensional target displacem ents, J N europhysiol, 67(2) R izzolatti G (1987) Functional organization of inferior area 6, M otor Areas of the Cerebral Cortex, C iba F o u n d atio n S ym posium 132, G Bock, M O ’ C onnor, J M arsh, eds., N ew York, W iley a n d Sons, 171-186 Rogal L, Fischer B (1986) Eye-hand coordination: A m odel for com puting reaction tim es in a visually guided reach task. Biol C ybern 55: 263-273 R u m e lh a rt DE, H in to n GE, W illiam s RJ (1986) L ea rn in g in te rn a l rep resen tatio n s b y e rro r p ro pagation, in Parallel Distributed Processing, DE R um elhart, JL M cClelland, eds., C am bridge, MIT Press, 318-362 R u m elh art DE, M cC lelland JL (1986) Parallel D istributed Processing, C am bridge, MIT Press Sam uel AL (1959) Som e studies in m achine learn in g u sin g th e g am e of checkers, IBM J Research and D evelopm ent, 3: 210-229 S ch m id t RA (1982) The schem a concept, H um an M otor Behavior, an Introduction, S Kelso, ed. H illsdale, E rlbaum , 219-235 Schm idt RA, Z elaznik H N , Frank JS (1977) M otor o u tp u t variability: an a lte rn a tiv e in te rp re ta tio n of F itts' law . Big 10 S y m p o siu m o n Inform ation P rocessing in M otor L earning an d C ontrol, U niversity of W isconsin-M adison, 15 A pril 1978 Schneider K, Z ernicke RF (1989) Jerk-cost m o d u latio n d u rin g th e practice of rap id arm m ovem ents, Biol C ybern, 60(3): 221-230 S tark L (1968) Neurological Control System s, Studies in Bioengineering, N ew York, P lenum Press T ak ah ash i Y, R abins MJ, A u sla n d er DM (1970) Control and D ynam ic Systems. N ew York, A ddison-W esley U n o Y, K aw ato M, S uzuki R (1989) F orm ation an d control of optim al trajectory in h u m an m ultijoint arm m o v em en t - A m in im u m to rque- change m odel. Biol C ybern 61: 89-101 van der H o u w en PJ (1977) Construction of Integration Formulas for Initial Value Problems. A m sterd am , N o rth -H o llan d 189 W ang D, A rbib M A (1990) C om plex tem poral sequence learning based on short-term m em ory, Proc IEEE 78(9): 1536-1543 W eitzen feld A (1991) NSL: N e u ral S im ulation L an g u ag e V ersion 2.1, CN E-TR 91-05, U niversity of S outhern C alifornia, C enter for N eu ral E ngineering W erbos PJ (1990a) A m en u of designs for reinforcem ent lea rn in g over tim e, Neural Networks for Control, W T M iller, RS S utton, PJ W erbos, eds., C am bridge, MIT Press, 67-95 W erbos PJ (1990b) B ackpropagation th rough time: W hat it does an d how to d o it, Proc IEEE 78(10): 1550-1560 W o o d w o rth RS (1899) The accuracy of v o lu n ta ry m ovem ents. Psychol Rev M onogr Suppl 3: 1-114 Z elazn ik H N , H aw k in s B, K isselburg K (1983) R apid v isu al feedback processing in single-aim ing m ovem ents. J M otor Behav 15: 217-236
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255735
Unique identifier
UC11255735
Legacy Identifier
DP22847