Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Minimum jerk model for control and coarticulation of arm movements with multiple via-points
(USC Thesis Other)
Minimum jerk model for control and coarticulation of arm movements with multiple via-points
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Minimum Jerk Model for Control and Coarticulation of Arm Movements with Multiple Via-Points by Oziel de Oliveira Carneiro Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science Computer Science – Intelligent Robotics University of Southern California May 2015 ii Dedication To my late grandfather Oziel R. Carneiro, from whom I’ve learned the value of honesty, knowledge and hard work. iii Acknowledgments I would like to thank many people who helped me finish this thesis and achieve my Master’s degree. To my parents Clóvis and Angélica, for the support, encouragement and educating me to constantly look for instruction, as The Bible says “Receive ye discipline as a great sum of money, and possess abundance of gold by her.” (Sirach 51:36 Douay- Rheims 1899 American Edition). To my advisor Professor Michael Arbib, for dedicating time and patience, sharing his knowledge, and guiding me through the dense world of Computational Neuroscience. To Professors Nicolas Schweighofer and Stefan Schaal, for agreeing to be members of the Master’s Committee, reviewing and approving this Thesis. To Victor Barrès and Brad Gasser, my colleagues, for discussing my ideas and sharing their experience. To my brother João Pedro, for the companionship and friendship dedicated throughout the years. To my roommates and friends, for the non-scientific activities and discussions. To the USC Triathlon Team and Coach Rad for the many hours spent suffering through the pains of being an amateur athlete while pursuing a degree. iv Abstract This thesis proposes a kinematic model designed to simulate arm movements with one or multiple via-points, in order to achieve continuous trajectories with one or more curves and straight segments (such as the ones present in handwriting). The model main unit is designed as a feedback minimum jerk control unit, an extension to the Hoff & Arbib (1993) and Flash & Hogan (1985) minimum jerk reach controllers. In addition to the feedback unit, it is suggested an automated learning procedure, a modified Hill- Climbing algorithm, to adapt the multiple via-points trajectories from highly segmented ones (composed of a sequence of straight movements) to a more continuous and smoother trajectory. The development of the model is based on revised studies that identify or try to model the many characteristics of human arm control, on both kinematic and brain area functionality level. Once the model is explained, parallels between it and the brain functioning are made. v Table of Contents List of Tables ............................................................................................................................................. viii List of Figures .............................................................................................................................................. ix List of Algorithms ....................................................................................................................................... xi Chapter 1: Introduction .............................................................................................................................. 1 1.1 Goal of the thesis ........................................................................................................................... 1 1.2 Organization of the thesis ........................................................................................................... 2 Chapter 2: Human Limb Control ............................................................................................................ 4 2.1 – Bell-Shaped velocity profiles in movement ...................................................................... 4 2.2 – Movement Parameter Representation in the Cortex ...................................................... 5 2.2.1 – Target Direction and Amplitude ................................................................................ 5 2.2.2 – Velocity and Acceleration ........................................................................................... 6 2.3 – Sequencing and Via-Point movements ............................................................................... 7 2.4 - Primitives ...................................................................................................................................... 8 2.5 – Curvature, Tangential Velocity and the Two-Thirds Power Law .......................... 10 Chapter 3: Limb Control Models ......................................................................................................... 12 3.1 – Straight Reach Models .......................................................................................................... 12 3.1.1 – Flash & Hogan (1985) Min-Jerk model ............................................................... 12 3.1.2 – Bullock & Grossberg (1988) Vector-Integration-To-Endpoint (VITE) Model ............................................................................................................... 13 3.1.3 – Hoff & Arbib (1993) Reach and Grasp ................................................................ 14 3.2 – Curved Movement Models .................................................................................................. 15 vi 3.2.1 – Flash & Hogan (1985) Min-Jerk with via-point ................................................ 15 3.2.2 – Bullock, Grossberg, & Mannes (1993) VITEWRITE ..................................... 16 3.2.3 – Grossberg & Paine (2000) AVITEWRITE ......................................................... 17 3.2.4 – Han (2009) Virtual Target Model .......................................................................... 18 Chapter 4: Coarticulation Acquisition Minimum Jerk Model ................................................... 20 4.1 – Introduction to proposed model ......................................................................................... 20 4.2 – Minimum Jerk Feedback Controller ................................................................................. 22 4.3 – Sequence Queue ...................................................................................................................... 26 4.4 – Rotation, Size and Speed of Execution Scaling ........................................................... 27 4.5 – Learning Mechanism ............................................................................................................. 28 4.5 – Long Term and Working Memories ................................................................................. 34 4.6 – Inverse Dynamics, Plant and Current State Perception ............................................. 34 Chapter 5: Simulations and Results .................................................................................................... 36 5.1 – One via-point trajectories simulations ............................................................................. 36 5.1.1 – Targets aligned in a straight line ............................................................................. 36 5.1.2 – Targets aligned in a concave curve ........................................................................ 38 5.1.3 – Targets aligned in a convex curve .......................................................................... 41 5.1.4 – Targets aligned in a semi-circle .............................................................................. 44 5.2 – Multiple Via-points trajectories simulations ................................................................. 47 5.2.1 – Targets configured to resemble a handwritten number two .......................... 48 5.2.2 – Targets configured to resemble a lower case letter G ..................................... 51 5.3 – Rotation, size and speed of execution scaling simulations ....................................... 55 vii 5.4 – Tangential Velocity, Curvature and Two-Thirds Power Law ................................. 57 5.5 – Discussion ................................................................................................................................. 58 Chapter 6 – Conclusions ......................................................................................................................... 61 6.1 – Computational Characteristics of the Model ................................................................. 61 6.2 – How the model relates to the Brain .................................................................................. 62 6.2.1 – Feedback Module ......................................................................................................... 62 6.2.2 – Inverse Dynamics and Plant ..................................................................................... 62 6.2.3 – Current State Perception and Working Memory ............................................... 63 6.2.4 – Evaluator ......................................................................................................................... 63 6.2.5 – Long-Term Memory ................................................................................................... 63 6.2.6 – Rotation, Size and Speed Scaling ........................................................................... 63 6.2.7 – Sequence Queue ........................................................................................................... 64 6.3 – Future Work ............................................................................................................................. 65 References ................................................................................................................................................... 67 Appendix A: Matlab Code ..................................................................................................................... 71 viii List of Tables Table 1. Parameters values for targets aligned in a straight line ....................................... 37 Table 2. Evolution of state and goal during learning of straight line ................................ 37 Table 3. Parameters values for targets aligned in a concave curve ................................... 39 Table 4. Evolution of state and goal during learning of concave curve ............................ 39 Table 5. Parameters values for targets aligned in a convex curve ..................................... 42 Table 6. Evolution of state and goal during learning of convex curve .............................. 42 Table 7. Parameters values for targets aligned in a semi-circle ........................................ 45 Table 8. Evolution of state and goal during learning of semi-circle ................................. 45 Table 9. Parameters values for targets aligned to resemble a handwritten number two ... 48 Table 10. Evolution of state and goal during learning of handwritten number two .......... 49 Table 11. Parameters values for targets aligned to resemble a handwritten lower case letter G ....................................................................................................................... 52 Table 12. Evolution of state and goal during learning of handwritten lower-case letter G ....................................................................................................................... 53 ix List of Figures Figure 4.1. Hoff & Arbib (1993) reach model schema diagram ....................................... 21 Figure 4.2. Proposed model schema diagram .................................................................... 22 Figure 4.3. Detailed wiring used in simulation between Feedback Unit and Sequence Queue ......................................................................................................................... 35 Figure 5.1. Velocity profiles of simulations during learning when targets are aligned in a straight line ......................................................................................................... 38 Figure 5.2. Evolution of trajectory during learning of concave curve .............................. 40 Figure 5.3. Velocity profiles of simulations during learning of concave curve ................ 41 Figure 5.4. Evolution of trajectory during learning of convex curve ................................ 43 Figure 5.5. Velocity profiles of simulations during learning of convex curve .................. 44 Figure 5.6. Evolution of trajectory during learning of semi-circle .................................... 46 Figure 5.7. Velocity profiles of simulations during learning of semi-circle ..................... 47 Figure 5.8. Evolution of trajectory during learning with four targets resembling a handwritten number two ............................................................................................ 50 Figure 5.9. Velocity profiles of simulations during learning with four targets resembling a handwritten number two ...................................................................... 51 Figure 5.10. Evolution of trajectory during learning with eight targets resembling a handwritten letter G ................................................................................................... 54 Figure 5.11. Velocity profiles of simulations during learning with eight targets resembling a handwritten letter G ............................................................................. 55 x Figure 5.12. Trajectory and velocity profiles of the movement pattern resembling a handwritten number two when rotated 45° clockwise ............................................... 56 Figure 5.13. Trajectory and velocity profiles of the movement pattern resembling a handwritten number two with size scaled by a factor of three .................................. 56 Figure 5.14. Trajectory and velocity profiles of the movement pattern resembling a handwritten number two with execution speeded up by a factor of two ................... 57 Figure 5.15. Relationship between tangential velocity and curvature ............................... 58 Figure 6.1. Brain areas overlaid on top of the model’s schema diagram .......................... 64 xi List of Algorithms Algorithm 4.1. Modified Hill-Climbing Algorithm for minimum jerk optimization of trajectory with one via-point ...................................................................................... 31 Algorithm 4.2. Modified Hill-Climbing Algorithm for minimum jerk optimization of trajectory with multiple via-points ............................................................................ 33 1 Chapter 1: Introduction 1.1 Goal of the thesis Human arm movement has been studied for decades with the goal to better understand human actions and how the brain generates them, or attempt to create computational systems that are able to replicate these kinds of actions. The motivation behind these studies varies from simply increasing the knowledge of human life to development of technologies, such as humanoids robots, that behave like humans. These studies often propose computational brain models as a way to test hypothesis suggested by them or others, or as a way to test the feasibility and efficiency of technologies being developed based on the human brain. Therefore advances in models may offer further advancements to theoretical neuroscience and to bio-inspired technologies. This thesis goal is to propose a computational model of the human arm motion based on neuroscience studies, kinesiology studies and other computational models of human arm control. While developing the model two concurrent guidelines will be followed: (a) usage of biological characteristics as main characteristics to be reproduced; and (b) development of a model computationally efficient and simple. These guidelines are designed as an attempt to develop a model that could be useful to both the biological studies and technology development. Most computational models of arm movement proposed in the literature revolve around straight or single curved reaching movements. Reaching movements are one of the most important actions humans can perform with limbs. Also this type of movements 2 offer easier ways to capture data as not only it is easy to design experiments for it, it is also an action common to other mammals, which allows researchers to perform more detailed experiments. On the other hand, more complex movements, defined by Morasso, Ivaldi, & Ruggiero (1983) as a continuous movement composed of multiple curved and straight segments, offer more difficulties to researchers mainly due to scarcity of animals capable of such behavior. However these movements also hold great importance in the actions' repertoire of humans, as they are present in many activities, such as writing, drawing, and use of sign languages. The model proposed in this thesis is designed to be able not only to simulate simple reach movements, but also single curved and more complex trajectories. Once developed the model will be tested by running simulations to perform planar trajectories. The simulations are used to show how the model is able to account to characteristics present in the human limb control, while maintaining a low computational cost. 1.2 Organization of the thesis Chapter 2 is a brief review of biological studies regarding human arm movement. These studies highlight different characteristics involving such action. These characteristics vary from kinematic profiles to representation of motion data by the neural circuits responsible for movement in the brain. Chapter 3 discusses some models of arm movement that use the characteristics highlighted in chapter 2 as basis for their development. Models related to reaching, single curved movements and movements with multiple curves are listed while emphasizing their contributions to the development of this work. 3 In chapter 4, the model is proposed with details involving its development. Individual sub-modules are discussed with respect to their responsibility, how each of them is implemented and simulated, and their interactions. Chapter 5 presents the simulations' results. Each task is explained, and the results and the simulated data are displayed as tables and graphs. Then these results are discussed and aligned with the characteristics reviewed in chapter 2. Chapter 6 concludes the thesis by reflecting on how the proposed model achieved the desired goals. It also discusses further developments and applications to the model. 4 Chapter 2: Human Limb Control In order to properly address the task of developing a model that is capable of perform movements based on human limb control, it is necessary to identify the biological aspects of human limb control as it relates to the nervous system. Since movement is one of the most important ways humans and other animals are able to interact and modify the environment they are present, motor control has been studied with goals that spam from biological interest, that aim to better understand the processes that underlie movement and how they can relate to other biological activities, to robotics, which tries to replicate or find inspiration on animals to implement movement in machines. As this project looks into limb control, most of the studies looked into will fall into this category. 2.1 – Bell-Shaped velocity profiles in movement Morasso (1981) studied the arm movement in a 2 dimensional (2D) reaching task, in various directions. His analysis showed that these movements presented different characteristics when analyzed using joints configuration as reference, but had very homogenous features when analyzed using tangential velocity and position of the hand. The tangential velocity profile was bell-shaped and the hand trajectory was fairly straight, which could be indicatives of the brain planning the trajectory of the hand as opposed to joint configuration. Abend, Bizzi, & Morasso (1982) extended Morasso's (1981) studies to include 2D curved movements. This new study pointed that curved movements appear to be 5 segmented, as being composed by two movements with low curvatures. In contrast to straight movements, curved movements had irregular tangential velocity profiles with inflections, which were aligned with the point of maximum curvature, suggesting a high dependency between curvature and velocity. It was also noted that curved movements were slower than straight ones. Morasso (1983) while studying human arm movement in 3 dimensions, determined that point to point movements (straight or curved), are essentially planar movements and therefore contain the same characteristics as 2 dimensional movement, identified by Abend et al (1982). Additionally Morasso (1983) identifies that curved point-to-point movements are generally 50% slower than straight ones. However the presence of a third dimension allows the movements to change osculating plane during the movement. This torsion characteristic was found to be independent of speed and curvature, which remained coupled for non-planar movements. Therefore Morasso (1983) suggests that human arm movement is composed by chaining and overlapping strokes, defined as planar low curvature movements with bell-shaped velocity profile. The torsion profile would be originated by the sum of two concurrent strokes with different osculating plane. 2.2 – Movement Parameter Representation in the Cortex 2.2.1 – Target Direction and Amplitude Georgopoulos, Schwartz, & Kettner (1986) and Georgopoulos, Kettner, & Schwartz (1988) studied the behavior of cells in the primate Motor Cortex (MC) during arm movement in 3D space. They indicate that individual cells respond to a particular 6 direction of movement in space. If each cell then would to be represented as an weighted vector with direction set to be the direction with strongest response (or preferred direction) and amplitude proportional to cell activity, the sum of all vectors representing cells that belong to the same population would then indicate the overall direction and amplitude of the movement being performed. It is also indicated in the paper that in a different study multiple origins in front of the animal were used, and the population coding remained the same, indicating that this representation uses the end-effector position as reference. Cisek & Kalaska (2005) showed that in target selection neurons in the Dorsal Premotor Cortex (PMd) also respond to the direction of the targets being considered for selection, and once the selection is performed the population activity of PMd indicates the direction of the planned movement. These studies show that the movement in reach to target tasks is done through feedback control, as the target position is compared to the current hand position to compute the desired movement vector represented by the neurons. Also the Motor Cortex encodes information during movement execution, and the Premotor Cortex during movement planning. 2.2.2 – Velocity and Acceleration Ashe & Georgopoulos (1994) investigated the influence of movement parameters, target direction, velocity, position and acceleration, in the activity of neural populations related to arm movement in both the Motor Cortex and in Area 5 in the Parietal Cortex. In order to achieve such goal, they analyzed the activities of cells in the arm area of the 7 Motor Cortex and Area 5 of monkeys performing a reach to target task, where the monkey had to move a handle from the center to eight radially arranged targets with planar movements. Their findings indicate that even though target direction is the most significant variable in the information represented by the population coding of these areas, the other three parameters also play a significant role. Movement velocity is the second parameter (only behind direction) to mostly affect cells, and acceleration is the parameter that less affects cells. 2.3 – Sequencing and Via-Point movements Sosnik, Hauptmann, Karni, & Flash (2004) reports that training in a sequence of planar movements passing through different targets, eventually yielded coarticulation in between the multiple segments, transitioning from a segmented movement of straight lines to smoother continuous curved movements. The appearance of curved movements from a sequence of discrete sequential movements to targets, align with how Flash & Hogan (1985) describe curved movements, as being movements that require to pass through a secondary target in between start and end positions, of the movements. The data captured by Sosnik et al. (2004) and Flash & Hogan (1985) model, indicates that the via-point target is the point with minimal velocity in the trajectory This is relevant because in Sosnik et al. data it was shown that this point is not necessarily one of the desired targets to be passed as determined in the task, and with learning it may move in the path, in order for the trajectory to be optimal in both time and jerk. Lashley (1951) theorized that action sequencing is not done as a chain of events, but as a parallel activation of all the elementary units constructing the sequence and serial 8 highlighting of the current piece in the sequence. Averbeck, Chafee, Crowe, & Georgopoulos (2002) investigated the neural activity in the prefrontal cortex of two monkeys while drawing 2D geometrical shapes (polygons) that were being displayed to them as a template. The activities of the neurons could be directly related to the segment in the shape being draw, due to highly differentiation of activity pattern during different segments. So these segments seem to be used as the elementary unit being represented by the population code. Using this information it was noted that during the drawing of a shape, the population presented activity corresponding to all the segments used in the image, but the strongest activity during a period of time was of the neurons corresponding to the segmented being drawn. Which supports Lashley's (1951) theory of cotemporal activation of serially ordered action units. Additionally Averbeck, et al. (2002) data suggests that the order of the sequence is indicated by the strength of these activities before starting of the sequence. So before the start, the first action unit is the strongest, the second is the “runner-up”, and so forth. And as the sequence advanced the representation of the next segment started rising when the representation for the current segment peaked and started decaying, which was roughly in the middle of its execution, leading to the two representations to approximately cross in the end of one segment and beginning of the other. 2.4 - Primitives Schaal (2003) points out that the idea of primitives comes from the ability biological systems have to perform a such a versatile and creative set of continuous patterns of movement in their daily activities, that it yields the assumption that these 9 complex movements are performed as a sequence of ordered and overlapped (partially or fully) segments. This description aligns with the results Abend et al. (1982) and Morasso (1983) reported, as mentioned previously. However Schaal's (2003) primitives are more robust than the strokes model Morasso (1983) suggests, as the primitives can be dynamically changed according to a certain goal (Ijspeert, Nakanishi, Hoffman, Pastor, & Schaal, 2013). Sosnik et al. (2004) and Sosnik, Flash, Hauptmann, & Karni (2007) show that the coarticulation learned when practicing a sequence of movements, carries over to new targets configuration that contain the same geometrical organization of targets, i.e. mirrored targets, rotated sequence. This can be seen as the ability to dynamically modify an already learned primitive. Additionally, the studies performed by Sosnik, et al. (2007) indicates that the set of primitives to be used were related to the specifics of the task, as in tasks that were explicitly required strict accuracy, after the curved movements were already in the repertoire of the subject, the policy to which the sequence of targets were traversed was the sequential straight path as opposed to a smoother curved movement. Rohrer, et al. (2004), runs a study on stroke patients during recovery, with the assumption that simpler movements with bell-shaped velocity profile compose complex movements. Their results shows that as the patients improve the control over their limbs, the number of simple movements reduces, and their duration increases, as well as the overlapping in between them. This indicates that as the skill over a task increases to perform it in a smoother motion, the sub-movements used become more robust. 10 2.5 – Curvature, Tangential Velocity and the Two-Thirds Power Law Abend, et al (1982) studies of two-dimensional curved movements included an interesting analysis to the curvatures presented in the trajectories. The trajectory curvature is computed using a standard formula defined as: = !!!!! (! ! !! ! ) ! ! (1) where C(t) is the curvature profile of the trajectory, and are the first order time derivatives of the x and y coordinates, and and are the second order derivatives of the x and y coordinates. When comparing the curvature for the whole trajectory to the tangential velocity profile, Abend et al (1982) found that peaks in curvatures corresponded directly to valleys in the velocity profile. Lacquaniti, Terzuolo, & Viviani (1983) further investigated the relationship between curvature radius (inverse of curvature) and tangential velocity. Because the value of curvature radius is infinite at inflection points, Lacquaniti et al. used angular velocity (tangential velocity divided by radius) and curvature to derive a mathematical relationship in between the two measurements. This relationship was then called two- thirds power law because it is in the form: =() !/! (2) where A(t) is the angular velocity profile of the trajectory, and k is a proportional factor. Since angular velocity is equals to the tangential velocity divided by the curvature radius, and the curvature radius is the inverse of the curvature, the two-thirds power law can be rewritten as: 11 = () !/! (3) where V(t) is the tangential velocity profile of the trajectory, and R(t) is the curvature radius throughout the trajectory. 12 Chapter 3: Limb Control Models Computational models of neural networks have multiple purposes along the disciplines it is based on. In neuroscience for example, it serves as a tool for further investigation of theories through simulations. As for Computer Science it gives a bio- inspired method to make computers behave intelligently. There are Computational Models for many of the neural system tasks, such as motor control, image processing, language processing, and other brain activities; Arbib (2003) offers a good review and compilation of such models. As all of them offer interesting insights on brain activities and behavior, this review will be limited to models that are involved in tasks of arm motor control, which is the main interest of this thesis. 3.1 – Straight Reach Models In this subsection models that are designed to simulate tasks involving reaching are reviewed. These models' goals commonly involve minimizing the path to reach a target, while at the same time trying to replicate other characteristics humans present while performing the tasks. 3.1.1 – Flash & Hogan (1985) Min-Jerk model This model presents a successful attempt to create a mathematical description of the hand trajectory while performing direct reaches, while maintaining the velocity profile human movement displays. This was achieved with an optimization approach, where the variable trying to being minimized was the movement Jerk (the derivative of the acceleration) in all axes. 13 The relevance of this model is the indication that movement may be planned related to the end-effector position as opposed to joint configuration. This implies a hierarchical organization on planning; as the top layers use few variables (i.e. end- effector position) into account and the bottom layers then transform the information to the more complex domain (i.e. multi-joint configuration). 3.1.2 – Bullock & Grossberg (1988) Vector-Integration-To-Endpoint (VITE) Model The VITE model is a feedback controller model for reaching that builds on the data found by Georgopoulos and his colleagues (Georgopoulos, Kalaska, Caminiti, & Massey, 1982; Georgopoulos, Kalaska, Crutcher, Caminiti, & Massey, 1984) indicating that cells populations in the motor cortex behave as vectors differentiators, as their activity responds to the spatial direction in which the target is of the current hand position. So the VITE model mathematically computes the vectorial difference in between target and current position, and integrates it to the current hand position. Before integration the difference vector is multiplied by an external signal (GO signal in the model), which has the ability either to control the rate in which the integration happens or to stop the integration completely. The GO signal is important to the model because it is what controls the rate of integration of the difference vector to the current position. Using a GO signal, that is either faster than linear (with respect to time) or sigmoidal, allows the model to replicate the bell-shaped velocity profiles seen in biological experiments (Morasso, 1981; Abend et al., 1982; Morasso, 1983). 14 3.1.3 – Hoff & Arbib (1993) Reach and Grasp This model is an attempt to present a schema overview on how the brain controls and timely coordinates the arm and hand in reach and grasp movements. As well as the Flash & Hogan (1985) model it uses a minimum jerk approach to describe the reach movement, but instead of a feed forward description it uses feedback control, which allows the model to respond to position variations and disturbances. Additionally it includes an internal feed forward model that acts as a predictor to the current state (position) of the hand to reduce the dependency of delayed perception, improving the model accuracy, in reaching targets. The modeling of pre-shape and enclosing controllers also used optimal control, but the cost function used is the sum of the square of the hand opening and the acceleration of this opening, as smoothness criterion. As the importance of these criteria is not known a weighting parameter is introduced. All three controllers (transport, pre-shape and enclosing) require a time constant indicating the duration of the action being controlled. These constants are assumed to be estimations by the brain on the duration of each component. As the experimental data showed, the transport movement is properly synchronized to the hand movement. So the brain timely coordinates all actions, scaling the duration of each of them so the reach and grasp (pre-shape and enclosing) have the same duration. This coordination is modeled as being a max function, which computes the maximum time estimation between reaching time and grasping time (pre-shape plus enclosing time). This maximum is used as new time constant for the transport controller. The enclosing time is kept, as being the original 15 estimation; as for the pre-shape time constant it will be the maximum value minus the enclosing estimation. 3.2 – Curved Movement Models 3.2.1 – Flash & Hogan (1985) Min-Jerk with via-point Flash & Hogan (1985) describe single curved movements as being point-to-point movements with an intermediate target, called via-point. To obtain an equation to describe this kind of movements the same optimization approach that was used in straight reach movements was applied, but with the addition of the via-point as a constraint in the trajectory. This model is able to describe curved movements with a bell-shaped velocity profile, which is the velocity profile displayed by humans when performing similar movements. Since the model involves the analytical solution to an optimization problem, trying to extend it to contain multiple via-points in order to achieve more elaborated patterns is very complicated. The extension would require a different solution for each via-point quantity. Sosnik, et al. (2004) used this model to explain the appearance of curved movements from a highly practiced sequence of straight movements. Sosnik, et al. (2004) suggests that the creation of new strokes is done by adjustment of movement representation within the brain. This is based on their ability to replicate the human movement evolution during practice by slowly altering the via-point position in the min- jerk model. 16 3.2.2 – Bullock, Grossberg, & Mannes (1993) VITEWRITE As mentioned before, the VITE model (Bullock & Grossberg, 1988) performance is extremely dependent on the GO signal. When performing planar movements if the GO signals for both directions are synchronized, the model will trace a straight trajectory. Since the controllers for each direction are independent of each other the GO signals don't need to be synchronized. If the GO signals are not synchronized, then the VITE model is able to perform curved movements. Using this characteristic Bullock, Grossberg, & Mannes (1993) extended the VITE model to perform trajectories resembling handwriting. The trajectory control in VITEWRITE is done by a vector plan, which stores target positions, size-scaling factor and respective GO signals for each of the targets in each direction. The vector plan outputs one target position and respective size scaling and GO signal to each direction controller. The controller then starts the trajectory computation by calculating the difference vector between target and current positions, modulating this value with the GO signal and integrating the modulated value to the current position. To keep movements smooth target iteration needs to be done before the target position is reached. This is done by using the peak of the bell-shaped velocity profile as being the cue to iterate to the next planning vector, removing the need to store within-stroke delays to perform this change. Bullock et al. (1993) don't show how to compute the vector plan to a given trajectory. They also don't account for the evolution from a highly segmented to a smoother trajectory. 17 3.2.3 – Grossberg & Paine (2000) AVITEWRITE Proposed as an extension to the VITEWRITE model (Bullock et al., 1993), the AVITEWRITE model from Grossberg & Paine (2000) includes learning ability to VITEWRITE. The learning in the model is done via repetitive retracing of a template curve. It is assumed that while trying to retrace an unpracticed template curve, the brain uses a sequence of targets, selected one at the time, in the curve, and reach movements are done in between them. As the number of repetitions over the same curve increases, the brain starts to assimilate the pattern of neuromuscular activation needed to replicate the shape. The activation pattern to a learned trajectory is stored in a structure called Purkinje Cell Spectrum. The Purkinje Cell Spectrum is a series of phase-delayed depolarization of Purkinje Cells that are triggered by the conditional stimulus. Upon the presence of an unconditional stimulus (learning signal) these depolarizations suffer Long Term Depression (LTD), which after enough experience reduces the inhibition the Purkinje cell applies to the nuclear cells, generating activation in the nuclear cells. The learning phase in the AVITEWRITE model works through retracing a template curve. The model uses an algorithm to simulate visual attention target selection in the presented path and then it uses the target to perform a straight reach movement towards it. The algorithm constrains the target to be reachable without breaking a threshold distance from the original template. As the targets are reached the Purkinje cell spectrum slowly learns through LTD the pattern of activation for the trajectory. A buffer is used to simulate the working 18 memory, which receives the long-term memory signal. When the working memory output is strong enough it suppresses the activity coming from the reactive system, and the movement is performed based in the memory information provided by it. The difference vector used for integration becomes then either the working memory output or the difference between current position and visual target, depending on whether the memory is active or not. This vector is then scale by a sizing factor and then integrated to the current position following the integration rate indicated by the GO signal. A target estimation model uses the difference vector to create memorized target to compare with the current position to control the read out of the Working Memory Buffer. If the memory movement breaks the threshold distance from the template curve the reactive system takes control again and reactivates learning. After enough repetition the memory is able to control independently the movement. 3.2.4 – Han (2009) Virtual Target Model Hoff & Arbib (1993) model, while not designed for curved movements, performs such movements when the target position is altered in the middle of the execution. Based on this, Han (2009) proposes the use of a virtual target to start the movement and in a preset switch time changes to the final target to perform curved movements. The switch time is referred to how long the detouring process is required to last. To compute the proper values Han (2009) used curve-fitting strategies to replicate experimental data. While Han (2009) offers an interesting approach to curved movements, it is not shown how the Virtual Target model is able to evolve from a sequence of segmented straight movements to a smooth trajectory. Additionally the use of virtual target requires 19 the trajectory to be straight until the target is switched, this would then limit its ability to replicate movements that are curved from beginning to end. On the other hand, the model is of easy scalability for multiple sequential curves within the same movement, as it would only require an additional virtual target and a switching time pair to each curve. Refer to the appropriate rubric for guidance on the content of sections in this chapter. 20 Chapter 4: Coarticulation Acquisition Minimum Jerk Model 4.1 – Introduction to proposed model As previously mentioned (Subsection 2.3), Sosnik, et al. (2004), investigated the appearance of coarticulation in between segments of movements while performing a task involving reaching to sequenced targets. The results presented in the study, aligns with the idea that minimum jerk optimality is obtained when using coarticulation, in addition to reduction in time cost for the task. In a subsequent study, Sosnik, et al. (2007) obtained data that may lead to the idea that the initial attempt when presented a novel target configuration is the use of already learned primitives, and from that initial performance adjustments are made to find a better pattern. This also appears to be the case in the study by Rohrer, et al. (2004) on rehabilitating stroke patients, as they improved their control over their arms. It seems from Sosnik, et al (2007) studies that the choice of the primitive to be used has to do with the way the targets are presented to the subject. When the targets position is very similar to one previously practiced one of the subjects appeared to use the coarticulated movement learned for the original target displacement. As the movement started and the different configuration was recognized the subject started to correct the movement in other to traverse all targets. This suggests two things, the first being the coupling between the position of targets with the movement pattern used, and the second is the use of feedback during motion allowing for the online correction of the trajectory. The coupling between target position and movement pattern, and the use of feedback control, already can be seen in various reach models (Flash & Hogan, 1985; 21 Bullock & Grossberg, 1988; Hoff & Arbib, 1993), however they are limited to one or two targets only, and do not account for extension of movement via coarticulation. In an attempt to remove these limitations of reach models it is proposed an extension to the minimum jerk feedback controller suggested by Hoff & Arbib (1993) and usage of reinforcement learning to perform the adjustments from an initial performance using a previously known primitive, straight point to point movements. The Hoff & Arbib (1993) model for reach uses a feedback controller with a state look-ahead component to generate the end-effector control signal, which is then transformed to joint control signal using inverse kinematics. The diagram displayed in Figure 4.1 describes the model. Figure 4.1. Hoff & Arbib (1993) reach model schema diagram To incorporate the multiple target capability and the ability to learn coarticulation in between straight segments to create smoother trajectories, the new model extends from the Hoff & Arbib (1993) model through the inclusion of a module to handle a sequence of targets, a module to scale, rotate and speed up the movement pattern, and a learning 22 mechanism. These additions and modifications are explained individually in the following subsections. Figure 4.2 shows the block diagram for the new model proposed. Figure 4.2. Schema Diagram for the model proposed when operating with memorized target configuration. The Hoff & Arbib (1993) reach model is represented by the feedback loop between the Feedback Unit and the Current State Perception. 4.2 – Minimum Jerk Feedback Controller The feedback unit is the main component of the model, since it is responsible for the computation of the trajectory. As mentioned earlier this unit is derived as being an extension to the controller presented by Hoff & Arbib (1993) for the transportation phase 23 of a reach and grasp model. To better understand the Hoff & Arbib (1993) feedback model, and then suggest an extension to it is necessary to first study the feed forward minimum jerk model proposed by Flash & Hogan (1985). Since both Flash & Hogan (1985) and Hoff & Arbib (1993) models describe two-dimensional movement by using a one-dimensional function that is replicated for each dimension, the model will be described for one dimension and during simulations will be replicated for each of the dimensions. The minimum jerk model describes the planar motion of human arm movement as a function of time. The function is obtained using dynamic optimization with the total sum of the square of the third derivative of space (jerk) as criteria. Two separate equations were derived, one for point to point unconstrained movements, and point-to- point movements with a via-point constraint. Additional parameters to the function are the starting and target position, and a time estimate for completion of trajectory. Constrained movements also require the via-point coordinates and the time estimation to reach the via-point. The equations (Flash & Hogan, 1985) are defined as: = ! + ( ! − ! )(15 ! −6 ! −10 ! ) (1) !!! ! = ! ! 720 ! ! ! 15 ! −30 ! + ! ! 80 ! −30 ! −60 ! ! ! +30 ! ! −6 ! + ! 15 ! −6 ! −10 ! + ! (2) !!! ! = !!! ! + ! ! ! − ! ! 720 (3) ! = 1 ! ! ! ! 1− ! ! ( ! − ! 300 ! ! −1200 ! ! +1600 ! ! + ! ! −720 ! +120 ! +600 ! + ! − ! 300 ! −200 ) (4) ! = 1 ! ! ! ! 1− ! ! ( ! − ! 120 ! ! −300 ! ! +200 ! ! −20 ! − ! ) (5) 24 where, x 0 is the initial position, x f the target position, τ is equals to t/t f with t f being the time estimation to reach the target, x 1 the via-point position, t 1 the time estimation to reach the via-point, and τ 1 equals to t 1 /t f . Equation (1) defines unconstrained movements, while equations (2) and (3) define via-point movements, before and after passing through the constraint, respectively. While Flash & Hogan (1985) model is able to describe the pattern that seems to underlie human arm movement, its feed forward characteristic limits its ability of performing properly in the presence of noise or target disturbance. To remove these limitations Hoff & Arbib (1993) developed a model using the same principle of minimum jerk, but instead of assuming the initial conditions for velocity and acceleration as set to zero, they generalized the equation for unconstrained movement to handle any initial conditions. This generalization allowed transforming the model into a feedback controller, and it is described as following: = ( ! 1−9+18 ! −10 ! + ! −36+96 ! −60 ! + ! − ! 60−180 ! +120 ! ! ) (6) where D is the duration t f -t 0 and τ is equal to (t-t 0 )/D, a generalization of τ used in Flash and Hogan (1985) model, for non-zero t 0 . This equation can be then translated into a feedback controller as: = 0 1 0 0 0 1 !!" ! ! !!" ! ! !! ! + 0 0 !" ! ! ! (7) 25 The Hoff & Arbib (1993) model is limited to unconstrained point-to-point reach, which naturally yield straight movements, except when the target is perturbed. Target perturbation can be used to properly describe curved movements as Han (2009) did, however this approach requires the use of virtual targets outside the trajectory, which can interfere in the ability of the trajectory to accurately traverse desired targets. The Flash & Hogan (1985) model treats constrained movements as a two-piece motion, before reaching the via-point and after it. Additionally while passing through the via-point there is no constraints to the velocity and acceleration at that moment. Since the Hoff & Arbib (1993) reach model is already able to handle non-zero initial conditions, it is already capable of handling the trajectory after the via-point, but to be able to reproduce the trajectory before the via-point some changes are required. Equation (7) is a state-space representation of the controller, with the state of the system being described by the vector [x v a] T . Since during the part of the trajectory starting at the via-point this state allows for any initial conditions, this equation also needs to allow any final conditions when using the via-point as final state. Recomputing the minimum jerk model but allowing any final state results in the following equation for the acceleration: = ! 1−9+18 ! −10 ! + ! 3−12 ! +10 ! + ! (−36+96 ! −60 ! ) + ! (−24+84 ! −60 ! ) + ! − ! (60−180 ! +120 ! ) ! (8) Similarly to what Hoff & Arbib (1993) did, equation (8) can be transformed into a feedback controller as: 26 = 0 1 0 0 0 1 !!" ! ! !!" ! ! !! ! + 0 0 0 0 0 0 !" ! ! !!" ! ! ! ! ! ! ! (9) Equation (8) is able to describe both unconstrained and constrained movements. Straight movements can be described by simply setting the final state to [x f 0 0] T , which simplifies to the equation (6). Via-point movements need to be described in two parts. The first part uses the via-point position, desired velocity and acceleration while passing through the via-point as final state. The second part, which starts at the state used as final state for the first part, has final state equal to [x f 0 0] T indicating end of movement at the target. The values used as desired velocity and acceleration in the via-point affect directly the shape of the curve, as simulations show in Chapter 5. It is also important to note that the time variable D in constrained movements has to be representative of the current segment being simulated. 4.3 – Sequence Queue Since the feedback unit handles one target at a time when dealing with a motion that is described by multiple targets, such as the via-point movements, it is necessary for the proper target information to be passed to it. Therefore it is necessary a mechanism that can handle the information on the multiple targets and properly select the information to be forwarded. This aligns with the data gathered by Averbeck, et al. (2002), in which is shown that the prefrontal cortex stores sequences and highlights, in a winner-take-all fashion, the item of interest in a serial order. 27 This functionality can be simulated simply by an array and an index variable to keep track of the target that needs to be highlighted. The array stores the state vectors representing each of the targets in the trajectory. And as each target is traversed the index variable is update to the position in the array of the next target. The end of the array represents end of movement, so the last entry needs to have zero values for velocity and acceleration. 4.4 – Rotation, Size and Speed of Execution Scaling While performing movements, human beings have the ability to modify, with respect to orientation, size or speed of execution, previously learned trajectories without the need to relearn the new pattern. This ability to adjust known movement patterns is one of the main characteristics of the idea behind the possible use of movement primitives as indicated by Schaal (2003). Because all the information related to the movement is coded as vectors, it is very simple to rotate the movements that are already encoded. To obtain the rotated pattern all that is needed is to multiply each of the vectors (target, velocities and acceleration) to a clockwise two-dimensional rotation matrix defined as: ot = cos () −sin () sin () cos () (10) where θ is the desired rotation angle. To perform a size change in the movement trajectory, while retaining the total time to perform the full motion and each time to target constant, all that is needed is to scale the targets, the velocity and acceleration through targets to the same factor. This 28 will keep the trajectory and velocity profile with the same shape, but with different amplitudes. In a similar fashion, to perform the same original movement but with different speed of execution one only needs to multiply the velocity and acceleration through targets to the desired factor and divide the time-to-targets to the same value. 4.5 – Learning Mechanism The Learning Mechanism is responsible for improving the trajectory used for a set of targets. The improvement is done using minimum jerk principle. The goal is to obtain a smoother trajectory by reducing its overall jerk. Since the optimality variable being used is the jerk, the duration of the segments have to remain constant throughout the whole learning period, otherwise it would skew the jerk totals. Since the targets position can't be changed either, the only values available for adaptation are the values of the velocities and accelerations through the via-points. The first step is to define the Jerk function that will be used as the cost being optimized, which is described by the equation below: = !! ! ! !" ! + !! ! ! !" ! ! ! (11) where a x (t) and a y (t) are the function describing the acceleration from equation (8) in each direction and T is the total duration of the movement. The main interest of the learning algorithm is over movements with via-points. First the case with one via-point is studied and then expanded to multiple via-points. Since the feedback unit treats via-point movements as separate segments, equation (11) can be modified to represent this segmentation. 29 = !! !! ! !" ! + !! !! ! !" ! ! ! ! + !! !! ! !" ! + !! !! ! !" ! ! ! ! (12) where D 1 is the duration of the segment before the via-point, T is equal the sum of durations of both segments D 1 and D 2 , and a x1 (t), a y1 (t), a x2 (t) and a y2 (t) are the acceleration function for each direction in each segment. It is important to note that in equation (12) the velocities and accelerations through the via-point are used as the final values in the first segment, and the initial values in the second segment. The Total Jerk function for one via-point movement is a convex function with respect to the velocity and acceleration through the via-point. The convexity allows the use of Hill-Climbing learning with no issues of being stuck in a local optimum. Russell & Norvig (2010) defined the Hill-Climbing algorithm as being a greedy algorithm that updates the current solution by choosing a neighbor solution that has a better value with respect to the cost function. Arbib (1989) notes, that the algorithm is similar to a goal-seeking strategy used by simple organisms. To use the algorithm it is necessary to define the optimization problem in terms of states, neighbors of each state and a goal function. A state is an instantiation of the velocity and acceleration vectors through the via-point with any value. The neighbors of a state are created through the alteration of one or both of the vectors by addition or subtraction of a defined step vector. The goal function is the total jerk computed by equation (12). Because a simulation of the trajectory is necessary for computation of the total jerk for each state, and each simulation has a computational cost, it is necessary to minimize the number of simulations during. To obtain this, first a trial and compare to previous value approach is used as opposed to selection of the best neighbor among all 30 the candidates. Second the number of neighbors to pick from is reduced, avoiding increase to the number of neighbors that don't represent improvement to the cost function, by doing the climb in two phases, the first for changes in the velocity vector, and the second in the acceleration vector. Change of phase happens when none of the neighbor states offer improvement. After the acceleration phase fails learning is stopped. And the last addition to the algorithm is a guided pick of the neighbor to be used in the initial trials. For the velocity phase this is done by choosing a change vector that better aligns with the direction of the vector between initial position and final target. And when the acceleration phase starts the initial change vector used is the one that is closer in direction to the average of the accelerations presented in the previous and next points in the acceleration profile presented in the simulation. The modified algorithm is presented below: 31 inputs: start position; target list; time estimations for targets; velocity at targets; acceleration at targets; initial performance; step size α; output: velocity at targets; acceleration at targets; step_vectors := [(1,0);(0.707,0.707);(0,1);(-0.707,0.707);(-1,0);(-0.707, -0.707);(0, -1);(0.707, -0.707)]; initial_dir := (target[2]-start)/|target[2]-start|; pick := argmin(|candidates[i]-initial_dir|); best_jerk := initial performance; phase := 1; for N iterations if phase==1 velo_tgt[1]:= velo_tgt[1]+α*candidates[pick]; else if phase==2 acc_tgt[1]:= acc_tgt[1]+α*candidates[pick]; total_jerk:= min_jerk_model(start position, targets, velocities, accelerations, time estimations); if total_jerk < best_jerk best_jerk:= total_jerk; else if phase==1 velo_tgt[1]:= velo_tgt[1]-α*candidates[pick]; else if phase==2 acc_tgt[1]:= acc_tgt[1]-α*candidates[pick]; if phase==1 if all directions have failed sequentially phase:=2; acc_trend:= (acc(ttg-1)+acc(ttg+1))/2; acc_trend:= acc_trend/|acc_trend|; pick:= argmin(|candidates[i]-acc_trend|); go for next loop iteration; else try a new direction; if phase==2 if all directions have failed sequentially phase:=0; stop learning; else try a new direction; Algorithm 4.1. Modified Hill-Climbing Algorithm for minimum jerk optimization of trajectory with one via-point. As one of the goals of the model is to have the ability to handle trajectories with more than one via-point, it is necessary to extend the learning algorithm to multiple via- points. The use of multiple via-points creates a new challenge to the algorithm: velocities and accelerations in via-points are not independent of each other. The more direct approach would be to create neighbors that would represent all possible combinations of 32 changes to the variables, however the number of possible neighbors would grow exponentially to the number of via-points used. When designing the one via-point algorithm it was noted that a large number of neighbors increase the computational cost of learning because of the need to run a trajectory simulation to evaluate each of the possible candidates. One way around this issue is to assume the variables can be treated independently. If the variables were fully independent, one could isolate the effects to the cost function caused by changes in them. Analyzing the total jerk function, equation (11), it can be seen that in a given instant of the trajectory the jerk is greater effected by the velocity and acceleration through the via-point that is closer to that instant. With this property, one can roughly approximate the optimization by splitting the trajectory into segments, based on via-point proximity, and perform hill climbing for each segment. Since equation (11) is an integral it is really simple to split it into segments, as shown below: = !! ! ! !" ! + !! ! ! !" ! ! ! ! ! ! ! ! + !! ! ! !" ! + ! ! ! ! !!! ! ! !!! ! ! ! ! !!! !!! !! ! ! !" ! + ! ! ! !" ! + !! ! ! !" ! ! ! !!! ! ! !!! ! (13) where N is the number of targets and D i is the time estimate value of the i th target. Each integral corresponds to the cost function of each via-point split used in the learning algorithm. This splitting is not needed for trajectories with only one via-point. Once the splits are defined the algorithm can perform local optimization in each split. The hill- climbing algorithm for multiple via-points trajectories is defined as: 33 inputs: start position; target list; time estimations for targets; velocity at targets; acceleration at targets; initial performance; step size α; output: velocity at targets; acceleration at targets; step_vectors:= [(1,0);(0.707,0.707);(0,1);(-0.707,0.707);(-1,0);(-0.707,-0.707);(0,-1);(0.707,-0.707)]; for each via-point i initial_dir[i]:= (target[i+1]-target[i-1])/|target[i+1]-target[i-1]|; pick[i]:= argmin(|candidates[i]-initial_dir[i]|); phase[i]:=1 if i==1 split_start = 0; else split_start = time_to_target[i-1]+(time_to_target[i]-time_to_target[i-1])/2; if i==number_targets-1 split_end = time_to_target[number_targets]; else split_end = time_to_target[i]+(time_to_target[i+1]-time_to_target[i])/2; best_jerk[i]:= sum jerk from initial performance where split_start ≤ t < split_end; for N iterations for each via-point i if phase[i]==1 velo_tgt[i]:= velo_tgt[i]+α*candidates[pick[i]]; else if phase[i]==2 acc_tgt[i]:= acc_tgt[i]+α*candidates[pick[i]]; split_jerk:= min_jerk_model(start position, target list, velocities at targets, acceleration at targets, time estimations for targets); for each via-point i if split_jerk < best_jerk best_jerk:= total_jerk; else if phase[i]==1 velo_tgt[i]:= velo_tgt[i]-α*candidates[pick[i]]; else if phase[i]==2 acc_tgt[i]:= acc_tgt[i]-α*candidates[pick[i]]; if phase[i]==1 if all directions have failed sequentially phase[i]:=2; acc_trend:= (acc(time_to_target[i]-dt) +acc(time_to_target[i]+dt))/2; acc_trend:= acc_trend/|acc_trend|; pick[i]:= argmin(|candidates[i]-acc_trend|); go for next loop iteration; else try a new direction; if phase[i]==2 if all directions have failed sequentially phase:=0; stop learning; else try a new direction; 34 Algorithm 4.2. Modified Hill-Climbing Algorithm for minimum jerk optimization of trajectories with multiple via-points. Splitting the learning procedure increases the chances of being stuck in local optima. To allow the algorithm to better account for the influences the other variables have in the overall goal, a second pass through the learning algorithm is done to allow it to adjust variables that might have been exhausted earlier but can be improved after the remaining variables have been changed. 4.5 – Long Term and Working Memories Long Term memory was not implemented, but it is conceptualized to behave as a dictionary data structure, where a pattern would be evoked and the output of it would be the encoding of targets position and the respective velocities and accelerations to traverse each target. As the learning happens the values encoded in the long-term memory need to be updated to reflect any new improvement in the trajectory. The working memory is responsible for keeping track of the movement under execution, so the learning mechanism can evaluate if any improvements have been achieved after the coarticulation increment had been added to the original movement. This is simulated as a simple historical array that stores the state of the model during each simulation step. 4.6 – Inverse Dynamics, Plant and Current State Perception Hoff & Arbib (1993) already shown that a state look-ahead unit is able to reduce the effects of delays that happen from the moment the feedback unit sends the control signal until it is able to perceive a state change. Since the main interest of this thesis is on the feedback controller and the learning mechanism, the Inverse Dynamics, Plant and 35 Current State Perception are simulated through constant updates to the current state by integrating the derivatives outputted by the feedback unit. In the simulation the current state variable is also used as perceived state to the other modules that require that information. Figure 4.3 displays in greater detail the connections used in the simulation between Sequence Queue and Feedback Unit. Figure 4.3. Detailed wiring used in simulation in between Feedback Unit and Sequence Queue. 36 Chapter 5: Simulations and Results 5.1 – One via-point trajectories simulations The first tests executed with the model, were simple one via-point reaching movements. The idea is to verify the ability of the model to learn, as it aims to reduce the total jerk in the trajectory, how to coarticulate a highly segmented chain of straight movements, to achieve a more continuous and smoother trajectory. This initial test was performed using 4 different target configurations: (1) targets aligned in a straight manner from the origin; (2) targets aligned to generate a concave pattern; (3) targets aligned to generate a convex configuration; (4) targets resembling a semi-circle. To validate the results we compare them against the results provided by the Flash & Hogan (1985) model, which describes the optimal trajectory. 5.1.1 – Targets aligned in a straight line The simulation parameters values and evolution of state and goal during learning for this test is displayed in tables 1 and 2, respectively. Additionally Figure 5.1 shows the graphs of how the velocity profile of the trajectory changes during learning. 37 Table 1 Parameters values for targets aligned in a straight line Parameter Value Δt 0.001 Time interval [0, 1.1] (x 0 , y 0 ) (0, 0) (v x0 , v y0 ) (0, 0) (a x0 , a y0 ) (0, 0) Targets [(0.5, 0.5), (1, 1)] Velocity Through Targets [(0, 0), (0, 0)] Acceleration Through Targets [(0, 0), (0, 0)] Time to Targets [0.5, 1] α 0.05 Max. Iterations 1000 Table 2 Evolution of state and goal during learning of straight line Learning Iteration Velocities Through Target Acceleration Through Target Total Jerk 0 [(0, 0), (0, 0)] [(0, 0), (0, 0)] 2.2951e+07 25 [(0.8837, 0.8837), (0, 0)] [(0, 0), (0, 0)] 7.4401e+06 61 [(1.8735, 1.8735), (0, 0)] [(0, 0), (0, 0)] 1.4345e+06 Flash & Hogan Optimal [(1.875, 1.875), (0, 0)] [(0, 0), (0, 0)] 1.4301e+06 38 Figure 5.1. Velocity profiles of simulations during learning when targets are aligned in a straight line: (a) Before Learning; (b) After 25 iterations; and (c) at stopping point (no extra improvement found), after 61 iterations. 5.1.2 – Targets aligned in a concave curve The simulation parameters values and evolution of state and goal during learning for this test is displayed in tables 3 and 4, respectively. Additionally Figure 5.2 shows the simulated trajectories during learning. Also Figure 5.3 graphs of how the velocity profile of the trajectory changes during learning. 39 Table 3 Parameters values for targets aligned in a concave curve Parameter Value Δt 0.001 Time interval [0, 1.1] (x 0 , y 0 ) (0, 0) (v x0 , v y0 ) (0, 0) (a x0 , a y0 ) (0, 0) Targets [(0.5, 0.5), (1, 0.5)] Velocity Through Targets [(0, 0), (0, 0)] Acceleration Through Targets [(0, 0), (0, 0)] Time to Targets [0.5, 1] α 0.05 Max. Iterations 1000 Table 4 Evolution of state and goal during learning of concave curve Learning Iteration Velocities Through Target Acceleration Through Target Total Jerk 0 [(0, 0), (0, 0)] [(0, 0), (0, 0)] 1.7192e+07 100 [(1.8726, 0.9519), (0, 0)] [(0, -0.3), (0, 0)] 3.5657e+06 150 [(1.8726, 0.9519), (0, 0)] [(0, -2.8), (0, 0)] 2.6840e+06 235 [(1.8726, 0.9519), (0, 0)] [(0, -6.65), (0, 0)] 2.1708e+06 Flash & Hogan Optimal [(1.8750, 0.9375), (0, 0)] [(0, -6.6665), (0, 0)] 2.1650e+06 40 Figure 5.2. Evolution of trajectory during learning with targets at (0.5, 0.5) and (1, 0.5): (a) before learning; (b) after 100 iterations; (c) after 150 iterations; and (d) at stopping point (no extra improvement found), after 235 iterations. 41 Figure 5.3. Velocity profiles (blue line for horizontal velocity and green line for vertical velocity) of simulations during learning with targets at (0.5, 0.5) and (1, 0.5): (a) Before Learning; (b) After 25 iterations; (c) After 50 iterations; (d) After 75 iterations; and (e) at stopping point (no extra improvement found), after 85 iterations. 5.1.3 – Targets aligned in a convex curve The simulation parameters values and evolution of state and goal during learning for the test with targets aligned in a convex curve are displayed in tables 5 and 6, respectively. Additionally Figure 5.4 shows the simulated trajectories during learning. Also Figure 5.5 graphs of how the velocity profile of the trajectory changes during learning. 42 Table 5 Parameters values for targets aligned in a convex curve Parameter Value Δt 0.001 Time interval [0, 1.1] (x 0 , y 0 ) (0, 0) (v x0 , v y0 ) (0, 0) (a x0 , a y0 ) (0, 0) Targets [(0.5, 0.5), (0.5, 1)] Velocity Through Targets [(0, 0), (0, 0)] Acceleration Through Targets [(0, 0), (0, 0)] Time to Targets [0.5, 1] α 0.05 Max. Iterations 1000 Table 6 Evolution of state and goal during learning of convex curve Learning Iteration Velocities Through Target Acceleration Through Target Total Jerk 0 [(0, 0), (0, 0)] [(0, 0), (0, 0)] 1.7108e+07 100 [(0.92, 1.8735), (0, 0)] [(-0.45, 0), (0, 0)] 3.5007e+06 150 [(0.92, 1.8735), (0, 0)] [(-2.95, 0), (0, 0)] 2.6449e+06 232 [(0.92, 1.8735), (0, 0)] [(-6.65, 0), (0, 0)] 2.1709e+06 Flash & Hogan Optimal [(0.9375, 1.8750), (0, 0)] [(-6.6665, 0), (0, 0)] 2.1650e+06 43 Figure 5.4. Evolution of trajectory during learning with targets at (0.5, 0.5) and (0.5, 1): (a) before learning; (b) after 100 iterations; (c) after 150 iterations; and (d) at stopping point (no extra improvement found), after 232 iterations. 44 Figure 5.5. Velocity profiles (blue line for horizontal velocity and green line for vertical velocity) of simulations during learning with targets at (0.5, 0.5) and (0.5, 1): (a) Before Learning; (b) After 100 iterations; (c) After 150 iterations; and (d) at stopping point (no extra improvement found), after 232 iterations. 5.1.4 – Targets aligned in a semi-circle The simulation parameters values and evolution of state and goal during learning for the test with targets aligned in a semi-circle are displayed in tables 7 and 8, respectively. Additionally Figure 5.6 shows the simulated trajectories during learning. Also Figure 5.7 graphs of how the velocity profile of the trajectory changes during learning. 45 Table 7 Parameters values for targets aligned in a semi-circle Parameter Value Δt 0.001 Time interval [0, 1.1] (x 0 , y 0 ) (0, 0) (v x0 , v y0 ) (0, 0) (a x0 , a y0 ) (0, 0) Targets [(0.5, 0.5), (1, 0)] Velocity Through Targets [(0, 0), (0, 0)] Acceleration Through Targets [(0, 0), (0, 0)] Time to Targets [0.5, 1] α 0.05 Max. Iterations 1000 Table 8 Evolution of state and goal during learning of semi-circle Learning Iteration Velocities Through Target Acceleration Through Target Total Jerk 0 [(0, 0), (0, 0)] [(0, 0), (0, 0)] 2.2670e+07 50 [(1.85, 0), (0, 0)] [(0, 0), (0, -0.25)] 1.1713e+07 125 [(1.85, 0), (0, 0)] [(0, 0), (0, -4.0)] 8.8150e+06 319 [(1.85, 0), (0, 0)] [(0, 0), (0, -13.3)] 5.8193e+06 Flash & Hogan Optimal [(1.875, 0), (0, 0)] [(0, -13.333), (0, 0)] 5.7998e+06 46 Figure 5.6. Evolution of trajectory during learning with targets at (0.5, 0.5) and (1, 0): (a) before learning; (b) after 50 iterations; (c) after 125 iterations; and (d) at stopping point (no extra improvement found), after 319 iterations. 47 Figure 5.7. Velocity profiles (blue line for horizontal velocity and green line for vertical velocity) of simulations during learning with targets at (0.5, 0.5) and (1, 0): (a) before Learning; (b) after 50 iterations; (c) after 125 iterations; and (d) at stopping point (no extra improvement found), after 319 iterations. 5.2 – Multiple Via-points trajectories simulations To test the multiple via-points algorithm it is used two target configurations: (a) targets positioned to resemble a handwritten number two; and (b) targets positioned to resemble a handwritten lower case letter G. Because it is very complicated to obtain the optimal value analytically when there is more than one via-point, the algorithm results are compared to results obtained using a simple genetic algorithm (GA), which has been able to obtain successful optimality approximations to many different practical problems with variables in the real domain (Goldberg, 1990). 48 5.2.1 – Targets configured to resemble a handwritten number two The simulation parameters values and evolution of state and goal during learning for the test with targets aligned to resemble a handwritten number two are displayed in tables 9 and 10, respectively. Additionally Figure 5.8 shows the simulated trajectories during learning. Also Figure 5.9 graphs of how the velocity profile of the trajectory changes during learning. Table 9 Parameters values for targets aligned to resemble a handwritten number two Parameter Value Δt 0.001 Time interval [0, 2.1] (x 0 , y 0 ) (0, 0) (v x0 , v y0 ) (0, 0) (a x0 , a y0 ) (0, 0) Targets [(0.6, 0.66), (1.2, 0), (0, -1.33), (1.2, -1.33)] Velocity Through Targets [(0, 0), (0, 0), (0, 0), (0, 0)] Acceleration Through Targets [(0, 0), (0, 0), (0, 0), (0, 0)] Time to Targets [0.5, 1, 1.5, 2] α 0.05 Max. Iterations 1000 49 Table 10 Evolution of state and goal during learning of handwritten number two Learning Iteration Velocities Through Target Acceleration Through Target Total Jerk 0 [(0, 0), (0, 0), (0, 0), (0, 0)] [(0, 0), (0, 0), (0, 0), (0, 0)] 1.410e+08 150 [(2.3, 1.55), (-0.9191, -3.2691), (1.5198, -0.8663), (0, 0)] [(0, 0), (0, 0), (0, 0), (0, 0)] 6.865e+07 969 (end of 1 st pass) [(2.3, 1.55), (-0.9191, -3.2691), (1.4698, -0.8663), (0, 0)] [(0, 0), (0, 0), (25.4567, 8.4067), (0, 0)] 3.447e+07 55 (end of 2 nd pass) [(2.6, 1.55), (-1.6984, -3.8691), (1.293, -0.3395), (0, 0)] [(0, 0), (0, 0), (25.4567, 8.4067), (0, 0)] 2.981e+07 GA Best [(2.8210, 1.3973), (-2.4619, - 3.4153), (1.5761, -0.9135), (0, 0)] [(3.8635, -9.0039), (- 16.3302, -2.9308), (22.7720, 8.1840), (0, 0)] 1.767e+07 50 Figure 5.8. Evolution of trajectory during learning with four targets resembling a handwritten number two: (a) Before learning; (b) after 150 iterations; (c) at the end of the first pass, after 969 iterations; and (d) at the end of the second pass, after 55 iterations. 51 Figure 5.9. Velocity profiles of simulations during learning with four targets resembling a handwritten number two: (a) Before learning; (b) after 150 iterations; (c) at the end of the first pass, after 969 iterations; and (d) at the end of the second pass, after 55 iterations. 5.2.2 – Targets configured to resemble a lower case letter G The simulation parameters values and evolution of state and goal during learning for the test with targets, aligned to resemble a lower case letter G, are displayed in tables 11 and 12, respectively. Additionally Figure 5.10 shows the simulated trajectories during learning. Also Figure 5.11 graphs of how the velocity profile of the trajectory changes during learning. 52 Table 11 Parameters values for targets aligned to resemble a handwritten lower case letter G Parameter Value Δt 0.001 Time interval [0, 2.1] (x 0 , y 0 ) (0, 0) (v x0 , v y0 ) (0, 0) (a x0 , a y0 ) (0, 0) Targets [(-0.2, 0.3), (-0.4, 0), (-0.2, -0.3), (0, 0), (0, -0.7), (-0.1, -0.9), (-0.2, -0.7), (0.1, 0)] Velocity Through Targets [(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0)] Acceleration Through Targets [(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0)] Time to Targets [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2] α 0.05 Max. Iterations 1000 53 Table 12 Evolution of state and goal during learning of handwritten lower-case letter G Learning Iteration Velocities Through Target Acceleration Through Target Total Jerk 0 [(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0)] [(0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0)] 1.208e+09 150 [(-1.4379, 1.1312), (0, -2), (0.9, 0), (0.3621, -0.7414), (0, -1.95), (-0.6405, 0.1061), (1.2863, 3.7363), (0, 0)] [(0, -3.7), (0, 0), (0, 0), (0, - 3.2), (0, 0), (0, 0), (0.6363, 0.6363), (0, 0)] 3.984e+08 935 (end of 1 st pass) [(-1.4379, 1.1312), (0, -2), (0.9, 0), (0.3621, -0.7414), (0, -1.95), (-0.6405, 0.1061), (1.2863, 3.7363), (0, 0)] [(0.4621, -19.7121), (0, 0), (0, 0), (-5.1611, -34.55), (0, 0), (0, 0), (15.873, 24.0231), (0, 0)] 2.345e+08 40 (end of 2 nd pass) [(-1.3879, 1.0812), (0.1061, - 2.2061), (1.0061, 0.2474), (0.4768, -0.9475), (-0.0646, - 2.0939), (-0.6405, 0.1061), (1.3363, 3.7863), (0, 0)] [(0.4621, -19.7121), (0, 0), (- 0.0354, -0.0147), (-5.1611, - 34.6), (-0.15, -0.05), (0, 0), (15.8584, 24.0584), (0, 0)] 1.813e+08 GA Best [(-1.429, 1.1969), (0.1062, - 2.7502), (1.1514, 1.1842), (0.2685, -0.9800), (-0.0207, - 2.3574), (-1.0823, -0.4898), (1.2887, 3.6402), (0, 0)] [(0.6672, -17.3814), (7.7491, 0.1803), (-0.3227, 18.0328), (-4.4706, -28.6522), (0.3637, 18.8931), (-4.4646, -3.5378), (14.8688, 23.4592), (0, 0)] 1.143e+08 54 Figure 5.10. Evolution of trajectory during learning with eight targets resembling a handwritten lower case letter G: (a) Before learning; (b) after 150 iterations; (c) at the end of the first pass, after 935 iterations; and (d) at the end of the second pass, after 40 iterations. 55 Figure 5.11. Velocity profiles during learning with eight targets resembling a handwritten lower case letter G: (a) Before learning; (b) after 150 iterations; (c) at the end of the first pass, after 935 iterations; and (d) at the end of the second pass, after 40 iterations. 5.3 – Rotation, size and speed of execution scaling simulations To demonstrate the ability of the model to perform rotation, size and speed of execution scaling, simulations are done using the final values obtained for the trajectory describing a handwritten number two described in 5.2.1. First a clockwise rotation of 45 degrees is applied to the movement pattern, with the resulting trajectory and velocity profiles displayed in Figure 5.12. Next, applying a size factor of 3 tests size scaling, with the resulting trajectory and velocity profiles graphed in Figure 5.13. And lastly speed of 56 execution is changed so the trajectory is completed twice as fast, with the resulting trajectory and velocity profile shown in Figure 5.14. While the velocity profiles did changed as compared to Figure 10(d), they kept the bell-shaped characteristic. Figure 5.12. (a) Trajectory and (b) velocity profiles of the movement pattern resembling a handwritten number two when rotated 45º clockwise. Figure 5.13. (a) Trajectory and (b) velocity profiles of the movement pattern resembling handwritten number two with size scaled by a factor of three. 57 Figure 5.14. (a) Trajectory and (b) velocity profiles of the movement pattern resembling a handwritten number two with execution speeded up by a factor of two. 5.4 – Tangential Velocity, Curvature and Two-Thirds Power Law Since the model aims to replicate as best as possible the human arm control it is interesting to verify if the resulting trajectories possess the same characteristics as human created trajectories. One of the main characteristics of the human arm motor control is the relationship between Tangential Velocity and Curvature as summarized in section 2.5. Using the trajectory learned for the handwritten letter G example, the tangential velocity and curvature are compared to confirm the presence of the characteristic. Figure 5.15 displays two graphs comparing these two metrics, first a directly comparison between tangential velocity and curvature, and the second graph empiric tangential velocity against predicted tangential velocity as indicated by the two-thirds power law. 58 Figure 5.15. Relationship between tangential velocity and curvature: (a) tangential velocity (blue line) vs. curvature (green line) scaled by a one fourth factor; and (b) empirical tangential velocity (blue line) vs. Two-Thirds power law expected tangential velocity (green line) using scaling factor of two. 5.5 – Discussion The results from the simulations for one via-point trajectories show that the model is able to replicate curved movements while constrained by target information. Additionally the evolution of the movement presented during the learning phase from a highly segmented motion towards coarticulation, gradually transforms into a singular bell shaped profile, when possible. As expected due to the convexity of the total jerk function the Hill-Climbing algorithm obtained values really close to the optimal values obtained by the Flash & Hogan (1985) model. The same behavior is noticeable when multiple via-points are involved. Figures 5.9 and 5.11 show a clear evolution in the trajectories, from highly segmented too much smoother continuous ones. Figures 5.10 and 5.12, showing the velocity profiles support this idea by displaying that bell-shaped segments tend to be combined to reduce the total number of segments in the whole movement. 59 It is noticeable that the learning algorithm for multiple via-points movements has its shortcomings. This happens because of the trade-off made to treat changes to variables as being as independent as possible, in order to remove the exponential characteristic of combining changes to multiple variables at once. Different learning algorithms could have been applied, but Hill-Climbing was chosen because of its incremental characteristic and simplicity, which seems to align to the way humans create coarticulation in movement as shown in the studies of Sosnik, et al. (2004). The number of iterations needed to obtain final values for the results are high because of the small learning step used. One way to improve this is use a varying step size scheme. For example, when trying to learn the letter G trajectory using a start step size of 0.5, and as the algorithm fails to improve, repeat learning with a step size of 0.1, and then repeat again with step size of 0.05, a total of 450 repetitions are needed to obtain results similar to the ones obtained after one thousand repetitions needed when using a fixed step size of 0.05. The resulting trajectories of applying rotation, size and speed of execution scaling, show that these operations can be done to the movement pattern without changing the smoothness and low segmentation properties obtained from the original training. This characteristic is important because it removes the need to relearn patterns and simplifies the use of previously learned patterns to different situations. It is important to also note that the relationship between tangential velocity and curvature obtained in the simulations of the model resembles the relationship presented in human motion. Figure 5.15 (a) shows that the two properties in the trajectory are 60 inversely coupled, as each peak in the curvature corresponds to a valley in the tangential velocity, and each peak in the tangential velocity corresponds to a valley in the curvature. The graph on Figure 5.15 (b) shows that the two-thirds power law presents a stronger relationship, showing a moderate similarity (correlation coefficient of 0.6077) between the movement pattern generated by the model and the tangential velocity predicted by the two-thirds power law. While the use of strict feedback control forces trajectories to pass through the targets at their exact position, human motion on the other hand hardly is as precise, due to noise and natural precision boundaries on perception. Sosnik, et al. (2007) investigates this issue, and points that when high accuracy is required in the task, humans tend to reduce the coarticulation and prefer a more segmented trajectory. Lastly to force trajectories to have a specific shape it is necessary to add enough targets so the trajectory is constrained enough so the learning algorithm doesn't produce patterns that don't fully correspond to the desired trajectory. This can be seen in the simulations in 5.2.1 where the trajectory formed has an unexpected skewed curve on the top right corner of the figure two. 61 Chapter 6 – Conclusions 6.1 – Computational Characteristics of the Model The extended minimum jerk model proposed in this thesis offers a simple and effective way to generate kinematic information that describes planar trajectories similar to the ones produced by humans when moving their arms. As the main module in the model is a feedback controller, implementation and deployment of such controller in different contexts is relatively simple. While the thesis proposes the use of a modified Hill-Climbing optimization algorithm for coarticulation learning, other optimization techniques can be used to produce the coarticulation. The choice of such technique should account for the computational complexity and the accuracy such technique offers. For example, the Genetic Algorithm Optimization used to create a close to optimal baseline to compare the results require a much larger amount of simulated results when compared to the Hill- Climbing algorithm in order to produce similar results, however if the quantity of simulations and their cost is not an issue, it can produce more accurate results. The other modules simulated in this work, were all implemented as simple data structures and logical procedures, which don't add any relevant computational cost to the overall model. The non-simulated modules, such as inverse dynamics, plant and state perception, can pose a risk to increase the total computational cost of the model, however these modules have been successfully implemented in other models (see models review in Chapter 3) and therefore should not create any major issues. 62 6.2 – How the model relates to the Brain In this section each module in the model is related to a cortex area, which is most responsible for the functionality of the module. The cortex areas and functionalities are listed as indicated by Cortical Functions (2012). After these relationships are described, an overlay of the model's Schema Diagram (Figure 4.2), with the labels of the brain areas associated with each module, is used to identify a possible configuration of the brain that underlies motor execution. Figure 6.1 displays the suggested configuration. It is important to note that this configuration is a model, and therefore simplifies the brain behavior by assuming areas functionalities are independent of other areas, which it is not always true. 6.2.1 – Feedback Module The feedback module as it is responsible for generating the control signal that will drive the execution of a movement can be associated with the Primary Motor Cortex, which is the area in the brain responsible for motor function. 6.2.2 – Inverse Dynamics and Plant As studies correlate the primary motor cortex to kinematic information of the end- effector trajectory (see Section 2.2), it is assumed that the Inverse Dynamics computation required to transform this kinematic information into joint configurations happen in the sub-network that connects the motor cortex to muscles, composed of areas of the brainstem and the spinal cord. The plant would be representative of the arm's muscles and joints. 63 6.2.3 – Current State Perception and Working Memory Combining both somatosensory information processed in the Primary Somatosensory Area and visual information processed in the Visual Cortex generates current state perception in the brain. Since the working memory module is responsible for keeping track of the executed motor pattern so far, it is also related to the Primary Somatosensory Area and the Visual Cortex. 6.2.4 – Evaluator The evaluator is the module responsible for the learning process as it evaluates the action taken and induces changes in the long-term memory according to its evaluation. Learning in the brain has not been associated precisely with one specific area, and it seems to be a characteristic involving a joint effort from multiple areas in the brain. So instead of relating the evaluator to a brain area, it will be related to the ability of the brain to use reinforcement learning to alter how the information is being processed. 6.2.5 – Long-Term Memory The long-term memory module is responsible for generating the signal that corresponds to the set of targets and the kinematic information that together compose complex movements. In the brain the ability to generate extensive practiced movements is associated with the Premotor Cortex. 6.2.6 – Rotation, Size and Speed Scaling The ability to adjust an already learned pattern to be replicated with some of its characteristics altered is directly related to the ability to perform practiced movements. So this module is associated with the Premotor Cortex. 64 6.2.7 – Sequence Queue The ability to process motor sequencing and planning is also associated with the Premotor Cortex, as it is necessary for the transition from coarse movements to coarticulated ones. Figure 6.1. Brain areas (black dotted and dashed boxes) overlaid on top of the model's schema diagram (gray boxes). Each brain area is directly placed on top of the modules that model its functionalities. 65 6.3 – Future Work While the model presented in this thesis is useful on its right, it can be further developed or used for other works. The first development would be to implement the modules non-simulated in this work to obtain a more complete testing of the model with use of more complex situations, which include delays and noise present in perception. Other interesting experiment to develop with the model would be to implement it to be used as control mechanism of a robotic arm. This would offer insights on how the model can be useful in the development of robotic technologies. The model was restrained to planar movements, however most human actions are three-dimensional. So extending the model to allow three-dimensional movements and compare the simulation results to human data might create new insights with regards to the model and its relationship to human movement. Additionally, the use of neural models offers a way to test and simulate sub- networks of the brain and their connectivity. To transform the model into a neural network model would be a way to increase the relationship between the model and the brain behavior, while at the same time offering a simulation tool to scientists to test some of theirs hypothesis. Lastly, this thesis’ model does not fully incorporate the concept of primitives, as it does not address the long-term memorization and reutilization of recently learnt trajectories as new primitives. The model simply assumes that straight point-to-point movement is an available primitive, and uses it as starting point to the ability to develop smoother continuous trajectories from segmented ones. To include this ability would be 66 interesting as it would make the model more robust, as well as could offer some insight on how the brain processes such information. 67 References Abend, W., Bizzi, E., & Morasso, P. (1982). Human arm trajectory formation. Brain : A Journal of Neurology, 105(Pt 2), 331-348. Arbib, M. A. (1989). The metaphorical brain 2: Neural networks and beyond John Wiley & Sons, Inc. Arbib, M. A. (Ed.). (2003). The handbook of brain theory and neural networks (2nd ed.) The MIT Press. Ashe, J., & Georgopoulos, A. P. (1994). Movement parameters and neural activity in motor cortex and area 5. Cerebral Cortex (New York, N.Y.: 1991), 4(6), 590-600. Averbeck, B. B., Chafee, M. V., Crowe, D. A., & Georgopoulos, A. P. (2002). Parallel processing of serial movements in prefrontal cortex. Proceedings of the National Academy of Sciences of the United States of America, 99(20), 13172-13177. doi:10.1073/pnas.162485599 [doi] Bullock, D., & Grossberg, S. (1988). Neural dynamics of planned arm movements: Emergent invariants and speed-accuracy properties during trajectory formation. Psychological Review, 95(1), 49. Bullock, D., Grossberg, S., & Mannes, C. (1993). A neural network model for cursive script production. Biological Cybernetics, 70(1), 15-28. Cisek, P., & Kalaska, J. F. (2005). Neural correlates of reaching decisions in dorsal premotor cortex: Specification of multiple direction choices and final selection of action. Neuron, 45(5), 801-814. Cortical functions. (2012). (Reference). Hong Kong: Trans Cranial Technologies. 68 Retrieved from http://www.trans-cranial.com/local/manuals/cortical_functions_ref_v1_0_pdf.pdf Flash, T., & Hogan, N. (1985). The coordination of arm movements: An experimentally confirmed mathematical model. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 5(7), 1688-1703. Georgopoulos, A., Kalaska, J., Crutcher, M., Caminiti, R., & Massey, J. (1984). The representation of movement direction in the motor cortex: Single cell and population studies. Dynamic Aspects of Neocortical Function, 501 Georgopoulos, A. P., Kalaska, J. F., Caminiti, R., & Massey, J. T. (1982). On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 2(11), 1527-1537. Georgopoulos, A. P., Kettner, R. E., & Schwartz, A. B. (1988). Primate motor cortex and free arm movements to visual targets in three-dimensional space. II. coding of the direction of movement by a neuronal population. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 8(8), 2928-2937. Georgopoulos, A. P., Schwartz, A. B., & Kettner, R. E. (1986). Neuronal population coding of movement direction. Science (New York, N.Y.), 233(4771), 1416-1419. Goldberg, D. E. (1990). Real-coded genetic algorithms, virtual alphabets, and blocking. Urbana, 51, 61801. Grossberg, S., & Paine, R. W. (2000). A neural model of cortico-cerebellar interactions during attentive imitation and predictive learning of sequential handwriting movements. Neural Networks, 13(8), 999-1046. 69 Han, C. E. (2009). Modeling Human Reaching and Grasping, Hoff, B., & Arbib, M. A. (1993). Models of trajectory formation and temporal interaction of reach and grasp. Journal of Motor Behavior, 25(3), 175-192. Ijspeert, A. J., Nakanishi, J., Hoffmann, H., Pastor, P., & Schaal, S. (2013). Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation, 25(2), 328-373. Lashley, K. (1951). The problem of serial order in behavior. 1951, , 112-135. Morasso, P., Ivaldi, F., & Ruggiero, C. (1983). How a discontinuous mechanism can produce continuous patterns in trajectory formation and handwriting. Acta Psychologica, 54(1), 83-98. Morasso, P. (1981). Spatial control of arm movements. Experimental Brain Research, 42(2), 223-227. Morasso, P. (1983). Three dimensional arm trajectories. Biological Cybernetics, 48(3), 187-194. Rohrer, B., Fasoli, S., Krebs, H. I., Volpe, B., Frontera, W. R., Stein, J., & Hogan, N. (2004). Submovements grow larger, fewer, and more blended during stroke recovery. Motor Control-Champaign-, 8, 472-483. Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.) Prentice Hall. Schaal, S. (2003). Dynamic movement Primitives–A framework for motor control in humans and humanoid robotics Paper presented at the The International Symposium on 70 Adaptive Motion of Animals and Machines, Kyoto, Japan. Retrieved from http://www-clmc.usc.edu/publications/S/schaal-AMAM2003.pdf Sosnik, R., Flash, T., Hauptmann, B., & Karni, A. (2007). The acquisition and implementation of the smoothness maximization motion strategy is dependent on spatial accuracy demands. Experimental Brain Research, 176(2), 311-331. Sosnik, R., Hauptmann, B., Karni, A., & Flash, T. (2004). When practice leads to co- articulation: The evolution of geometrically defined movement primitives. Experimental Brain Research, 156(4), 422-438. 71 Appendix A: Matlab Code File 1: min_jerk_va.m function [x,xd,xdd] = min_jerk_va(x,xd,xdd,goal,vgoal,agoal,tau, dt) % computes the update of x,xd,xdd for the next time step dt given % that we are currently at x,xd,xdd, and that we have a time interval % tau until reaching the goal if tau<dt, return; end; dist = goal - x; a1 = agoal * tau.^2; a0 = xdd * tau^2; v1 = vgoal * tau; v0 = xd * tau; t1=dt; t2=dt^2; t3=dt^3; t4=dt^4; t5=dt^5; c1 = (6.*dist + (a1 - a0)/2. - 3.*(v0 + v1))/tau^5; c2 = (-15.*dist + (3.*a0 - 2.*a1)/2. + 8.*v0 + 7.*v1)/tau^4; c3 = (10.*dist+ (a1 - 3.*a0)/2. - 6.*v0 - 4.*v1)/tau^3; c4 = xdd/2.; c5 = xd; c6 = x; x = c1*t5 + c2*t4 + c3*t3 + c4*t2 + c5*t1 + c6; xd = 5.*c1*t4 + 4*c2*t3 + 3*c3*t2 + 2*c4*t1 + c5; xdd = 20.*c1*t3 + 12.*c2*t2 + 6.*c3*t1 + 2.*c4; 72 File 2: learning_multi_tgt_va.m %MULTIPLE POINT LEARNING% rng('shuffle')%sets random seed to time value dt = 0.001;%time step %testing mode option: 1: pausing and graphing every 150th iteration; 2: no %stopping and only first and last graphs; 3: only initial iteration, no learning mode = 2; t = 0:dt:2.1;%simulation time vector N = length(t);%size of time simulation pos = [0,0];%initial position v0 = [0,0];%initial velocity a0 = [0,0];%initial acceleration %long-term memory contents tgt_list = [-0.2, 0.3;-0.4,0;-0.2, -0.3;0, 0;0,-0.7;-0.1,-0.9;-0.2,- 0.7;0.1,0];%list of targets in order v_tgt_list = [0,0;0,0;0,0;0,0;0,0;0,0;0,0;0,0];%list of desired velocities when crossing each target a_tgt_list = [0,0;0,0;0,0;0,0;0,0;0,0;0,0;0,0];%list of desired accelerations when crossing targets ttg = [0.25,0.5,0.75,1,1.25,1.5,1.75,2];%expected time duration of segments ntgt = 8;%total number of targets split_jerk = zeros(1,ntgt-1); split_start = 1; tau = ttg(1);%estimation of finishing time for the current segment cursor = 1;%keeps track of which target is current jerk_ct = 1;%keeps track of which target the jerk is being assigned to tau_full = tau;%full estimation of time in between targets (will not be modified as tau is) %working memory pos_vec = zeros(N,2);%stores position through simulation vel_vec = zeros(N,2);%stores velocity through simulation acc_vec = zeros(N,2);%stores acceleration through simulation %first simulation for i=1:N %current target from sequence queue tgt = tgt_list(cursor,:);%"loads" the position of target on focus v_tgt = v_tgt_list(cursor,:);%velocity of target on focus a_tgt = a_tgt_list(cursor,:);%acceleration of target on focus %min-jerk feedback controller step [pos(1),v0(1),a0(1)] = min_jerk_va(pos(1),v0(1),a0(1),tgt(1),v_tgt(1),a_tgt(1),tau,dt); 73 [pos(2),v0(2),a0(2)] = min_jerk_va(pos(2),v0(2),a0(2),tgt(2),v_tgt(2),a_tgt(2),tau,dt); %updates simulation history pos_vec(i,:) = pos; vel_vec(i,:) = v0; acc_vec(i,:) = a0; %updates estimation on expected finishing tau = tau-dt; %updates cursor and tau values to reflect a new target in the sequence %queue after finishing a segment, if segment was the last one, no %changes needed. the use of the 1e-10 in the comparison is to avoid %issues with floating point precision. if tau<dt-1e-10 && cursor<ntgt cursor = cursor+1; tau = ttg(cursor)-ttg(cursor-1); tau_full = tau; end %updates jerk split for evaluation and the variable identifying the %current split being evaluated if jerk_ct<cursor && (tau/tau_full)<(0.5-1e-10) && cursor<ntgt split_jerk(jerk_ct) = sum((gradient(acc_vec(split_start:i,1))./dt).^2)+sum((gradient(acc_vec(spl it_start:i,2))./dt).^2); split_start = i+1; jerk_ct = jerk_ct+1; end end %computes last split for jerk split_jerk(jerk_ct) = sum((gradient(acc_vec(split_start:i,1))./dt).^2)+sum((gradient(acc_vec(spl it_start:i,2))./dt).^2); %plotting of trajectory, velocity and acceleration profiles figure plot(pos_vec(:,1),pos_vec(:,2),'- ',[0;tgt_list(:,1)],[0;tgt_list(:,2)],'r*',[-1,-1,1.5,1.5],[1,-1.5,1,- 1.5],'ko'); title('Trajectory');xlabel('X');ylabel('Y'); figure plot(t,[vel_vec(:,1),vel_vec(:,2)]); title('Velocity Profile');xlabel('t');ylabel('V'); figure plot(t,acc_vec); title('Acceleration Profile');xlabel('t');ylabel('A'); %jerk and total squared jerk computation %jerk is the derivative of the acceleration in time jerk = [gradient(acc_vec(:,1))./dt, gradient(acc_vec(:,2))./dt]; total_jerk = sum(jerk(:,1).^2)+sum(jerk(:,2).^2); 74 if mode==3 return end %multi-pass learning total_pass = 2; total_rep = 0; %pause to allow saving of graphs pause for pass=1:total_pass %reinforcement learning trials %set-up %learning factor alpha = 0.05; %variable to store current min jerk value min_jerk = split_jerk; %unity vectors for changing velocity unity = [1,0;0.707,0.707;0,1;-0.707,0.707;-1,0;-0.707,-0.707;0,- 1;0.707,-0.707]; %array storing fails to avoid chosing a previously failed direction fails_v = zeros(ntgt-1,8); fail_count_v = zeros(ntgt-1,1);%indicates position in the fail array to insert fails_a = zeros(ntgt-1,8); fail_count_a = zeros(ntgt-1,1); phase = ones(ntgt-1,1); initial_dir = zeros(ntgt-1,2); pick = zeros(ntgt-1,1); for i=1:ntgt-1 if i==1 initial_dir(i,:) = (tgt_list(i+1,:)-[0,0]); else initial_dir(i,:) = (tgt_list(i+1,:)-tgt_list(i-1,:)); end initial_dir(i,:) = initial_dir(i,:)./sqrt(sum(initial_dir(i,:).^2)); %variable storing direction picked for change, initial pick is %chosen as the closest direction to the direction as the movement %would follow if it was to be traced from the previous target to %the next target [~,pick(i)] = min((sum((unity- repmat(initial_dir(i,:),8,1)).^2,2)).^(1/2)); end %learning iterations for rep = 1:1000 %adds changes to not "exausted" targets according to their %respective "learning phase" 75 v_tgt_list = v_tgt_list+alpha.*[unity(pick,:).*repmat((fail_count_v<8).*(phase==1),1,2) ;0, 0]; a_tgt_list = a_tgt_list+alpha.*[unity(pick,:).*repmat((fail_count_a<8).*(phase==2),1,2) ;0, 0]; if sum(phase)==0 break; end %simulation %reseting initial conditions and control variables pos = [0 0]; v0 = [0 0]; a0 = [0 0]; cursor = 1; tau = ttg(cursor); split_start=1; jerk_ct = 1; tau_full = tau; %time simulation for i=1:N tgt = tgt_list(cursor,:);%"loads" the position of target on focus v_tgt = v_tgt_list(cursor,:);%velocity of target on focus a_tgt = a_tgt_list(cursor,:); %min-jerk controller step [pos(1),v0(1),a0(1)] = min_jerk_va(pos(1),v0(1),a0(1),tgt(1),v_tgt(1),a_tgt(1),tau,dt); [pos(2),v0(2),a0(2)] = min_jerk_va(pos(2),v0(2),a0(2),tgt(2),v_tgt(2),a_tgt(2),tau,dt); %updates simulation history pos_vec(i,:) = pos; vel_vec(i,:) = v0; acc_vec(i,:) = a0; %updates estimation on expected finishing tau = tau-dt; %updates cursor and tau values to reflect a new target after finishing %segment, if segment was the last one, no changes needed. the use of %the 1e-10 in the comparison is to avoid issues with floating point %precision. if tau<dt-1e-10 && cursor<ntgt cursor = cursor+1; tau = ttg(cursor)-ttg(cursor-1); tau_full = tau; end %updates jerk split control variable if jerk_ct<cursor && (tau/tau_full)<(0.5-1e-10) && cursor<ntgt 76 split_jerk(jerk_ct) = sum((gradient(acc_vec(split_start:i,1))./dt).^2)+sum((gradient(acc_vec(spl it_start:i,2))./dt).^2); split_start = i+1; jerk_ct = jerk_ct+1; end end %computes last split for jerk split_jerk(jerk_ct) = sum((gradient(acc_vec(split_start:i,1))./dt).^2)+sum((gradient(acc_vec(spl it_start:i,2))./dt).^2); %computes total jerk for the current iteration jerk = [gradient(acc_vec(:,1))./dt, gradient(acc_vec(:,2))./dt]; total_jerk = sum(jerk(:,1).^2)+sum(jerk(:,2).^2); %data gathering stops, plots trajectory, velocity and acceleration %profiles. Displays split jerk, velocities and acceleration to %targets if rep==150 && mode==1 figure plot(pos_vec(:,1),pos_vec(:,2),'- ',[0;tgt_list(:,1)],[0;tgt_list(:,2)],'r*',[-1,-1,1.5,1.5],[1,-1.5,1,- 1.5],'ko'); title('Trajectory');xlabel('X');ylabel('Y'); figure plot(t,[vel_vec(:,1),vel_vec(:,2)]); title('Velocity Profile');xlabel('t');ylabel('V'); figure plot(t,acc_vec); title('Acceleration Profile');xlabel('t');ylabel('A'); split_jerk v_tgt_list a_tgt_list rep pause end %looks for local improvements for k=1:ntgt-1 %if failed to improve previously nothing needed skip to next split if phase(k) == 0 continue; end %compares for improvement if split_jerk(k)<min_jerk(k) min_jerk(k) = split_jerk(k); if phase(k)==1 fails_v(k,:) = zeros(1,8); fail_count_v(k) = 0; elseif phase(k)==2 77 fails_a(k,:) = zeros(1,8); fail_count_a(k) = 0; end else %regression to previous iteration value if no improvement if phase(k)==1 fail_count_v(k) = fail_count_v(k)+1; %removes changes made v_tgt_list(k,:) = v_tgt_list(k,:)- alpha.*unity(pick(k),:); fails_v(k,fail_count_v(k)) = pick(k); elseif phase(k)==2 fail_count_a(k) = fail_count_a(k)+1; a_tgt_list(k,:) = a_tgt_list(k,:)- alpha.*unity(pick(k),:); fails_a(k,fail_count_a(k)) = pick(k); end %checks if all directions created failure for this split %if so moves on to next split if phase(k)==1 %if all directions for velocity change have been tested, %changes phase to acceleration phase, makes a pick based on %the current trend of the acceleration profile around the %time of crossing the target, and skips back to the next %iteration in the splits loop (variable k) if fail_count_v(k) == 8 phase(k) = 2; tgt_n = int16(ttg(k)/dt)+1; trend = (acc_vec(tgt_n- 1,:)+acc_vec(tgt_n+1,:))./2; trend = trend./sqrt(sum(trend.^2)); [~,pick(k)] = min((sum((unity- repmat(trend,8,1)).^2,2)).^(1/2)); continue; end elseif phase(k)==2 if fail_count_a(k) == 8 phase(k) = 0; continue; end end control = 1;%loop control variable while(control)%keeps randomly selecting direction until not in fail stack pick(k) = ceil(rand(1,1)*8); control = 0;%assumes pick is good if phase(k)==1 78 for i= 1:fail_count_v(k)%checks against all fail stack entries if pick(k)==fails_v(k,i) control = 1;%pick is no good, "stays" in the loop break; end end elseif phase(k)==2 for i=1:fail_count_a if pick(k)==fails_a(k,i) control = 1; break; end end end end end end %if all targets exhausted start a new pass or exits learning if ((sum(fail_count_v<8)==0&&sum(fail_count_a<8)==0)||sum(phase)==0) %last time simulation with "best" results %reseting initial conditions and control variables pos = [0 0]; v0 = [0 0]; a0 = [0 0]; cursor = 1; tau = ttg(cursor); split_start=1; jerk_ct = 1; tau_full = tau; for i=1:N tgt = tgt_list(cursor,:);%"loads" the position of target on focus v_tgt = v_tgt_list(cursor,:);%velocity of target on focus a_tgt = a_tgt_list(cursor,:); %min-jerk controller step [pos(1),v0(1),a0(1)] = min_jerk_va(pos(1),v0(1),a0(1),tgt(1),v_tgt(1),a_tgt(1),tau,dt); [pos(2),v0(2),a0(2)] = min_jerk_va(pos(2),v0(2),a0(2),tgt(2),v_tgt(2),a_tgt(2),tau,dt); %updates simulation history pos_vec(i,:) = pos; vel_vec(i,:) = v0; acc_vec(i,:) = a0; %updates estimation on expected finishing tau = tau-dt; 79 %updates cursor and tau values to reflect a new target after finishing %segment, if segment was the last one, no changes needed. the use of %the 1e-10 in the comparison is to avoid issues with floating point %precision. if tau<dt-1e-10 && cursor<ntgt cursor = cursor+1; tau = ttg(cursor)-ttg(cursor-1); tau_full = tau; end %updates jerk split control variable if jerk_ct<cursor && (tau/tau_full)<(0.5-1e-10) && cursor<ntgt split_jerk(jerk_ct) = sum((gradient(acc_vec(split_start:i,1))./dt).^2)+sum((gradient(acc_vec(spl it_start:i,2))./dt).^2); split_start = i+1; jerk_ct = jerk_ct+1; end end %computes last split for jerk split_jerk(jerk_ct) = sum((gradient(acc_vec(split_start:i,1))./dt).^2)+sum((gradient(acc_vec(spl it_start:i,2))./dt).^2); break; end end total_rep = total_rep+rep; %plotting and data display of results from the best iteration if pass==2 figure plot(pos_vec(:,1),pos_vec(:,2),'- ',[0;tgt_list(:,1)],[0;tgt_list(:,2)],'r*',[-1,-1,1.5,1.5],[1,-1.5,1,- 1.5],'ko'); title('Trajectory');xlabel('X');ylabel('Y'); figure plot(t,[vel_vec(:,1),vel_vec(:,2)]); title('Velocity Profile');xlabel('t');ylabel('V'); figure plot(t,acc_vec); title('Acceleration Profile');xlabel('t');ylabel('A'); total_rep min_jerk v_tgt_list a_tgt_list end end
Abstract (if available)
Abstract
This thesis proposes a kinematic model designed to simulate arm movements with one or multiple via-points, in order to achieve continuous trajectories with one or more curves and straight segments (such as the ones present in handwriting). The model main unit is designed as a feedback minimum jerk control unit, an extension to the Hoff & Arbib (1993) and Flash & Hogan (1985) minimum jerk reach controllers. In addition to the feedback unit, it is suggested an automated learning procedure, a modified Hill-Climbing algorithm, to adapt the multiple via-points trajectories from highly segmented ones (composed of a sequence of straight movements) to a more continuous and smoother trajectory. The development of the model is based on revised studies that identify or try to model the many characteristics of human arm control, on both kinematic and brain area functionality level. Once the model is explained, parallels between it and the brain functioning are made.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
The representation, learning, and control of dexterous motor skills in humans and humanoid robots
PDF
Iterative path integral stochastic optimal control: theory and applications to motor control
PDF
Modeling motor memory to enhance multiple task learning
PDF
Data-driven autonomous manipulation
PDF
Computational model of stroke therapy and long term recovery
PDF
Modeling human reaching and grasping: cortex, rehabilitation and lateralization
PDF
Computational principles in human motor adaptation: sources, memories, and variability
PDF
Optimization-based whole-body control and reactive planning for a torque controlled humanoid robot
PDF
Computational models and model-based fMRI studies in motor learning
PDF
Data-driven H∞ loop-shaping controller design and stability of switched nonlinear feedback systems with average time-variation rate
PDF
Investigating the role of muscle physiology and spinal circuitry in sensorimotor control
PDF
Interaction between Artificial Intelligence Systems and Primate Brains
PDF
The task matrix: a robot-independent framework for programming humanoids
PDF
Learning controllable data generation for scalable model training
PDF
Computational modeling and utilization of attention, surprise and attention gating
PDF
Computational modeling and utilization of attention, surprise and attention gating [slides]
PDF
Discrete geometric motion control of autonomous vehicles
PDF
Mobile robot obstacle avoidance using a computational model of the locust brain
PDF
Learning lists and gestural signs: dyadic brain models of non-human primates
PDF
Risk-aware path planning for autonomous underwater vehicles
Asset Metadata
Creator
Carneiro, Oziel de Oliveira
(author)
Core Title
Minimum jerk model for control and coarticulation of arm movements with multiple via-points
School
Viterbi School of Engineering
Degree
Master of Science
Degree Program
Computer Science
Publication Date
02/09/2015
Defense Date
01/26/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
arm control,brain,coarticulation,minimum jerk,Model,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Arbib, Michael A. (
committee chair
), Schaal, Stefan (
committee member
), Schweighofer, Nicolas (
committee member
)
Creator Email
odeolive@usc.edu,ozielcarneiro@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-530120
Unique identifier
UC11298984
Identifier
etd-CarneiroOz-3167.pdf (filename),usctheses-c3-530120 (legacy record id)
Legacy Identifier
etd-CarneiroOz-3167-0.pdf
Dmrecord
530120
Document Type
Thesis
Format
application/pdf (imt)
Rights
Carneiro, Oziel de Oliveira
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
arm control
brain
coarticulation
minimum jerk