Close
The page header's logo
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Evaluating the dynamics of agent -environment interaction
(USC Thesis Other) 

Evaluating the dynamics of agent -environment interaction

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Transcript (if available)
Content INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send U M I a complete manuscript and there are missing pages, these w ill be noted. Also, if unauthorized copyright material had to be removed, a note w ill indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UM I directly to order. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, M l 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. NOTE TO USERS This reproduction is the best copy available. UMI’ Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. EVALUATING THE DYNAMICS OF AGENT-ENVIRONMENT INTERACTION by Dani Goldberg A Dissertation Presented to the FACULTY OF THE SCHOOL OF ENGINEERING UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) M ay 2001 Copyright 2001 Dani Goldberg Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3054737 ___ ® UMI UMI Microform 3054737 Copyright 2002 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES. CALIFORNIA 90007 This dissertation, written by D an i Goldberg under the direction o f his Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re­ quirements for the degree of DOCTOR OF PHILOSOPHY Dean of Graduate Studies D ate 3 .:.. .251 DISSERTATION COMMITTEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. D edication This dissertation is dedicated to the memory of my grandfather Mirza who died in May of 1998 during my second year as a Ph.D. student. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgem ents It is well-known that dissertations are seldom read, and I do not realistically expect this disser­ tation to be any different. But, in the unlikely event that someone does open these pages, I hope that he or she spends enough time on these words to realize that there are many people who have been important to this work and deserve sincere thanks. Sincerest thanks first to my advisor, Maja Mataric. for the countless things (both big and small) that have helped make this dissertation and the completion of my Ph.D. studies a reality. Maja’s energy, insight, humor, and many talents are an inspiration. Instead of attem pting to thank her for everything (assuming that is even possible), I will simply mention, as an example, this: that anytime I went to her office uncertain, depressed or indifferent about the state of my research, she always had advice that revitalized my enthusiasm. How do you thank someone for such a rare talent? Heartfelt thanks to the other four members of my dissertation defense committee: to Stefan Schaal, for his uncanny ability to concretize difficult concepts and his advice on comparing para­ metric and nonparametric AMMs: to Gaurav Sukhatme for detailed draft comments and much appreciated advice on handling the bureaucracy of graduating: to Kurt Palmer, for excellent feed­ back that has helped clarify the use of parametric and nonparametric statistics in the dissertation; and to Deborah Estrin, for keen questions and comments that have honed my presentation of the work. Many thanks also to Leslie Pack Kaelbling, whose comments on my dissertation proposal have helped me clarify* the presentation of AMMs and their relationship to other models. Thanks iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. also to Milind Tambe for great comments on several of my earliest AMM-related talks. Very spe­ cial thanks also to Sridhar Mahadevan for help in establishing the connection between AMMs and SMPs. Thanks to Barry Werger, my oldest labmate and compatriot, for friendship, late night singing sessions, sleepovers in the Lab, calzones on Fridays, movie nights, racquetball, sword-fighting stress- relief, being the only other Macintosh person in the Lab, and many other things, both personal and academic. The past five years would have been much more difficult without Barry’s support. I would also like to thank my other labmates for making the Interaction Lab such a great place to work, and just be. Thanks to Monica Nicolescu for always being willing to lend a hand, from reading a paragraph that doesn’t sound right, to setting up the video camera for my defense. Her positive outlook is contagious, but alas, if my desk is any indication, it seems that her neatness is not. Thanks to Brian Gerkey for great coffee, great humor, and guru-like advice on everything Unix. Few people are so patient and unselfish with their help. Thanks to Chad Jenkins for giving the lab a deep sense of camaraderie and family, and for the ability to be completely serious one moment and utterly hilarious the next. Thanks to Ajo Fod for great conversations, sharing the Simpsons, and especially racquetball without any permanent injuries (though I did get really good at being in the wrong place at the wrong time). Thanks also to Richard Vaughan, Paolo Pirjanian, and Andrew Howard for lab spirit and great comments on my talks. Thanks and best wishes to Francois Michaud for early days at the Interaction Lab. The are many other people whose encouragement, advice and support have helped over the past five years, including George Bekey, Maria Gini, Jordan Pollack, and Tucker Balch. Thanks also to the wonderful staff whose dedication and help has made the past five years easier and more enjoyable than they otherwise would have been. These folks include: M yma Fox, Kusum Shori, Julieta de la Paz, Amy Yung, Carol Gordon, Hilda Mauro. Bende Lagua, Laura Lopez, and the rest of the SAL staff. Special thanks to Chuck and Rosario at IS Robotics for helping to keep the R2e’s running. iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. I have saved the last of my acknowledgements for the people whom I can never hope to ade­ quately thank — my family. My deepest love and gratitude to my mother, Monir, the best mother in the world, whose unbound love is the wellspring of my energy. My deepest love and gratitude also to my father, Joel, for always being there when I need him and always on the lookout for things of interest. To my sister, Dori, my warmest love and thanks for dropping by the Lab during my early days at Brandeis, surprising me with food when I was too busy to leave, and in general, always brightening my day. My warmest love also to my brother, Sina, for companionship and shared times too many to mention. Warmest love to Galit and Roii, for friendship and laughs most precious. Many thanks to my beloved relatives in Boston and Los Angeles for all of their love, support, encouragement, and the best home cooking around. In particular I wish to thank: Mamanjoon, Grandma; my aunts, Iran, Simin, Shamsi, Ester, Dina, Dorothy; my uncles, Asher, Bahram, Saied; and all of my other uncles, aunts, and cousins. The research reported in this dissertation was conducted at the Interaction Laboratory at the Brandeis University Volen Center for Complex Systems and the Computer Science Department, and at the University of Southern California Robotics Research Labs and Computer Science Depart­ ment. The work was supported in part by the Office of Naval Research Grant N00014-95-1-0759, by the National Science Foundation Infrastructure Grant CDA-9512448, by the Office of Naval Research Grant N00014-00-1-0140, by the Sandia National Labs Grant 3065, and by a grant from the DARPA TASK program. v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C ontents D ed icatio n ii A cknow ledgem ents iii L ist O f T ables ix L ist O f F igures x A b stra c t xiii 1 In tro d u c tio n 1 1.1 Issues in Agent-Environment Interaction............................................................................ 4 1.1.1 Sensing and Hidden S t a t e ......................................................................................... 4 1.1.2 Uncertainty of A c tio n ............................................................................................... 5 1.1.3 Evaluating Interaction D ynam ics............................................................................ 7 1.1.4 Stochasticity and Non-Stationarity......................................................................... 8 1.1.5 Timescale of Performance Im provem ent................................................................ 9 1.2 Learning in Agent-Based System s.............................................................................................10 1.2.1 AMMs in P e rsp e c tiv e ................................................................................................... 10 1.2.2 Parametric versus Nonparametric A M M s .................................................................13 1.3 Behavior-Based C o n tr o l............................................................................................................ 14 1.4 C ontributions............................................................................................................................... 17 1.5 Dissertation O u tlin e ...................................................................................................................19 2 M o tiv atio n : M obile R o b o t Foraging 22 2.1 Foraging Task S tru c tu re ............................................................................................................ 23 2.2 The R o b o ts .................................................................................................................................. 24 2.3 The Behavior-Based Foraging C ontroller................................................................................25 2.3.1 Behavior D e ta ils ............................................................................................................ 27 2.4 Motivational Result: Modeling with B B C .............................................................................30 2.5 S u m m a ry ......................................................................................................................................31 3 R e la te d W ork 32 3.1 Robotics, Behavior-Based Control, and F o ra g in g ................................................................ 32 3.2 Machine Learning and Robotics................................................................................................34 3.3 Statistics: Parametric Approxim ations................................................................................... 37 3.3.1 Binomial Confidence L im its..........................................................................................37 3.3.2 Approximating the Cumulative Standard Normal Distribution ............................40 3.3.3 Approximating the Cumulative t D istribution.......................................................... 42 3.3.4 Approximating the Cumulative F D is trib u tio n .......................................................43 3.4 Statistics: Nonparametric T e sts................................................................................................ 44 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.4.1 Nonparametric Tests of L o catio n ................................................................................45 3.5 S u m m a ry ....................................................................................................................................48 4 A u gm ented M arkov M odels 49 4.1 Markov Chains, SMPs, and A M M s........................................................................................50 4.2 AMM Implementation: Overview........................................................................................... 52 4.2.1 Representation of AMMs ............................................................................................. 53 4.2.2 AMM Construction A lg o rith m ....................................................................................55 4.2.3 Examples of AMM C o n stru ctio n .................................................................................57 4.3 AMM-Based Evaluations...........................................................................................................59 4.3.1 Mean First Passage .......................................................................................................59 4.3.2 Variance of Mean First P assage....................................................................................61 4.3.3 Degrees of Freedom .......................................................................................................62 4.3.4 The f-Test and F -T e s t....................................................................................................63 4.4 AMM Use with Behavior-Based C o n tro l...............................................................................63 4.4.1 Examples of AMM Construction with B B C ..............................................................65 4.5 S u m m a ry .................................................................................................................................... 66 5 A M M s in S ta tio n ary P ro b lem D om ains 67 5.1 Introduction................................................................................................................................. 67 5.2 Individual Performance: Fault D etection............................................................................... 69 5.3 Group Affiliation: Membership through Ability and E xperience............................................................................................................................ 71 5.4 Group Performance: Dynamic Leader Selection...................................................................72 5.5 S u m m a ry .....................................................................................................................................78 6 A M M s in N o n -S tatio n ary P ro b lem D om ains: R egim e D etectio n 79 6.1 Introduction..................................................................................................................................79 6.2 AMMs for Regime Detection .................................................................................................. 82 6.2.1 N otation............................................................................................................................. 83 6.2.2 Algorithm for Regime D e te c tio n ................................................................................. 83 6.3 The Land Mine Collection T a sk ...............................................................................................84 6.3.1 Validating the A pproach................................................................................................. 86 6.3.2 Maintaining the Proportion of M ines........................................................................... 88 6.3.3 Maximizing R e w a rd ....................................................................................................... 89 6.4 S u m m a ry .....................................................................................................................................89 7 A M M s in N o n -S tatio n ary P ro b lem D om ains: R ew ard M axim ization 90 7.1 Introduction and M otivation......................................................................................................90 7.2 Dynamic Moving Average A lgorithm ......................................................................................93 7.3 Experiment 1: Validation of the Reward Maximization C rite rio n ................................... 96 7.4 Experiment 2: Abruptly Changing Environment ................................................................ 98 7.5 Experiment 3: Gradually Shifting Environm ent..................................................................100 7.6 S u m m a ry ....................................................................................................................................105 8 P a ra m e tric versus N o n p aram etric A M M s 106 8.1 Introduction.................................................................................................................................106 8.2 Parametric and Nonparametric Node-Splitting ..................................................................108 8.3 Simulation and Evaluation....................................................................................................... 109 8.4 Experimental R esults................................................................................................................. I l l vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.5 S u m m a ry ................................................................................................................................. 115 9 S u m m ary a n d F u tu re D irections 116 R eference List 117 A p p en d ix A Design and Evaluation of Robust Behavior-Based C o ntrollers..................................................128 A .l Introduction..............................................................................................................................129 A.2 The Collection T a s k ............................................................................................................... 131 A.2.1 Task S tru c tu re ............................................................................................................ 131 A.2.2 The R o b o ts .................................................................................................................. 132 A.2.3 Behavior-Based C o n tro l.......................................................................................... 133 A.3 The Homogeneous C o n tro lle r................................................................................................ 135 A.3.1 B ehaviors......................................................................................................................137 A.3.2 R o b u stn e ss...................................................................................................................139 A .l Spatio-Temporal In te rac tio n s................................................................................................ 140 A.5 The Pack C o n tro lle r................................................................................................................ 142 A.5.1 The • ‘message passing1 ’ B eh av io r.............................................................................142 A.5.2 R o b u stn e ss...................................................................................................................144 A.5.3 Interference...................................................................................................................145 A.6 The Caste C ontroller................................................................................................................ 146 A.6.1 The Search C a ste ......................................................................................................... 147 A.6.2 The Goal C a ste .............................................................................................................147 A.6.3 Robustness and Interference.......................................................................................148 A.7 A nalysis.......................................................................................................................................151 A.7.1 Interference. Avoiding, and T im e ............................................................................. 152 A.7.2 Distance Traveled .......................................................................................................154 A.7.3 R o b u stn e ss................................................................................................................... 157 A.7.4 Evaluation ................................................................................................................... 158 A.7.5 Heterogeneity and Perform ance................................................................................ 159 A.8 S u m m a ry ....................................................................................................................................161 A p p en d ix B Details of AMM Representation and C o n stru c tio n .....................................................................163 B.l Representation of AMMs ....................................................................................................... 163 B.2 AMM Construction Algorithm ..............................................................................................167 B.2.1 In itia liz a tio n ................................................................................................................ 167 B.2.2 Main L o o p ................................................................................................................... 169 B.2.3 Calculating Traversal Probabilities.......................................................................... 172 B.2.4 Calculating Node P ro b ab ilities.................................................................................173 B.2.5 Node S p littin g ............................................................................................................. 174 B.3 S u m m a ry .................................................................................................................................... 182 A p p en d ix C Tables of Critical Points for T ...........................................................................................................183 v iii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List Of Tables 3.1 Pre-calculated values, c, of the inverse cumulative normal distribution for use in the Paulson-Camp-Pratt approximation (Equation 3.2)...............................................................39 5.1 Mean time to completion for the Least Desirable, Dynamic Leader Selection, and Most Desirable experiments.........................................................................................................74 5.2 Mean positions in the hierarchy at the end of the Dynamic Leader Selection experi­ ments................................................................................................................................................75 5.3 The mean number of pucks collected in the Least Desirable, DLS, and Most Desirable experimental scenarios.................................................................................................................. 76 6.1 Pucks remaining in the environment at the end of each trial of the proportion main­ taining mine collection task......................................................................................................... 88 7.1 The average reward points the robot is expected to have accrued during the ran d o m , co n tro l and a lg o rith m scenarios (puck point values: black=l, clear=10).....................101 7.2 The average reward points the robot is expected to have accrued during four versions of the ran d o m scenario with different collection probabilities (puck point values: black=l, clear=10)...................................................................................................................... 102 7.3 The average reward points the robot is expected to have accrued during the ran d o m , co n tro l and a lg o rith m scenarios with close puck point values (black=l, clear=4). . 104 A .l Average time of task completion and average time spent in the avoiding behavior for each controller........................................................................................................................ 152 A.2 Average amount of interference and average fraction of time spent in the avoiding behavior......................................................................................................................................... 153 A.3 Average amount of interference per unit time for each controller......................................153 A.4 Average distance (in feet) traveled by the robots for each controller................................155 C .l Critical points, r a { 3 ...6 ,3 ...6 } , for the T distribution...............................................185 C.2 Critical points, Ta { 3 ... 1 0 ,7 ... 10}, for the T distribution...........................................186 C.3 Critical points, T’ „{3 ... 1 4 ,1 1 ... 14}, for the T distribution......................................... 187 C.4 Critical points. Ta {3 ... 14,15 ... 18}, for the T distribution.........................................188 C.5 Critical points, Ta { 15... 18.15... 18}, for the T distribution.......................................189 C.6 Critical points, Ta {3 ... 1 4 ,1 9 ... 22}, for the T distribution......................................... 190 C.7 Critical points, T 'a { 1 5 ...2 2 ,19...22}, for the T distribution.......................................191 C.8 Critical points, Ta { 3 ... 14 ,2 3 ... 25}, for the T distribution.........................................192 C.9 Critical points, r Q {15... 2 5 ,2 3 ... 25}, for the T distribution.......................................193 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List O f Figures 1.1 The interaction between an agent and its environment...................................................... 2 1.2 The representational expressiveness of AMMs compared to other models. The hori­ zontal axis shows increasing expressiveness in observations, actions, and state. The vertical axis shows increasing expressiveness in time. AMMs share attributes of MCs, SMPs, and HMMs...............................................................................................................12 1.3 The basic structure of a behavior-based controller. Behaviors receive input from sensors and other behaviors, process the input possibly changing internal state, and send outputs to effectors and other behaviors. (Sensors are represented by the Sony pan-tilt-zoom camera on the left, and effectors by the Sarcos Dextrous Arm® on the r ig h t .) .............................................................................................................................................15 1.4 The representational expressiveness of AMMs used in conjunction with BBC, com­ pared with other models. The horizontal axis shows increasing expressiveness in observations, actions, and state. The vertical axis shows increasing expressiveness in time. AMMs used with BBC provide a richer representation of observations and actions than AMMs alone.............................................................................................................16 1.5 Each robot constructs at least one AMM capturing its interaction dynamics with the environment while performing a task. Each state of the AMM represents the execution of a particular behavior.............................................................................................. 17 2.1 Two example region configurations for the foraging task....................................................... 23 2.2 One of the foraging task configurations used in the experiments of the dissertation. . 24 2.3 The four R2e robots used in the foraging experiments........................................................... 25 2.4 The sensor configuration of an R2e robot..................................................................................26 2.5 The homogeneous controller for the foraging task. Rounded rectangles represent the robot’s sensors, ellipses represent behaviors, and rectangles represent actuators. Sensor values are transmitted along dotted lines, actuator commands along dashed lines, and inter-behavior control signals along solid lines. The symbol 0 represents behavior selection and 0 represents Subsumption-style precedence...................................27 4.1 A example of a first-order or second-order AMM generated with 100 input symbols from the sequence {3 21 42 1 3 2 1 4 2 1 . . . } ...................................................................... 58 4.2 A example of a third-order AMM generated with 100 input symbols from the se­ quence {32 1 4 2 1 32 1 4 2 1 ...} ................................................ ' . ........................................ 58 4.3 (Left) A second-order AMM constructed from foraging behavior data; (Right) A first-order AMM, constructed with the same data..................................................................65 5.1 Sample AMM constructed from the wandering and avoiding behaviors of the foraging task................................................................................................................................................... 70 5.2 Mean time to completion for the Least Desirable, Dynamic Leader Selection, and Most Desirable experimental scenarios......................................................................................74 x Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3 Average hierarchy positions at the completion of the Least Desirable, Dynamic Leader Selection, and Most Desirable experiments................................................................. 75 5.4 Mean number of pucks collected at the completion of the Least Desirable, Dynamic Leader Selection, and Most Desirable experiments................................................................. 77 6.1 Two versions of the mine collection task environment: (Left) 11 x 14 foot Corrall with 9 clear and 18 black pucks; (Right) 11x8 foot Corrall with 18 clear pucks. . . 86 7.1 The mine collection task: setup for validation of the reward maximization criterion. 97 7.2 The mine collection task: setup for reward maximization in a non-stationary envi­ ronment 99 7.3 The simulated mine collection task: setup for reward maximization in a gradually shifting non-stationary environment.........................................................................................100 7.4 Average accrued reward for the three experimental scenarios (puck point values: black=l, clear=10)...................................................................................................................... 101 7.5 Average accrued reward for the four versions of the random scenario with different probabilities for collecting pucks (puck point values: black=l, clear=10)........................103 7.6 Average accrued reward for the three experimental scenarios with close point values for pucks (black=l, clear=4)..................................................................................................... 104 8.1 The distributions associated with the execution of four behaviors (reverse homing, exiting, homing, creeping) in the foraging controller (Section 2.3). The graphs were generated with real data captured from the physical mobile robots..................................108 8.2 The transition model of the foraging task used by the simulation for evaluating parametric and nonparametric AMMs.....................................................................................110 8.3 The performance, A*, of parametric and nonparametric AMMs with a node-splitting significance of 0.05 and nmax = 10. This graph shows the average over 100 trials of 10000 simulation steps. Asterisks indicate a significant difference at a level of 0.01. . 113 8.4 The performance. A ,, of parametric and nonparametric AMMs with a node-splitting significance of 0.01 and nm ax = 10. This graph shows the average over 100 trials of 10000 simulation steps. Asterisks indicate a significant difference at a level of 0.01. . 114 A .l Two example region configurations for the collection task.................................................. 132 A.2 Actual configuration used in the collection task.................................................................... 133 A.3 The four R2e robots used in the experiments........................................................................ 134 A.4 The sensor configuration of an R2e robot............................................................................... 134 A.5 The homogeneous controller for the collection task. Rounded rectangles represent the robot’s sensors, ellipses represent behaviors, and rectangles represent actuators. Sensor values are transmitted along dotted lines, actuator commands along dashed lines, and inter-behavior control signals along solid lines. The symbol 0 represents behavior selection and Q represents Subsumption-style precedence.................................136 A.6 This plot shows the characteristic interference pattern for the homogeneous imple­ mentation of the collection task on four physical robots. The shading, corresponding to the height of the peaks, is clearer when the data support a very fine mesh................141 A.7 The pack variation of the collection task................................................................................ 143 A.8 The pack version of the controller for the collection task.................................................... 144 A.9 This plot shows the characteristic interference pattern for the pack implementation of the collection task on four physical robots. The shading corresponds to the height of the peaks................................................................................................................................... 146 A. 10 The caste variation of the collection task................................................................................147 A .ll The controller for the Search Caste, the three-robot subgroup that searches for pucks. 148 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A.12 The controller for the Goal Caste, the one-robot subgroup that brings pucks from the Boundary/Buffer line to Home...........................................................................................149 A. 13 The sweeping behavior of the controller for the Goal Caste............................................... 149 A. 14 This plot shows the characteristic interference pattern for the caste implementation of the collection task on four physical robots. The shading corresponds to the height of the peaks...................................................................................................................................150 A. 15 A typical path taken by one physical robot in the homogeneous controller.....................155 A. 16 (Left) A typical path of a physical robot in the Search Caste of the caste controller; (Right) a typical path of the robot in the Goal Caste..........................................................157 A. 17 (Left) A typical path of the least dominant robot of the pack controller; (Right) a typical path of the most dominant robot................................................................................ 158 xii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A bstract Improving the performance of agent-based systems is a challenging problem requiring both system evaluation and appropriate modification of the agent’ s policy or controller. This disserta­ tion presents work in this problem domain, focusing on the development of an on-line, real-time method for modeling the interaction dynamics between a situated agent and its environment. The encompassing theme is to provide pragmatic, general-purpose, and theoretically-sound approaches for improving the performance of agent-based systems. In order to provide context to the approach and contributions of the dissertation, we first consider some of the many complicating factors that influence a solution to the problem of improving performance. Next, motivation for our on-line modeling approach is provided by a brief examination of off-line evaluation using interference (or collisions) between agents (robots). This work in off­ line evaluation presents the unifying experimental theme of the dissertation (mobile robot foraging) and shows how behavior-based control provides a rich substrate for the evaluation of interaction dynamics. The majority of the dissertation focuses on on-line learning of augmented Markov models (AMMs), a novel version of semi-Markov processes. The approach utilizes AMMs to capture agent-environment interaction dynamics in terms of the history of behaviors executed while per­ forming a task. These models provide the data that are used on-line and in real-time to evaluate the system and suggest task-dependent, performance-improving modifications to the agent’s be­ havior. An AMM construction algorithm is presented that allows incremental generation with little computational overhead, making it feasible for on-line, real-time applications. The algorithm is xiii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. able to represent non-first-order Markovian systems in first-order form by dynamically adjusting models through the use of higher-order statistics. This ability to represent higher-order Markovian characteristics provides the expressiveness to accommodate systems with rich interaction dynamics. The on-line, real-time modeling approach using AMMs in conjunction with behavior-based con­ trol is demonstrated as effective in both stationary and non-stationary problem domains. Several challenging robotics applications are examined in the stationary domain (fault detection, affili­ ation determination, hierarchy restructuring) and the non-stationary domain (regime detection, reward maximization). The AMM-based evaluations used in these applications include statistical hypothesis tests and expectation calculations from Markov chain theory. Experimental results are presented for each of the methods and applications discussed. Finally, some of the statistical distri­ bution issues involving AMMs and their utilization in this work are addressed through an empirical comparison with a non-parametric alternative. The methods and experimentation presented in this thesis aim to show that the evaluation of agent-environment interaction dynamics can be effective and efficient in improving the performance of agents in challenging problem domains. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 1 Introduction This chapter provides an overview of the dissertation and its contributions, placed in the context of key ideas and issues that arise when evaluating the performance of agent- based systems. It establishes the notion of agent-environment interaction, and examines evaluation difficulties as a consequence of this interaction, thereby providing perspective on the challenges involved in improving the performance of such systems. This chapter also motivates the remainder of the dissertation by introducing the idea of evaluation using augmented Markov models (AMMs) and behavior-based control (BBC). Agent-based systems are an active area of research in Artificial Intelligence. These systems generally consist of one or more entities, or agents, that sense and act within an environment that changes (at least in part) as a consequence those actions. Figure 1.1 illustrates the interaction between an agent and its environment. The agent performs actions (mediated through its effectors) that change the state of the environment; the changed environmental state (mediated through sensors) affects subsequent observations and actions by the agent. It is the details of this interaction that determines exactly how the agent performs its function or task. Brooks (1991) argues further that ^intelligence is determined by the dynamics of interaction with the world.” The concern in this 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. effectors AGENT intended action realized action •«- observation j state sensors ENVIRONMENT Figure 1.1: The interaction between an agent and its environment. dissertation is not what the richness of interaction implies about intelligence, but rather how it affects performance. It can be extremely difficult to design a complex agent-based system that initially has optimal (or even efficient) performance. In order to improve performance, two key problems must be addressed, namely: 1. how to evaluate the performance of the system, and 2. how to modify the agent’s policy or controller1 to improve that performance. Performance optimization is a ubiquitous theme in Artificial Intelligence. In this dissertation, we restate the theme as a general challenge for agent-based systems. Performance Challenge: To im prove perfo rm an ce th ro u g h a p p ro p ria te sy ste m ev alu atio n an d m odification o f th e a g e n t’s policy or con tro ller. There are numerous constraints that influence the solution space of this challenge. These include: the sensing capabilities of the agent, the actions it can perform, the function or task it lT he agent’s policy provides a mapping from the state of the system (a combination of environm ental state as perceived by the agent and the agent’ s internal state) to actions th at the agent performs. A policy ideally tells the agent w hat action to take in each situation in which it finds itself, so as to accomplish its task as well as possible. T he policy is generally encoded as a controller, analogous to the way a program codes for an algorithm . A controller often provides a level of abstraction th at simplifies and makes concise the specification of the policy. 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. must accomplish, the complexity of the environment, the presence of other agents, and the amount of time available to improve performance. The following sections explore these and other related issues, providing context to the contributions of this dissertation. In addition, this chapter introduces the major research themes of the dissertation, including the use of machine learning techniques, specifically augmented Markov models (AMMs) developed in this work. Throughout the dissertation, AMMs are used in conjunction with behavior-based control (BBC), a methodology for constructing agent controllers. The use of AMMs with BBC is further motivated with a brief examination of the author’s earlier work exploring the off-line evaluation of multi-robot systems using interference in Chapter 2. That chapter also presents the robot foraging task, the main experimental theme of the dissertation, including the basic behavior-based controller that implements the task and the robots that performed it. This chapter concludes with a summary of the contributions of the dissertation. The main contribution is the development of an effective method for on-line, real-time modeling o f the in­ teraction dynamics between an agent and its environment, using AMMs and BBC. The models developed are a foundation for solutions to the Performance Challenge above, providing the data for the evaluations which suggest application-dependent, performance-improving modifications to the agent’ s policy. Stated more concisely, the thesis of this dissertation is: Thesis'. A ugm ented M arkov m odels, in conjunction with behavior-based control, enable effective evaluation o f agent-environm ent interaction dy­ nam ics and facilitate solutions to the Performance Challenge. The approach is demonstrated in several challenging problem domains involving embodied agents (i.e., robots). The link between the contributions of this thesis and the nuances of the Performance Challenge will be made clearer in the following sections. 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.1 Issues in Agent-Environm ent Interaction When an agent exists in an environment, it is said to be situated in that environment, and con­ sequently can interact with it. A subclass of situated agents is embodied agents, in which sensing of the environment and actions in the environment are mediated through a ‘ ‘body” (Brooks 1991, M ataric 1999). The body can be virtual, as with an animated video game character, or physical, as with a ‘ ‘nuts-and-bolts” robot. The key notion with both virtual and physical embodiment, however, is that the only sensing capabilities and actions to whicli the agent has access are those afforded by the body, which therefore provides the physical interface that enables interaction with the environment. Physically embodied agents are of particular interest in this dissertation, since the experimental domain is physical mobile robots. As % v e will see below, many of the issues that complicate the Performance Challenge for agent-based systems are exacerbated by embodiment. 1.1.1 Sensing and Hidden State A paradox of sensing is that it can simultaneously provide information that is both excessive and insufficient. The true state of the environment is hidden from (or only partially observable to) the agent (Whitehead & Ballard 1991. McCallum 1996). Hidden state is an often-encountered issue in situated agent-based systems, though its manifestation is dependent on the sensing capabilities of the agent and the complexity of the task and environment. If these factors are such that the agent can perceive the exact state of the environment, then there is no hidden state. If the possibility of hidden state does exist, the configuration of the environment might make it a non-issue for a particular task. Additionally, sensing is often local, but the agent’s movement, including techniques such as active perception (Bajcsy 1988, Ballard 1991). can help compensate for the hidden state associated with the locality of sensing. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Hidden state is more likely to be an issue when the environment is very complex and the sensing capabilities of the agent are relatively impoverished by comparison, as is often true in the domain of physical mobile robots. If the agent’s sensing does not allow for full discrimination of the states of the environment, then subsets of environmental state will appear identical, i.e., there will be perceptual aliasing (Chrisman 1992). When hidden state is a factor, discrimination of environmental state must be based in part on a history of sensing. A further complication to state discrimination quite common in mobile robotics is noisy and inaccurate sensing. These sensing difficulties complicate the Performance Challenge by necessitating an evaluation that can suggest policy/controller modifications that accommodate them. Many techniques have been developed that attem pt to compensate for hidden state, partial observability, and inaccuracies in sensing. This dissertation assumes that an effective, basic controller for a task can be specified using the appropriate techniques to handle sensing issues. One methodology for constructing controllers is behavior-based control, which is used in this dissertation and discussed briefly in Section 1.3. (Appendix A provides a more in-depth examination of designing and evaluating robust behavior-based controllers for robots.) In regards to sensing difficulties, the focus of this dissertation is on problems that arise during execution and that can be evaluated through their impact on the interaction dynamics between the agent and its environment. As we shall see, monitoring (or • ‘sensing” ) of interaction dynamics also requires special consideration of hidden state. 1.1.2 Uncertainty of Action In addition to the sensing difficulties that an agent faces, there are difficulties associated with the actions that it takes (Boutilier, Dean Hanks 1999). Even assuming that the agent has in its repertoire a set of actions sufficient for accomplishing its task, it does not necessarily mean th at it will succeed in doing so. A key problem is that the outcome of actions is uncertain, especially for 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. physically embodied agents. In other words, the action an agent intends to perform can have an outcome that is highly variable. Consider, for example, a mobile robot that intends to turn 90 degrees in place. If there is no slippage and the robot’s mechanical systems are working properly, then it will likely come close to doing so using only open-loop control (i.e.. with no sensor feedback). The inherent inaccuracies and noise in the robot’s systems, however, make an exact turn nearly impossible. Furthermore, if the floor is dirty, causing the robot to slip, then the turn will be even less accurate. Even though sensing has its own associated difficulties, incorporating it into the turning process (for example with a compass) can help to achieve better control by providing feedback, as we will see in Section 1.3 when we look at behavior-based control. Fortunately, reducing the uncertainty associated with the outcome of actions to an acceptable level can be quite manageable in practice, as will be demonstrated by the foraging task in Chapter 2. In addition to the uncertainty associated with the outcome of actions, there exists the broader uncertainty of what action is appropriate in a particular situation, relating directly to the Perfor­ mance Challenge. As discussed previously, the agent's policy specifies what action to take in each perceived state of the system. The policy, however, may not be optimal, or even efficient, and thus modification of the policy could improve performance. One approach to policy modification involves the human designer making direct changes to the controller (explored in Appendix A). Alternatively, the agent can learn to improve its performance through its experience in executing a task (Section 1.2). This dissertation is concerned with the uncertainty of action that arises in the natural variability of controller execution for a specific task in a specific environment. We assume that a basic controller can be designed to accommodate much of the uncertainty associated with the outcome of actions. During execution, however, there are likely to be situations (such as hardware failures) that introduce greater uncertainty. In addition, controllers often have decision points where a choice must be made among alternative actions of uncertain appropriateness. The idea in this work is 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that an agent can learn a model of its interaction with the environment that captures its specific experience during the current execution. This model can then be used to evaluate the system, providing data to reduce uncertainty and choose an action that helps improve performance. 1.1.3 Evaluating Interaction Dynamics There often exists high variability in the execution of a controller for a particular task by an agent in an environment. In other words, the specific ordering of actions taken by the agent can vary’ greatly depending on the sensing and action issues discussed above, the exact configuration of the environment, and how the environment is changing. High execution variability increases the difficulty of characterizing the normative behavior of a system. (This helps explain the need for many trials in some of the experiments presented later in the dissertation.) In contrast, execution variability also provides an opportunity to improve performance by taking advantage of the specifics of the current execution. A notion key to this dissertation is that the execution variability (influenced by sensing and action issues) is captured in the agent-environment interaction dynamics. Modeling these inter­ action dynamics provides an approach to evaluating the current execution characteristics of the system. This evaluation can then be used in a task-dependent manner to suggest modifications to the agent’s policy that are appropriate to the current execution. As an example, consider a robot that has some debris covering one of its sensors. The robot’s performance may improve, degrade, or remain unchanged, depending on the sensor affected, the task, and the configuration of the environment. The exact affect on performance is likely only to become apparent during execution as a result of the agent-environment interaction dynamics. Evaluating the dynamics can thus provide a way to assess the system and suggest policy modifications as an approach to the Performance Challenge. In this dissertation, the interaction dynamics are captured and evaluated on-line and in real-time using augmented Markov models (AMMs), described in Section 1.2.1. 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.1.4 Stochasticity and Non-Stationarity Agent-based systems, especially those that are physically embodied, are often replete with noise and uncertainty in sensing and action. As we have discussed, this leads to (potentially high) variability in execution as a result of the agent-environment interaction dynamics. The evolution of these systems is therefore not appropriately described as a deterministic process, but rather as a stochastic (or probabilistic) one. Consequently, probabilistic models, such as the AMMs used in this dissertation, are appropriate for capturing the interaction dynamics of these systems. An issue orthogonal to stochasticity is (non-)stationarity. In a stationary, stochastic system, the probabilistic characteristics do not change over time. To illustrate this point, let us consider a robot charged with finding a hot cup of coffee. If the configuration of the environment and the robot’s controller are stationary (i.e., do not change) then the amount of time it takes the robot to find a cup will follow a particular characteristic probability distribution. The actual time will vary over executions, but the nature of the variability will follow a set pattern. Now. let us consider what happens when the environment is non-stationarv. for example, the number of hot cups of coffee decreases over time as they cool or are drunk. There will still be variability in the amount of time it takes to find a cup, but the average time will increase as the number of cups decreases, i.e., the nature of the stochasticity will change. There are several factors that impact the stochasticity and stationarity of a system. One of these is the structure of the environment. As with the coffee cups, the exact configuration of the environment impacts the stochasticity in the interaction dynamics, and leads to non-stationarity if the configuration is changing. Another factor is learning. As an agent learns, improving its performance by modifying its policy, the nature of its interaction with the environment changes, resulting in non-stationarity. The presence of other agents is also a factor impacting the stochas­ ticity of the system. If these agents are learning or reconfiguring the environment, there will be non-stationarity. In embodied systems, where noise and uncertainty tend to be high, the impact of these factors is exacerbated. 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In this dissertation, AMMs are the stochastic model used to capture agent-environment inter­ action dynamics for solutions to the Performance Challenge. We use AMMs in several applications in both stationary and non-stationary problem domains. 1.1.5 Timescale of Performance Improvement When deciding upon a technique appropriate to meeting the Performance Challenge in a particular agent-based system, a key consideration is the time constraint. Perhaps there is no restriction on the amount of time it takes to improve performance, in which case, off-line controller modification can be used (as in Appendix A), or a learning technique that requires experience gained over extended or repetitive execution cycles. Alternatively, the need may be for performance improvement occurring on-line (i.e., during the course of a single execution) and in real-time (i.e., with little computational overhead). The approach explored in this dissertation focuses on on-line, real-time performance improve­ ment. Augmented Markov models are used to learn the agent-environment interaction dynamics during a single execution cycle and provide data used to improve performance during that cycle. Because the time available for improvement is limited, the complexity of the policy modification that can be achieved is also limited. It is not possible, for example, to learn the complete controller for a complex task in a noisy and dynamic environment without extensive time and experience. This work therefore assumes that a basic controller for a task already exists, but that contin­ gencies may arise that require an evaluation of the interaction dynamics, while also providing an opportunity for performance improvement. In this section, we have considered a number of issues that influence agent-environment in­ teraction dynamics and a solution to the Performance Challenge. The next section explores how an agent can improve its performance through learning, and briefly introduces one of the main contributions of this dissertation — augmented Markov models (AMMs). 9 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.2 Learning in A gent-Based System s Learning allows an agent to reach a solution to the Performance Challenge without it being provided explicitly by an external expert, such as a human designer. One benefit of learning is the ability to accommodate contingencies in the agent's experience that are not known prior to execution. This relates to the specifics of the agent-environment interaction dynamics arising during execution. One caveat of learning is that careful attention is required to make certain that an appropriate technique is applied to the desired problem. The learning technique should have the correct expressiveness for the task and specific issues surrounding the agent-environment interaction dynamics. In addition, the learning problem should be tractable, and as we will see (in Chapter 3 when we examine some related work in machine learning) there are many techniques that attempt to make complex learning problems more tractable. In this dissertation, the focus is on learning augmented Markov models (AMMs). The following section provides a brief introduction to AMMs, addressing some of the issues above by showing how AMMs are appropriate to the task of on-line, real-time modeling of agent-environment interaction dynamics. It also places AMMs in the context of other related techniques. 1.2.1 AM Ms in Perspective Augmented Markov models are stochastic models closely related to both Markov chains (MCs) and semi-Markov processes (SMPs), providing a compromise between the two. In Markov chains, the amount of time spent in a particular state follows a geometric distribution, which can be quite limiting since data are often not geometrically distributed. SMPs are a generalization of Markov chains allowing for state durations that follow arbitrary distributions (Ross 1992). In fact, a different distribution can be used for each outgoing transition from a state. AMMs provide some of the generality of SMPs by allowing arbitrary distributions for state durations, but unlike SMPs, AMMs only allow a single distribution for each state. This restriction facilitates the evaluation 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of AMMs, enabling the use of standard expectation calculations from Markov chain theory. This type of evaluation is crucial in this dissertation, given the aim of using AMMs for modeling and evaluating agent-environment interaction dynamics. In addition, the capacity to represent non- geometric distributions is justified in that the data modeled in the later experimental chapters are generally not geometrically distributed. Unlike a straightforward SMP representation, the AMM representation of this dissertation in­ corporates additional statistics in links and nodes which are used during construction and available for evaluation. These statistics allow the AMM construction algorithm, presented later, to dy­ namically restructure a model to represent, in first-order form, a second-order, or higher-order, Markovian system by maintaining the appropriate order statistics. These statistics are used in conjunction with node-splitting to ‘ ‘unfurl" the higher-order transitions into first-order transitions. Maintaining a first-order representation greatly simplifies many expectation calculations, again al­ lowing standard Markov chain methods to be employed. Node-splitting captures the hidden state associated with the non-first-order nature of a system. In this way, AMMs are related to hidden Markov models (HMMs), though the hidden state captured by HMMs is different in that it is not associated with higher-order transitions (Rabiner 1989). For the purposes of the applications presented in this dissertation, the higher-order representation of AMMs allows capturing interac­ tion dynamics that are non-first order, providing a more accurate evaluation of the system, and consequently, more improvement in performance. A comparison of AMMs and other related stochastic models in terms of representational ex­ pressiveness is presented in Figure 1.2. Markov chains are the least expressive of the models, essentially just a stochastic transition matrix (Roberts 1976). As discussed, SMPs are a general­ ization of Markov chains allowing for a richer representation of time (durations spent in states). Along the horizontal axis, HMMs capture some hidden state, making them more expressive than Markov chains. Unlike HMMs, Markov decision processes (MDPs) allow the explicit representation of actions and their associated rewards, but without hidden state. In addition, each action also has 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 Representational Expressiveness SMP SMDP < D B \ AMM / s MC HMM MDP POMDP Observations, Actions, State Figure 1.2: The representational expressiveness of AMMs compared to other models. The hori­ zontal axis shows increasing expressiveness in observations, actions, and state. The vertical axis shows increasing expressiveness in time. AMMs share attributes of MCs, SMPs, and HMMs. an associated probability distribution over possible outcomes. Partially-observable Markov deci­ sion processes (POMDPs) are even more expressive, sharing all of the features of MDPs, but also explicitly incorporating observations and allowing for the possibility of hidden state (Kaelbling, Littman & : Moore 1996). Just as SMPs are a more expressive version of Markov chains in time, so are semi-Markov decision processes (SMDPs) to MDPs (Bradtke & Duff 1995, Sutton, Precup & Singh 1999). Figure 1.2 graphically shows that AMMs provide a compromise between SMPs and Markov chains, while also sharing a relationship with HMMs in the ability to capture hidden state. The question naturally arises as to whether AMMs are appropriate to the problem being ex­ plored in this dissertation, namely, on-line, real-time modeling of agent-environment interaction dynamics. We will see in Chapter 4 that the AMM construction algorithm developed in this work allows incremental model construction (enabling on-line application) and, in practice, has low computational overhead and gives real-time response (i.e., with no lag in model construction). The question also arises as to whether AMMs, having no explicit representation of observations 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (sensing) and actions, are sufficiently expressive to represent the interaction dynamics. The ability to capture higher-order dynamics provides part of the expressiveness required to model the full richness of interaction. The use of behavior-based control (BBC), encompassing both sensing and action, provides the remaining representational expressiveness. Section 1.3 describes BBC and Sec­ tion 2.4 motivates the use of BBC with AMMs. First, however, we touch on the issue of parametric and nonparametric representations for AMMs. 1.2.2 Param etric versus Nonparametric AM M s An important factor influencing the appropriateness of a modeling technique to a particular ap­ plication is the assumptions that the model makes about the structure of the system. One such assumption, mentioned earlier, relates to Markovian order. Standard SMP, MDP, and POMDP implementations assume a first-order Markovian system, whereas AMM construction allows for higher-order systems. A second assumption relates to the probability distributions of the data. Markov chains, HMMs, MDPs, and POMDPs assume geometric distributions for the time spent in each state. By contrast, AMMs, SMPs, and SMDPs allow the possibility of arbitrary distributions. As we will see, there exist tradeoffs between different distribution assumptions. One dichotomy among distribution assumptions is the parametric/nonparametric distinction. Parametric distributions (e.g.. normal/Gaussian, binomial. F. t) allow data sets to be summarized in terms of a few parameters that define the exact shape of the distribution. Some parametric distributions, such as the normal, even allow the incremental update of their parameters as new data are added. When the raw data themselves are used to represent the distribution, there is no parameterization involved, and so the distribution is nonparametric. The advantages of parametric distributions (especially the normal) are that they allow parsimo­ nious representation of the data, and provide the most powerful statistical hypothesis tests when the data conform to the distribution. Unfortunately, the data often do not conform, resulting in con­ clusions that are potentially very inaccurate. One solution to this problem is to use robust versions 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of parametric statistical tests that can accommodate some degree of non-conformity (Wilcox 1997). A second approach is to use nonparametric statistical tests that make fewer assumptions about the structure of the data (Siegel & c Castellan 1988, Hettmansperger & McKean 1998). Disadvantages of nonparametric statistics include the need to retain all of the data, and the fact that they are less powerful than their parametric counterparts when the parametric assumptions are not violated. An advantage of nonparametric tests, however, is that they are usually easy to understand and implement. This dissertation considers both parametric and nonparametric approaches to representing state duration probability distributions in AMMs. In order to provide parsimony, the majority of the work presented uses parametric AMMs assuming Gaussian state durations. The use of Gaussian distributions also facilitates some of the hypothesis tests used in conjunction with Markov chain expectation calculations in the evaluation of interaction dynamics. In Chapter 8, we will revisit nonparametric AMMs in-depth and compare their effectiveness to that of parametric AMMs. The next section introduces behavior-based control and its use as the representational substrate for AMMs. providing both expressiveness and a reduced state space for learning. 1.3 B ehavior-B ased Control Behavior-based control (BBC), a paradigm for constructing controllers for situated agents (Brooks 1991, Mataric 1992), is used extensively in this dissertation. In BBC, a controller is organized as a collection of processing modules, called behaviors, that receive input from sensors and/or other behaviors, process the input (possibly modifying internal state), and send output to effectors and/or other behaviors (Figure 1.3). Each behavior generally serves some coherent, independent goal-achieving or goal-maintaining function, such as avoiding obstacles or homing to a destination. All behaviors in a controller are executed asynchronously and in parallel, simultaneously receiving input and producing output. An action selection mechanism prevents conflicts when signals are 14 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. sensors BEHAVIOR BEHAVIOR behaviors processing state effectors behaviors BEHAVIOR Figure 1.3: The basic structure of a behavior-based controller. Behavior receive input from sen­ sors and other behaviors, process the input possibly changing internal state, and send outputs to effectors and other behaviors. (Sensors are represented by the Sony pan-tilt-zoom camera on the left, and effectors by the Sarcos Dextrous Arm™ on the right.) simultaneously sent to the same actuators or behaviors (Pirjanian 1998). Behavior-based control has proven to be an effective paradigm for developing single-robot and multi-robot controllers (Mataric 1997a, Arkin 1998). Appendix A demonstrates, in detail, the suitability of the behavior- based paradigm for designing robust and modifiable multi-robot controllers. BBC has several characteristics that make it particularly appropriate to the work in this disser­ tation, and which justify its use as a substrate for learning with AMMs. First of all, BBC facilitates the accommodation of noisy and inaccurate sensing and action by promoting tight feedback. Noise and inaccuracy tend to be ’ ‘average out” as the controller frequently senses the world to update actions, in essence adhering to the notion that “the world is its own best model” (Brooks 1991). Because a basic behavior-based controller for a task is able to accommodate much of the low-level noise and inaccuracies, it allows us to model interaction dynamics with a focus on higher-level issues (e.g., the non-stationarity of the environment) that affect the agent’ s performance. Behavior-based control also provides the representational expressiveness that is crucial to our use of AMMs for modeling agent-environment interaction dynamics. Since augmented Markov models do not explicitly represent observations (sensing) and actions, the use of AMMs with an appropriate representational substrate is necessary to capture the richness of interaction. BBC, encompassing both sensing and action, provides this substrate. As depicted in Figure 1.4, the 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. combination of AMMs with BBC provides more representational expressiveness of sensing and actions than do AMMs alone. Representational Expressiveness SMP SMDP < D £ \ t / s AMM- AMMs & BBC MC HMM MDP POMDP Observations, Actions, State Figure 1.4: The representational expressiveness of AMMs used in conjunction with BBC, compared with other models. The horizontal axis shows increasing expressiveness in observations, actions, and state. The vertical axis shows increasing expressiveness in time. AMMs used with BBC provide a richer representation of observations and actions than AMMs alone. AMMs with BBC are the synergistic combination integral to this dissertation. AMMs provide the ability for on-line, real-time model construction in higher-order Markovian systems, while BBC provides the representational richness for capturing interaction dynamics. In addition, because BBC abstracts low-level sensing and action into behaviors with coherent functions, it both provides a parsimonious space for AMM construction and facilitates interpretation of the models. This, in turn, facilitates the evaluation of agent-environment interaction dynamics. Throughout this dissertation, we will use the states of an AMM to represent the execution of individual behaviors of a controller (Figure 1.5). One caveat in using BBC is that, because the controllers can carry state with extended history that impacts behavior execution, the interaction dynamics captured in terms of behaviors may not be (first-order)-Markovian (Whitehead & : Lin 1995). The ability of our 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. AMM construction algorithm to represent higher-order Markovian systems, however, compensates for this. Figure 1.5: Each robot constructs an AMM capturing its interaction dynamics with the environ­ ment while performing a task. Each state of the AMM represents the execution of a particular behavior. Chapter 2 presents the experimental theme of the dissertation — mobile robot foraging. It describes the robots that performed the foraging task and the behavior-based controller that im­ plements it. It also presents the motivational result for using AMMs with BBC in Section 2.4. First, however, we review the contributions of the dissertation and outline the remaining chapters. 1.4 C ontributions The contributions of this dissertation are of two types. There are the main contributions in direct support of the Thesis that augmented Markov models (AMMs) and behavior-based control (BBC) enable effective evaluation of agent-environment interaction dynamics and facilitate performance improvement in both stationary and non-stationary problem domains. There are also the ancillary contributions that do not directly support the Thesis, but help in the development of the main contributions. 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The main contributions of the dissertation are as follows: • The development of augmented Markov models and their relationship to Markov chains and semi-Markov processes. This contribution includes the representation of AMMs, and the model construction algorithm that uses it to enable on-line, incremental generation with dynamic node-splitting for non-first-order Markovian systems. Included in this contribution are also the statistical techniques used to evaluate AMMs. • The use of AMMs with behavior-based control to capture and evaluate, on-line and in real­ time, the interaction dynamics between an agent and its environment. • The application of this approach to challenging problems involving performance improvement in both stationary and non-stationary mobile robot domains. • The implementation and experimental evaluation of applications in the stationary problem domain (fault detection, affiliation determination, dynamic leader selection) and the non- stationary domain (regime detection, reward maximization). • The implementation of a non-parametric version of AMMs and a comparison with the stan­ dard parametric version. This contribution includes an extensive empirical study of the statistical distribution issues surrounding the use of AMMs in this dissertation. The ancillary contributions in support of the main ones are: • The development of multiple behavior-based controllers for the mobile robot foraging task (the experimental theme), including both individual and group controllers. These controllers are used predominantly with physical mobile robots, but simulated versions are also developed for some of the experimental studies. • A review of related work. A novelty of this review is a fairly extensive study of computation­ ally efficient approximations from the statistics literature, used in AMM construction and 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. evaluation. These approximations are widely applicable in Computer Science, and in many cases may be more appropriate than the common techniques. This review aims at increasing awareness of and access to these approximations. 1.5 D issertation Outline The remainder of the dissertation is organized into eight chapters and three supporting appendices, as follows. Chapter 2: M otivation: M obile Robot Foraging presents the unifying experimental theme of mobile robot foraging, including a description of the robots that perform the task and the behavior-based controller that implements it. Also presented is the experimental result (from the author’s earlier work) that motivated the central approach of this dissertation, namely, modeling interaction dynamics by monitoring behavior execution. Chapter 3: R elated Work presents a review of related work, focusing on the fields of Robotics, Machine Learning, and Statis­ tics. Chapter 4 : A ugm ented Markov M odels develops the relationship between Markov chains, semi-Markov processes and augmented Markov models. An overview of the AMM representation and the model construction algorithm is provided, with full details in Appendix B. This chapter also presents the techniques (including Markov chain expectation calculations) used in the evaluation of AMMs, and the details of AMM utilization with behavior-based control. 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 5: A M M s in Stationary Problem Dom ains explores mobile robotics applications in the stationary problem domain. Specifically, three applica­ tions relevant to group-level coordination are considered: fault detection, affiliation determination, and dynamic leader selection. C hapter 6 : A M M s in N on-Stationary Problem Dom ains: R egim e D etection examines the use of AMMs in the non-stationary mobile robot problem domain. The focus in this chapter is on detecting significant shifts in the structure of a robot's interaction with the environ­ ment that are indicative of environmental regimes, given limited a priori knowledge. C hapter 7: AM M s in N on-Stationary Problem Dom ains: Reward M axim ization explores a second application in the non-stationary domain. The consideration here is on how a robot can maximize its reward on a task, given it has little a priori knowledge of an environment that is changing. C hapter 8: Param etric versus Nonparam etric AM M s examines some of the statistical issues associated with the use of AMMs. In the preceding chapters, a parametric version of AMMs is used that assumes normal distributions for state durations. This chapter tests the validity of the assumption through an empirical comparison of parametric AMMs and a nonparametric version that does not assume normal distributions. Chapter 9: Sum m ary and Future Directions recapitulates the work in this dissertation and suggests possible extensions. A ppendix A : D esign and Evaluation o f Robust Behavior-B ased Controllers provides extensive supplementary work on the design and evaluation of foraging controllers, in 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. particular, for groups of robots. This appendix provides a more complete understanding of how to construct and quantitatively analyze a behavior-based controller than do the preceding chapters. This chapter thus complements the earlier ones which assume the existence of a basic controller that can be adjusted on-line to improve performance. A ppendix B : D etails o f A M M R epresentation and C onstruction gives the complete, unadulterated details of both the parametric and nonparametric AMM repre­ sentations, and their respective model construction algorithms. A ppendix C: Tables o f C ritical Points for T provides tables of critical points for the nonparametric test of location described in Section 3.4.1, and used in Chapter 8 and in the construction of nonparametric AMMs in Appendix B. 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 M otivation: M obile R obot Foraging This chapter motivates the remainder of the dissertation in two ways. First, it presents the unifying experimental theme — mobile robot foraging — including the robots that perform the task, and the basic behavior-based controller that implements it. Second, it briefly examines some of the author’s earlier work that inspired the main approach in this dissertation — the evaluation of interaction dynamics by modeling behavior execution. The behavior-based controller presented in this chapter implements a version of a (multi-) robot foraging (collection) task, a prototype for various applications including distributed solutions to de-mining, toxic waste clean-up, and terrain mapping. We present the general structure of the task, the physical robot test-bed used throughout the dissertation, and the details of the behavior-based controller. 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1 Foraging Task Structure We define the foraging task as a two-step repetitive process in which: 1. n robots, where n > 1, search designated regions of space for certain objects, and 2. these objects, once found, are brought to a goal region using some form of navigation. A region in the task is any contiguous, bounded space (in the case of mobile robots, a planar surface) which the robots are capable of moving across. There are three mutually-exclusive, non-overlapping types of regions: • search regions, 5, containing a number, p, of objects, a fraction of which must be delivered to a goal region: • goal regions, G, where objects are delivered; • and, optionally, empty regions, E. that contain no objects and are not goal regions. The only restrictions placed on the configuration of regions for the foraging task are: that there be at least one search and one goal region, and that the union of all the regions be contiguous. Figure 2.1 gives two examples of possible valid region configurations for the foraging task. Figure 2.1: Two example region configurations for the foraging task. The specific configuration used often in this dissertation is shown in Figure 2.2. Experiments are performed in an 11 x 14 foot (or occasionally smaller) rectangular enclosure (the “CorralT). 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The search region, S, is approximately 126 square feet and has up to p = 36 small cylinders (pucks) evenly distributed throughout. The goal region G, also called Home, is a ninety degree sector of a circle with a radius of 2 feet, located in one corner of the Corrall. Finally, there is a 25 square foot empty region, E, separating the search and goal regions. E is composed of the Boundary' and Buffer zones, whose functions will be described in the next section, n < 4 robots are used in the experiments. 11 feet Home Buffer Boundary feet Figure 2.2: One of the foraging task configurations used in the experiments of the dissertation. 2.2 T he R obots Up to four IS Robotics R2e robots are used in the experiments (Figure 2.3). Each is a differentially- steered base equipped with two drive motors and a two-fingered gripper. The sensing capabilities of each robot include piezo-electric contact (bump) sensors around the base and in the gripper, five infrared (IR) sensors around the chassis and one on each finger for proximity detection, a color sensor in the gripper, a radio transmitter/receiver for communication and data gathering, and an ultrasound/radio triangulation system for positioning (Figure 2.1). The robots are programmed in the Behavior Language (Brooks 1990), a parallel, asynchronous, behavior-based p ro g ra m m in g 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. language inspired by the Subsumption Architecture (Brooks 1986). The main computational power on each robot is a single Motorola 68332 16-bit microcontroller running at 16 MHz. Even though computationally impoverished by today’s standards, the processing capabilities have proven to be adequate for most tasks we have envisioned, helping to show that robust, effective control need not be computationally expensive. Perhaps the greatest drawback of the 68332 is its lack of floating point computation, which, for example, influences the calculation of heading, described in the following section. Figure 2.3: The four R2e robots used in the foraging experiments. The next section presents the basic, homogeneous behavior-based controller for the foraging task, used extensively throughout the dissertation. 2.3 T he Behavior-Based Foraging Controller The controller presented in this section performs a homogeneous version of the foraging task in which, if there are multiple robots, they all have identical behavioral repetoirs. and act concurrently and independently. 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Positioning Radio IRs Bumi Color Breakbeam Bump Figure 2.4: The sensor configuration of an R2e robot. The overall structure of the controller is presented in Figure 2.5. In the figure, the rounded rectangles represent the robot's sensors, with sensor values being transmitted to behaviors along the dotted lines. The behaviors themselves are drawn as ellipses with text in one of three font styles: italics for behaviors that only receive sensor inputs: bold for behaviors that send actuator outputs: and bold-italics for behaviors that do both. The dashed lines represent commands sent by behaviors to the actuators (rectangles), and the solid lines represent control signals sent between behaviors. These control signals include: inhibition signals that temporarily disable behaviors, or do so permanently until the inhibition is lifted: information about the state of the behaviors: and signals indicating that a behavior should perform a certain action. These control signals establish the hierarchy of actuator commands shown at the right of the diagram. The 0 represents behavior selection and indicates that only one of relevant actuator command pathways is active at any time. The 0 represents a Subsumption-style priority scheme with the actuator command coming from above taking precedence (Brooks 1986). The hierarchy of command pathways in the diagram illustrates that behavior arbitration is the action selection mechanism for the controller (Pirjanian 1998). The next section presents, in detail, the function of the each behavior in the controller, and the structure of the inter-behavior command pathways. 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sensors Behaviors Actuators creeping exiting Gnppcr M otors ( Brcakbcam IRs '}■ wandering Pnti tinning System homing Figure 2.5: The homogeneous controller for the foraging task. Rounded rectangles represent the robot’s sensors, ellipses represent behaviors, and rectangles represent actuators. Sensor values are transmitted along dotted lines, actuator commands along dashed lines, and inter-behavior control signals along solid lines. The symbol (0 represents behavior selection and Q represents Subsumption-style precedence. 2.3.1 Behavior Details In order to provide a clear picture of the interaction between behaviors, we describe the individual behaviors of the controller in an order that mirrors the progression of the task as the robot performs it. The following twelve behaviors constitute the foraging task: 1. avoiding-. This behavior avoids any object (including other robots) detected by the IR sensors and deemed to be in the path of, or about to collide with, the robot. If the robot has already collided with an object, as detected by the contact sensors, it steers away from it. This behavior is critical to the safety of the robot and therefore takes precedence over most of the behaviors that control the drive motors (puck detecting, wandering, homing, reverse homing). 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2. wandering: The robot moves forward and, at random intervals, turns left or right through some random arc. Using this behavior, the robot searches the region for pucks. 3. puck detecting: If an object is detected by the front IR sensors while wandering, this behavior, by lifting the gripper, determines whether the object is short enough to be a puck, or whether it is an obstacle that must be avoided. If it is a puck, the robot carefully approaches the object and attempts to place it between its fingers. Otherwise, the robot performs avoiding. 4. puck grabber: When a puck enters the fingers and is detected by the breakbeam IR sensors, this behavior grasps it and raises the fingers. Raising the fingers above puck height prevents the robot from unnecessarily avoiding pucks while homing, and allows the robot to collect up to about four additional pucks with its base. 5. homing: If carrying a puck, the robot moves towards the designated goal location, Home. While homing, avoiding can take precedence in order to avoid obstacles. 6. boundary: This behavior monitors how the robot enters the Boundary region. If the robot enters this region without a puck, it returns it to the search region using reverse homing. If carrying a puck, the robot is allowed to enter this region and proceed towards Home (see Figure 2.2). This behavior prevents the robot from collecting pucks that have already been delivered. 7. buffer: This behavior monitors entry into the Buffer region. Entering this region triggers the activation of the creeping behavior. 8. creeping: A refined combination of the homing and avoiding behaviors designed to carefully bring the robot to the very corner of the Corrail where Home is located and where the pucks must be delivered. Under creeping, the robot moves more slowly and uses its IR sensors at a closer range appropriate for working within the comer. The standard versions of homing and avoiding would conflict in a confined comer situation, since avoiding would perceive the goal 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. corner as an obstacle and attem pt to move the robot away from it. Creeping takes precedence over avoiding since it already incorporates a version of this behavior. 9. home detector-. A monitoring behavior for entry into the Home region. Upon entering this region, home detector sends a signal to p-uck grabber to release the puck. 10. exiting: Entering the Home region triggers this behavior which moves the robot several inches backwards, then performs a 180-degree turn in place. This behavior also sends the signal that lowers the gripper. When exiting terminates, the robot remains within the Boundary region without a puck. This in turn triggers the boundary behavior to begin reverse homing. 11. reverse homing: Starting from within the Boundary region, this behavior performs the op­ posite of homing, moving the robot out into the search region. This behavior is essentially identical to homing except that the goal location is set to the corner of the Corrall opposite Home. Once the Boundary region has been left, reverse homing becomes inactive and the robot once again begins searching for pucks using wandering. 12. heading: This behavior processes the positioning system data and provides approximate heading values for the homing and reverse homing behaviors. The positioning system supplies the robot’s current (x,y) position at approximately 1-2 Hz. Consecutive position values, (xo-Uo) and {xi,yi), are used in an approximate integer-based calculation of arctan(^ |~ ^ ) adjusted for the quadrant of the angle to provide one of sixteen possible sector headings. The accuracy of this heading calculation is usually within one sector of the true heading, but may be far worse when the robot turns in place. Frequent updates of the heading, with little reliance by the other behaviors on any one heading value, help to compensate for the inaccuracies. (An alternative is to use a physical compass for heading data. In our lab, however, the high variance in magnetic fields makes this inviable.) 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Now that we have defined the foraging task and the behavior-based controller that implements it, we have the basis for a brief examination of early results that inspired and motivated the notion of modeling interaction dynamics by monitoring the execution of a behavior-based controller. 2.4 M otivational Result: M odeling w ith BBC In contrast to this dissertation which focuses on the on-line evaluation of interaction dynamics, the author’s earlier work (Goldberg & Mataric 1997) was concerned with the off-line evaluation of the interaction dynamics in a group of mobile robots performing variations of the forging task. In par­ ticular, this work showed how interference (or collisions between robots) arising during execution could be used in evaluating multi-robot controllers. This evaluation enabled the use of behav­ ior arbitration schemes (i.e., controller modifications) to adjust the characteristics of interference and the performance of the controllers. As an experimental example of the approach, the work demonstrated three different implementations of the foraging task using the four R2e robots, and presented analyses of data gathered from trials of all three implementations. Many of the details of this work are presented in Appendix A. One of the key experimental results in this early work, and the initial motivation for this dissertation, was the strong correlation (p = 0.995) observed between interference and the activation of the avoiding behavior (Table A.2). This result seems fairly intuitive, since the avoiding behavior is expected to be active during collision events. At the time, however, the strength of the result was a revelation. It begged the question: if a simple measure of behavior activation could capture a key aspect of the interaction dynamics (i.e., interference), would not a more sophisticated model of behavior execution enable a richer evaluation of the interaction dynamics? Further, could not this evaluation be used on-line to enable controller modification and performance improvement during a single execution? The answer to these questions, as is shown in the remainder of the dissertation, is “yes.” 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.5 Sum m ary This chapter has presented the mobile robot foraging task that is the main experimental theme of the dissertation, and to which we will often refer in the following chapters. Also presented was the early motivation for this dissertation and the idea of modeling interaction dynamics using the behavior execution of a behavior-based controller. 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 3 R elated W ork This chapter reviews related work, primarily in Robotics. Machine Learning, and Statis­ tics. One of the contributions of this chapter is a compilation of references on compu­ tationally efficient approximations to common statistical quantities, and some not-so- common, though excellent, nonparametric tests. These approximations and tests will be used for AMM construction and evaluation in later chapters. 3.1 R obotics, Behavior-Based Control, and Foraging We begin by considering work related to Robotics and our use of the foraging task. Arkin, Balch & Nitz (1993) demonstrate simulation work studying the issues of density and critical mass in a hoarding task using fully homogeneous robots. Arkin & : Hobbs (1993) describe the general schema- based control architecture (which bears some fundamental similarities to the behavior-based control used in this dissertation) and give the critical mass experiments. Finally, Arkin & Ali (1994) present a series of simulation results on related spatial tasks such as foraging, grazing, and herding. Similar to our behavior-based foraging or mine-collection task, Mataric (1994) describes a behavior-based approach for minimizing complexity in controlling a collection of robots performing various behaviors including following, aggregation, dispersion, homing, flocking, and foraging. The 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. work also includes a simulated dominance hierarchy based on IDs, and used to evaluate perfor­ mance of homogeneous versus ordered aggregation and dispersion behaviors. There is also a very well explored biological aspect to foraging which has provided some of the inspiration for our ex­ perimental foraging task. Gordon (1996) discusses some of the factors th at affect task allocation, including foraging, in social insect colonies. Fontan & Mataric (1998) also worked on multi-robot foraging, but focused on issues of critical mass in task division. Goldsmith, Feddema Robinett (1998) also present a strategy for multi­ robot search, but with each robot able to dynamically switch its team and function. Parker (1992) and Parker (1994) describe multi-robot experiments also on foraging R2e robots with a priori hard-wired heterogeneous capabilities using the Alliance architecture. Parker (1994) describes a temporal division that sends one robot to survey and measure the environment for toxic spills, then has the rest of the group use its information to clean up the spill. Balch (1997) presents Social Entropy, inspired by and based on Information Entropy (Shannon Weaver 1963), as a metric for describing the heterogeneity of multi-robot systems. As presented, the metric is used off-line to evaluate the learned structure of a robot team. A potential extension of Information Entropy, however, might be to gauge the changing structure of a group on-line. Cao, Fukunaga & Kahng (1997) offer a view of the multi-agent and multi-robot research applied to 10 ISR R3 mobile platforms, a later generation of our R2e robots. Tan & Lewis (1997) describe an approach to maintaining geometric configurations of a robot group using virtual structures, also tested on the R3’s. Similar to our implementation of foraging, this work also exhibits spatial and temporal homogeneity, though the coupling is tighter. Beckers, Holland & Deneubourg (1994) describe a group of five robots with minimal sensing and no explicit communication effectively clustering pucks through a careful combination of the mechanical design of the robots’ puck scoops and the simple controller that moves them forward and in reverse. This work demonstrates a homogeneous controller performing a task similar to our foraging task, but where the goal location is not pre-specified, instead emerging during execution. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Holland & Melhuish (2000) present more recent results from an expanded study with essentially the same experimental scenario. Other work on multi-robot foraging is inspired by trail formation in ants (Drogoul & Ferber 1992). Werger Mataric (1996) describe a foraging robot chain that is constructed and modi­ fied using only contact sensing for communication. Vaughan, Stoy, Sukhatme & Mataric (2000) present multi-robot ant-like foraging in a simulated environment where efficient foraging trails are dynamically constructed using a mechanism analogous to ant pheromones. Behavior-based control has been used in many applications ranging from multi-robot soccer (Lund Pagliarini 2000) and service robotics (Lindstrom. Oreback& Christensen 2000), to control of underwater robots (Rosenblatt, Wiliams Durrant-Whyte 2000) and ape-like robots (Hasegawa, Ito & Fukuda 2000). In all of these behavior-based systems, some action selection mechanism pro­ duces a coherent, global behavior. The work described in this dissertation uses behavior arbitration in which some (possibly small) subset of the behaviors controls the motors at a given time. Pirja- nian (1998) describes a number of action selection mechanisms. Pirjanian, Christensen & c Fayman (1998) present a voting-based action selection mechanism which is extended to multi-robot coor­ dination in Pirjanian & Mataric (2000). 3.2 M achine Learning and R obotics We confine our review to a selection of most relevant work in mobile robotics and statistical modeling. Various models have been employed on mobile robot platforms to date, a few examples of which we consider here. Cassandra. Kaelbling & c Littman (1994) use a partially observable Markov decision process (POMDP) to model uncertainty of location in a robot navigation task for an office environment. This work is similar to ours in that both explore model construction on, and use by, a mobile robot. The high computational complexity and large d ata requirements for POMDP generation, however, make it necessary to use a number of heuristics to reduce the 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. problem size and facilitate learning in addition to often running numerous trials, neither of which is the case with our approach. In more recent work, Koenig & Simmons (1996) use a POMDP to model sensor, actuator, and metric uncertainties in a similar office navigation task. For tractability, the learning system on the robot is initialized with a POMDP compiled off-line using sensor models and a topological map with metric uncertainties. A modified POMDP learning algorithm is used to passively fine- tune the probability distributions. Somewhat similarly, our work uses AMMs passively to capture statistics about the interaction dynamics between a robot and its environment, with the data used to influence the robot’s behavior. Both POMDPs and AMMs represent uncertainties using probability distributions, but POMDPs, being decision processes, also enable learning of control policies. Such policies may require sensor and environment state information, making the search space large and requiring more (possibly heuristic) computation. The number of states in a POMDP is usually pre-specified, whereas in an AMM it is learned based on the order of the system. Michaud & c Mataric (1998) also present a learning technique that uses a behavioral model of a robot’s interaction with the environment. The model takes the form of a tree capturing the history of behavior use, i.e.. specific sequences of behaviors, with nodes representing executed be­ haviors, and links representing transitions between them. Their approach and our AMMs are both constructive, being generated and modified as needed. Unlike AMMs, which can form arbitrary graphs with alternative behavior choices represented by probability distributions over transitions, their approach stores potentially long linear sequences of behaviors in a tree with branches indi­ cating alternative choices. The result is that the tree representation may require many trials to collect useful statistics, whereas AMMs can generate them during the course of one trial. This, however, is understandable given the goal of their work is for the robot to learn how to perform a task efficiently. By contrast, in our work the robot has a basic controller for the task, but must make intelligent decisions about how to proceed given its experience in the environment. 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Kosecka k Bajcsy (1993) present Discrete Event Systems (DESs) with emphasis on applications to mobile robotics. The structure of this DES approach is also related to AMMs, both being represented as directed graphs with behaviors as states. Unlike AMMs, DESs require a priori specification of all possible states, events, and the full transition function. This specification endows DES with control theoretic properties, though a practical specification of these parameters for mobile robots would seem to require heuristic engineering. Mahadevan k Theocharous (1998) have applied the notion of discrete (though time-extended) events to a Markov decision process for modeling industrial manufacturing. The goal is to optimize production using reinforcement learning. Other work has also used such hybrid SM P/M DP models, or semi-Markov decision processes (Sutton et al. 1999, Wang k Mahadevan 1999), as well as dynamical systems approaches (Beer 1993. Smithers 1995) to model the interaction between an agent (robot) and its environment. The basic structure of augmented Markov models is very similar to that of hidden Markov models (HMMs) (Rabiner 1989). The difference is that in a AMM. there is only one observation symbol per state, as opposed to a probability distribution over observation symbols in an HMM. In addition, an AMM assumes that the state of the system is known, thereby removing the HMM hidden state assumption. Our construction algorithm, however, is able to capture hidden state associated with higher-order transitions. Han k Veloso (1999) use HMMs to represent robot behaviors in a real-time behavior recognition system employed in a robot soccer domain. The notion of splitting AMM states based on local estimates of mean and variance is related to work by Hanson on the stochastic delta rule used in updating mean and variance estimates on the weights of feed-forward neural networks. Meiosis Networks use these mean and variance estimates to decide when to split nodes in the hidden layer (Hanson 1990). Other similar state-splitting approaches have been explored. McCallum (1996) presents Utile Distinction Memory (UDM), an algorithm that splits the states of a POMDP using a method for storing statistics that is almost identical to the statistics in an AMM. The state splitting tests used in the two approaches are also very similar. UDM performs node splitting based on 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reward distinctions and not perceptual distinctions. This limits the growth of the POMDP to the complexity of the task and not the complexity of the perceptual space. Our use of behaviors (instead of actions, precepts, etc.) serves a similar function — the AMM is only as complicated as the interaction between the robot, controller, and environment, for a particular task. 3.3 Statistics: Param etric Approxim ations This section reviews work in Statistics that is relevant to the use of AMMs in the following chap­ ters. In order to maintain low computational overhead in the construction and evaluation of AMMs (Chapter 4 and Appendix B), it is necessary to use computationally efficient approximations to common parametric quantities, including: binomial confidence limits, the cumulative standard nor­ mal distribution, the cumulative t distribution, and the cumulative F distribution. Unfortunately, good approximations for these values do not appear to be in wide use. In the hope of increasing awareness, this section reviews some of the literature for these approximations, considering in-depth those that are used in this dissertation. 3.3.1 Binom ial Confidence Limits Let A' be a random variable having a binomial distribution, such that P r(X = x:n.0) = Q p '( l - p ) " - * . where n is the number of trials, x is the number of successes, and p is the probability of success (Freund 1992). Given a level of significance, a . we wish to find the upper (1 — a) confidence limit, pU -a)(x), f o r p g m - j j that Pr{X < x \ p = p(l~a){x)} = a. (3.1) 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. for x = 0 ,1 ,... , n — 1. The confidence interval for p is then {0 < p < pll_Q)(x )}i which has an associated infimum coverage probability (i.e., minimum confidence) of (1 - a) (Blyth 1986). The symmetry properties of binomial probabilities allow calculation of the complementary lower confidence limit using P(i_0 )(x) = 1 - p( l - a |(n — x), for x = 0 ,... , n. or exact values may be derived using the inverse beta function, but both methods can be com­ putationally intensive. Tables of pre-calculated values are also available, but, no m atter how extensive, they may not cover the desired values of a and n. A more practical alternative is one of the many approximations that use inverted approximations to the normal and F distributions (Blyth 1986). Ghosh (1979) describes two approximations that are quite inaccurate for small val­ ues of n (i.e., n < 30). Johnson, Kotz & : Kemp (1992. pp. 124-133) describe an approximate confidence interval used by the MATLAB® binofit.m function, though this is also computation­ ally intensive. Blyth & Still (1983) presents a table of values for n < 30 and a = .95. .99, to be used in conjunction with a more accurate version of a common approximation when n > 30. (Blyth 1986) compares four different approximations, including the Paulson-Camp-Pratt approx­ imation (Paulson 1942, Camp 1951. Pratt 1968), which has the best accuracy for all values of n and x. The Paulson-Camp-Pratt approximation is among the best approximations available, being often accurate to several decimal places even for very small values of n (Blyth 1986). Thus, it seems quite reasonable for most practical applications, and it is used extensively in the model construction described in this dissertation. The approximation is given by1 The value of p(l Q,(x) may be calculated from Equation 3.1 using numeric approximation, P(1-a)( x )« 1 + SI(x+l)(r»—x ) — 9 n —8—3c^/9(x-f-l)(ri—x )( 9 n + 5 —c ^ l + n + t 8 l ( x + l ) - - 9 ( x - h l ) ( 2 + c i ) + t ) (3.2) l T he equation as presented by Blyth (1986) has a missing right parenthesis which may cause confusion. 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Values of a Values of c 0.1 0.05 0.025 0.01 0.005 0.0025 0.001 1.28155 1.64485 1.95996 2.32635 2.57583 2.80703 3.09023 Table 3.1: Pre-calculated values, c, of the inverse cumulative normal distribution for use in the Paulson-Camp-Pratt approximation (Equation 3.2). where c is the inverse normal cumulative distribution function at (1 — a) with p = 0 and a — 1. Table 3.1 provides some precomputed values of c for common values of a. An alternative is to calculate c using one of the approximations cited in the next section. It is important that Equation 3.2 only be used for x = I n — 2, when it is difficult to calculate exact values of p{i-Ql(x). Otherwise, for x = 0,n — l,n , the following easily computed exact values should be used: pll- o |(0 ) = I - a 1/rl p(1- Q,(n - 1) = (1 - a ) l/n p (l-a)(n ) _ j To calculate p(i_a )(x), x is replaced by x — 1 and c by — c in Equation 3.2. Given p(1-Q ,(x) and P (i-Q)(x), the upper 100(1 — a)% confidence interval for p is given by [ 0 ,p ( l- a |(x)], and the lower 100(1 — a)% confidence interval is given by [p(i_Q )(x), 1]. The two- sided 100(1 — a)% confidence interval is then [P(i-Q / 2)(x );P(l-a/,2)(x )]- 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3.2 Approxim ating the Cumulative Standard Normal D istribution The positive tail probability of a random variable .V having a standard normal distribution is given by *{z) = Pr(X > z) = j -J= e ~ -x2dx with an inverse <fc~l (x) = s. The positive tail probability is a cumulative distribution, though the standard normal cumulative distribution is often considered to be 1 - '{'(-)• Given ^ (r), one may calculate the tail probability at x of a general normal distribution with mean \j and variance a- by setting z = (x — n)/a. Instead of referring to pre-compiled tables of 'f > and < & -1 , which are unlikely to have the accuracy desired and are cumbersome to include in a program, it is often desirable to use one of the many approximations that exist. We briefly discuss a few of the approximations proposed in the literature along with their strengths and weaknesses, then present two of the most accurate. Page (1977) presents an approximation that is quite accurate for 0 < z < 2 with a maximum absolute error of 0.00014, but with a percentage error that grows large for high values of r (Hawkes 1982). Schmeiser’s (1979) approximation is not as accurate as that of Page for small values of z. but it is more accurate for larger values, as well as being simpler to calculate. Hamaker (1978) presents an approximation which is not as good as Page's for small values of z, having a maximum absolute error of 0.0061 and a relative error that grows without bound for larger z (Hawkes 1982). A modification to Hamakers approximation by Lin (1988) is nearly as accurate, but simpler to calculate. A further, though less accurate, simplification is presented in later work (Lin 1990). Unlike some of this work, the concern here is not in finding a decent approximation that is easy enough to use with a hand calculator. Rather, the desire is to use the most accurate approximation 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. that is not computationally exorbitant. Bailey (1981) makes use of two relatively complicated approximations to $ _ l(x), one for 0 < 2 < 4.98 and one for z > 4.75: ti (1 + 0.0078365*? - 0.00028810*} + 0.0000043728*?), if x < 0.999999, i , 0 .1 6 3 3 , 0 .5 9 6 2 _ . l _______ t2 + - g - + - 73—, otherwise, ^ ^ where = ;rln(4x - 4x2) *2 = \J—' 21n(l — x) - ln (— 4~ln(l — x)) The combination of these approximations is fairly accurate, with a maximum absolute error of 0.00022. Hawkes (1982) also uses the notion of two complementary approximations, but for approxi­ mating $ ( 2). For 0 < 2 < 2.2. Hawkes uses a modification of Harnaker's approximation, and for 2 > 2.2, a minor modification to an approximation in Lew (1981). The combination of these two is remarkable, with a maximum absolute error of 0.000017 and a relative error less than 0.1% for 2 < 20. The procedure is as follows: *(«) ~ - k [1 - V l . if , < 2.2, where t=z-0.0075166c3 -h 0.00031737;5 - 0.0000029657jt (3-4) (1.18-l+r)-4-e(“' 2/2) ^ - s . o o (1.209+1.l76r+~) U - ^ ----- This approximation is used in the calculation of the cumulative * and F distributions in the following sections. 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.3.3 Approxim ating the Cumulative t Distribution The t distribution is useful in determining the statistical significance of the difference between the means of two normally distributed populations, especially when the sample sizes are small (i.e., < 30) (Freund 1992, pp. 406-407). Values for the cumulative t distribution, or its inverse, are generally available in pre-calculated tables or via the widely-used, computationally-intensive incomplete beta function expression of the t distribution. Alternatively, it can be quite desirable to use one of the computationally efficient and quite accurate approximations to the cumulative t distributions. In general, the cumulative t distribution is approximated via a normalizing transformation, i.e., the t distribution is transformed to allow use of the normal cumulative distribution (see previous section). Prescott (1974) compares normalizing transformations of the t distribution by Anscombe (1950), Quenouille (1953), Chu (1956), Wallace (1959), and Scott & c Smith (1970). The approx­ imation by Wallace appears to be the most accurate among these. Mickey (1975) provides a modification of the approximation by Chu. Ling (1978) provides an extremely useful comparison of approximations to the t distribution, including ones by Fisher & Cornish (1960). Gentleman & Jenkins (1968), Peizer & Pratt (1968), and Wallace (1959). The comparison data indicate that an extremely accurate approximation could be constructed using Gentleman-Jenkins up to about 45 degrees of freedom, and Cornish-Fisher at higher degrees of freedom. Considering only a single approximation, the one by Peizer Pratt provides reasonable accuracy and is easy to program. We now present this approximation in more detail. 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Let Q{t | v) be the cumulative t distribution with v degrees of freedom. For small degrees of freedom (v = 1, 2,3,4) the following exact values should be used: Q(t I 1) = ^ — arctan(t) 1 7 r (3.5) Q(t I 3) = (3.6) (3.7) Q (t|4 ) 2 2 L 1 + # T 4 ‘ I l f . 2 1 t (3.8) (3.9) When u > 5, the approximation by Peizer & Pratt can be used: (3.10) where $ is the cumulative standard normal distribution, such as is given by Equation 3.4. 3.3.4 Approxim ating the Cumulative F Distribution The F distribution is used in comparing the variances of two normal populations (Freund 1992, p. 316). Ling (1978) presents a comparison of several normalizing approximations to the cumulative F distribution. The one we focus on here is by Peizer & Pratt (1968). Let Q(F | Vi, v2) be the cumulative F distribution with iq and u2 degrees of freedom. The Peizer-Pratt approximation (as presented by Ling (1978)2) is given by: I V 2 Due to typographical error, the equation for g(x) as presented in Ling (1978) is missing the parentheses in the denom inator, thereby leading to incorrect calculations. 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where d Vo H i Vi + Vo a _ j l + q q — 0.5 ) T S Vo ~ 1 2 ui - 1 2 Vi ~ i~ uo — 2 n 2 P V \ F + vo 1 - p I — x 2 + 2x log x 9(x) for x ^ l.x > 0 3(0) I 3(1) = 0. < f > is the cumulative standard normal distribution, or an approximation such as is given by Equa­ tion 3.4. 3.4 Statistics: Nonparam etric Tests In Chapter 8 , we compare the parametric version of AMMs used in much of the dissertation to a nonparametric implementation. One of the key differences between the two versions is the test of location that is used to determine the need for node-splitting. In the parametric version, the location test used is based on the t distribution and assumes normal populations. The idea in the nonparametric version is to use a location test that makes as few distribution assumptions as possible. In the following section, we present a review of nonparametric location tests, focusing on one by Fligner & Rust (1982), which is used in this dissertation. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.4.1 Nonparam etric Tests of Location One of the most commonly used nonparametric tests of location is the Mann-Whitney-Wilcoxon test (Mann & Whitney 1947, Wilcoxon 1945). It allows greater freedom in the characteristics of the two populations being compared than does the parametric t test. The Mann-Whitney-Wilcoxon test, however, requires that the two populations be symmetric and identical in every respect except location. Thus, the spread, or variances, of the populations can not differ. The Behrens-Fisher problem examines location differences between normally distributed samples from populations that may differ in shape, i.e., have different variances (Fenstad 1983). The nonparametric Behrens- Fisher problem allows non-normal populations, and a further generalization allows non-symmetric populations. There are nonparametric tests that handle the symmetric version of the Behrens-Fisher prob­ lem, including Fligner & Policello (1981). There are also tests for the generalized Behrens-Fisher problem with non-symmetric populations. Hettmansperger & Malin (1975) present one such test, a conservative modification of Mood’s (1954) result. Fligner & c Rust (1982) present another modi­ fication of Mood’s test with advantages over the one by Hettmansperger & c Malin. Hettmansperger 8z McKean (1998, pp. 131-133) present a modified version of Mathisen’s (1943) test applicable to the generalized problem. In this dissertation, we have chosen the result by Fligner & Rust as an effective nonparametric test of location with very few distribution assumptions. We now provide the details of the test. Let X t ,... , Xm and Y i,... ,Yn be independent random samples from populations with con­ tinuous cumulative distribution functions F(x) and G(y), respectively. Let 9X and 9y be unique medians of the populations, F(9X) = G{9y) = We wish to test the null hypothesis H0 : 9X = 9y against the alternative Hi : 9X > 9y or Hi :9X < 9y or H\ :9X ^ 9y. We define Z = Z i, ... , Z,v to 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. be the sorted list of the X's and V r’s, with N = m + n, and M as the median of Z. V V e calculate a version Mood’s (1954) statistic, T = 52 A/)/n, where < Z > (a , 6) = < 1, if a < b 7 , if a = b 0, if a > b and use this in the modified test statistic T = \fn(T — \)/b. T has a limiting normal distribution with a mean of 0 and variance a1. An estimate of a1 is given by: • • > a" = < if G n ( U s ) — G n { L s ) > 0 if G„(L'.v) — Gn{Lt\) = 0 where p = rn/n and R Ln Un b,\ Fm (U .v) - F m (L .\) G n (f’.v) — Gn(Ly) Z(.v-bv) N + 1 Fm and Gn are the empirical cumulative distribution functions as calculated from the .Y’s and V’s, respectively. 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Now, in deciding between the null hypothesis, H q, and an alternate hypothesis, the following decision criteria are used: choose Hi : 8 \ > Oy over Hq if T > Ta{m.n} choose Hi :&x < Oy over Ho if T < —Ta{m,n} choose H i . d x ^ Q y over Hq if T > Tq.{m,n} or T < — T^{m, n} where Ta{m,n} is the critical value of the T statistic at a significance level of a with sample sizes of m and n. Appendix C provides tables of critical values for m.n < 25. When m,n > 25, the inverse cumulative standard normal distribution at a, < f > -1 (a), provides a good approximation to Tn (see Equation 3.3). One problem with Fligner Rust’s (1982) test is that, even though the test accommodates non-symmetric distributions, the test fails when the distributions are highly non-symmetric. for example when M = min(Z). The way we have overcome this deficiency in this dissertation is by performing a sanity check, making the alternate hypotheses symmetric Hi '.O x > Oy and 9y < Ox :@ x < & y > 0.Y Hi -Ox ^ Oy and Oy ^ Ox and performing all of the associated calculations. When the test fails, at least one term of these alternate hypotheses does not hold, and thus. H q is correctly not rejected. 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.5 Sum m ary This chapter reviewed related work in the areas of Robotics, Machine Learning, and Statistics. In particular, the focus was on single-robot and multi-robot foraging, behavior-based control, and Markov-type models as applied to Robotics. An aim of this chapter (and an ancillary contribution of the dissertation) was to increase awareness of and access to some of the Statistics literature on computationally efficient approximations to common statistical quantities, and less common nonparametric tests. The approximations and test presented in detail will be used in many of the remaining chapters. • 1 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 A ugm ented M arkov M odels This chapter presents augmented Markov models (AMMs), and details their relationship to Markov chains and semi-Markov processes (SMPs). It provides an overview of the AMM representation and the model construction algorithm used in this dissertation (with details available in Appendix B). This chapter also lays the foundation for the remainder of the dissertation: it formalizes the use of AMMs with behavior-based control to enable capturing of agent-environment interaction dynamics; and it presents the AMM-based calculations necessary to evaluating those dynamics. The chapter also includes examples of AMM construction. In Chapters I and 2, we introduced and motivated the idea of using AMMs with behavior- based control to capture and evaluate agent-environment interaction dynamics. In this chapter, we provide the key tools and methods for actually doing so in the remainder of the dissertation. We describe the representation and construction of higher-order AMMs, their evaluation, and their use with behavior-based control. First, however, we examine the theoretical underpinnings of AMMs. 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1 Markov Chains, SM Ps, and A M M s In this section, we develop the relationship between augmented Markov models (AMMs), Markov chains and semi-Markov processes in order to provide theoretical context for our use of AMMs. We begin with Markov chains and work towards AMMs. A discrete, first-order Markov chain is a stochastic process {.Ym, m = 1.2,3,...} with a finite or countable state space adhering to the following property: P { A rri-r-1 = j | A m = i, A m_i — 1. . . . . A2 = *2i -AI = *1 } = P{.Ym+i = j I A'm = i}, (4.1) for all states ii, t >,.. • . im-i, i.j, and all m > 1 (Ross 1992). In other words, the probability that the next state A'm+i is j, given the current state (A'm = i) and any past state (A'[ = t|,... . .Vm_i = im_i), is dependent only upon the current state i. In general, a stochastic process that satisfies Equation 4.1 is said to be first-order Markovian. If Equation 4.1 is independent of m, the Markov chain has stationary transition probabilities and is said to be homogeneous. The models described in this section are all homogeneous. In an nth-order Markov chain. Equation 4.1 takes the following form: P{A m+l — j | Am — i■ Am— l = - 1 A-j = i'2 .. A\ = iV } — P { A'm* [ —j | Am — i.Am — t — i-T U — I' - - ■ , Am-n+i — irn — n-f-1 }• (4.2) We assume that Equation 4.2 holds for state j and some n = < m. and that for all states other than j the equation holds for some n < n t . In other words, the probability of the next state is dependent on the current state and at most n — 1 previous states. In a Markov chain, the time spent in each state before a transition to the next state is geomet­ rically distributed. A semi-Markov process (SMP) is a generalization of a Markov chain allowing 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. arbitrary state durations. V V e let Qij{t) be the probability of remaining in state i for time < t before transitioning to state j. If we let Pl} — Qij(oc), the PtJ define the transition probabilities of the embedded Markov chain (Ross 1992) and it follows that ^ Pij = 1. V V e let Fij (t) = Qtj(t)/Pij be the conditional probability of remaining in state i for time < t, given the system has just entered state i and will transition to state j. (Ross (1992) provides further details on Markov chains and semi-Markov processes.) An AMM is a sub-class of SMP in which the time spent in a particular state is not dependent upon the next state, i.e., Fij(t) = F,k(t) for all states i, j, and k such that j,k ^ i. In addition, before a self-transition, the system remains in the current state for exactly one time step, giving: 0, if t < 0 1. if t > 0 Though this formulation makes no assumptions about underlying distributions, the statistical tests we present in this Section 4.3.4 do assume normal distributions. This constrains Fij(t) to be < & (fi = 1 /(1 — P,i),<r~), where is the normal cumulative distribution function (with mean and variance determined empirically). Chapter 8 empirically evaluates the violation of this assumption of normality. AMMs provide a compromise between the generality of SMPs and the computational simplicity of Markov chains. They allow standard expectation calculations from Markov chain theory to be easily combined with popular statistical hypothesis tests, such as the t and F tests, that assume normal distributions. Now that we have placed AMMs in the context of Markov chains and SMPs, we provide an overview of our AMM representation and the construction algorithm that uses it. Full details are available in Appendix B. 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2 AM M Im plem entation: Overview We have seen that an AMM is essentially a semi-Markov process. Unlike perhaps a straightforward SMP representation, our AMM representation incorporates additional statistics in links and nodes which are used during construction and are available for application-motivated evaluations. These statistics allow the AMM construction algorithm to dynamically restructure a model to represent, in first-order form, a second-order, or higher-order, Markovian system by maintaining the appro­ priate order statistics. These statistics are used in conjunction with node splitting to "unfurl” the higher-order transitions into first-order transitions. Maintaining a first-order representation greatly simplifies many expectation calculations, allowing standard Markov chain results to be employed. Before continuing, we clarify what an AMM is not. Unlike a Markov decision process (MDP), an AMM does not explicitly represent actions or local reward. There is also no explicit representation of observations, nor the type of hidden-state-inducing partial-observability that is captured in a partially observable Markov decision process (POMDP). In other words, the system is taken to be fully observable with no hidden state of the type in an HMM. The one exception is that the AMM construction algorithm does not assume a first-order Markovian system, but is able to capture, in first-order form, the hidden state arising from the higher-order nature of the system. This differs from the hidden state captured by HMMs and POMDPS which assume first-order systems. The lack of explicit actions and observations in AMMs may seem limiting, but as we discussed in Section 1.3, the use of behavior-based control as a representational substrate provides the expressiveness that compensates for this lack. The simplicity of AMMs also facilitates on-line, incremental, real-time construction and evaluation — a key consideration in this dissertation. In the next section, we present the representational components of an AMM. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2.1 Representation of AM Ms For the purposes of conciseness and understandability in this section, we do not describe the full AMM representation as used by the model construction algorithm, but rather, consider only the five-tuple ( S, A, B, L, T ) containing much of the information necessary' for incremental model construction. The details of the elements are as follows: 1. S, a set of symbols (si,S2,... , s \i } recognized by the network. The first symbol, s t , is recognized only by the first state, aj. 2. A, a set of states (or nodes) (ai,a2,. .. .ajv}. Each state a* has four attributes: • a", the symbol that the state recognizes, i.e., an element of S: • a f, the average number of time steps that the system remains in a, whenever it enters that state: • nf’ , the variance associated with a f ; • and a f, the probability of remaining in a, in the next time step. The state, a i, represents the initial (unknown) state of the system, which is promptly left upon commencement of model construction and never entered subsequently. 3. B. an X x M transition matrix, where bt(k) contains the value of the state to transition to if the current state is a, and symbol s* is observed. If af = sfc , then 6j(fc) = a*, i.e.. if the observed symbol is identical to the last symbol observed, then the system remains in the current state. 4. L, a set of directed links ... .Ip}, connecting the states. Each link 1, has the following six attributes: • l{, indicates the state from which the link begins, a,/ 6 A: 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • l\, indicates the state to which the link connects, 6 A. The following constraints apply: a link cannot start and end at the same state, l[ 5£ l\; and two links from the same state cannot go to states that accept the same symbol, Vi,_/ s.t. l{ = i j , af, ^ a*, ; • if, stores the number of times the link Z , has been traversed; • if, stores the total number of time steps that the system has been in state l\, after first having traversed the link /*; • i f ’, contains the sum of squares of all the durations that comprise if; • and If is the probability of using the link lt at each time step, given tne system is in state l{. Because no two links can have the same value for both their from and to attributes, they cannot represent the same directed transition. Thus, iV - 1 < P < N (N — 1): at least iV — 1 links are needed to connect the non-initial states, and for a fully connected network there are :V (A T — I) links between the non-initial states. The single link from a lt Z t , is traversed exactly one time, giving Z f = 1 and If = 0. 5. T . a set of structures {Tl T rWx_l}. each with elements . tqn } storing infor­ mation on a particular n-link traversal sequence, where 1 < n < nmax - I and nmax is a user-specified maximum order for the model. Each element tf has n + 4 attributes: • t"’1, t”'2, ... , £jl’n+1, the n links comprising tf, stored as indices into L; • tf's, the number of times the n-link sequence has been traversed: • tf'~, the total number of time steps that the system has been in the state that link t" 'L connects to, after first having traversed tf; • , the sum of the squares of all the durations that comprise t"'“ . Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The bounds of Qn axe given by: 0, if P < n P-n+1, if P > 2 < Q n < iV(.V - 1)". In order for a two-link transition to exist there must be at least two links. If more than two links exist, the fewest n-link transitions (P — n + 1 of them) are created when an Euler path exists and is followed through the network. In a fully connected network, each of the P = iV('V — 1) links has a transition to N — I other links, giving us the upper bound for a sequence of n links. Given this AMM representation, the corresponding probabilistic transition m atrix of a Markov chain could be generated from aP. lp. P . and 1‘. The addition of a" and a°~ provides the more general (normally distributed) state durations of an SMP. Aside from a*, the remaining represen­ tational elements are used in incremental model generation and dynamic model reconfiguration using node splitting. We provide an overview of the model construction algorithm next. 4.2.2 AM M Construction Algorithm The data used for constructing an AMM consist of a continuous stream of symbols belonging to S. Construction of an AMM proceeds according to the following algorithm: • Initialize the system by creating the initial state, at. • If the current input symbol has never been seen before, add it to S and create a state that recognizes this symbol. Create a link from the current state to the new state and make the transition. Add this transition to B and create the corresponding new entries for T. • If the current input symbol is the same as the last input symbol, then remain in the current state. Update the appropriate values in A, L, and T for mean values and length of time in the state. Recalculate the transition probabilities associated with that state and its links. 55 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • If the current input symbol has been seen before, but is different from the last symbol, transition to a state that accepts the new symbol. If the link for this transition does not exists, create it. • When transitioning from one state to another, update the variance for the state being tran­ sitioned from, and the appropriate sum of squares values in L and T . • When about to transition from one state to another using link /*, do the following: 1. calculate the binomial confidence interval (Blyth 1986) for the number of traversals (tn's) of each tn which has t" * ... tn-n+i equal to the n links traversed before lt. If the actual number of traversals that then use I, as fn l falls outside the expected confidence interval, then there is an nth-order inconsistency in the traversals. (Section 3.3.1 details the calculation of binomial confidence intervals.) 2. calculate the t statistic associate with: (1) the mean time spent in the state l{ as calculated from the data in each tn which has tn,~ ... t n-n~ i_ l equal to the n links traversed before /*, and (2) the mean time spent in the state as calculated from all of the data. If the t statistic indicates a significant difference, then there is an nth-order inconsistency in the SMP-like state durations. If either of the previous tests indicates an inconsistency, then the current state is split, at­ taching the current in-link (with its associated out-iinks. as indicated in T) to the new state. This allows an nth-order traversal sequence to be represented in a first-order model. Using T, make all appropriate changes to the two states and their related links, in order to keep all global probabilities consistent. Update T , modifying and/or adding traversal sequences of the appropriate orders to maintain a consistent model. The above rules do not provide the complete details, but capture the general flavor of the model construction process. The final rule of the algorithm describes node splitting and deserves further 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. explanation. Since an AMM is constructed incrementally as training data become available, it is important that there be some mechanism for model modification when new data invalidate the current structure of the model. This mechanism is provided by node splitting which utilizes data from T to attempt to ensure that the model remains consistent with the training data. Node splitting allows an nth-order traversal sequence to be represented in a first-order model, making the model intuitively easier to understand and simplifying expectation calculations with the model. Note that there is relatively little computation involved in AMM construction when used with behavior-based control. In a non-optimized implementation, the computational complexity per input symbol is at most 0 ( N nm K X ) (for a fully connected network) when a state is being split, and at most 0 ( N n""*~l) otherwise. In practice, the use of behavior-based control constrains the complexity to be much more reasonable than what is implied by these values. Because execution of a controller must result in some coherent activity, the possible transitions from any one behavior tend to be small (e.g., 1 < N < 4). This essentially brings the maximum computational complexity near 0(1), though there might be significant constant overhead. In practice, we have observed that, even for nm ax as large as 10, the computational overhead is low enough to allow real-time processing of input symbols from the foraging task at a high frequency (e.g., 100’s of Hertz). The space complexity is also 0 ('V"m * x) for N fully connected states, but again, in practice, graphs tend to be fairly sparse, implying reasonable overhead. The next section demonstrates the effectiveness of the AMM construction algorithm in a non- first-order Markovian system. 4.2.3 Examples of AM M Construction Consider the sequence of input symbols {32 1 4 2 1 3 2 1 4 2 1 3 2 1 4 2 1...} that alternates between occurrences of (3 2 1} and (4 2 1}. Figure 4.1 gives an example of a first-order or second-order AMM generated with 100 symbols from this sequence. The key item to note in the figure is that from state # 4 (accepting symbol 1) there is a 0.5 probability of transitioning to 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. state #2 or state # 5 , and thus generating {3 2 1} or {4 2 1}. We know, however, that this is an inaccurate representation of the system since there can never be two consecutive occurrences of either {3 2 1} or {4 2 1}. Because the next state after # 4 depends on two states prior, the system is third order. In contrast, Figure 4.2 shows the first-order representation generated by a third-order AMM constructed with the same sequence. One third-order node split gives two additional states: one that accepts symbol 1 and one that accepts symbol 2. This new first-order representation accurately represents the symbol sequence. sym=: sym =0 #4 sym = l Figure 4.1: A example of a first-order or second-order AMM generated with 100 input symbols from the sequence {3 2 1 4 2 1 3 2 1 4 2 1 ...} «4 Figure 4.2: A example of a third-order AMM generated with 100 input symbols from the sequence {3 2 1 4 2 1 3 2 1 4 2 1 . . . } This illustration of third-order AMM construction demonstrates the case where there is an in­ consistency just in the link traversals, whereas a more subtle case might also involve inconsistencies in the traversal probabilities. A second type of node-splitting occurs based on inconsistencies in the 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. time spent in a particular state. AMM generation for a very complex system may require multiple types of node-splitting at several different Markovian orders. The next sections details the AMM-based evaluations that will be utilized in the applications of the following three chapters. 4.3 AM M -Based Evaluations We wish to derive several useful statistics from an AMM which will be used in the following chapters to evaluate the interaction dynamics captured by the models. One such statistic is the expected number of time steps the system takes to reach a destination state from a given start state. This is known as the mean first passage. The theory of Markov chains provides powerful tools for easily calculating such expectations. We apply these tools to AMMs, then use the results to calculate two other statistics: the total variance associated with the mean first passage, and the accompanying degrees of freedom. (A more detailed treatm ent of Markov chain theory, including proofs and derivations, is available in Roberts (1976) and Kemeny, Snell & Knapp (1966)). 4.3.1 M ean First Passage The first step in calculating the mean first passage is to extract the N x A T transition matrix P of an AMM A/. P is a matrix such that each element pij is the probability of transitioning directly to state j given the system is in state i. Each element of the matrix is given by: p Ptj = a i * if i = j lp kl if 3k s.t. /{ = ai and lk = aj 0 . otherwise where ai, af, lp, lk and lk are extracted from A/. 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In order to receive meaningful values from the following calculations, P must represent a Markov chain that is ergodic, i.e., every state can be reached from every other state. We can determine that a directed graph is ergodic by showing that it consists of one strongly connected component (see Cormen, Leiserson Rivest (1990, pp. 488-493) for an algorithm). Given that P is ergodic, we calculate the stationary vector w = w\,wo,... , wt\ giving the probability of being in each state a t , ao,... , a/v of M in the limit, w is found by solving the system of equations wP — w with the constraint that uq + id? + -----!- w.v = I. If w has at least one zero value, then P is not ergodic. Using w, we calculate the fundamental matrix Z = [I — (P — IF)]-1 , where W is the square matrix having w as each row. IF now enters the calculation of the mean first passage matrix E = (/ — Z + JZdg)D, where J is a matrix of l ’s, Zag is the diagonal matrix containing the main diagonal of Z. and D is the diagonal matrix with da = l/u>,. Each element eij of E represents the mean first passage from state n, to as. In our use of AMMs with behavior-based control, we are often interested in the mean first passage from an input symbol s* to another input symbol Sj. Since each input symbol could be recognized by multiple states in the AMM, the mean first passage between two input symbols might not be unique. In general, if n states recognize s, and m states recognize Sj, then nm entries in E represent the mean first passage between these symbols. In such a case, our interest is in the minimum mean first passage between two input symbols and the corresponding states that give this value in E. For symbols Si and Sj, let etJ be this minimum value from E. Let aQ and a$ be the states such that eaj = c,j. Once we know the states associated with the minimum mean first passage, we can calculate the expected amount of time spent in each state before reaching aj. To do so. we first convert our ergodic Markov chain into an absorbing chain with a j as the absorbing state. Essentially, this means modifying P so that once the system enters state a j it does not leave it, i.e., a* = 1.0. The transition matrix P is converted into a new matrix Q containing only transitions to and from the N — 1 non-absorbing states by simply deleting the /3-th row and column from P . Note that a ^ 3. 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The fundamental matrix N for the absorbing chain is given by N = ( / — Q )~l , where each element jiij represents the expected amount of time spent in the j-th non-absorbing state, given that the system starts in the i-th non-absorbing state, and n - i ^ nttJ, i f a < 3 & a d ~ ~ S jV—L 5 1 " ( a - i)7, i f a > 3 . V J = l 4.3.2 Variance of Mean First Passage In order to calculate the variance associated with a value in the mean first passage matrix, we first note that the underlying Markov chain of an AMM may be interpreted as a set of random variables, one for each state of the chain. Generally, it is assumed that these random variables are independent, identically distributed, and follow a normal distribution. Our use of variance also assumes independence and a normal distribution, but to be more general it allows non-identical distributions. Given a set of independent random variables {A'i, A '-> , • • • . A'.v} with each A', having an associ­ ated population mean Hi and variance of, we wish to calculate the mean and variance for the linear combination Y = c \A'i + c> A '-> + • • • + cnA'„. Basic results from statistics tell us that the mean of Y is h y = and the variance of V is erij. = c7cri !- In our case, we do not know the population means and variances and instead use the sample means and variances calculated for each state of the AMM, i.e., a'1 and a f'. Given eaij, aa, a j, and the fundamental matrix N generated by making a$ an absorbing state, we are ready to calculate the variance associated with eaa- As shown previously, eQ £ j is equivalent to the sum of the row of N associated with aa. Equivalently, eQ j may be expressed as a linear combination of the extracted from the AMM: ,v ead = y ] i=0 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ci = where nai/a£, if a < 3 and i < 3 na-i'i/a £ , if a > 3 and i < 3 rca ,,_i/a£, if a < 3 and i > 3 n a -i.i-i/a £ , if a > 3 and t > 3 0 , if i = 3 Now that we have expressed eaj as a linear combination of the means of the random variables composing the AMM, we can do similarly for the variance of eaj: .v var(eQ j) = ^ c f a f t = 0 with the Ci's calculated as above. 4.3.3 Degrees of Freedom The number of degrees of freedom associated with the linear combination V ' = c,A', may be expressed as ' /v z .i=L cf var(.Yt) ''.V . ^ [cf var(A'i)/V'.Y. t = i V x. - I where V'v. is the number of values used to calculate var(A',). This is an expanded version of the formula found in Press, Teukolsky, Vetterling & Flannery (1992, p. 617). To apply this to our calculation of the variance of ea& we simply use cfaf* in place of cf var(A't), with the Ci's calculated as above, and with 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.3.4 The f-Test and F-Test Given the mean first passage values in E and their associated variances and degrees of freedom, we can perform two standard tests of statistical significance: the (one-sample and two-sample) t-test and the F-test (Freund 1992). The t-test uses Student’s t distribution to determine whether the difference between the means of two normally-distributed populations is significant. Calculation of this test requires the two mean values and their combined degrees of freedom derived above. The F-test uses the F distribution to determine if the variances of two normal distributions are significantly different given their degrees of freedom. Either of these two tests can be used to compare values for mean first passage and variance within the same AMM and across AMMs. Press et al. (1992) provide more details on the computation of these tests. See Peizer & Pratt (1968) and Ling (1978) for computationally efficient approximations to F and t tail probabilities for use in these tests. The next section, describing how AMMs are used with behavior-based control, completes the foundation for modeling and evaluating interaction dynamics in this dissertation. 4.4 A M M U se w ith Behavior-Based Control As we have discussed. AMMs can be constructed and utilized on-line and in real-time as an agent (mobile robot) is performing a task, in order to capture the dynamics of its interaction with the environment. At each time step, the datum used for model generation consists simply of a symbol indicating which behavior (or subset of behaviors) of the behavior-based controller is currently active. As we have discussed, the use of behaviors as a representational substrate for model construction has major benefits. It provides parsimony in abstracting away low-level sensor readings and motor commands, while at the same time encompassing the richness of sensing and action manifested in behavior activations. 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. There is no need to heuristically modify the AMM construction algorithm in order to use behaviors, or provide any application-dependent initialization to the system. W hat is required, however, is the determination of the behavior space, i.e., the set of mutually-exclusive behaviors, which continuously describe the robot’s activity, and are uniquely labeled. One of these labels or symbols (together comprising S) is sent to the model generation algorithm at each time step. Note that the algorithm only accepts one symbol per time step. If the robot’s control system is structured so as to utilize simultaneous execution of two or more behaviors, the AMM algorithm will only be able to consider one of their symbols as input. Unless that one symbol is consistently used to represent the parallel execution of both behaviors, the result will be a model unrepresentative of the actual behavior dynamics. To prevent ill-defined models, it is necessary for the behavior space to be composed of mutually exclusive behaviors which account for all of the robot’ s activity. In other words, in behavior space B = {B i,B -i,.. . ,B k }, no two behavior sets B, and B} may be simultaneously active. In addition, if 7tot is the total time that the robot is active, and 7b, is the total time that behavior set Bi is active, then 7t0t = 7b, • An individual behavior set B, in the behavior space may represent several behaviors in the controller that are executed in parallel. The behavior space B need not contain all of the behaviors that the robot can exhibit, only some composition of behavior sets that meets the preceding constraints. In the worst case scenario where all of the behaviors of a controller are executed in parallel for 7to t, or when there is only a single behavior in the controller, then B also consists of exactly one behavior set. The AMM generated in this degenerate case has one non-initial state that it never leaves, and consequently is of no practical use. Fortunately, the large majority of current approaches to robot control, ranging from hybrid to behavior-based systems, utilize sequences and priorities over the actions and behaviors executed on the robot (Mataric 19976, Gat 1998). Thus, as long as the behavior of the robot can be decomposed into a sequence of mutually-exclusive behavior symbols to be provided to the AMM, this approach can be applied. 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.3: (Left) A second-order AMM constructed from foraging behavior data; (Right) A first- order AMM, constructed with the same data. 4.4.1 Exam ples of AMM Construction with BBC Using the data generated by a robot performing the foraging task (Section 2.3), the AMM genera­ tion algorithm constructed the model shown in graphical notation in Figure 4.3 (Left). Many of the numerical details are omitted, but the main elements are present. The states with their recognized symbols (e.g., avoiding, wandering) are indicated, as well as the initial state represented by 0. Links between states are shown, as are the two-link transitions, represented by dashed lines inside states connecting an in-link and an out-link. For the construction of this particular model, we used approximately 1750 data points sampled at 5 Hz over the course of approximately 7 minutes. Only 7 of the foraging controller behaviors were used as the behavior space. All together, the model required 35,000 flops to construct and 2700 bytes to store. The AMM was validated by visual comparison to the robot's behavior while performing the task. Numerous hours of video footage for the foraging task support the validity of the second-order AMM. Consequently, in the following chapters, the AMMs constructed in the experiments that use foraging are all second-order. The first-order AMM in Figure 4.3 (Right) is not an accurate model of the system. Examining the two-link transitions inside the avoiding state, we note that the transition from wandering to avoiding is followed by a transition from avoiding back to wandering in all cases. A similar situation 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. holds for reverse homing and homing. If avoiding were really one state, intuitively we would expect transitions between wandering, homing and reverse homing that pass through avoiding. These transitions, however, are often inappropriate in the foraging task and seldom occur, again arguing in favor of the second-order model. 4.5 Sum m ary This chapter presented the tools for capturing and evaluating agent-environment interaction dy­ namics using AMMs and behavior-based control. Specifically, it outlined the AMM construction algorithm and its associated representation, detailed AMM-based evaluations, and concretized the use of AMMs with behavior-based control. The following three chapters utilize these tools for performance-improving applications in both stationary and non-stationary problem domains. 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 5 A M M s in Stationary Problem Dom ains The previous chapters laid the foundation for the modeling and evaluation of agent- environment interaction dynamics using AMMs and behavior-based control. This chap­ ter now utilizes the approach to solve specific application challenges, with a focus on stationary problem domains where the stochasticity of agent-environment interaction dynamics is assumed to be non-changing over time. Specifically, this chapter explores group coordination challenges associated with: individual performance, group affilia­ tion, and group performance. Corresponding respectively to these are the three experi­ mental examples — fault detection, group membership based on ability and experience, and dynamic leader selection. 5.1 Introduction This chapter examines how modeling interaction dynamics using AMMs and behavior-based control can help improve the performance of a group of agents in the face of contingencies that arise during execution. V V e consider three issues — individual performance, group affiliation, and group performance — that impact the ability of a group to achieve effective coordination, and present applications of AMMs that help negotiate these issues to improve performance. The assumption 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in these applications is that the interaction dynamics being modeled are stationary, or any non- stationarity does not significantly impact the application-specific evaluations being used. This is a simplifying assumption that allows us to use a single AMM for evaluation. This assumption is relaxed in Chapters 6 and 7 when we explore non-stationary problem domains. To help motivate the AMM-based applications of this chapter, we first discuss the three group coordination issues in more detail. As an example of the impact of individual performance on group coordination, consider a scenario where a single robot develops a hardware failure and is neither able to complete its portion of the group task, nor to inform the other group members of its failure. If the members do not know to compensate for the incapacitated robot, the group as a whole may fail to complete its task. Monitoring individual robot performance, in this case for fault detection (one of our experimental examples), is an important component of group coordination. The ability of a robot to determine what group it belongs to (i.e., its group affiliation) is another important component of group coordination. Suppose a robot were introduced into an environment containing several groups specializing in different tasks. In order to be able to coordinate its activity with the group it fits into best, it must have some mechanism for determining its group affiliation. One way is to compare its abilities and experience with those of other robots — an approach we present in another of the experimental examples. A third issue impacting coordination is group performance. Consider a group of robots or­ ganized in a hierarchy, where the performance of the entire group is strongly dependent on the members in the upper strata. The ability to dynamically reorganize the structure of the group (re-coordinate the individuals) to improve performance is important when unknown or unforeseen circumstances result in poor leaders. We also present experimental results for this type of dynamic leader selection later in the chapter. In the next sections, we apply AMMs to experimental examples exploring the three issues in group coordination. 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2 Individual Performance: Fault D etection For this application, we limit our consideration of faults to those that would keep the robot in one behavior for an inordinate period of time. Such faults may include sensor and actuator failures, as well as the robot becoming physically stuck. To detect a potential fault, we compare, at each time step, the total time the robot has spent in the current AMM state ai to the mean and variance calculated from previous data for that state. A simple confidence estimate on the upper bound of the mean can be used to make the comparison: /j. = ai(rnean) + c\/ai{var) where c is a small positive constant (e.g., 1 < c < 3). If the current time spent in the state exceeds /i, the algorithm signals that there might be a fault. V V e tested this algorithm on-line by having the robots perform the wandering and avoiding behaviors of the foraging task (Chapter 2). Figure 5.1 shows a typical AMM that was constructed. If it was detected that the robot had been in one of the behaviors too long, it would send a signal to the robot, which would in turn beep, thereby indicating a potential fault. We simulated a fault (the robot getting stuck on a rock) by lifting the drive wheels off the ground. During a dozen trials, the robot never failed to detect the fault. In choosing a particular confidence interval, it is important that one be aware of how rapidly the interval narrows as new data are incorporated into the model. If the interval narrows too quickly, the result will be many false positives, i.e., the robot indicating a fault when none exists. An example of such a confidence interval that narrows too quickly is /r = ai{mean) + t.ooi 69 ai(uar) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. avoiding f wandering Figure 5.1: Sample AMM constructed from the wandering and avoiding behaviors of the foraging task. where < 001 is Student’s t distribution leaving 0.001 probability in the tail, and n is the total number of times the system has entered a*. The first confidence interval may be interpreted as this one, but with tp = s/n. This allows us to test for a fault with increasing confidence (i.e., with decreasing values of p) as the number of data points n increases, thus effectively reducing the narrowing of the confidence interval to the rate of variance decrease, and greatly reducing false positives. Unfortunately, due to limitations in the programming environment of the robots, the models generated for fault detection could not be constructed on the robots themselves. Instead, behavior data were transmitted via radio modem to a Power Macintosh that constructed the models and communicated model information back to the robots. Due to other system limitations, the com­ munication throughput was approximately 2.5 bytes/sec/robot. Even at such rates, and with lost packets, the system was able to maintain useful models simultaneously for multiple robots. More sophisticated versions of fault detection are also possible using tests based on the mean first passage (Section 4.3). These tests allow the detection of faults that manifest as aberrations in multi-behavior execution sequences and loops. 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3 Group Affiliation: M em bership through A bility and Experience Coordinating activity in a behaviorally heterogeneous group of robots may require the robots to know their sub-group affiliations. In a learning system where the robot’s final behavior is not predetermined, group affiliation is not designated a priori. AMMs provide a mechanism for determining group affiliation. Two robots that wish to ascertain whether they belong to the same group can transmit data generated by their AMMs, then determine the probability of the other robot’s data on their respective AMMs. The probability of a sequence is the product of the probabilities on the transitions that would be followed to generate that sequence. If a transition does not exist, then the probability is zero. If each AMM accepts the data generated by the other’s AMM (with probability >0). then the robots are designated as members of the same group. They are considered to have the same ability, or capacity for performing a particular task. In the case of a complex task, such as our foraging example, it may take a significant amount of time for two robot that “should” belong the same group to explore the same interaction dynamics and be determined to actually belong to that group. In the context of our experimental example, there might be several different groups in an area, with only one performing foraging. When a new foraging robot is introduced into the environment, it must determine that its abilities coincide with those of the other foraging robots, and join their group. In addition to this coarse “don't accept”/ “accept”, or ability-based, determination of group affiliation, a more refined categorization can be made by considering the actual probabilities of symbol sequences. To test this notion, we ran 2 trials for each of 3 robots performing the wandering- avoiding behaviors. In one trial, the Corrall was empty, in the other, 18 pucks were distributed evenly, though sparsely, on the floor. The robots occasionally avoided the pucks in addition to normally avoiding the walls of the Corrall. For each of the six three-minute trials, an AMM of the robot’s behavior was constructed. One thousand data points were generated by each of the 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. six AMMs, and the probability of each data set was calculated on each of the AMMs, resulting in 36 probability values. Our hypothesis was that a data set from an AMM generated in one of the two environments should produce higher probabilities on the remaining two AMMs from that environment than on the three from the other environment. For each data set, we tested all combinations of two AMMs, one taken from each environment, to determine how often the higher probability would be from the same environment. Of the 36 such combinations, in 26 (or 72%) of them, the same environment had the higher probability. These results, produced from little training data and almost identical environments, suggest that AMMs can be used to make subtle behavioral distinctions. These distinctions can be thought of as experience-based. Since the robots are able to and do perform the same task, it is their specific individual experiences that differ, and are the basis for distinction. More sophisticated variations of affiliation determination are also possible using, for example, calculations of isomorphism between the AMMs of different robots. Another metric might be based on the summed differences between corresponding minimum mean first passage values (Section 4.3). A version of this metric is used in Chapter 8. 5.4 Group Performance: Dynam ic Leader Selection Due to inherent variations in sensors and actuators, or inexperience with a specific robotic plat­ form, it may be difficult to accurately assess the ability of a robot at performing a novel task. Alternatively, even if performance history' is availabie, there is no guarantee that future perfor­ mance will neither improve nor degrade. Even though the ability of individuals may change over time, it is important that the performance of the group remain as high as possible. To achieve this, some mechanism for dynamic restructuring based on performance is necessary, especially in social structures such as hierarchies where significant reliance is placed on the most dominant individuals. We present dynamic leader selection using AMMs as an example of such a mechanism. 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In the following experiments, four R2e robots had to perform the foraging task (Chapter 2). Collecting 10 of the 27 pucks in the Corrall constituted completion of the task, with a shorter completion time corresponding to better group performance. The robots were organized in a strict dominance hierarchy such that whenever two or more robots simultaneously had pucks to deliver to the goal, the most dominant individual was allowed to proceed, while the less dominant individuals each waited their turn. The four robots, however, were not equally efficient at performing the task. The code for each robot was identical, except that the maximum speed was limited to different values, as follows: RobotO ‘ ‘full-speed” (ss 0.5 ft/sec); Robotl "two-thirds-speed” (« 0.33 ft/sec); Robot2 ‘ ‘half-speed” (=s 0.25 ft/sec); and Robot3 “one-third-speed” (s= 0.17 ft/sec). We conducted three sets of experiments, two with fixed hierarchies as baselines of comparison to the third, which allowed hierarchy restructuring through the use of AMM-based evaluations. The experiments were designated as follows: 1. Least Desirable: The robots were members of a fixed hierarchy with the relative dominance of each inversely proportional to its maximum speed. Thus, Robot3 (the slowest) was the most dominant, and RobotO (the fastest) was the least dominant. 2. M ost Desirable: Complementary to Least Desirable scenario, these experiments had the robots arranged in a fixed hierarchy, with the fastest as most dominant, and slowest as least dominant. 3. D ynam ic Leader Selection (D L S): The hierarchy was initialized to be identical to that of the Least Desirable experiments, but allowed hierarchy restructuring to improve performance. In the DLS experiments, with no a priori information about a robot’s speed provided, an AMM for each robot was constructed at run-time and used to evaluate performance. The metric of evaluation employed was number of transitions to exiting state performance = ---------------- — :-------------------------------- total tune spent homing 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Mean T im # to Completion Least Desirable Dynamic Leader Selection Most Desrable * Difference between Least Desirable and Dynamic Leader Selection is significant at p-0.005 Figure 5.2: Mean time to completion for the Least Desirable. Dynamic Leader Selection, and Most Desirable experimental scenarios. Least Desirable DLS Most Desirable Mean time to completion 27.2 23.4 22.4 Standard deviation l.l 1.3 1.1 Table 5.1: Mean time to completion for the Least Desirable. Dynamic Leader Selection, and Most Desirable experiments. where the statistics for the exiting and homing behaviors came directly from the AMMs. The numerator gives a count of the number of pucks that were delivered to the goal, while the denom­ inator measures the total time spent delivering those. The ratio gives the number of pucks per unit time that a robot is able to deliver: the higher this value, the faster the robot delivers pucks, and the better its performance. (An alternate metric would the minimum mean first passage from homing to exiting.) Each robot began a trial with its performance value initialized to zero. As it executed the task, its AMM was continuously updated, as was the performance value derived from it. The robot’s position in the hierarchy was also updated so that it was more dominant than all other robots with lower performance values. 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Hierarchy Positions: Dynamic Leader Selection Experiments Robot ID 0 1 2 3 Mean position 2.6 1.8 1.4 0.2 Standard deviation 0.9 0.4 1.1 0.4 Table 5.2: Mean positions in the hierarchy at the end of the Dynamic Leader Selection experiments. M mo Hwaucfiy Petition* H Least Desiratfe Dynamic Leader Selection Most Desirable most dominant dominant 0 1 2 3 fastest Robot ID slowest * Oiflerenes between Least Oessable and Oynanw: Laader Selection ts significant at p*0.05 Figure 5.3: Average hierarchy positions at the completion of the Least Desirable. Dynamic Leader Selection, and Most Desirable experiments. We ran five trials of each experiment (Least Desirable, Dynamic Leader Selection, Most De­ sirable) and used a statistical hypothesis test based on Student’s t distribution (Freund 1992) to ascertain the significance of our results. In the figures, statistical significance is indicated by asterisks. Table 5.1 and Figure 5.2 present the average time to completion (i.e., performance) for the three experiments. In the experiments using dynamic leader selection we see a statistically significant improvement in the time to completion over the Least Desirable experiments, thus indicating a successful restructuring of the hierarchies to a more optimal configuration. The Most Desirable time is slightly, though not significantly, lower than the DLS time. This difference may be attributed 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Pucks Collected Least Desirable DLS Most Desirable Robot ID 0 1 2 3 0 1 2 3 0 1 2 3 Mean number of pucks collected 1.2 2.6 3.2 3.0 3.2 3.4 2.0 1.4 3.8 3.2 2.2 0.8 Standard deviation 0.4 0.5 0.8 1.0 1.3 0.9 1.2 0.5 0.4 0.8 l.l 0.4 Table 5.3: The mean number of pucks collected in the Least Desirable, DLS, and Most Desirable experimental scenarios. to the fact that the DLS experiments are initially configured with the less efficient Least Desirable hierarchy structure. The successful restructuring of the hierarchies is evident in Figure 5.3 and Table 5.2. We see that the final hierarchy positions in the Least Desirable and Most Desirable experiments are unchanged from the initial positions since no hierarchy restructuring was allowed to take place. Even though the initial hierarchy positions in the DLS experiments are identical to those of the Least Desirable experiments, we see in Figure 5.3 that by the end of the trials, the positions arc almost identical to the Most Desirable ones, though always lying between the Least Desirable and Most Desirable. The fact that the average DLS positions are not exactly equal to the Most Desirable positions (indicating non-optimal restructuring) requires further explanation. At the beginning of each DLS trial, when all of the robots have performance values of zero and the hierarchies are still identical to the Least Desirable, it is very likely that one or more of the slower, more dominant robots will find a puck and be allowed to deliver first, thereby attaining a non-zero performance value. This helps establish its position in the hierarchy and makes it more difficult for the faster, less dominant robots to get a chance to deliver a puck and get a better performance value. In addition, the foraging task is quite stochastic. On average, all of the robots will find pucks the same distance from the goal, but in any one trial, a slower, less capable robot may find pucks very close to the goal and deliver them in little time, thereby gaining a higher performance value than a more capable robot finding pucks further away. Similar to the fault detection experiments, the AMMs in these experiments were also generated on a PowerMac with data transmitted to and from the robots 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. M«an Number of Pucks Colecied T — ----------- — f 1 least Desirable Dynamic Leader Selection Most Desirable 0 1 2 3 (attest Robot ID slowest * Difference between Least Desirable and Dynamic Leader Selection « significant at p*0.05 Figure 5.4: Mean number of pucks collected at the completion of the Least Desirable, Dynamic Leader Selection, and Most Desirable experiments. via radio modem. Lost data packets and noisy signals, in addition to occasional robot failures, also hindered optimal restructuring. With all of these complications, the success of the Dynamic Leader Selection experiments is an indication of the robustness of the approach presented here. Figure 5.4 and Table 5.3 present the average number of pucks that each robot collected in each of the three experiments. We note by a comparison with Figure 5.3 that in all three experiments, the number of pucks a robot collected is proportional to its final position in the hierarchy. Intuitively, this is consistent with the notion of a dominance hierarchy: the less dominant a robot is, the less often it will be allowed to deliver a puck, since it must wait to be the most dominant individual ready to do so. In a large hierarchy, this may never happen. There are two slight anomalies in the puck data that, although not significant, bear some consideration. In the Least Desirable experiments, Robot3 (the most dominant and slowest) did not collect as many pucks as Robot2, although one might expect it to collect more. One explanation for this inconsistency is that the performance of Robot3 degraded more than could be compensated for by its high position in the hierarchy. Robot3 was simply poor at finding pucks. There is a 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. complementary inconsistency in the DLS data where RobotO (the fastest, and on average most dominant at the end of the trials) collected fewer pucks than Robot 1. This seems due to the fact that RobotO began the DLS trials as the least dominant robot, and consequently at times was not allowed to deliver a puck and establish its performance value until relatively late in a trial. Finally, it is worth noting that none of the results for the Most Desirable experiments were significantly different from the dynamic leader selection experiments. One implication of this result (and some of the others mentioned above) is that in order to test the significance of minor differences in the data gathered, additional, possibly numerous, trials would be necessary — likely an impossibility due to time and robot robustness constraints. The complementary implication is that, on average, dynamic leader selection using AMMs yields performance virtually identical to a pre-specified optimal hierarchy, even starting with a very undesirable hierarchy. 5.5 Sum m ary This chapter presented three experimental applications focusing on group coordination issues and assuming stationary interaction dynamics. The evaluation of agent-environment interaction dy­ namics using AMMs and behavior-based control was shown to be effective in fault detection, af­ filiation determination, and dynamic leader selection — all important to group-level performance. The next chapter explores applications in non-stationary domains. 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 A M M s in N on-Stationary Problem Domains: Regim e D etection This chapter relaxes the assumption of stationarity in Chapter 5, and focuses specifi­ cally on detecting significant changes in the agent-environment interaction dynamics. It presents an approach using multiple AMMs to monitor events at different time scales and provide statistics to detect changes at those time scales. The approach is success­ fully implemented using a physical mobile robot performing a land mine collection task (a variation of foraging), and experimental results are provided. 6.1 Introduction In certain classes of tasks, it may be necessary for a situated agent to detect significant global changes in the environment and modify its behavior or the task structure accordingly. The envi­ ronment can be in a particular regime (i.e., a period of steady state) and then switch to a different regime requiring the agent to modify its behavior. Detecting such environmental regime changes may be difficult for a number of reasons: 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • The agent may have no a priori knowledge of the environment and thus also lack a baseline for gauging environmental shifts. In a system where the environment is evolving (i.e., a non-stationary system), determining a basis for comparison may be difficult. • Given only local sensing capabilities, the agent may require a significant amount of time to estimate the state of the environment. Any estimate of state, however, may be outdated in a non-stationary system. • The nature of the task may be stochastic, with uncertainties large enough to preclude an effective predictive model of environmental state, or dynamics too complex to make the development of such a model feasible or tractable. Alternatively, however potentially simple the system, there may be no a priori data with which to instantiate a model. • Depending on the task or environment, the time scale of the environmental non-stationarity that must be detected may differ. For example, in one task, the environmental change may be almost instantaneous, detectable between one moment and the next. In another task, the change may be slow and incremental, requiring the examination of a large time interval for detection. Hard-coding the agent with a specific time scale to use for regime detection can be problematic. A time scale that is too small makes the robot incapable of detecting the change. Conversely, a time scale that is unnecessarily large increases the time required to detect the change and may be undesirable in time-critical situations. As a concrete example, consider the task of collecting undetonated land mines in a field. Assume that there are two types of mines, large and small, with destructive power proportioned to their size. A robot is given the following instructions: “Go out to the field and first collect as many large mines as you can, since they are the more destructive. But don’t spend all of your time searching for every last large mine if you discover that there aren't many of them. Instead, start collecting the small mines. After all, we want to clear the field as well as possible.” In order for the robot to accomplish this task, it must have enough data about its environment (the mine field) 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to intelligently switch from collecting large mines to small ones. In this scenario, the robot is only able to carry one mine at a time, producing a large cost (in time) for each mine collected. It is important that the more critical large mines be collected first, but that the robot be able to decide when to switch to the smaller mines. (Here we assume that the task requires the robot to collect one type of mine at a time. Alternatively, the robot might switch between types as necessary. V V e explore this alternative when we consider a reward maximization scenario in Chapter 7.) The difficulty of this task is compounded when the issues mentioned above apply. The robot may have no a priori information about the numbers of large and small mines in the field, their distributions, or relative proportions. The robot may also lack global sensing of the mines in the field and may not know the time scale appropriate to its decision for switching between mine types. This decision is dependent on factors including the size of the field and the relative densities of the two types of mines. In this chapter, we propose a mechanism for regime detection that resolves the above issues. The approach uses multiple AMMs to capture, in real time, the dynamics of a robot interacting with its environment in terms of the behaviors it performs. One AM\I is created and maintained at each time scale that is monitored, and statistics about the environment at that time scale are derived from it. As task execution continues, AMMs are dynamically generated to accommodate the increasing time intervals. Sets of statistics from the models are used to determine whether the environmental regime has changed. This approach requires no a priori knowledge, uses only local sensing, and captures the notion of time scale. Additionally, it works naturally with stochastic task domains where variations between trials may change the most appropriate time scale for regime detection. The approach has been physically realized on a mobile robot performing the mine collection task. Experiments and results for this task are presented later in the chapter. It should be noted that it is difficult to define an absolute notion of regimes, especially since it relates to the dynamics of environmental changes. In a gradually shifting environment, the designation of a regime change can be fairly arbitrary. In this chapter, we propose one principled 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. method for regime detection based on statistical hypothesis tests, and empirically show it to be effective. In the next section we describe how AMMs may be used as part of a mechanism for regime detection. 6.2 AM M s for R egim e D etection Our focus is on the difficult but realistic situation in which a robot lacks a priori information about its environment, an environmental model, and global sensing. In such a situation, the robot may require a relatively large amount of time to detect a trend that signals a global environmental regime change. This is especially so if the system is noisy and stochastic, as is generally true for mobile robotics. Unless a sufficiently large time scale is employed, the regime change may be lost in the variation of the data. Determining the appropriate time scale, however, may not be possible ahead of time. It may be dependent on the exact nature of the task, the structure of the environment (including the presence of other robots), and the nature of the system’ s stochasticity. The time scale may also dependent on the specific attribute(s) of the system being monitored for regime changes. In order to negotiate these challenges and endow a robot with the ability to detect global environmental regime changes, we maintain models (AMMs) of the robot’s interaction with its environment at multiple time scales. As the robot performs its task, we extract and store particular statistics from these models, which are used to detect a specific regime change based on a sound criterion of significance. In our first experimental validation, the regime switch is detected as a significant change in the density of mines, while in the proportion-maintaining scenario it is detected as a change in the proportion of mines. Before presenting the algorithm for regime detection, we first introduce some notation used in the algorithm. 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.2.1 N otation • Let t > 0 be the minimum time scale, or number of input symbols, used to construct an AMM. • Let fi be a positive valued function of r returning the size of (number of input symbols maintained by) the i-th time scale. • Let k > 0 specify the number of AMM-extracted values used in detecting a regime. • Let the AMM at time scale fi be mi. • Let Qi be a sequence of at most k statistics for model rn,. • Let n be the total number of input symbols that have been used to construct the models. • Let M be a special AMM that is constructed using all of the input symbols that have been seen. This notion is now employed in the regime detection algorithm. 6.2.2 Algorithm for Regime D etection 1. Initialize M , m0, and set n < — 0. 2. Get an input symbol and use it to update M and all m,. 3. Set n < — n + 1. 4. For all i such that (n mod fi) = 0 (a) If no such mi exists due to the fact that a new time scale has been reached, then create m, and initialize it to equal M . (b) Call Stat(mt) to get the statistic for the model and insert that value into Qi. (c) If the length of Qi equals k, then call DetectRegime(Q,). 83 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (d) If DetectRegime(Qi) returns true, then the regime has changed, else it has not. (e) Re-initialize rm to be an empty model. Stat() is a function on an AMM, returning application-dependent statistics extracted from the model (e.g., the mean time in a state/behavior). DetectRegimeQ performs a statistical hypothesis test (such as Students t or ANOVA) on a list of values. DetectRegime() returns true if the result is significant, false otherwise. Essentially, DetectRegime() provides a meta-threshold based on a statistical hypothesis test. The threshold is not a set number, but rather a measure of the statistical significance of the shift in the environment. The algorithm maintains multiple AMMs at different time scales. At each time step, each AMM (mi and M ) is updated with a new input symbol. If no rn, exists for a new, larger time scale {fi), then that model is created and initialized to M. If a model m, has received its maximum number of input symbols (as designated by fi), then Stat(m ,) is called to extract the appropriate application-dependent data from it, and m, is reinitialized to be empty. The data from mi is inserted into a queue Qi of maximum length k, and if | Q, |= k then DetectRegime(Q,) is called to test for significant differences in the values of Qt. It is just such a significant difference or shift in the data which is designated as a regime change. In the next section we describe our experimental setup and example. 6.3 T he Land M ine C ollection Task To validate our algorithm for detecting global environmental regime changes, we use a task analo­ gous to the land mine collection example from the beginning of the chapter, which is also a version of foraging from Chapter ‘ 2. We use one R2e robot (Figure 2.3) in the experiments. The Corrall is adjusted to be either II x 14 feet or 11 x S feet, depending on the experiment, and contains up to 36 pucks of two different colors: clear (representing large mines) and black (representing small 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ones). Figure 6.1 shows two experimental configurations. The behavior space (Section 4.4) for this task consists of the following nine behaviors: • avoiding: avoid any object (detected by IR and contact sensors) deemed to be in the path of the robot. • wandering: move forward and, at random intervals, turn left or right through some random arc. • puck detecting: if avoiding is not active, and if an object is detected by the front IRs, lift up the gripper fingers to determine whether the object is short enough to be a puck. If it is, approach the object and try to place it between the fingers and pick it up. If unsuccessful, perform avoiding. • color detecting: if puck detecting is successful, detect the color of the puck. If it is the desired color, then perform homing, else perform leave puck. • leave puck: drop the puck and continue searching for more, using avoiding, wandering and puck detecting. • homing: if carrying a puck, move towards the designated goal, Home. • creeping: when near Home, perform a slower, more accurate homing behavior. • exiting: if in the Home region, drop puck and exit Home. • reverse homing: move away from the Home region. The two behaviors that are new to the land mine collection task are color detecting and leave puck. The color detecting, homing, avoiding, and creeping behaviors are also qualified to indicate the color of puck the robot has found. Control of the robots drive motors was the basis for selecting the constituent members of this behavior space. When active, each of the behaviors has exclusive 85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11 feet 11 feet Home Buffer Boundary feet 14 feet Figure 6.1: Two versions of the mine collection task environment: (Left) 11 x 14 foot Corrall with 9 clear and 18 black pucks; (Right) 11x8 foot Corrall with 18 clear pucks. control of the motors, and together they account for all activity (or inactivity) of the motors for the duration of the task. 6.3.1 Validating the Approach In order to validate our approach to regime detection, we show that: (1) regime changes do happen at different time scales, and (2) our algorithm using multiple AMMs can detect such changes, as brought about by shifts in large mine density. We compare results from two versions of the mine collection task that are identical except for the environmental setup. The hypothesis is that the decrease in environment size and the increase in clear puck (large mine) density in the second version pushes the regime change to a different time scale, most likely smaller. The first version of the task uses an 11 x 14 foot (large) Corrall with 9 clear and 18 black pucks evenly distributed throughout (Figure 6.1: Left). With no a priori information about the environment, the robot must collect only the clear pucks (i.e., large mines), while executing the regime detection algorithm to determine when to switch to black pucks (i.e., small mines). (In 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reality, data were sent via a serial radio link to an off-board Power Macintosh G3/266 which performed the regime detection algorithm and notified the robot of any regime changes. This was done because programming limitations of the R2’s and the Behavior Language made implementing the algorithm on-board an R2 extremely difficult. These limitations are platform-specific.) In the second version, the Corrall is decreased in size to 11 x 8 feet (small) and only 18 clear pucks are used (Figure 6.1: Right). The key statistic of interest in these two versions is the time scale at which the robot detects a regime change and decides to begin collecting black pucks (small mines). V V e complete the description of the validation experiment by presenting the parameter values used in the regime detection algorithm: the minimum time scale r = 5: the number of statistics kept for each model was k = 8: function /; = 2‘r; Stat(m,) returned the number of pucks that had been collected during the lifetime of AMM m*; and Detect Regime (Q,) performed an analysis of variance (ANOVA) on two groups of data (namely, the first and second * values in Q i ) , to determine if the means were different at a significance level of 10%, indicating a significant environmental shift. Since in each trial the robot was initialized to collect clear pucks (large mines), DetectRegime(<3i) essentially determined if the number of clear pucks changed significantly enough over k consecutive intervals of size fi to indicate a regime change. V V e conducted five experimental trials in each of the two environments and gathered data about the time scale at which regime detection occurred. In each of the 10 trials, the algorithm successfully detected a regime switch. In the large Corrall environment, the mean time scale of detection was 1024. while in the small Corrall it was 256. (Since data were collected at 2 Hz, this translates to approximately 512 seconds and 128 seconds, respectively.) A hypothesis test based on Student’s t distribution (Freund 1992) indicates that the two means in the experiments are statistically different at a significance level of 1%. Thus, we have validated our approach by showing that regime changes do occur at different time scales (even in the same task but with different environments), and that our algorithm is able to detect such changes. Next, we describe a more sophisticated use of our approach. 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Puck type Trial # 1 2 3 4 5 Clear pucks 4 8 15 10 14 Black pucks 8 4 16 9 10 Table 6.1: Pucks remaining in the environment at the end of each trial of the proportion maintaining mine collection task. 6.3.2 M aintaining the Proportion of Mines In a more complex version of the mine collection task, the robot is required to maintain the proportion of large to small mines in the environment at a specified value p. A significant switch in this value indicates a non-local regime switch, since p itself is a non-local measure. Once again, the robot begins by collecting large mines, but this time switches to small mines when the observed proportion p0b * is significantly different from p, and pab a < p. Conversely, the robot switches back to large mines when p06* > p and this difference is significant. The goal of this experiment was to determine whether the robot could detect multiple consecutive regime changes in its environment due to shifts in the proportion of large to small mines. For this experiment, the Corral was 11 x 8 feet and contained 18 each of clear and black pucks. The parameter values used in the regime detection algorithm were: r = 5: k = 4; f(i) = 2‘r: Stat(m j) returned the proportion of clear to black pucks encountered: and DetectRegime(Qj) performed an analysis of variance (ANOVA) at a significance level of 10% on Qi and a list of length k having all values equal to p. The proportion p was set to 1.0, indicating that the robot should try to maintain equal numbers of the two types of mines. Whenever a regime switch was detected, the regime detection algorithm was re-initialized so as to be able to detect the next regime change. In this experiment, a trial was considered complete when the robot detected two consecutive regime changes. The robot successfully did so in each of the five trials that were conducted. Table 6.1 shows the numbers of clear and black pucks remaining in the environment at the end of each trial. The correlation between the numbers is quite large (p = 0.70) and indicates that 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. their proportion tended to be close to 1.0. Thus, not only was the robot able to detect multiple consecutive regime changes, but was also effective in maintaining the desired proportion of pucks (mines). 6.3.3 M axim izing Reward An alternate experimental scenario requires the robot to maximize the expected reward garnered from collecting mines. Instead of designating a priori that the robot begin by collecting large mines (as in the previous experiments), the robot is told the reward value associated with each type of mine and must decide which to collect in order to maximize its total reward. Reward values can be set in proportion to a mine’s explosive power, thus making reward maximization identical to minimizing the mine field’s destructive potential. Regime detection enters the scenario when the environment is non-stationary, i.e., puck densities shift as they are collected or replaced. The following chapter describes this scenario in detail. 6.4 Sum m ary This chapter presented a novel approach that enables an agent to detect and respond to global environmental regime changes having no a priori knowledge or models of the environment, and limited to only local sensing. Multiple AMMs were constructed at different time scales and used to derive sets of statistics that were analyzed to detect a regime change. The approach was successfully validated on a physical mobile robot performing a land mine collection task. The next chapter presents another application in the non-stationary domain: reward maximization. 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 7 A M M s in N on-Stationary Problem Dom ains: Reward M axim ization This chapter explores a second application of AMMs and behavior-based control to modeling agent-environment interaction dynamics in non-stationary problem domains. The problem explored in this chapter is reward maximization. Similar to the approach to regime detection in Chapter 6 . the approach here also uses multiple AMMs to monitor the interaction dynamics at different time scales, but for the purposes of estimating the state of the environment. The approach is validated with a real mobile robot performing a mine collection task in both abruptly and gradually changing environments. 7.1 Introduction and M otivation In certain classes of tasks, an agent may be required to perform optimally with respect to the information it possesses about the structure of its environment. Reward maximization may be used as a means of quantifying performance. In that framework, the agent receives reward (e.g., points) in proportion to its performance. Reward maximization in a non-stationary environment requires the agent to be able to estimate the state of the changing environment. There are a number of issues that can compound the difficulty of this problem: 90 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • The agent may have no a priori knowledge of the environment and thus also lack a baseline for gauging the non-stationarity of the environment. • Given only local sensing capabilities, the agent may require a significant amount of time to estimate the state of the environment. Any estimate of state, however, may be outdated in a non-stationary system. • The nature of the task may be stochastic, with uncertainties large enough to preclude an effective predictive model of environmental state, or dynamics too complex to make the development of such a model feasible or tractable. Alternatively, however potentially simple the system, there may be no a priori data with which to instantiate a model. • Further, in a stochastic system, the variability associated with performing a task (or ele­ ments there of) may be enormous and effectively mask gradual shifts in the environment. Conversely, in a system with very low variability, even minute shifts may be easily detected. Thus, effective estimation of environmental state requires an understanding of the system’s variability (as often measured by variances, covariances, etc.). • Depending on the task or environment, the time scale at which the non-stationarity manifests and thus can be detected may differ. For example, in one task, the environmental change may be almost instantaneous, detectable between one moment and the next. In another task, the change may be slow and incremental, requiring the examination of a large time interval for detection. Hard-coding the agent with a specific time scale to use for state estimation can be problematic. A time scale that is too small makes the agent incapable of detecting the change. Conversely, a time scale that is unnecessarily large increases the time required to detect the change and may be undesirable in time-critical situations. As a concrete example, consider the task of collecting undetonated land mines in a field. Assume that there are two types of mines, large and small, with destructive power proportional to their size. The robot’ s goal is to minimize the destructive power of the mine field as much as possible 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. during a given period of time. When the robot is given points in proportion to the destructive power of the mines it collects, the goal becomes equivalent to reward maximization. To accomplish its goal, the robot must have enough data about its environment (the field) to intelligently decide whether it is best to collect large mines or small ones at each point in time. The difficulty of this task is compounded when the issues mentioned above apply. The task is likely stochastic, with unknown variability. The robot may have no a priori information about the numbers of large and small mines in the field, their distributions, or relative proportions. The robot may also lack global sensing of the mines in the field. These limitations relegate the robot to estimating the environmental state while performing the task. With only an estimate, however, the robot may not perform in a globally optimal manner. The heart of this problem, therefore, is to use the best possible estimate of environmental state given the limitations of the system. If the task environment is stationary, then all of the data the robot gathers may be used to estimate the state, with more data presumably providing a better estimate. Conversely, for non- stationary environmental state estimation, some mechanism must exist for discarding old data. This is a tricky proposition. If too much data are discarded, the estimate may be susceptible to noise and variance: if too little are discarded, the estimate may be skewed and not accurately represent the current state. This is analogous to the issues of overfitting and underfitting generally encountered in machine learning (Mitchell 1997). The appropriate amount of data to be kept is not necessarily static and pre-determinable, but rather, depends on the variances of the system and the type of non- stationarity exhibited. Low variances require less data (i.e., a smaller time scale) to characterize, as does non-stationarity exemplified by abrupt shifts. Both high variances and gradually shifting non-stationarity require greater amounts of data (i.e., a larger time scale) to characterize. Thus, a mechanism for estimating environmental state must accommodate both the variances and the type of non-stationarity exhibited by the system. Additionally, since multiple types of relevant non-stationarity and variances may exist in the system, a state estimation procedure capable of dynamically compensating is desirable. 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We propose an algorithm that provides a moving average estimate of the state of a non- stationary system. The algorithm dynamically adjusts the window size used in the moving average to accommodate the variances and type of non-stationarity exhibited by the system, while discard­ ing outdated and redundant data. Our focus is the application of the algorithm to the problem of reward maximization in a non-stationary environment. Similar to the algorithm in Chapter 6 , the algorithm here also uses multiple AMMs to capture interaction dynamics at different time scales for evaluations at those time scales. The state of the environment is estimated indirectly though the robot’s interaction with it. As task execution continues, AMMs are dynamically generated to accommodate the increasing time intervals. Sets of statistics from the models are used to deter­ mine whether old data and AMMs are redundant/outdated and can be discarded. This approach requires no a priori knowledge, uses only local sensing, and captures the notion of time scale. Additionally, it works naturally with stochastic task domains where variations between trials may change the most appropriate amount of data for state estimation. In the next section, we use AMMs and the evaluations from Section 4.3 in our dynamic moving average algorithm. 7.2 D ynam ic M oving Average A lgorithm To manage the amount of data used to estimate the state of a non-stationary environment, we present an algorithm which essentially computes a moving average with a dynamic window size. The algorithm maintains multiple AMMs for different time intervals, and uses a t-test and F- test on comparable values from the different AMMs to determine which AMM provides the best information. These tests allow the algorithm to adjust the window size of the moving average to accommodate both the amount of variance in the system and the type of non-stationarity (ranging from abrupt to very gradual). The window size of the moving average is allowed to grow by 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. maintaining and expanding old AMMs, and is shrunk by deleting AMMs. V V e now present the algorithm. 1. Let £ be a queue-like list of AMMs, with Co as the first element. Initialize C to contain one AMM. 2. Let S 7i and up- be constants specifying the significance levels for the t-test and F-test, respectively. 3. Let mode be a variable designating the two modes of the algorithm. 4. For each new input symbol do the following: 5. Update each AMM in C with the new input. 6. If it is time to create a new AMM. then: 7. Create a new AMM and add it to C. 8. Compute the mean first passage matrix for Co, extract the desired values and calculate their associated variance and degrees of freedom. 9. Do the same for C\. 10. Perform an F-test between the variances calculated for Co and £ l . 11. If the significance level returned by the F-test is less than (i.e.. the variances are different), then set mode= 1, else set mode=2. 12. If mode—= l, then let i be the index of the first AMM in C after Co that has either significantly different variances or significantly different means (i.e., significance level < zjp.zvt). If such an i exists, delete Co through £,_•> and use the new £o as the best estimate of the state. 13. If mode==2, then let i be the index of the first AMM in £ after Co that has neither significantly different variances nor means (i.e., significance 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. levels > x*jF,zjt). If such an i exists, delete Co through £,_t and use the new Co as the best estimate of the state. 14. If no such t exists for either value of mode, then do not delete any AMMs and use the current C0 as the best estimate. There are several characteristics of the algorithm worth noting. The decision criterion used to create a new AMM (line 6 ) is very general. For example, a new AMM might be created after a certain period of time, a certain number of input symbols, or when a particular input symbol is observed. In the experiments described below, a new AMM is created every time the robot finds an object (puck) to collect. The algorithm adjusts the amount of data in the moving average to accommodate the variance in the system. This is accomplished by considering deletion of Co. the AMM representing the largest time window of data, only when its variance is comparable to (i.e.. not significantly different from) that of another AMM. When the variability in the system is high, the AMMs require more data (i.e., larger windows) to acquire comparable variances. When variability is low. less data are required to accurately characterize the variances of the system. The algorithm also has two distinct modes. In the first mode (m ode=l), the algorithm removes redundant/old data. In systems with very gradual non-stationarity, this mode effectively main­ tains a good moving average estimate of the state. When there is an abrupt change in the system, the means and variances of adjacent AMMs may become increasingly different as more data tire collected, causing the first mode to stall (i.e., not delete old AMMs). The second mode (mode=2) solves this problem by comparing non-adjacent AMMs to find two that are not significantly differ­ ent. The second mode then “ jumps over" the intervening AMMs by deleting them. 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. One final Item of note is the importance of the zjt and zjf thresholds for significance level. Both values must in the interval [0,1.0], with a significance level of 0.05, or less, generally con­ sidered significant. The effect of zjt and zjp is to adjust the size of the moving average window. Extremely large values of both thresholds produce a very large window with excessive smoothing and a potentially skewed estimate of state. Very small values of the thresholds result in a small window and state estimation that is prone to overfitting. Empirical tests suggest that values in [0.01,0.1] tend to work fairly well in the algorithm, with relatively little sensitivity. It should also be noted that the experiments described later use this algorithm in real time. The experimental verification of this chapter is done using the land mine collection task of Section 6.3. Figure 7.1 shows how the Corrall was setup. V V e now present the validation experiment and results for the reward maximization criterion. 7.3 Experim ent 1: Validation of the Reward M axim ization Criterion Before demonstrating reward maximization in a non-stationary version of the mine collection task, we first validate the reward maximization criterion in a stationary environment. This validation is necessary to ensure that our subsequent results are not biased by an invalid assumption about the value of reward maximization. If it were shown that our reward maximization criterion did not improve performance over random behavior, then the utility of our dynamic moving average algorithm in the non-stationary version of the task would be suspect. V V e now more formally define the reward maximization criterion. Let 71, and 7l( be the rewards for small and large mines, respectively. Let r / be the expected time required for the robot to find a small mine, and let r f be the expected time to deliver the mine to the goal location once it has been found. Similarly, r f and r[l represent these times for large mines. The robot maximizes its reward by deciding for each mine found whether to deliver it or leave it in search of a higher valued 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11 feet Home Buffer Boundary 14 feet Figure 7.1: The mine collection task: setup for validation of the reward maximization criterion. mine. The action chosen is the one that maximizes the expected reward per unit time (and thus the overall expected reward). If the robot finds a small mine, and the inequality R a R i r d t { + T f holds, then delivering the small mine maximizes reward. Otherwise, the small mine should be left and a large mine sought. The complementary inequality is used when a large mine is found. The main issue in evaluating the inequality is calculating rd and t? for each mine type. One could maintain internal variables that record these values. Our approach, however, is to calculate these values from the robot’ s AMM. As discussed previously, each element, e^, of E gives the mean first passage from state t to state j . E also contains the values for r-f and r d. t? is simply the entry in E associated with the minimum mean time from a wandering state to a puck color state, and r d is the minimum mean time from a puck color state to a wandering state. In other words, if is the input symbol for the wandering behavior and is the input symbol for the puck color 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. behavior, then Td = Sol and = ci2- Since these states are qualified by the color of puck the robot possesses, r, and 7 7 can be distinguished. We performed two sets of experiments in this validation: one control set where the robot col­ lected both types of pucks without discrimination, and one set with reward maximization allowing the robot to decide which pucks to collect. For these experiments, reward values were set in pro­ portion to a mine’s explosive power, thus making reward maximization identical to m inim izing the mine field’s destructive potential. The setup for both sets of experiments was identical, with 18 clear pucks (large mines) and 18 black pucks (small mines) evenly distributed in the Corrall (Figure 7.1). Each clear puck had a reward of 4 points, while each black had a reward of 1 point. In addition, the environment was kept stationary by replenishing collected pucks. V V e performed five one-hour trials for each experiment. Using the reward maximizing criterion, the robot accrued an average of 46.6 points (standard deviation of 6 .8 ), while without reward maximization the robot averaged 37 points (standard deviation of 5.0). A hypothesis test based on Student’s t indicates that the means are different at a significance level of 5%. and validates our reward maximization criterion. V V e now present experimental results applying the dynamic moving average algorithm to an abruptly changing non-stationary version of this task. 7.4 Experim ent 2: A bruptly Changing Environm ent In this set of experiments, we aim to show that, in an abruptly changing non-stationary version of the mine collection task, reward maximization with the addition of the dynamic moving average algorithm is superior to reward maximization alone. The hypothesis is that when the environ­ ment changes, average values of and t‘ 1 become inaccurate and thus not effective for reward maximization. Using a dynamic moving average allows quicker adaptation to change. 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11 feet Home Buffer Boundary 14 feet Figure 7.2: The mine collection task: setup for reward maximization in a non-stationary environ­ ment. The environment was first initialized to contain only 18 clear pucks (Figure 7.2). We ran one trial of approximately 20 minutes in this environment, during which the robot collected 4 clear pucks. This result is extremely variable: the actual time to collect 4 pucks could easily range between 10 and 30 minutes. In order to reduce the variability, the data from this one trial were used as a primer for all of the subsequent experiments, in which the white pucks (large mines) were replaced with black (small) ones. Doing so allowed us to focus on the key experimental parameter: the time required to adapt to the new environment. V V e considered the robot to have adapted to the new environment when it consistently began collecting black pucks. We ran two experiments using reward maximization with the dynamic moving average algorithm and three experiments using reward maximization alone. In these experiments, the reward for a white puck was 7 points and the reward for a black puck was 1 point. For reward maximization alone, the mean time to adaptation was 47 minutes, while the mean time with the algorithm was 18.3 minutes. A t-test indicates that these means are different at a significance level of 0.01. 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This strongly supports our hypothesis that the dynamic moving average algorithm allows quicker adaptation to abrupt non-stationarities. 7.5 Experim ent 3: G radually Shifting Environm ent In this set of experiments, we test the effectiveness of the dynamic moving average algorithm for reward maximization in a gradually shifting environment. Instead of abruptly changing the color of the pucks, as in the previous experiment, here the environment shifts slowly as the robot collects pucks. Due to the high degree of variance in the mine collection task using physical robots, we anticipated that the number of experiments we would have to conduct in order to obtain statistical significance in the gradually shifting environment would pose a practical impossibility. The experiments described in this section were therefore conducted in a simulation of the mine collection task. clear puck black puck robot Home Figure 7.3: The simulated mine collection task: setup for reward maximization in a gradually shifting non-stationary environment. 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The simulation was initialized to contain 18 clear pucks worth 10 points each, mid 18 black pucks worth 1 point each (Figure 7.3). Three experimental scenarios were examined: 1. random: the robot collects any puck it encounters regardless of the color. 2. control: the robot uses the reward maximization criterion, but does not employ the dynamic moving average algorithm used to compensate for non-stationarity. 3. algorithm: the robot uses both reward maximization and the dynamic moving average algorithm for state estimation. 200 160 140 120 £ 100 80; 6 0 40 20 20 25 30 simulation thne steps (1000s) 40 45 Figure 7.4: Average accrued reward for the three experimental scenarios (puck point values: black=l, clear=10). random control algorithm Expected average reward 151.6 149.3 153.8 Standard deviation 5.7 5.7 5.5 Table 7.1: The average reward points the robot is expected to have accrued during the random , control and algorithm scenarios (puck point values: black=l. clear= 10). 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We conducted trials of 50,000 simulation steps for each scenario: 200 trials of the random scenario, 40 of control, and 100 of algorithm , with the actual number determined by the desired level of statistical significance. The data gathered included the time at which each puck was collected, allowing us to calculate the accrued number of reward points. Figure 7.4 presents the average number of reward points accrued at 1000-time-step intervals for each of the three scenarios. The maximum possible accrued reward is 198 points, corresponding to the collection of all 36 pucks. Both the random and algorithm scenarios essentially reach this maximum by the end of each trial (with average accrued points of 197.8 and 197.5, respectively). The control scenario outperforms the other two until around 2 0 ,0 0 0 time steps, then quickly declines ending with an average reward of 184.4. The discrepancy between the algorithm and control cases illustrates the importance of eliminating outdated information, and the effectiveness of our algorithm in doing so. As a further comparison, we calculate the number of reward points that the robot is expected to have accrued on average during the course of a trial: 151.6 for random, 149.3 for control, and 153.8 for algorithm (Table 7.1). The pair-wise comparison of the data using a two-sample version of Student’ s t test indicates significantly different means at p-value< 0.02. The superiority of the algorithm case illustrates the effectiveness of our moving average algorithm for reward maximization in a gradually shifting environment. Probability of collection 1.0 0.75 0.50 0.25 Expected average reward 151.6 148.5 140.3 118.3 Standard deviation 5.7 6.2 7.3 10.9 Table 7.2: The average reward points the robot is expected to have accrued during four versions of the random scenario with different collection probabilities (puck point values: black=l, clear= 10). It should be noted that the random scenario used above lies along a continuum of possible experiments distinguished by the probability with which the robot collects (or discards) each puck it encounters. For comparison with the control and algorithm scenarios, we used the random case in which the robot collects 100% of the pucks it encounters. Intuitively, one would expect this to 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 0 0 180 160 140 120 I too 60 40 -e - collection prooatnlify = 1.00 collection praoaodity = 0.75 -A- collection prooafidity = 0.50 collection 20 20 25 30 simulation time steps (1000s) 40 50 Figure 7.5: Average accrued reward for the four versions of the random scenario with different probabilities for collecting pucks (puck point values: black=l, clear= 10). be the best performing of the random cases. When the probability of collecting a found puck is less than 1.0 , the robot wastes more time searching for pucks but does not improve its reward since it discards both high-valued and low-valued pucks with equal probability. To verify our intuition, we conducted 80 trials for each of three lower probabilities of collection: 0.75. 0.50, 0.25. As expected, the data show that there is a continual significant decrease in accrued reward as the probability of collection decreases. The expected reward points accrued on average during the course of a trial (in order of decreasing collection probability) are: 151.6, 148.5, 140.3, 118.3 (Table 7.2). Pairwise t-tests indicate that the four random scenarios are indeed different at a significance level of 0 .0001, and Figure 7.5 visually demonstrates this difference. Thus, it might initially seem that the random cases with lower collection probabilities are more appropriate for comparison with the algorithm and control scenarios. It is, however, the collection probability of 1.0 that is the most difficult challenge for our algorithm, and consequently the one used for the comparisons described here as well as those in Section 7.3. 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. random control akjomrm \ 0 5 tO T S 20 25 30 3S 40 45 50 simulation nme steps (tO O O s) Figure 7.6: Average accrued reward for the three experimental scenarios with close point values for pucks (black=l, clear=4). random control algorithm Expected average reward 69.02 67.99 69.67 Standard deviation 2.28 2.16 2.46 Table 7.3: The average reward points the robot is expected to have accrued during the random , control and algorithm scenarios with close puck point values (black=l. clear=4). V V e also tested our algorithm in a more challenging set of experiments where the point values of pucks were closer together (black=l. clear=4). Close point values make the difference in reward for collecting the • ‘wrong” versus the “right” colored puck very small, and thus further sensitize the performance of the robot to the accuracy of the estimate of environmental state. In these experiments, we compared data from 140 trials of the control and algorithm scenarios, and 200 trials of the random scenario (all trials being run to 50,000 simulation steps). The expected reward accrued on average during these scenarios was, respectively: 67.99, 69.67, 69.02 (Table 7.3). A t- test shows these results to be different at a significance level of 0.02. The superior performance of 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the algorithm scenario given the closeness of the data (Figure 7.6) helps illustrate the effectiveness of our AMM-based, moving-average state estimation algorithm using dynamic windowing. 7.6 Sum m ary This chapter explored the use of AMMs and behavior-based control for modeling interaction dy­ namics in a non-stationary environment. Multiple AMMs were used as part of a moving average algorithm with dynamic windowing, which was applied to estimating the environmental state. This estimate provided a robot performing the land mine collection task with the information to make performance-improving decisions about which type of mine to collect. 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. C hapter 8 Param etric versus Nonparam etric A M M s This chapter compares the effectiveness of the erstwhile used parametric version of AMMs (assuming normally distributed state durations) to a nonparametric alternative (using the raw duration data without fitting it to a parametric distribution). The moti­ vation for this empirical comparison is the observation that the real-world, mobile-robot, behavior-execution data modeled in the previous chapters with parametric AMMs is, in fact, non-normal. The question is how the violation of normality impacts the effec­ tiveness of parametric AMMs. The answer, as we will see, is that the violation makes parametric AMMs less effective than their nonparametric counterpart when the order of the system is unknown. 8.1 Introduction In this chapter, we examine the effectiveness of modeling when the data used violate underlying assumptions of the model. The impact of such a violation on the desired application is not nec­ essarily obvious. Perhaps the combination of the modeling approach, the training data, and the target application is fairly insensitive to the deviation, which therefore has negligible impact on the performance of the system. Alternatively, this combination of factors might be quite sensitive to the deviation in assumptions, leading to relatively poor performance. Given the potentially 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. complex interaction of factors involved, an empirical study is a practical option for determining the relative merits of competing approaches to modeling. This chapter presents such an empirical study, focusing specifically on a comparison between parametric (Gaussian/normal) and nonpara­ metric versions of AMMs as applied to the generally non-normal robot data from the foraging task (Chapter 2). The question answered in this chapter is: Might the results of the previous chap­ ters have been different if a nonparametric version of AMMs were used instead of the parametric version ? The assumption of normally distributed data is not uncommon in machine learning and statis­ tics. In our case, this assumption led to a more parsimonious AMM representation by allowing the data to be incrementally summarized in mean and variance values. The nonparametric version, by contrast, must store all of the data at multiple Markovian orders. The use of parametric statistics also simplified certain expectation calculations (Chapter 4). It is intuitively apparent, however, that the robot foraging data violate this assumption, simply by noting that a robot can not spend negative time in a state. Figure 8.1 shows the actual distribution of data from the execution of four behaviors. It is clear, both graphically and from the chi-square goodness-of-fit test (Freund 1992, pp. 487-489), that this violation is quite severe, and that the robot data do not nicely fit any standard parametric distribution. While this result does not invalidate the work in previous chapters, it does lead us to question whether the results might have been better had we not used a parametric (Gaussian) version of AMMs, and instead used a nonparametric version making as few distribution assumptions as possible. In order to answer this question, we implemented a nonparametric version of AMMs. To the author’s knowledge, this is the first incremented, higher-order, nonparametric, SMP-like model. Details of the model representation and incremental construction algorithm are found in Appendix B. A review of relevant parametric and nonparametric statistics literature is provided in Sections 3.3 and 3.4. The next section describes one of the key differences between parametric and nonparametric AMMs — the test for node-splitting. 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. reverse homing 200 < o 150 a > o c ® 3 100 3 " o * 50 20 40 60 80 Duration in state (# of symbols) homing S 400 = 300 o 200 50 100 150 Duration in state (# of symbols) exiting 60 50 C f l S 40 s 1 30 8 o 20 20 30 Duration in state (# of symbols) creeping 30 200 35 30 3 25 0 1 20 s« o _ * 10 50 100 150 200 Duration in state (# of symbols) Figure 8.1: The distributions associated with the execution of four behaviors (reverse homing, exiting, homing, creeping) in the foraging controller (Section 2.3). The graphs were generated with real data captured from the physical mobile robots. 8.2 Param etric and Nonparam etric N ode-Splitting There are two types of node-splitting that can occur in the AMM construction algorithm: one is due to link traversal inconsistencies; the other occurs if the duration in a state differs significantly depending upon the particular multi-link transition sequence that enters that state. Whereas the former type of node-splitting relies on the binomial distribution, the latter depends on the SMP- like distribution that characterizes the time spent in a state. Node-splitting based on durations, therefore, encompasses the distinguishing characteristics of parametric and nonparametric AMM 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. construction. It will also be the primary cause of differences between parametric and nonparametric AMMs in the forthcoming evaluation study. In parametric AMMs, a t test (based on Student’s t distribution) is used to determine a sig­ nificant deviation in the mean time spent in a state. The particular form of the test allows for non-identically distributed populations (Press et al. 1992). The key point regarding the t test is that it assumes that observations are independent and are drawn from normally distributed populations. When these assumptions hold, the t test is a powerful test of location (Siegel & Castellan 1988). It also tends to be sensitive, however, to outliers and deviations from its assumptions. In nonparametric AMMs, the hypothesis test used to determine a significant discrepancy in state duration is a median test by Fligner Rust (1982). This test makes few distribution as­ sumptions, allowing the underlying distributions to be both unequal (i.e., of different shapes) as well as asymmetric. This test, however, can have difficulties if the distributions are severely asym­ metric with many values equal to the median. Section 3.4.1 presents the details of the test and Appendix C provides tables of critical points. The next section describes the simulation used in evaluating parametric and nonparametric AMMs. 8.3 Sim ulation and Evaluation In order to evaluate the relative merits of parametric and nonparametric AMMs. we use a simulation of the foraging task employing real robot data collected from seven behaviors: avoiding, wandering, puck detecting, homing, creeping, exiting, and reverse homing (Section 2.3). When active, each of these behaviors has exclusive control of the motors, and together they account for all activity (or inactivity) of the motors during the foraging task, i.e., they constitute a behavior space for the task (Section 4.4). To gather the data used in the simulation, we polled the behavior activity of a robot at approximately 2 Hertz, as it performed the foraging task over a period of 10 hours. 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. There are two reasons why a simulation, rather than experimentation on real robots, is used in this study. First, since we designed the simulation, we know the precise characteristics of the system that the AMMs attempt to model. This provides exact ("‘theoretical”) baseline data with which to evaluate the performance of the AMMs. Due to the possibility of hidden state, complex stochasticity, and non-stationarity, it would be difficult to derive any such exact system information in the real world. Second, even if sufficiently faithful baseline information about the real world system were available, the hundreds of hours of trials that would be necessary with real robots would still pose a practical impossibility. reverse homing exiting 0.1 0.7 0.3 avoiding 0.6 w andering avoiding creeping avoiding 0.3/ 0.3 puck detecting 0.2 0.7 hom ing Figure 8.2: The transition model of the foraging task used by the simulation for evaluating para­ metric and nonparametric AMMs. Figure 8.2 presents the transition diagram of the foraging task used by the simulation. During each step of the simulation, a transition is made from the current state to a new state according to the probabilities in the graph. In the new state, the simulation randomly samples a value from the population of durations for that behavior, from the real robot data. For example, upon entering wandering, the simulation might pick a value of 9 from the distribution of durations that the real robot spent in the wandering behavior. The simulation then feeds 9 symbols representing the 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. wandering behavior to the parametric or nonparametric AMM construction algorithm. Note that the simulation is second-order Markovian, since when transitioning from the avoiding behavior, the system must remember the previous behavior in order to be able to return to it. In other words, a transition from the avoiding behavior not only depends on the current state, but also the previous one. Since the simulation has the actual sequence of state transitions that were made, and the duration spent in each state, it is possible to compute the "theoretical” value for the mean first passage between any two states. This can be compared to the corresponding values derived from the AMMs using Markov chain expectation calculations. Our interest is in the mean first passage between two states that represent the execution of particular behaviors, but because multiple states might represent these behaviors, we wish to calculate s,j, the minimum mean first passage between two input symbols, i and j (Section 4.3.1). This is one of the key AMM-based evaluations used in the previous chapters, and in particular. Chapter 7. Because there are 7 behaviors (input symbols) in the simulation, there are 49 distinct etJ. The evaluation metric. Ar , for the performance of the parametric and nonparametric AMM implementations relies on the calculation of the sum of the absolute differences between corresponding values of i.e., = 5 1 i £ iJ i- a l l i.J where Sij is the actual ("theoretical baseline” ) value from the simulation and fTj is the value calculated from the AMM. Thus, smaller values of A, signify better performance by the AMM. The next section presents the simulation results. 8.4 Experim ental R esults This section presents results of the study comparing the relative effectiveness of parametric and nonparametric AMMs in modeling the simulated version of the foraging task using real robot data. I l l Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. There are two main factors that influence the performance of the AMM construction algorithm, and which we explore in conjunction with the parametric/nonparametric distinction. One factor is the significance level of the binomial confidence interval test and the location test that are used to determine whether node splitting is required. The higher the significance level, the less confidence there is in the decision to split a node, and thus the more likely it is that the algorithm will incorrectly do so. The second factor is the user-specified maximum order of the AMM, nmax. Each time the algorithm determines whether node splitting is necessary for the current state, it checks T 1,... T " " * 1' 1, the nmax — I structures containing statistics on multi-link transitions. Thus, the larger nmax is, the more opportunities the algorithm has (and, thus, the more likely it is) to incorrectly split a node. It is these incorrect splits that are the primary cause of poor performance. We first consider a node-splitting significance level of 0.05 and nmax = 10. Figure 8.3 shows the average performance over 100 trials of 10000 simulation steps, using the A , metric described previously. The performance of the parametric version degrades after 4000 time steps, indicating that it does not converge to a stable topology. The d ata clearly show that nonparametric AMMs model the simulation more faithfully than do parametric ones. This result is not very surprising considering that the non-normal robot data places the t test of the parametric AMM algorithm at a distinct disadvantage to the more robust median test of the nonparametric version. Figure 8.4 shows that this disadvantage is still evident at a node-splitting significance level of 0.01, although the performance of the parametric version is not as deficient as at a level of 0.05. Further simula­ tion results at a node-splitting significance level of 0.005 support the continued superiority of the nonparametric version. We now consider the impact of the user-specified maximum order of the model, nmax. In general, we observe that, for both parametric and nonparametric AMMs. the value of nmax should be greater than or equal to the actual order of the system being modeled. This allows the model construction algorithm the chance to capture the correct order of the system. One caveat, however, is that nmax should also be close to actual order of the system. If, for example, the system is sixth 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5000 - 0 - parametric AMM - a - nonparametric AMM 4500 4000 g 3000 2 2500 ® 2000 to 1500 1000 500 1000 2000 3000 4000 5000 Time steps 6000 7000 8000 9000 10000 Figure 8.3: The performance, A£, of parametric and nonparametric AMMs with a node-splitting significance of 0.05 and nmax = 10. This graph shows the average over 100 trials of 10000 simulation steps. Asterisks indicate a significant difference at a level of 0.01. order, it is better to set nmax to 8 than to 10. Of course, the user may not have a good idea of the order of the system, making it difficult to pick a good nmax. Unfortunately, an unnecessarily high value of nmax increases the possibility of node-splitting errors at high orders, which can significantly impact the effectiveness of the model. In support of these observations is the following experimental result: when nmax is set to 2, equaling the order of the simulation, there is no significant difference between the performance of parametric and nonparametric AMMs, regardless of the node-splitting significance level. The t test, used in the parametric version, is particularly sensitive to violations of its assumptions, so when the parametric algorithm is limited to a low nmax, it is also limited in the number of node-splitting mistakes it can make. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5000 - e - parametric AMM -B - nonparametric AMM 4500 4000 i 3000 ■ a 2 2500 .o " 2000 1500 1000 500 1000 2000 3000 4000 5000 Time steps 6000 7000 9000 8000 10000 Figure 8.4: The performance. A ,, of parametric and nonparametric AMMs with a node-splitting significance of 0.01 and nmax = 10. This graph shows the average over 100 trials of 10000 simulation steps. Asterisks indicate a significant difference at a le% -el of 0.01. These results suggest that, in the previous chapters, the nonparametric version of AMMs might have been more effective in modeling the robot-environment interaction dynamics if we had had no idea of the order of the system and had to pick a large nmax. Fortunately, we have extensive experience with the interaction dynamics arising in the variations of the foraging task, and felt confident that they were second order, and not higher. Thus. nm ax was set to 2. It is therefore unlikely that the nonparametric implementation would have afforded us better performance in the applications of the previous chapters. 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.5 Sum m ary This chapter compared the effectiveness of the parametric AMMs (used in the previous chapters) to a nonparametric version making few distribution assumptions about state durations. The results of a simulation study showed that the nonparametric version of AMMs provides more faithful models when the data are not normally distributed and the user-specified order of the system is larger than the actual order. In the applications of the previous chapters, the user-specified order was 2, making it unlikely that nonparametric AMMs would have provided better performance. 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 9 Sum m ary and Future Directions This dissertation presented a novel approach for capturing and evaluating, on-line and in real-time, the interaction dynamics between an agent and its environment. The approach is based on the synergistic combination of augmented Markov models (AMMs) and behavior-based control (BBC). Augmented Markov models, a contribution of this dissertation, provide a compromise between the generality of semi-Markov processes and the computational simplicity of Markov chains. AMMs allow standard expectation calculations from Markov chain theory to be combined easily with popular statistical hypothesis tests (such as the t and F tests) that assume normal distributions, or their nonparametric counterparts. This dissertation presented an incremental AMM construction algorithm that dynamically restructures models to represent, in first-order form, non-first-order Markovian systems. AMMs were used with behavior-based control to capture the execution history arising from the interaction between an agent performing a task and its environment. The use of behavior-based control, encompassing and abstracting both sensing and action, provides the representational ex­ pressiveness and parsimony necessary for on-line, real-time modeling with AMMs. The combination of AMMs and behavior-based control enables the effective evaluation of interaction dynamics and may be used to suggest application-dependent, performance-improving modifications to an agent’s policy. 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The effectiveness of the AMM-BBC approach was verified in both stationary and non-stationary mobile robot problem domains. Experimental results were provided for three applications in the stationary domain (fault detection, affiliation determination, dynamic leader selection) and two applications in the non-stationary domain (regime detection, reward maximization), all using vari­ ations of a foraging task. The results support the Thesis of this dissertation as presented in Chapter 1: AMMs, in conjunction with behavior-based control, enable effective evaluation of agent- environment interaction dynamics and facilitate performance-improving solutions to application challenges. Many extensions exist to the work in this dissertation. Several possible ones are: • Node-merging: This is a complement to node-splitting that would allow incorrect splits to be fixed when sufficient data are available to indicate the error. • Multiple simultaneous applications: There is no reason why the same AMMs could not be used simultaneous for several non-conflicting applications. One possible example is fault detection concurrent with dynamic leader selection. • Heterogeneous groups: It would be interesting to see how AMMs could be used to coordi­ nate group activity among heterogeneous agents with highly disparate characteristics and capabilities. Perhaps these, and others, will see the light of day. 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reference List Anscombe, F. J. (1950), 'Table of the Hyperbolic Transformation sinh 1 >Jx\ Journal of the Royal Statistical Society 113(2), 228-229. Arkin, R. C. (1998), Behavior-Based. Robotics, The MIT Press: Cambridge, Massachusetts. Arkin, R. C. & c Ali, K. S. (1994), Reactive and Telerobotic Control in Multi-Agent Systems, in ‘From Animals to Animats: International Conference on Simulation of Adaptive Behavior’, MIT Press, pp. 473-478. Arkin, R. C. & Hobbs, J. D. (1993), Communication and Social Organization in Multi-Agent Systems, in ‘From Animals to Animats: International Conference on Simulation of Adaptive Behavior’, MIT Press, pp. 486-493. Arkin, R. C., Balch, T. & c Nitz, E. (1993), Communication of Behavioral State in Multi-Agent Retrieval Tasks, in ‘IEEE International Conference on Robotics and Automation’, IEEE Com­ puter Society Press, pp. 588-594. Bailey, B. J. R. (1981), ‘Alternatives to Hastings’ Approximation to the Inverse of the Normal Cumulative Distribution Function’, Applied Statistics 30(3), 275-276. Bajcsy, R. (1988), ‘Active Perception’, Proceedings of the IEEE 76(8), 996-1005. Balch, T. (1997), Social Entropy: a New Metric for Learning Multi-robot Teams, in ‘Proceedings of the 10th International Florida Artificial Intelligence Research Society Conference (FLAIRS- 97)’, AAAI Press, Daytona Beach, FL. 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Balch, T. (2000), ‘Hierarchical Social Entropy: An Information Theoretic Measure of Robot Group Diversity’, Autonomous Robots 8 , 209-237. Ballard, D. H. (1991), ‘Animate Vision’, Artificial Intelligence 48(1), 57-86. Beckers, R., Holland, O. & Deneubourg, J. (1994), From Local Actions to Global Tasks: Stig- mergy and Collective Robotics, in ‘Artificial Life IV, Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems’, MIT Press, pp. 181-189. Beer, R. D. (1993), ‘A Dynamical Systems Perspective on Agent-Environment Interaction’, Artifi­ cial Intelligence 72, 173-215. Blyth, C. R. (1986), ’Approximate Binomial Confidence Limits’. Journal of the American Statistical Association 81(395), 843-855. Blyth, C. R. it Still, H. A. (1983), ’Binomial Confidence Intervals’, Journal of the American Statistical Association 78(381), 108-116. Boutilier, C., Dean, T. & Hanks, S. (1999), Decision Theoretic Planning: Structural Assumptions and Computational Leverage’, Journal of Artificial Intelligence Research 11, 1-94. Bradtke, S. J. & Duff, M. O. (1995), Reinforcement Learning Methods for Continuous-Time Markov Decision Problems, in G. Tesauro, D. Touretzky & c T. Leen. eds. ‘Advances in Neural Infor­ mation Processing Systems’, Vol. 7, The \n T Press, pp. 393-400. Brooks, R. A. (1986), ‘A Robust Layered Control System for a Mobile Robot’, IEEE Journal of Robotics and Automation RA -2(1), 14-23. Brooks, R. A. (1990), The Behavior Language; User’s Guide, Technical Report AIM-1227, MIT AI Lab. Brooks, R. A. (1991), Intelligence Without Reason, in ’Proceedings of the Twelfth International Joint Conference on Artificial Intelligence (IJCAI-91)’, Morgan Kaufinann, pp. 569-590. 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Camp, B. H. (1951), ‘Approximation to the Point Binomial’, Annals of Mathematical Statistics 22(1), 130-131. Cao, Y. U., Fukunaga, A. S. & Kahng, A. B. (1997), ‘Cooperative Mobile Robotics: Antecedents and Directions’, Autonomous Robots 4, 1-23. Cassandra, A. R., Kaelbling, L. P. & Littman, M. L. (1994), Acting Optimally in Partially Observ­ able Stochastic Domains, in ‘Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-94)’, Seattle, WA, pp. 1023-1028. Chrisman, L. (1992), Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinc­ tions Approach, in V V . Swartout, ed., ‘Proceedings of the 10th National Conference on Artificial Intelligence’, MIT Press, pp. 183-188. Chu, J. T. (1956), ‘Errors in Normal Approximations to the t. r. and Similar Types of Distribution’, Annals of Mathematical Statistics 27(3), 780-789. Cormen, T. H., Leiserson, C. E. & Rivest, R. L. (1990), Introduction to Algorithms, McGraw-Hill Book Company. Drogoul, A. & Ferber, J. (1992), From Tom Thumb to the Dockers: Some Experiments with For­ aging Robots, in ‘From Animals to Animats II’, The MIT Press: Cambridge, Massachusetts, pp. 451-459. Fenstad, G. U. (1983), ‘A Comparison Between the U and V Tests in the Behrens-Fisher Problem’, Biometrika 70(1), 300-302. Fisher, R. A. & Cornish, E. A. (1960), ‘The Percentile Points of Distributions Having Known Cumulants’, Technometrics 2, 209-225. Fligner, M. A. & c Policello, G. E. (1981), ‘Robust Rank Procedures for the Behrens-Fisher Problem’, Journal of the American Statistical Association 76(373), 162-168. 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Fligner, M. A. k Rust, S. V V . (1982), ‘A Modification of Mood’s Median Test for the Generalized Behrens-Fisher Problem’, Biometrika 69(1), 221-226. Fontan, M. S. k Mataric, M. J. (1998), ‘Territorial Multi-Robot Task Division’, IEEE Transactions on Robotics and Automation 14(5), 815-822. Freund, J. E. (1992), Mathematical Statistics, fifth edn, Prentice Hall. Gat, E. (1998), On Three-Layer Architectures, in D. Kortenkamp, R. P. Bonnasso k R. Murphy, eds, ‘Artificial Intelligence and Mobile Robotics: Case Studies of Successful Robot Systems’, AAAI Press, pp. 195-210. Gentleman, V V . M. k Jenkins, M. A. (1968), ‘An Approximation for Student's t-Distribution’, Biometrika 55(3), 571-572. Ghosh, B. K. (1979), ‘A Comparison of Some Approximate Confidence Intervals for the Binomial Parameter’. Journal of the American Statistical Association 74(368), 894-900. Goldberg, D. k Mataric, M. J. (1997), Interference as a Tool for Designing and Evaluating Multi- Robot Controllers, in ‘Proceedings of the Fourteenth National Conference on Artificial Intel­ ligence (AAAI-97)’, AAAI Press. Providence. Rhode Island, pp. 637-642. Goldsmith, S. Y., Feddema, J. T. k Robinett, R. D. (1998), Analysis of Decentralized Variable Structure Control for Collective Search by Mobile Robots, in ‘Sensor Fusion and Decentralized Control in Robotic Systems’, Vol. 3523 of SPIE Proceedings, SPIE, Boston, Massachusetts, pp. 40-47. Gordon, D. M. (1996), ‘The organization of work in social insect colonies’, Nature 380, 121-124. Hamaker, H. C. (1978), ‘Approximating the Cumulative Normal Distribution and its Inverse’, Applied Statistics 27(1), 76-77. 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Han, K. & Veloso, M. (1999), Automated Robot Behavior Recognition Applied to Robotic Soc­ cer, in ‘Proceedings of the IJCAI-99 Workshop on Team Behaviour and Plan Recognition’, Stockholm, Sweden. Hanson, S. J. (1990), Meiosis Networks, in D. S. Touretzky, ed., ‘Advances in Neural Information Processing Systems 2’, Morgan Kaufmann, San Mateo, CA, pp. 533-541. Hasegawa, Y., Ito, Y. & Fukuda, T. (2000), Behavior Coordination and its Modification on Brachiation-type Mobile Robot, in ‘Proceedings of the 2000 IEEE International Conference on Robotics and Automation’, IEEE, San Francisco, CA, pp. 3984-3989. Hawkes, A. G. (1982), ‘Approximating the Normal Tail’, Statistician 31(3), 231-236. Hettmansperger, T. P. & Malin, J. S. (1975), "A Modified Mood’s Test for Location with no Shape Assumptions on the Underlying Distributions’, Biometrika 62(2), 527-529. Hettmansperger, T. P. & McKean, J. W. (1998), Robust Nonparametric Statistical Methods, Vol. 5 of Kendall’ s Library of Statistics, John Wiley & Sons. Holland, O. & Melhuish. C. (2000), 'Stigmergy, Self-Organization, and Sorting in Collective Robotics’, Artificial Life 5(2), 173-202. Johnson, N. L., Kotz, S. Kemp, A. W. (1992), Univariate Discrete Distributions, Wiley Series in Probability and Mathematical Statistics, second edn. John Wiley and Sons. Kaelbling, L. P., Littman, M. L. & Moore. A. W. (1996), ‘Reinforcement Learning: A Survey’, Journal of Artificial Intelligence Research 4, 237-285. Kemeny, J. G., Snell, J. L. & Knapp, A. W. (1966), Denumerable Markov Chains, D. Van Nostrand Company, Inc. 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Koenig, S. Sc Simmons, R. G. (1996), Unsupervised Learning of Probabilistic Models for Robot Navigation, in ‘Proceedings of the IEEE International Conference on Robotics and Automa­ tion’, Vol. 3, pp. 2301-2308. Kosecka, J. Sc Bajcsy, R. (1993), Discrete Event Systems for Autonomous Mobile Agents, in ‘Pro­ ceedings of the Symposium on Intelligent Robotic Systems’, pp. 21-31. Lew, R. A. (1981), ‘An Approximation to the Cumulative Normal Distribution with Simple Coef­ ficients’, Applied Statistics 30(3), 299-301. Lin, J.-T. (1988), ‘Alternatives to Hamaker’s Approximations to the Cumulative Normal Distribu­ tion and its Inverse’, Statistician 37(4/5), 413-414. Lin. J.-T. (1990), ‘A Simpler Logistic Approximation to the Normal Tail Probability and its In­ verse’. Applied Statistics 39(2), 255-257. Lindstrom, M., Oreback, A. Sc Christensen, H. I. (2000). BERRA: A Research Architecture for Service Robots, in ‘Proceedings of the 2000 IEEE International Conference on Robotics and Automation’, IEEE, San Francisco, CA, pp. 3278-3283. Ling, R. F. (1978), ‘A Study of the Accuracy of Some Approximations for t , \ 2, and F Tail Probabilities’. Journal of the American Statistical Association 73(362), 274-283. Lund, H. H. Sc Pagliarini. L. (2000), RoboCup Jr. with LEGO MINDSTORMS, in ‘Proceedings of the 2000 IEEE International Conference on Robotics and Automation’, IEEE, San Francisco, CA, pp. 813-819. Mahadevan, S. Sc Theocharous, G. (1998), Optimizing Production Manufacturing using Reinforce­ ment Learning, in ‘Proceedings of the Eleventh International FLAIRS Conference’, AAAI Press, pp. 372-377. Mann, H. B. Sc Whitney, D. R. (1947), ‘On a Test of Whether One of Two Random Variables is Stochastically Larger than the Other’, Annals of Mathematical Statistics 18(1), 50-60. 123 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Mataric, M. J. (1992), Behavior-Based Systems: Key Properties and Implications, in ‘IEEE Inter­ national Conference on Robotics and Automation, Workshop on Architectures for Intelligent Control Systems’, Nice, France, pp. 46-54. Mataric, M. J. (1994), Interaction and Intelligent Behavior, PhD thesis, Massachusetts Institute of Technology. Mataric, M. J. (1997a), ‘Behavior-Based Control: Examples from Navigation, Learning, and Group Behavior’, Journal of Experimental and Theoretical Artificial Intelligence 9(2-3), 323-336. Special issue on Software Architectures for Physical Agents. Mataric, M. J. (19976), ‘Behavior-Based Control: Examples from Navigation, Learning, and Group Behavior’, Journal of Experimental and Theoretical Artificial Intelligence 9(2-3), 323-336. Mataric, M. J. (1999), Behavior-Based Robotics, in R. A. Wilson & F. C. Keil, eds, ‘The MIT Encyclopedia of Cognitive Sciences’, MIT Press, pp. 74-77. Mathisen, H. C. (1943), ‘A Method of Testing the Hypothesis that Two Samples are from the Same Population’, Annals of Mathematical Statistics 14(2), 188-194. McCallum, A. K. (1996), Reinforcement Learning with Selective Perception and Hidden State, PhD thesis, University of Rochester. Department of Computer Science, Rochester, New York. Michaud, F. & c Mataric, M. J. (1998), ‘Learning from History for Behavior-Based Mobile Robots in Non-stationary Conditions’, Autonomous Robots 5(3-4), 335-354. Mickey, M. R. (1975), ‘Approximate Tail Probabilities for Student’s t Distribution’, Biometrika 62(1), 216-217. Mitchell, T. M. (1997), Machine Learning, The McGraw-Hill Companies, Inc. Mood, A. M. (1954), ‘On the Asymptotic Efficiency of Certain Nonparametric Two-Sample Tests’, Annals of Mathematical Statistics 25(3), 514-522. 124 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Page, E. (1977), ‘Approximations to the Cumulative Normal Function and its Inverse for Use on a Pocket Calculator’, Applied Statistics 26(1), 75-76. Parker, L. E. (1992), Adaptive Action Selection for Cooperative Agent Teams, in ‘From Animals to Animats: International Conference on Simulation of Adaptive Behavior’, MIT Press, pp. 442- 450. Parker, L. E. (1994), Heterogeneous Multi-Robot Cooperation, PhD thesis, MIT. Paulson, E. (1942), ‘An Approximate Normalization of the Analysis of Variance Distribution’, Annals of Mathematical Statistics 13(2), 233-235. Peizer, D. B. & Pratt, J. V V . (1968), ‘A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, I’, Journal of the American Statistical Associ­ ation 63(324), 1416-1456. Pirjanian, P. (1998), Multiple Objective Action Selection & c Behavior Fusion using Voting, PhD thesis, Institute of Electronic Systems, Alborg University, Denmark. Pirjanian, P. k. Mataric, M. J. (2000), Multi-Robot Target Acquisition using Multiple Objective Be­ havior Coordination, in ‘Proceedings of the 2000 IEEE International Conference on Robotics and Automation’, IEEE, San Francisco. CA, pp. 2696-2702. Pirjanian, P., Christensen, H. I. k Fayman, J. A. (1998). ‘Application of voting to fusion of purposive modules: An experimental investigation’, Robotics and Automation 23, 253-266. P ratt, J. V V . (1968), ‘A Normal Approximation for Binomial, F, Beta, and Other Common, Related Tail Probabilities, II’, Journal of the American Statistical Association 63(324), 1457-1483. Prescott, P. (1974), ‘Normalizing Transformation of Student’s t Distribution’, Biometrika 61(1), 177-180. 125 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. (1992), Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press. Quenouille, M. H. (1953), The Design and Analysis of Experiment, Griffin, London. Rabiner, L. R. (1989), ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition’, Proceedings of the IEEE 77(2), 257-285. Roberts. F. S. (1976), Discrete Mathematical Models: With Applications to Social, Biological, and Environmental Problems. Prentice-Hall, Inc. Rosenblatt, J., Wiliams, S. & Durrant-Whyte, H. (2000), Behavior-Based Control for Autonomous Underwater Exploration, in ’Proceedings of the 2000 IEEE International Conference on Robotics and Automation’, IEEE, San Francisco, CA. pp. 920-925. Ross, S. M. (1992), Applied Probability Models with Optimization Applications, Dover Publications, Inc., New York. Schmeiser, B. W. (1979), ‘Approximations to the Inverse Cumulative Normal Function for Use on Hand Calculators’, Applied Statistics 28(2), 175-176. Scott, A. & Smith, T. M. F. (1970), ‘A Note on Moran’ s Approximation to Student’s t’. Biometrika 57(3), 681-682. Shannon, C. E. Weaver, W. (1963), Mathematical Theory of Communication. Univerisity of Illinois Press. Siegel, S. & c Castellan, N. J. (1988), Nonparametric Statistics for the Behavioral Sciences, second edn, McGraw-Hill. Smithers, T. (1995), W hat the Dynamics of Adaptive Behavior and Cognition Might Look Like in Agent-Environment Interaction Systems, in ‘Practice and Future of Autonomous Agents’, Mt. Verita, Switzerland. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sutton, R. S., Precup, D. & c Singh, S. (1999), ‘Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning’, Artificial Intelligence 112, 181-211. Tan, K.-H. & Lewis, M. A. (1997), ‘Virtual Structures for High-Precision Cooperative Mobile Robot Control’, Autonomous Robots 4(4), 387-403. Vaughan, R. T., Stoy, K., Sukhatme, G. S. & Mataric, M. J. (2000), Whistling in the Dark: Cooperative Trail Following in Uncertain Localization Space, in ‘Proceedings of the Fourth International Conference on Autonomous Agents’, ACM Press: New York, pp. 187-194. Wallace, D. L. (1959), ‘Bounds on Normal Approximations to Student’s and the Chi-Square Dis­ tributions’, Annals of Mathematical Statistics 30(4), 1121-1130. Wang, G. & Mahadevan, S. (1999), Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes, in ‘Proceedings of the Sixteenth International Conference on Machine Learning’, San Francisco, CA: Morgan Kaufmann Publishers, Bled, Slovenia, pp. 464-473. Werger, B. B. & : Mataric, M. J. (1996), Robotic “Food” Chains: Externalization of State and Pro­ gram for Minimal-Agent Foraging, in ‘From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior’. The MIT Press: Cambridge, Massachusetts, pp. 625-634. W hitehead, S. D. & Ballard, D. H. (1991), ‘Learning to Perceive and Act by Trial and Error’, Machine Learning 7(1), 45-83. Whitehead, S. D. & Lin, L.-J. (1995), ‘Reinforcement Learning of Non-Markov Decision Processes’, Artificial Intelligence 73(1-2), 271-306. Wilcox, R. R. (1997), Introduction to Robust Estimation and Hypothesis Testing, Statistical Mod­ eling and Decision Science, Academic Press. Wilcoxon, F. (1945), ‘Individual Comparisons by Ranking Methods’, Biometrics 1, 80-83. 127 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A ppendix A D esign and Evaluation of R obust Behavior-Based Controllers This appendix is designed to complement the main body of the dissertation by demon­ strating how robust behavior-based controllers for mobile robots can be designed and evaluated. The preceding chapters of the dissertation assumed the existence of (or abil­ ity to implement) a basic behavior-based controller for the foraging task. This basic controller (Chapter 2) served as the foundation for applications using on-line, AMM- based evaluations of the agent-environment interaction dynamics. This appendix pro­ vides a more complete understanding of how to construct and quantitatively analyze a behavior-based controller than do the preceding chapters. It also contrasts in the use of off-line evaluation, rather than on-line evaluation. This appendix is designed to be self-contained and can be read independently of the rest of the dissertation. In this appendix, we demonstrate the effectiveness of behavior-based control in facili­ tating the development and evaluation of multi-robot controllers that are: (1) robust to robot failures, and (2) easily modified to facilitate development of the controller variation that sufficiently satisfies the design requirements for the task. Our exper­ imental focus here is distributed multi-robot collection, a class of tasks that includes Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. de-mining and toxic waste clean-up. (This appendix uses a slightly different termi­ nology, referring to the ethologically-named foraging task of the previous chapters as the collection task.) We demonstrate a basic, homogeneous multi-robot controller for the collection task, then show how to easily derive two heterogeneous, spatio-temporal variations with markedly different performance properties. We evaluate the desirability of these controllers with respect to design requirements involving inter-robot interfer­ ence, time-to-completion, and energy expenditure. The data for evaluation come from experiments using four physical mobile robots performing the three variations of the collection task. A .l Introduction Designing and implementing robust controllers for multiple interacting mobile robots is considered something of a black art, often involving a great deal of reprogramming and parameter adjustment. It is difficult enough to develop a multi-robot controller that functions only under the ideal condi­ tions of little noise and no robot failures. The fact that such ideal conditions do not often exist, even in a laboratory setting, places certain practical requirements on the multi-robot controller. In particular, the controller must exhibit group-level robustness to noise and robot failures. This is especially important when physical human intervention is difficult (e.g., a toxic waste spill) or impossible (e.g., an extraterrestrial mission). Additional design requirements for the controller arise from the fundamental, constrained re­ sources of the system, including energy, time, and the number of robots. Untethered mobile robots are generally powered by batteries and can only perform a limited amount of work before need­ ing recharging. Minimizing energy utilization is thus often required in domains, such as space exploration, where recharging is expensive, difficult, or time consuming. In time-critical domains, such as search and rescue, the requirement is for expedient execution of the task. Additionally, 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. regardless of the domain, the fragility of the robots may require the controller to maintain both robot-object and inter-robot collisions at a minimum. For a given task environment and set of robots, the requirements for the controller may not be independent but instead arise as tradeoffs. For example, minimizing both time and inter-robot collisions may not be possible since faster moving robots are less likely to properly sense each other and thus, more likely to collide. Different controller variations may have to be tested and compared in order to select one that sufficiently satisfies the requirements, given the tradeoffs among them. This places an additional requirement on the controller, namely that it be easily modifiable. The testing and comparison of the variations could potentially be accomplished analytically if an adequate model of the system were developed (a significant challenge in itself), or in simulation (potentially less difficult). In either case, the desire to be able to easily modify the controller remains. Our assumption in this work is that neither an adequate (i.e., very high fidelity) model nor simulation of the physical multi-robot collection task need exist, and thus, we performed all tests directly on physical robots. The controllers we present in this appendix are designed to address the requirments above. Specifically, they exhibit group-level robustness to robot failures and noise, and are easily modified. Our focus is on the domain of distributed multi-robot collection (foraging) tasks, including toxic waste clean-up and de-mining. We present a basic homogeneous controller for the collection task in which all of the robots have identical behavioral repetoirs and work concurrently. We then derive two heterogeneous variations, pack and caste, which respectively modify the robots’ temporal and spatial interactions. Finally, we evaluate and compare the performance of the controllers using three spatio-temporal criteria: inter-robot collisions, distance traveled by each robot, and time-to- completion for the task. The latter two criteria also provide an indication of the energy expenditure of the robots. The data for evaluation come from experiments we conducted using four physical mobile robots performing the three variations of the collection task. 130 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Section 2 describes the structure of the collection task and the group of physical mobile robots that performed it. Section 3 then presents the details of the homogeneous controller including the behaviors it contains and how it achieves robustness. Section 4 considers spatio-temporal interac­ tions between robots, especially physical interference, and motivates the two interference-modifying heterogeneous controller versions, pack and caste, presented in Sections 5 and 6 . Section 7 presents an analysis of the controllers using data from physical experiments, and provides a comparative evaluation. Finally, a summary is presented in Section 8. A .2 T he C ollection Task The controllers we present implement versions of a multi-robot collection (foraging) task, a proto­ type for various applications including distributed solutions to de-mining, toxic waste clean-up, and terrain mapping. We present the general structure of the collection task, our multi-robot test-bed, and then the controllers. A .2.1 Task Structure We define the collection task as a two-step repetitive process in which: 1. n (n > 1) robots search designated regions of space for certain objects, and 2. once found, these objects are brought to a goal region using some form of navigation. A region in the task is any contiguous, bounded space (in the case of mobile robots, a planar surface) which the robots are capable of moving across. There are three mutually-exclusive, non-overlapping types of regions: • search regions, S, containing a number, p, of objects, a fraction of which must be delivered to a goal region; • goal regions, G, where objects are delivered; 131 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • and, optionally, empty regions, E, that contain no objects and are not goal regions. The only restrictions placed on the configuration of regions for the collection task are: that there be at least one search and one goal region, and that the union of all the regions be contiguous. Figure A .l gives two examples of possible valid region configurations for the collection task. Figure A.l: Two example region configurations for the collection task. The specific configuration we used is shown in Figure A.2. The experiments were performed in an 11 x 14 foot rectangular enclosure (the Corrall). The search region, S. is approximately 126 square feet and has p = 27 small metal cylinders (pucks) evenly distributed throughout. The goal region G, also called Home, is a ninety degree sector of a circle with a radius of 2 feet, located in one corner of the Corrall. Finally, there is a 25 square foot empty region, E. separating the search and goal regions. E is composed of the Boundary- and Buffer zones, whose functions will be described in the next section, n = 4 robots are used in the experiments. A .2.2 The Robots Four IS Robotics R2e robots were used (Figure A.3). Each is a differentially-steered base equipped with two drive motors and a two-fingered gripper. The sensing capabilities of each robot include piezo-electric contact (bump) sensors around the base and in the gripper, five infrared (IR) sensors around the chasis and one on each finger for proximity detection, a color sensor in the gripper, a 132 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11 feet G Home E Buffer Boundary 14 feet s & < $ Figure A.2: Actual configuration used in the collection task. radio transmitter/receiver for communication and data gathering, and an ultrasound/radio triangu­ lation system for positioning (Figure A.4). The robots are programmed in the Behavior Language (Brooks 1990), a parallel, asynchronous, behavior-based programming language inspired by the Subsumption Architecture (Brooks 1986). The main computational power on each robot is a single Motorola 68332 16-bit microcontroller running at 16 MHz. Even though computationally impover­ ished by today’s standards, the processing capabilities have proven to be adequate for most tasks we have envisioned, helping to show that robust, effective control need not be computationally expensive. Perhaps the greatest drawback of the 68332 is its lack of floating point computation, which, for example, influences our calculation of heading, described in the following section. A.2.3 Behavior-Based Control The work presented in this appendix is couched in the framework of distributed behavior-based control (Brooks 1991, Mataric 1992). Behavior-based control has proven to be an effective paradigm for developing single-robot and multi-robot controllers (Arkin 1998). In behavior-based control, the robot controller is organized as a collection of event-driven modules, called behaviors, that receive inputs from sensors and/or other behaviors, process the input, and send outputs to actuators and/or 133 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure A.3: The four R2e robots used in the experiments. Positioning Color Breakbeam Figure A.4: The sensor configuration of an R2e robot. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. other behaviors. Each behavior generally serves some independent function, such as avoiding obstacles or homing to a goal location. All behaviors in a controller are executed in parallel, simultaneously receiving inputs and producing outputs. An action selection mechanism prevents conflicts when multiple outputs are sent to actuators or other behaviors (Pirjanian 1998). The controllers presented in this appendix demonstrate the suitability of the behavior-based paradigm for designing robust and modifiable multi-robot controllers. In the next section, we present our initial, homogeneous controller for the collection task, followed later by two heterogeneous variations, pack and caste. A .3 The H om ogeneous Controller In this section, we present the first of three behavior-based controllers we implemented. This first controller performs a homogeneous version of the collection task where the robots’ behavioral repetoirs are identical, and the robots act concurrently and independently. The overall structure of the controller is presented in Figure A.5. In the figure, the rounded rectangles represent the robot’s sensors, with sensor values being transm itted to behaviors along the dotted lines. The behaviors themselves are drawn as ellipses with text in one of three font styles: italics for behaviors that only receive sensor inputs; bold for behaviors that send actuator outputs; and bold-italics for behaviors that do both. The dashed lines represent commands sent by behaviors to the actuators (rectangles), and the solid lines represent control signals sent between behaviors. These control signals include: inhibition signals that temporarily disable behaviors, or do so permanently until the inhibition is lifted; information about the state of the behaviors; and signals indicating that a behavior should perform a certain action. These control signals establish the hierarchy of actuator commands shown at the right of the diagram. The 0 represents behavior selection and indicates that only one of relevant actuator command pathways is active at any time. The O represents a Subsumption-style priority scheme with the actuator command coming from 135 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sensors Behaviors A ctuators Piezo* Electric Bumpers creeping ►f avoiding V Break beam IRs^ homing homing Figure A.5: The homogeneous controller for the collection task. Rounded rectangles represent the robot’s sensors, ellipses represent behaviors, and rectangles represent actuators. Sensor values are transmitted along dotted lines, actuator commands along dashed lines, and inter-behavior control signals along solid lines. The symbol ® represents behavior selection and 0 represents Subsumption-style precedence. above taking precedence (Brooks 1986). The hierarchy of command pathways in the diagram illustrates that behavior arbitration is the action selection mechanism for the controller. The next section presents, in detail, the function of the each behavior in the controller, and the structure of the inter-behavior command pathways. The subsequent section discusses the group-level robustness achieved by this controller. 136 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A .3.1 Behaviors In order to provide a clear picture of the interaction between behaviors, we describe the individual behaviors of the controller in an order that mirrors the progression of the task as the robot performs it. The following twelve behaviors constitute the collection task: 1) avoiding: This behavior avoids any object (including other robots) detected by the IR sensors and deemed to be in the path of, or about to collide with, the robot. If the robot has already collided with an object, as detected by the contact sensors, it steers away from it. This behavior is critical to the safety of the robot and therefore takes precedence over most of the behaviors that control the drive motors (puck detecting, wandering, homing, reverse homing). 2) wandering: The robot moves forward and, at random intervals, turns left or right through some random arc. Using this behavior, the robot searches the region for pucks. 3) puck detecting: If an object is detected by the front IR sensors while wandering, this behavior, by lifting the gripper, determines whether the object is short enough to be a puck, or whether it is an obstacle that must be avoided. If it is a puck, the robot carefully approaches the object and attempts to place it between its fingers. Otherwise, the robot performs avoiding. 4) puck grabber: When a puck enters the fingers and is detected by the breakbeam IR sensors, this behavior grasps it and raises the fingers. Raising the fingers above puck height prevents the robot from unnecessarily avoiding pucks while homing, and allows the robot to collect up to about four additional pucks with its base. 5) homing: If carrying a puck, the robot moves towards the designated goal location, Home. While homing, avoiding can take precedence in order to avoid obstacles. 6 ) boundary: This behavior monitors how the robot enters the Boundary region. If the robot enters this region without a puck, it returns it to the search region using reverse homing. If carrying a puck, the robot is allowed to enter this region and proceed towards Home (see Figure A.2). This behavior prevents the robot from collecting pucks that have already been delivered. 137 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7) buffer: This behavior monitors entry into the Buffer region. Entering this region triggers the activation of the creeping behavior. 8 ) creeping: A refined combination of the homing and avoiding behaviors designed to carefully bring the robot to the very corner of the Corrall where Home is located and where the pucks must be delivered. Under creeping, the robot moves more slowly and uses its IR sensors at a closer range appropriate for working within the corner. The standard versions of homing and avoiding would conflict in a confined corner situation, since avoiding would perceive the goal corner as an obstacle and attem pt to move the robot away from it. Creeping takes precedence over avoiding since it already incorporates a version of this behavior. 9) home detector: A monitoring behavior for entry into the Home region. Upon entering this region, home detector sends a signal to puck grabber to release the puck. 10) exiting: Entering the Home region triggers this behavior which moves the robot several inches backwards, then performs a 180-degree turn in place. This behavior also sends the signal that lowers the gripper. When exiting terminates, the robot remains within the Boundary region without a puck. This in turn triggers the boundary behavior to begin reverse homing. 11) reverse homing: Starting from within the Boundary region, this behavior performs the opposite of homing, moving the robot out into the search region. This behavior is essentially identical to homing except that the goal location is set to the corner of the Corrall opposite Home. Once the Boundary region has been left, reverse homing becomes inactive and the robot once again begins searching for pucks using wandering. 12) heading: This behavior processes the positioning system data and provides approximate heading values for the homing and reverse homing behaviors. The positioning system supplies the robot’s current (x, y) position at approximately 1-2 Hz. Consecutive position values, (xo, y0) and (xi,yi), are used in an approximate integer-based calculation of arctan( ) adjusted for the quadrant of the angle to provide one of sixteen possible sector headings. The accuracy of this heading calculation is usually within one sector of the true heading, but may be far worse when the robot 138 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. turns in place. Frequent updates of the heading, with little reliance by the other behaviors on any one heading value, help to compensate for the inaccuracies. (An alternative is to use a physical compass for heading data. In our lab, however, the high variance in magnetic fields makes this inviable.) A .3.2 Robustness In the above described homogeneous controller, group-level robustness is a direct result of the robots behaving identically and independently. No noise-susceptible, or time-critical, radio communication that could be a source of fragility in the system is necessary. Each robot must individually manage the noise and uncertainty associated with its sensors and actuators, and the complexity of a dynamic and basically unknown environment. (Our controller, as is true for most behavior-based controllers, accommodates noise and uncertainty by tightly coupling sensing to action so that no great reliance is placed on any one sensor reading.) The partial, or even complete, failure of any one robot, or a subset of them, does not debilitate the entire group. As long as there is one functioning robot, the task can be accomplished. As discussed previously, in addition to exhibiting group-level robustness, a multi-robot con­ troller should be easy to modify in order to facilitate the search for an acceptable variation. The desirability of the controller must be measured with respect to any design requirements, such as time-to-completion of the task, energy consumption, or the amount of interference exhibited. Thus, before we present the variations of our homogeneous controller, we discuss the key diagnostic pa­ rameters used in evaluation. Our focus here is on inter-robot interference, specifically physical collisions between robots. The goal that motivates the modification of the homogeneous controller is minimization of such interference. The next section provides a discussion of interference and the two spatio-temporal solutions to it which provide the basis for our heterogeneous controller variations. 139 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A .4 Spatio-Tem poral Interactions In this section, we discuss the nature of physical inter-robot interference (i.e., collisions), and how a multi-robot system may be modified to manipulate this interference. Our discussion here provides the motivation for the two controller variations, pack and caste, presented later. Multi-robot systems are by definition physically embodied and embedded in dynamic environ­ ments. The types of interference they contain can be distinguished about a physical/non-physical dichotomy. Physical interference manifests itself most overwhelmingly in competition for space. Non-physical interference ranges from the sensory (shared radio bandwidth, crossed infrared or ultrasound sensors) to the algorithmic (the goals of one robot undoing the work of another, com­ peting goals, etc.). Here we focus on physical interference and demonstrate that it is an effective tool for system evaluation and design. V V e define the characteristic interference of a system at a particular point in space to be the sum, over some finite time period, of all measured interference occurring at that location (see Figure A.6). The result is a surface that can be used to adjust the controller in order to reduce interference and thus modify the system’s overall performance. Robot density is a critical factor in characteristic interference. Single-robot systems and systems with density so high as to prevent movement produce no characteristic interference. Systems of interest lie in between the two extremes. A principled multi-step process of controller modification can be implemented by using char­ acteristic interference as a guide indicating where in the robots’ physical interaction, and when, within the lifetime of the task, behaviors should be switched and the task should be divided to modify overall task interference. Multi-robot interactions we focus on are spatio-temporal in nature and fall into four basic categories. Robots may either be in the same place (SP) or in different places (DP), both of which can occur at the same time (ST) or at different times (DT), resulting in four forms of interaction: SPST, SPDT. DPST. and DPDT. 140 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. wiotn mi Length (It) Figure A.6: This plot shows the characteristic interference pattern for the homogeneous implemen­ tation of the collection task on four physical robots. The shading, corresponding to the height of the peaks, is clearer when the data support a very fine mesh. Physical interference fits into the SPST category, covering the case when two or more robots try to occupy the same location at the same time. The other three categories are useful for deriving and fine-tuning controllers that modify SPST interactions. For two of these categories, we implemented and tested a corresponding controller. The SPDT category is associated with our pack controller, a temporal modification to the homogeneous controller, while DPST is associated with our caste controller, a spatial modification of the homogeneous controller scheme. The DPDT category represents the case where there is little possibility of physical interaction. For example, the robots may occupy non-contiguous regions of space, or only one robot at a time may be activated. Since our focus is on controllers for multiple robots interacting to accomplish a task, the DPDT category does not provide an acceptable solution for interference management. Figure A.6 presents the characteristic interference pattern for the homogeneous implementation showing the number of collisions between robots within the Corrall. The data for the plot are an average of the collisions observed over five trials with the completion criterion defined as collecting 14 of the 27 pucks at Home. The figure shows high levels of interference near Home resulting from multiple robots simultaneously attempting to deliver pucks. We thus seek to modify the controller 141 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. in order to reduce this interference using our two spatio-temporal variations, pack and caste. We present a more detailed comparative evaluation of interference later in the Analysis section. A .5 The Pack Controller In the pack controller, as in the homogeneous version, all individuals have identical behaviors and activation conditions. Unlike the homogeneous controller, however, the robots do not act concur­ rently and independently. Instead, a dominance hierarchy is imposed, based on some functional criterion such as the robots’ different capabilities, or on an arbitrary assignment scheme such as robot ID, if the robots are functionally identical (as are ours). The dominance hierarchy induces a temporal structure on the task by allowing only one of the robots to deliver a puck at any time. All of the robots may search for pucks in parallel, as in the homogeneous implementation, but if two or more robots simultaneously find pucks, the one highest in the hierarchy is allowed to deposit its pucks first. The other robot(s) cannot proceed until the first has finished delivering its pucks and has left the Boundary region (Figure A.7). This scheme introduces temporal heterogeneity to the homogeneous version, and thus corresponds to SPDT (or temporal) arbitration of SPST interactions. The pack strategy requires that some form of dominance hierarchy be assigned and that domi­ nance rank be recognized between the robots. In our case, rank was communicated over the radios, but in other implementations it could be based on physical characteristics that can be sensed directly. A .5.1 The “message passing” Behavior Figure A.8 presents the controller for the pack implementation. This controller is almost identical to the homogeneous controller (Figure A.5), except that it includes a high-precedence message passing behavior. The function of message passing is to send the robot’s status, specifically whether it is 142 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1 1 feet Dominant robot ♦ Home Buffer Boundary 14 feet $ Waiting for dominant robot ( f t Figure A.7: The pack variation of the collection task. delivering a puck, to the other robots, and in turn monitor the status of the other robots. When a robot finds a puck, message passing places the robot into a wait state with the motors off and enters the following communications routine: 1. Wait two communication cycles (approximately 6 seconds) to accumulate the most current status information from each robot. 2. If after (1) above, no other robot is currently delivering a puck, transm it the desire to do so. Otherwise return to (1). 3. Wait three communication cycles (approximately 10 seconds) for synchronization with the other robots. 4. If after (3) above, no other robot wishes to deliver a puck, or any that do are less dominant, then proceed to deliver the puck and inform the other robots when finished. Otherwise, return to (1). We now consider how message passing ensures robustness, one of the requirements for our multi­ robot controller. 143 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sensors B ehaviors A ctuators Ralio p u d home Jrtettur Figure A.8: The pack version of the controller for the collection task. A .5.2 Robustness As we have discussed, it is important that multi-robot controllers be robust to noise and robot failures. Similar to the homogeneous controller, the pack controller accommodates robot failures by having each robot able to accomplish the entire task. Unlike the homogeneous controller, the coordinated hierarchy of the pack controller requires special measures by the message passing behavior to ensure robustness. If a robot fails while searching for a puck, no special measures are required since no other robot is waiting upon its actions. If, however, the robot fails while delivering a puck, the other robots must be informed so as not to wait indefinitely. The failed robot can send such a message if it is able to detect the failure (a difficult problem in itself). Otherwise, some external agent, such as a human operator, can send the message. 144 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. We use a somewhat different approach in our experiments. Whenever a robot fails, it is shut down and restarted by a human operator. (In hazardous conditions, it may be possible to re­ pair/restart the robots remotely.) During this restart period, the waiting robots receive no com­ munications from the failed delivering robot. The robots consider such periods of protracted radio silence as an indication of the robot’s failure, and once again enter into the communications routine above. Once the failed robot has restarted and begins communicating, it is seamlessly incorporated back into the hierarchy. Since the communications routine only uses relative dominance to decide which robot should deliver a puck, it easily accommodates the attrition or addition of robots. Another advantage of our communications routine above is that the use of radio silence failure detection helps provide group-level robustness to radio noise. As noise levels increase, communica­ tion between the robots becomes increasingly difficult. This may lead to protracted periods of radio silence th at are incorrectly interpreted as robot failures. In such a situation, two or more robots may deliver pucks at the same time. The degradation of the hierarchy, however, is what prevents the failure of the entire group. Even if the radio system were to fail completely, the task would still be accomplished because every robot would consider every other robot as having failed. Thus, the pack controller would degenerate into the homogeneous controller. We posit that such graceful degradation in group structure, without jeopardizing the entire task, is an important property of controllers for unknown and dynamic environments. A .5.3 Interference Figure A.9 shows the characteristic interference pattern for the pack controller, averaged over 5 trials. The completion criterion was identical to the homogeneous case: delivering 14 of the 27 pucks to Home. As is clear from a comparison to the characteristic interference of the homogeneous controller (Figure A.6 ), the pack controller has reduced interference near Home, as desired. Not only is the pack controller successful in reducing interference, it is also attractive in its ease of implementation. The pack variation is simply the homogeneous controller with the addition of 145 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure A.9: This plot shows the characteristic interference pattern for the pack implementation of the collection task on four physical robots. The shading corresponds to the height of the peaks. the dominance hierarchy induced by the message passing behavior. Such ease of implementation supports our requirement that controllers be easy to modify. A .6 T he Caste Controller In a caste controller, the group of robots differentiates into two or more sub-groups (castes), each of which acts concurrently and independently, but occupies different regions of the task space. The goal is to manipulate interference by an appropriate division of the task space, and assignment of the castes to the sub-regions. This spatial separation of castes limits physical interactions to the territorial boundaries. The caste scheme introduces spatial heterogeneity and thus corresponds to DPST arbitration of SPST interference. Unlike the homogeneous and pack strategies, the sub-groups of robots in the caste strategy may have different behavioral repetoirs. Thus, in addition to spatial heterogeneity, a caste controller may also exhibit behavioral heterogeneity. Indeed, that is the case with the caste implementation we present in this section. It consists of two sub-groups: the Search Caste, comprised of three robots which find pucks and bring them near Home, and the Goal Caste, comprised of one robot 146 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. which brings the pucks the rest of the way to Home (Figure A.10). Each of the two castes has a different controller. 11 feet Buffer puck* af fm torn Boundary feet SMren Cut*: robots Figure A.10: The caste variation of the collection task. A .6.1 The Search Caste In our implementation, three of the four R2e robots, comprising the Search Caste, have behavior sets similar to the homogeneous implementation. Each robot searches the region S for pucks, but delivers them to the line separating the Boundary and Buffer zones, rather than all the way to Home. Figure A. 11 presents the controller for the Search Caste. It is identical to the homogeneous controller (Figure A.5), except that it lacks the creeping behavior. This more refined combination of homing and avoiding, designed to bring the robots to the corner of the Corrall, is no longer necessary since pucks are not brought to the corner. The buffer behavior is also removed from the controller because it is not needed to activate creeping. A .6.2 The Goal Caste The Goal Caste consists of one robot that remains in the Home and Buffer regions with the task of transporting to Home the pucks dropped by the Search Caste at the Boundary/Buffer line. 147 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sensors Behaviors Actuators f Breakbcam IRs y homing Gripper Motors Drive Moturs Piezo-Electric Bumpers homing boundary puck grabber Figure A .ll: The controller for the Search Caste, the three-robot subgroup that searches for pucks. The controller for the Goal Caste is presented in Figure A. 12. The sweeping behavior moves the robot away from Home and performs an arc at the Boundary/Buffer line to “scoop up” any pucks left there (Figure A.13). The creeping behavior then carefully moves the robot to Home, where it performs exiting to back up and deliver the pucks. The robot then turns in place 180 degrees to once again begin sweeping. During the execution of the controller, the gripper remains lifted allowing the concave front region of the robot’s base to scoop up multiple pucks. A .6.3 Robustness and Interference The controller for the Search Caste shares many of the characteristics of the homogeneous controller. It achieves group-level robustness by maintaining a behaviorally identical group with no reliance on fragile explicit communication. Thus, neither high levels of noise nor the failure of a robot 148 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Sensors Behaviors A ctuators creeping Positioning System Drive Motors Figure A. 12: The controller for the Goal Caste, the one-robot subgroup that brings pucks from the Boundary/Buffer line to Home. Home Buffer Boundary Figure A. 13: The sweeping behavior of the controller for the Goal Caste. 149 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Langtn (It) 14 ' Q Wlam («) Figure A. 14: This plot shows the characteristic interference pattern for the caste implementation of the collection task on four physical robots. The shading corresponds to the height of the peaks. debilitates the entire caste. The Search Caste controller also provides a good example of the ease with which the homogeneous controller can be modified. One of the keys to robustness in the caste controller is the asynchronicity of interaction between the two castes. The Search Caste must deliver pucks to the Boundary/Buffer line, but the Goal Caste is not dependent upon them arriving at a particular time or in a particular order th at may be difficult to ensure in such a complex, stochastic system. Though not implemented in our caste controller, additional robustness could be added by using a variation of the pack communication protocol to transm it the number of active members of each caste. If one caste were to lose too many individuals, members of the other castes could replace them. For example, if the one robot of the Goal Caste were to fail, a member of the Search Caste could substitute. This scheme, while improving robustness, would require each robot to possess all of the individual caste controllers and be able to switch between them as necessary. Such caste switching would be most robust if duplication of the exact state of the failed robot were not necessary, as would be the case with our controller. 150 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure A. 14 shows the average characteristic interference over five trials for the caste imple­ mentation. The completion criterion was the same as for the homogeneous and pack controllers: 14 of the 27 pucks collected. It is clear from a comparison to the characteristic interference of the homogeneous controller (Figure A.6 ) that interference near Home is reduced, as was desired. The overall level of physical interference throughout the Corrall, however, is higher with the caste controller. The following section provides a more detailed quantitative evaluation and comparison of the controllers in terms of interference, as well as time-to-completion and the distance traveled by each robot. A .7 A nalysis In order to better evaluate the desirability of and tradeoffs between the three controllers — one homogeneous and two heterogeneous — we performed five experimental trials for each, gathering both spatial and temporal data. Initial conditions for all trials were as nearly identical as possible in order to minimize free variables, and the completion criterion was 14 out of 27 pucks collected. For each trial, we gathered data on the time-to-completion of the task, and the location and number of collisions between robots, showing the characteristic interference. We calculated the average total number of collisions for each experiment, providing a relative comparison of the different schemes. Using the positioning system, we also recorded each robot’s location at approximately 0.3 Hz in order to examine the distance traveled and path taken by each. Finally, we monitored the activity of the internal behaviors of the robots. The avoiding behavior was of particular interest since it is the one directly invoked by physical interference. We hypothesized that the time spent avoiding would be correlated with the toted amount of interference in each of the implementations, and would thus serve as an alternate measure of interference. As shown below, this hypothesis was validated (see Table A.2). 151 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Controller Time (sec) Avoiding (sec) Homogeneous 549 143 Caste 1447 442 Pack 1081 229 Table A.l: Average time of task completion and average time spent in the avoiding behavior for each controller. All of the data presented in this section have been analyzed with one or more statistical tests. We have performed hypothesis tests using Student’s f, 1-factor analysis of variance (ANOVA), and 2-factor ANOVA, in order to verify that the differences between the results of the implementations were in fact statistically significant. In all cases, these differences were significant with p-values < 0.05. Our discussion in this section is based on the assumption that the task environment is fixed. Another effective method for altering the spatio-temporal properties considered below is modifica­ tion of the environment, if such is possible. We could, for example, move Home to the middle of the workspace, thus manipulating properties like interference and time-to-completion. The majority of this section eschews a quantitative evaluation of heterogeneity, focusing instead on the performance data mentioned above. This is justified in the concluding paragraphs where we discuss the poorly understood relationship between heterogeneity and performance in multi-robot groups. A.7.1 Interference, Avoiding, and Time One factor that impacts the total amount of interference observed for each implementation is the time-to-completion of the collection task. One would expect that for any given implementation, the longer the trial continues, the more interference or collisions there would be. One would also expect the total amount of time spent in the avoiding behavior to be positively correlated with the time-to-completion. In Table A .l we see that this is indeed the case. The homogeneous implementation has the shortest time-to-completion and the least amount of time spent avoiding; 152 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Controller Interference (collisions) Avoiding/Time Homogeneous 16.4 0.27 Caste 20 0.32 Pack 12.6 0.22 Table A.2: Average amount of Interference and average fraction of time spent in the avoiding behavior. Controller Interference/Time (collisions/ sec) Homogeneous 0.030 Caste 0.014 Pack 0.012 Table A.3: Average amount of interference per unit time for each controller. the pack implementation has the next larger times; and the caste implementation has the largest times over all. In their current form, the values for time-to-completion and time-spent-avoiding do not provide much useful information about the amount of interference in each controller. We can observe, however, that the amount of time spent in the avoiding behavior is composed of the time spent avoiding other robots (before, during, and after collisions) and the time spent avoiding everything else. Since the environment (discounting the robots) is identical in every trial, we can assume that the amount of avoidance per unit time attributable to non-robot objects is constant between the implementations. This assumption suggests that any differences in the amount of avoidance per unit time between the implementations would be primarily due to the avoidance of the other robots, possibly during collisions. Thus, we would expect to see a correlation between the average amount of interference observed in each implementation and the ratio of time spent avoiding to total time. In Table A.2 we observe that such a correlation does exist and it is quite large at p = 0.995. This indicates an important link between avoiding and total time, and suggests that their ratio is a good estimate of relative average interference levels. 153 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Another potentially useful statistic is the amount of interference per unit time. As shown in Table A.3, the pack implementation has the most desirable ratio while the homogeneous imple­ mentation has the least. A .7.2 D istance Traveled As mentioned previously, the energy expended by the robots in completing the task may be a concern if recharging is time-consuming or difficult. Time-to-completion provides one approxima­ tion of energy expenditure, but it can be inaccurate, especially with a controller such as our pack version where robots can be idle for long periods of time. A better approximation is the amount of work accomplished by the robots during the task. Work (W), force (F), and displacement (d) are related through the elementary physics equation \V = F ■ d ■ cos 9 or -iL_=d , F •cosfl where 9 is the angle between the force and displacement vectors. Since the robots are mechanically identical, we can consider F • cos 9 to be constant among them. This allows us to compare the work done by the robots solely in terms of d, the distance traveled. Because the robots are identical, d also provides a reasonable, relative indication of the energy expended in performing the work. Finally, it provides a measure of efficiency: the less work required to accomplish the task, the more efficient the controller. Table A.4 presents the average distance traveled by each robot, and the total over all robots, for each of the three controllers. The values were calculated from the robot position data gathered during the experiments. The results indicate that the homogeneous controller performs the least 154 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Controller RobotO Robot 1 Robot2 Robot3 Total (ft) Homogeneous 123 120 113 119 475 Caste 353 370 385 119 1227 Pack 112 145 188 178 623 Table A.4: Average distance (in feet) traveled by the robots for each controller. c Figure A.15: A typical path taken by one physical robot in the homogeneous controller. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. work in completing the task, and thus is the most efficient, whereas the caste controller performs the most work and is least efficient. Although the total distances traveled for the three controllers are statistically different, this is not necessarily true of the distances traveled by the individual robots within a controller. This follows intuitively from the structure of the controllers. In the homogeneous controller where all four robots are behaviorally identical, there is no statistical difference in the distances traveled. In the caste controller, RobotO, Robot 1, and Robot2, which comprise the Search Caste, travel similar distances, whereas Robot3 of Goal Caste moves significantly less, as might be expected. In the pack controller, one would expect the less dominant robots to travel less since they spend more time waiting for the dominant robots to deliver pucks. Table A.4, with RobotO as the least dominant and Robot4 as the most dominant, shows that this is the general trend. Although a one-way analysis of variance indicates that there is significant difference among these values, there are too few trials to provide further discrimination using a t-test. (The exception is that RobotO is shown to travel significantly less than Robot2 and Robot3.) A more qualitative, visual examination of the execution of the controllers is also possible. Figure A. 15 shows a typical path of one robot in the homogeneous controller. It is clear that the robot searches for pucks, delivers several to Home, and sometimes enters the Boundary without pucks and promptly leaves. Figure A. 16 (Left) shows a similar path taken by a robot in the Search Caste of the caste controller. The path is much longer than that of the homogeneous controller due to the protracted time of the trial. We also note that the Search Caste very clearly delivers pucks to the Boundary/Buffer line. Figure A.I6 (Right) shows the complementary path of the Goal Caste collecting pucks from the Boundary/Buffer line and taking them to Home. Figure A.17 provides a juxtaposition of typical paths taken by the least dominant and most dominant robot of the pack controller. As expected, the most dominant robot has a path (Right) very similar to that of the homogeneous controller. The least dominant robot, however, has a severely stunted path 156 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11 10 9 8 7 6 5 4 3 2 1 0 W idth (It) 11 10 9 6 7 6 5 4 3 2 1 0 W idth (It) Figure A.16: (Left) A typical path of a physical robot in the Search Caste of the caste controller; (Right) a typical path of the robot in the Goal Caste. demonstrating the significance of the time it waits for the more dominant robots to deliver their pucks. A. 7.3 Robustness During the experimental trials for each controller, we had the opportunity to evaluate group-level robustness. The R2e robots used in the experiments are quite fragile and prone to failure from something as simple as a buildup of static electricity corrupting memory or causing the robot’s computer to crash. There was seldom a trial without multiple failures requiring the failed robots to be restarted. With the homogeneous controller, we noted very clearly that the failure of one robot did not effect the activity of the others. In the pack controller, the less dominant robots of the hierarchy were able to compensate for the failure of a dominant robot by using the message passing protocol. If a dominant robot failed while delivering a puck (which occurred at least once per trial), the less dominant robots would stop waiting and begin delivering their pucks. In the caste controller, the Search Caste exhibited group-level robustness similar to the homogeneous 157 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11 10 9 8 7 8 5 4 3 2 1 0 I1 10 9 8 7 6 5 « 3 2 t 0 Wkm («) Wknn (it) Figure A.17: (Left) A typical path of the least dominant robot of the pack controller; (Right) a typical path of the most dominant robot. controller: the failure of one robot did not affect the other members of the caste. In addition, due to the asynchronicity of interaction between the two castes, the failure of the robot in the Goal Caste did not debilitate the activity of the Search Caste. A .7.4 Evaluation Using the analyses presented above we can now discuss the relative desirability of the three con­ trollers. All three are desirable in that they exhibit good group-level robustness. The tradeoff be­ tween time and interference captures the relative performance. The homogeneous implementation requires the least time but does not result in the least interference, whereas the pack implemen­ tation exhibits the least total interference and least interference per unit time, but takes longer overall. Thus, we must decide which criterion is more important or what kind of compromise we wish to make in the final controller choice. If we can sacrifice some performance time for decreased robot interference, then the pack implementation appears to be the best choice. This solution applies to conservative systems where collisions and the possibility of equipment damage outweighs 158 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the required time. In contrast, if total time or energy expenditure is the critical factor, such as in domains where the items to be collected are toxic or dangerous, or robot power is limited, then the homogeneous implementation is the better choice. From this analysis we also observe that the caste implementation does not appear to be a satisfactory solution under any criterion, and may be discarded from consideration. Although our analysis does not identify one controller that is clearly superior in all respects, it does provide information to make an intelligent decision regarding the tradeoffs between the homogeneous and pack controllers. The designer may decide that one of the controllers sufficiently satisfies the requirements for the task, or might wish to investigate other variations for a more suit­ able controller. The latter decision is facilitated by the ability to build behavior-based controllers that are easy to modify and evaluate in an expeditious manner, as we have demonstrated here. A .7.5 H eterogeneity and Performance So far, we have avoided a quantitative evaluation of the heterogeneity demonstrated by our three controllers. The reason for this is twofold: L . Quantification of the heterogeneity of a multi-robot system can be subjective and ill-defined. 2. Regardless of how well-defined heterogeneity is, the link between it and performance may be unreliable. We will consider each of these points in more depth. Heterogeneity in multi-robot systems remains ill-defined partially because to date there has been little work exploring its quantification. One notable exception is work by Balch on simple social entropy and hierarchical social entropy (Balch 1997, Balch 2000). Both are based on information entropy (Shannon & Weaver 1963) and provide metrics for quantifying behavioral differences in a group of robots. 159 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. For illustrative purposes, we use simple social entropy which takes the form Pi log2(pi), where M is the number of (behavioral) classes of robots, and pi is the proportion of robots in the ith class. According to this measure, our homogeneous controller has one class containing all four robots, giving pi = 1 and a social entropy of 0, indicating homogeneity. The caste controller has two classes with pi = 1/4 = .25 and = 3/4 = .75, giving a social entropy of 0.81, indicating some heterogeneity. Though seemingly straightforward, calculating the social entropy of the pack controller introduces the dilemma of subjectivity. If we consider that all of the robots have the same controller and behave similarly, it seems clear that the group is homogeneous and has a social entropy of 0. If, however, we consider that each robot has a defined position in the hierarchy and behaves differently with respect to the other robots, then it appears that there are four classes, each containing one robot. This results in a social entropy of 2.0, indicating maximum heterogeneity. Is the pack controller fully homogeneous or heterogeneous? The fact that both views seem justified helps illustrate our first point: even with a well-defined metric, heterogeneity may still be subjective. The situation is further complicated if the system contains multiple forms of heterogeneity. The caste controller, for instance, exhibits not only behavioral heterogeneity, but also spatial heterogeneity since the robots occupy different regions. We can quantify spatial heterogeneity using a variation of Balch’s social entropy. The Search Caste contains 3 robots occupying 141 square feet (ft2) of space, for a total of 3 x 141 = 423 robot-ft2. The Goal Caste contains 1 robot and occupies 13 ft2 of space, for a total of 1 x 13 = 13 robot-ft2. In our calculation, Pi = 423/436 = 0.970 and pn = 13/436 = 0.030. giving a spatial entropy of 0.19 and indicating a small amount of heterogeneity. The question is how to describe the overall heterogeneity of the caste controller. Should each type of heterogeneity (behavioral and spatial) be defined separately, or should the two numbers be combined into a single value? In the latter case, how should each number be weighted? The influence of each type of heterogeneity could depend on the task the robots are performing and the structure of the environment. Any weighting may thus have to be derived (likely empirically) for the exact scenario. If this is not possible, the overall heterogeneity 160 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. of the system could remain ambiguous or ill-defined. The addition of other forms of heterogeneity (e.g., involving morphology or sensors) could further complicate the matter. Even given an adequate measure for all forms of heterogeneity and their combination, the lack of a clear connection between the performance of the system and degree of heterogeneity it exhibits remains a concern. In our work in this appendix, we have compared several aspects of performance among our three controllers, including interference, time-to-completion, and energy expenditure. The important caveat is that these results are not completely general. They are partially dependent upon the structure of the environment, the physical characteristics of the robots, and the exact details of the controllers. In other words, the same task on different robots in a different environment might give very different results. Adding or removing heterogeneity to the system may improve or degrade performance depending on the details of the system and the aspect of heterogeneity being changed. As in the second point, one may not be able to rely on the results of a heterogeneity/performance comparison to generalize to another situation. The quantified heterogeneity of a multi-robot system is a potentially important design and diagnosis parameter, but we have seen that it can be difficult to quantify, and once quantified, is of uncertain relation to performance. Based on our experimental results in foraging, it was not clear how this relationship could help the designer improve a multi-robot system. V V e therefore did not focus our analysis in this appendix upon this open research topic. Our hope, however, is that the capability to expeditiously build, modify, and evaluate multi-robot controllers, as we have demonstrated, will help facilitate the future study of issues in group robotics, such as the uses of a quantitative analysis of heterogeneity. A .8 Sum m ary In this appendix, we have demonstrated the successful application of behavior-based control to the task of distributed multi-robot collection. Our focus has been on developing controllers th at are 161 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. robust to noise and robot failures, and easily modified to facilitate development of the variation that sufficiently satisfies the requirements for the task. Three versions of the collection task were presented: an initial homogeneous controller, and two heterogeneous variations (pack and caste) derived from the spatio-temporal manipulation of physical interference. All three versions were evaluated in a spatio-temporal context using interference, time-to-completion, and distance trav­ eled as the main diagnostic parameters. This work demonstrates that given a good substrate for development (e.g., a useful set of behaviors), it can be relatively easy to implement, evaluate, and compare multi-robot controllers. As demonstrated in the body of this dissertation, such controllers can then be used in conjunction with AMMs to evaluate on-line, the robot-environment interaction dynamics for use in a variety of performance-improving applications. 162 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A ppendix B D etails of A M M R epresentation and C onstruction This appendix details the AMM representation and model construction algorithm that are briefly presented in Chapter 4. The reader should refer to Chapter 4 for a dis­ cussion of the relationship between AMMs/MCs/SMPs and how AMMs are used with behavior-based control. B .l R epresentation of AM M s The representation of a parametric AMM used by our model construction algorithm is sixteen elements (S, A, B, L, T, A, T , Aiast, Llast, sym, oldsym, numsym, inlink, outlink, currnode, oldnode) containing all of the information necessary for incremental model construction. The boldfaced elements are matrices or other compound structures, while the sans-serif-type elements are variables containing single numeric values. The details of the elements are as follows: 1. S, a set of symbols {si,S2,.-- , s.w} recognized by the network. The first symbol, si, is recognized only by the first state, ai. 2. A, a set of states (or nodes) { ai,a2, ... , a,v}. Each state a, has four attributes: • a3 t , the symbol that the state recognizes, i.e., an element of S; 163 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. • a f , the average number of time steps that the system remains in whenever it enters that state; • af *, the variance associated with a f ; • and a f, the probability of remaining in a, in the next time step. The state, ai, represents the initial (unknown) state of the system, which is promptly left upon commencement of model construction and never entered subsequently. 3. B, an jV x M transition matrix, where b,(k) contains the value of the state to transition to if the current state is ai and symbol s* is observed. If af = s*, then bi(k) = ai, i.e., if the observed symbol is identical to the last symbol observed, then the system remains in the current state. 4. L, a set of directed links {li,l->, ... -Ip}, connecting the states. Each link li has the following six attributes: • if, indicates the state from which the link begins. ati € A: • if, indicates the state to which the link connects. at■ € A. The following constraints apply: a link cannot start and end at the same state. if 5 £ if; and two links from the same state cannot go to states that accept the same symbol, Vi, j s.t. if = if, af, ^ a f,; • if, stores the number of times the link li has been traversed: • if, stores the total number of time steps that the system has been in state if, after first having traversed the link I,: • if2, contains the sum of squares of all the durations that comprise if; • and if is the probability of using the link li at each time step, given the system is in state if. Because no two links can have the same value for both their from and to attributes, they cannot represent the same directed transition. Thus, N — I < P < N {N — 1): at least N — 1 164 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. links are needed to connect the non-initial states, and for a fully connected network there are N (N — 1) links between the non-initial states. The single link from a i, Z lt is traversed exactly one time, giving if = 1 and = 0. 5. T , a set of structures {T l , ... , T nm “ -1 }, each with elements {Z ” , t^,... , Zq^ } storing infor­ mation on a particular n-link traversal sequence, where 1 < n < nmax — 1 and nm ax is a user-specified maximum order for the model. Each element Z " has n + 4 attributes: • z"-l ,Z '*-2, . . . ,Z"-"+ l, the n links comprising Z ", stored as indices into L; • Z",< J , the number of times the n-link sequence has been traversed; • Z"'E, the total number of time steps that the system has been in the state that link Z " ’1 connects to, after first having traversed Z": • Z" E , the sum of the squares of all the durations that comprise t"'~. The bounds of Qn are given by: 0, if P < t i P-n+1, if P > 2 < Q n < - 1)". In order for a two-link transition to exist there must be at least two links. If more than two links exist, the fewest n-link transitions (P - n -I-1 of them) are created when an Euler path exists and is followed through the network. In a fully connected network, each of the P = iV(iV — 1) links has a transition to :V — 1 other links, giving us the upper bound for a sequence of n links. 6 . A, a list of elements {Ai,A2, A, A„m iv x }, indicating the nmax most recently used links in strict reverse chronological order, Z ,\, £ L, 1 < i < nmax, and A * was used i — 1 transitions in the past. 165 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7. T , a list of elements {T i, T 2, ... , T„m xx_ i }, maintaining references to the penultimately used multi-link transition of each order, fifi € T ‘. 8. Alast, index of the last element of A, ciA iast- 9. Llast, index of the last element of L, /uast- 10. sym, the current input symbol to the model, sym 6 S. 11. oldsym, the previous input symbol to the model, oldsym € S. 12. numsym, the length of the uninterrupted sequence of identical input symbols beginning with sym; identical to the number of time steps spent in the current state. 13. inlink, a reference to the link used in transitioning into the current state of the system. 14. outlink, a reference to the link to be used in transitioning out of the current state. 15. currnode, the current node (state) of the system. 16. oldnode, the previous state of the system. Given this AMM representation, the corresponding probabilistic transition matrix of a Markov chain could be generated from ap. lp. I?, and ll . The addition of ap and aa~ provides the more gen­ eral state duration capabilities of an SMP. Aside from a3. the remaining representational elements are used in incremental model generation and dynamic model reconfiguration using node splitting. In addition to the sixteen elements above, the nonparametric representation for AMMs contains the following three elements: 1. T > , a finite set of elements } storing the input data. Each D; stores information about the consecutive time spent in a particular state, and has two attributes: • D“, the state that the input data is associated with; • P f, the number of time steps spent in the state for the current entry into the state. 166 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2. T > L, a finite set of elements , 2?p}, with each V \ containing indices into V for the state given by l\. 3. I?T , a finite set of elements { V f, ... with each V j containing indices {PT ... Z)Tq> }, and each an index into V for the state given by tl j l. The focus of this appendix is on parametric AMM construction, but the pseudo-code presented can easily be extended to accommodate nonparametric AMMs. Next we present the details of the AMM construction algorithm. B .2 A M M C onstruction Algorithm For simplicity, in the pseudo-code that follows, the representational elements of the AMM being constructed (as presented in the previous section) are considered global variables. In an attem pt to keep the algorithm as concise as possible, we employ a mix of computational constructs and mathematical notation. Comments are provided in the hope of enhancing comprehensibility. B.2.1 Initialization The algorithm first initializes the elements of the AMM being constructed. 1. A = {di}; # the AMM starts with one state 2. af = 0 ; # the unique symbol for the first state is 0 3. < = 0; # mean time in the state is 0 4. a f = 0 ; # sum squared durations is 0 5. a ? = 0; # probability of using the state is 0 6 . B = {}; # there are no transitions yet 167 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7. L = {h}; # the AMM starts with one link 8 . l{ = 0; # from some imaginary state 9. l[ = 1 ; # to ai 10. Z f = 1; # this link is used only once 11. if = 1; # sum of durations in ai is 1 12. if2 = 1; # sum squared durations is 1 13. ly = 0; # probability of using this link again is 0 14. T = {T1, . . . , T }; # initialize T 15. for i = 1 to nmax - 1 16. 17. 18. end T* = {}; Qi = 0: # initialize each T* # to contain no elements # there are zero elements in T ‘ 19. S = (J; # no symbols have been seen yet 20. Alast = I; 21. Llast = 1; # index of last state is 1 # index of last link is 1 2 2 . sym = 0 ; 23. oldsym = 0; 24. numsym = 1; # unique symbol of the first state # initialize to anything # one 'O’ has been observed 25. inlink = 1; # we are using the first link 168 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 26. outlink = 1; # initialize to anything 27. currnode = 1; 28. oldnode = 0; # current state is 1 # old state is 0 (imaginary state) 29. T = { T h ... # initialize T 30. for i = 1 to n max - 1 31. Ti = 0; # to be all zeros 32. end 33. A = { A ,,... ,A„__ }; 34. for i = 1 to nmax - 2 35. A, = 0; 36. end 37. At = outlink; 38. Ao = outlink; # initialize A # to be all zeros # most recently used link # next most recently used link B.2.2 M ain Loop Below we present the pseudo-code for the main loop of the algorithm that is executed for every input symbol. At the start of the code, sym holds the current input symbol that is to be incorporated into the model, and oldsym holds the last input symbol. As mentioned previously, numsym contains the number of symbols observed in the current state, oldnode stores the index of the last state the system was in, currnode stores the index of the current state, inlink holds the index of the link traversed to enter the current state, and outlink holds the index of the link to be traversed in leaving the current state. 169 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1. if (oldsym = = sym) # 2. numsym + = 1; # 3- U n lin k + ~ I? # # 4. i = l; # 5. while (i < nmax & T; ^ 0) # 6. + = 1; # # 7. i + = 1; # 8 . end 9. traversaLprob(currnode); # 10. else # 11- U n lin k + = numsym-: # # 12. node.prob(currnode); # 13. i = I; 14. while (i < nmax i: ^ 0) # 15. + = numsym2; # # 16. t + = 1; 17. end 18. if (sym i S) # 19. S = S U {sym}; # 20. Alast + = 1; # if the system remains in the same state increment the time spent in the state increment time spent in the state that /in iin k connects to a counter update T increment the time spent in the state that tS |- ends at increment the counter update the transition probabilities the system is transitioning to a new state increment the time spent in the state that U n lin k connects to update mean/variance for the current state update T increment the sum squared time spent in the state that t\-t ends at if the symbol has not been seen before add the symbol to S add a new state Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2 1. a Alast = sym; # the new state recognizes the symbol 2 2. for i = 1 to Alast 23. 6j(sym) = Alast; # create the new transitions for sym 24. end 25. Vi s.t. i € S, let = 6i ( i # initialize transitions for the new state 26. end 27. oldnode = currnode: 28. currnode = 60|d„0de(sym); # transition to the new state 29. outlink = i s.t. (if = = oldnode & /■ = = currnode)# find the link to transition on 30. if (->E outlink) # must create a new link 31. Llast + = 1; # create the new link 32. / j j J s t = oldnode; #link connects from oldnode 33. ^L iast = currnode; #link connects to currnode 34. outlink = Llast; # update outlink 35. end 36. # shift the history of used links 37. Ai = outlink; # add the new link to the front of the history 38. numsym = 1; # reset the time spent in the current state 39. ^outlink "h # increment number of times link has # been traversed 40. ^outlink "h— # increment the time spent in the state # that /outlink connects to 41. traversal_prob(oldnode); # update the transition probabilities 42. i = 1; 43. while (i < nmax & Ai+1 # 0) # update T 171 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 44. j = k s.t. = A 1:i + l ; # for every order find the tl traversed 45. if (-3 j ) # if the t‘ 46. Q i + = l; # create a new tl 47. — -Mu-t-li # initialize it 48. o I I © 49. t g = 0; 50. ‘o f = 51. j = Q i ; 52. end; 53. t 'f + = 1; # increment number of times traversed 54. t f + = I; # increment time spent in the state it connects to 55. T, = j ; # update the must recently used traversals 56. end 57. if (n raax > 1) # if user-specified order > 1 58. do-node_split(); # test to see if node-splitting is needed 59. end 60. inlink = outlink; # the last step in transitioning to the new state 61. end B.2.3 Calculating Traversal Probabilities The following pseudo-code function calculates the traversal probabilities associated with a partic­ ular node and updates the appropriate statistics in the node and its links. 172 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. function traversaLprob(node) 1. xi = {t | Z * = = node}; 2. m = £ (Z f-Z f); at! i€*i 3. x - 2 = {i 11{ = = node}; -t- « a = 5 3 if; all i€xa 5. if (ni + ri2 ^ 0) 6. = _JiJ . node n i + n a ' [P — I. *2 fit -f fin ! # find all incoming links to node # total self-transitions of node # find all outgoing links from node # total times node has been entered # calculate the probability of a self-transition # calculate outgoing transition probabilities S. end B.2.4 Calculating Node Probabilities The following function updates the mean and variance for the particular node passed as a param­ eter. function node_prob(node) 1. x\ = {i | Z f = = node}; # find all incoming links to node 2. ni = ^ Z f; # total times node has been entered a ll i€-ci 3. no = ^ if; # total time spent in node a ll t g x i 4. if (ni > 0 ) # update mean time spent in node 5- “ n o d e = n2/n i; 6 . else '■ “ 'node = 0 ; 8 . end 173 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. # total of squared durations spent in node a il t€ * i 10. if (n t > l) # update the variance associated with node <ode = ("3 - 2 * < ode * n 2 + n l * [ a ^ J ^ / K - 1); 12. else 14. end B.2.5 Node Splitting This function determines whether node splitting is necessary, and if it is, splits the nodes and creates new links and mutli-link transitions as appropriate. function do_node_split() 1. flag = 0: # boolean indicating if splitting is necessary 2. splitorder = 1; # current Markovian order checked for splitting 3. while ((splitorder < nmax) & (flag = = 0)) # check all orders 4. splitorder + = 1; 5. xi = {i 11{ = = oldnode}: # find out-links 6 . Vi | i 6 xi, Pi = If: # get link probabilities # normalize probabilities 8 . flag = 0; q r t j .[splitorder— l,2:splitorder] = -^splitorder}: # find multi-link traversals 10. if (splitorder = = 2) # get total transitions 11. transitions = Z f nlink; 174 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 12. else 13. transitions = £ all 1 6 x 2 14. end 15. for j = 1 to length (xi) 16. if (transitions > 0) # calculate binomial limits as in Section 3.3.1 # With X = „ = transitionSi # and at a significance level of a = 0.05 or 0.01 17. Id = lower binomial confidence limit; 18. ud = upper binomial confidence limit: 19. else 20. Id = 0; 21. ud = I; 22. end 23. if p, [ • p h 'o r d c f — l.lj < Id | Pf [ • p M o ttt e r — 1 .11 > ud ) V ‘ *2.J l *2.j J 24. flag = 1 ; # do node-splitting 25. end 26. end 27. if (flag = = 0) 28. if (test_node.durations() = = 1) # if there are inconsistencies in the time spent # in the state that is being left, then split 29. flag = 1 : # do node-splitting 30. end 31. end 175 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 32. if ((flag = = 1) & (all elements of Ai:S p iito rd er are unique)) # split states! 33. Alast = Alast + 1; # add a new state 34. aA ia« = a? « ? # set the symbol '^tplrtM dw 35. Vi | Si 6 S, & A iast(0 = V, (0; # adjust B A tplitard«r 36- = Alast! # move in-link 37. b,/ ( a A last) = Alast; tp lito rd ff 38. ^Aiast(aA iast) = Alast; 39. A = A; # make a temporary list of links # make the rest of the new states with new links between them 40. for i = splitorder - 1 downto 2 41. Alast = Alast + 1; 42. “A last = a,\ ; 43. Vi | Sj € S. 6A last(i) = t> t\ (i); 44. Llast = Llast + 1: 45. /jjast = Alast - I: 46. ^uast = Alast: # find the multi-link traversal 4 7 t p m n — I. s r f [s p lito rd e r-1.1 :s p li» o r d e r -i+ 1 )_____. •tl. temp — A . s.t. tj------------------------------------------------- --M :sp lito rd er> j o iS _ .[splitorder—i,> 5 ] Llast — tem p ^ . = ^ - ^ , a s t ; - n , £ _ .[s p lito rd e r-t.E ], o u - ‘Llast — ‘ tem p =;1 / £ _ ;E _ / E . 0 i - ‘A, ~ ‘ Ai ‘ L last' ■j-i [splitorder—i . E 2] Llast ‘ tem p ; 52 =i3 I - 2 — / - 2 _ l ~ 2 ■ Ai ~ ‘ A, ‘ L last' 54. A] = Llast; 176 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0 0 . 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. .[splitorder—i,l:splitorder — i + l ] __ tem p i:sp!itorder ’ ^ . K i a s t ) = Alast; ^A last(** A last) = A l a S t i end for i = splitorder - 1 downto 2 for k = splitorder - i + 1 to nmax - 1 / V I ,[ * .l:s p lito r d e r - i+ l] . , temp = {J [ t) = = A1 :S p|itorder}; J c — A i:splitorder’ end end for ki = 2 to splitorder — 1 for = fci + 1 to splitorder — 1 2 - f c l + 1l _ \ — -V fci:*3, # add another element to t 67. temp = j s.t. tj*3 f c l 6 8. Qk-i-ki + = 1: 69. — f c i + l | .' ~ U. 70. W * . - x t - 71. _ ,S •■temp — 1 .< • A * i 72. W * t - X , ’ 73. , [ * 2 - * i . S [ _ / E . t tem p — 4 * ' i A di 74. [*2-*..S3] _ £» W * . “ 'A ;,' 75. e 3 ] _ E2 ‘ tem p — > A ‘ t 76. end 77. end 78. for linkloop = splitorder downto 79. — r • I Jsp lito rd er—lin k lo o p + I. x 2 — U I tj Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 80. for j = 1 to tength(xo) 81. if — A ^ . ^ i d i n k l o o p > 2)) 82. Llast + = 1; 83. lf Uia = l\. ; * b n U o e p Q a .[splitorder—linkloop+1.1] 0 4 . Tl — l X2 j *; 85. transitions = I 8 6. = a - lS _ .[splitorder—linkloop+L.iS] 0 ( - Llast r 2.j ! 8 8 - ls n -= an i £ _ .[splitorder—lin k lo o p + l.E ] S J - Llast ~ c* 2 .j ! 90 /- — /- • J U - ‘ n — L last’ _ . .v;2 __ [sp lito rd er-lin k lo o p -i-l.E "] Clast — '■ 92 l~2 -= • l n ~ ‘ L last’ L last’ 93. for fci = splitorder — linkloop downto 0 94. if (ki > 0) £ 3 temp S . t . ^tem p ^ = = = [ ^ ’ *^linkloop:linkloop+/ti g g temp ~ j | jf 5pl'tor^ e r—linkloop a- 1. L splitorder—linkloop+2] q? _ _ r.[*i.l:fci + I[ * 1. ~ [ ‘ -Ta • - ‘ Iinkloop+fci :SplitOrder i 98. Q/d + = 1: 9 9 J f e i , l : f c | + l[ _ [ i u - j » ' 1 . LQ k l ~ [ u a s x ’ -Vlin k lo o p :lin k lo o p + fc t-lJ ’ t n n .[*1 .< S ] _ .[splitorder-linkloop+ l.iS ] AUU- cQ k l ~ ‘ tem p • 101. - = iri« j .[ * i,E ] .[splitorder—lin k lo o p + l.S ] l u z - cQ kl — ‘ tem p : 103. -=t%'kf ]; l n . *[fcl’“ 2] [splitorder—lin k lo o p + 1 .E 3] 1 U 4 - Qfcj - tem p : Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ^Q g 2 3 = {j I * > i'3‘r l l = = [ n , A|irWoop:linkloop+fci - l ] } jqq c = k - 2 + splitorder - (linkloop + fci - 1); HQ. if (c + 1 < rimax) H I for k3 = I to length{x3) . .[c.i’ C -rL j 112> temp = j s.t. tj = = ^L*'i,3 fc2 -Vinkloop+tiisplitorderji U 3 . if (3temp) 114. + = 1: .[*=,l:fc3 — fct] _ 115. Q j» 2 - C *3>3 H g kl'k~^ ^ = j^LlaSt, A nn^oap.iinkioop+ fc, Jfca.1 5 ! _ .M l . 117. £ Q k2 “ £ «m P’ _ .1 *2.'* ] . 118. £l3>3 — Q i> i ' ,1 **.=! _ JcX 1. 119. £ Q. 2 ~ '" ’ ’’P’ ,1*2 .^1 __ Jfca.El. 120. tx3>3 ~ lQk, T > - = al _ tM . 1 2 1 . £Q»3 - C « mP ’ [*a.E»] [fc.S’ l. 1 2 2 . £x3>3 - Qkz 123. end 124. end 125. else 12g for k-i = 1 to length(x3) 127 p3 = ^ /tra n s itio n s : 1 2 g . i f ( P 3 > 0 ) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. Qk-i +— l; [f c j.l:* ! —fcl) _ _ J k i , l : k 3 — fci] Q k o — l x 3.k3 Q ko ~ [ L l a s t » A l i A linkloop:linkloop+fc l $ ? = round (pz • *£“;?); X3.I13 — l Q k„ > S ; “’ = round(pt ■ [*».=! __ **.k3 ~ zQk~ h . £ a] , ( .[*3.e3]\ Q k2 - r o u n < / ( P 3 J ) ; _ _ *[*»•=’]. ■ r3 ‘3 ~ C Qk2 end end end end end for = splitorder - iinkloop + 1 to nmax - 1 for ki = 1 to fct — splitorder + Iinkloop f p m n — / 1 I »[fci-* 3 :sp|it orde'— H n k lo o p + fc s + ll , . . . temp \ j \ tj A|in|,|oop:splitorder]}t V; e terrm ^ ^ “ “fcr-linkloop+fc^ + L ] _ rL| ] J I- e m P ' cj — [L|ast, •k|inkloop:splitorderj ' end end end for n2 = 1 to Alast node_prob(n2); traversal .prob (n2): end 180 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 154. end 155. end 156. xi = j s.t. (/J = = Alast)&(Z‘ = = 6A iast(sym)); 157. At = x t ; 158. A = A '; 159. outlink = xi; 160. oldnode = Alast: 161. currnode = Z ‘ ; 162. i = I; 163. reinitialize x: 164. while (1 < nmax) & (Ai+l £ 0) 165. temp = j s.t. == A Ui+l; 166. if (->3temp) 167. Xi = 0; 168. else 169. Xi = temp; 170. end 171. i + = 1: 172. end 1 *3. Tfi :tength(x) = 174. end 175. end 181 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B .3 Sum m ary This appendix presented many of the model construction algorithm used in 182 details of the AMM representation and pseudo-code for the this dissertation. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A ppendix C Tables of Critical Points for T This appendix provides tables of critical points for the nonparametric test of location by Fligner & Rust (1982), which is described in detail in Section 3.4.1 of this dissertation. The T statistic by Fligner & c Rust (1982), a modification of Mood’s (1954) statistic, enables a nonparametric median test that makes very few distribution assumptions. In particular, it accommodates non-symmetric and non-identically shaped distributions. This dissertation uses the T -test in the construction of nonparametric AMMs (Appendix B) and in testing the significance of the experimental data in Chapter 8. Unfortunately, there is no convenient source for the critical points of T, making its use impractical. This appendix attempts to ameliorate this situation by presenting tables of critical points for sample sizes 3 < m .n < 25, which were calculated for the work in this dissertation. The values for each pair of sample sizes were derived using a 100,000- iteration Monte Carlo simulation. Because the distribution of T is symmetric, and thus, in order to avoid redundancy, only the upper half (i.e., with min(m. n) indexing the row) of the full 23 x 23 table is presented in this appendix. When m .n > 25, the inverse cumulative normal distribution provides a good approximation to the critical points of T (see Section 3.4.1). The tables of this appendix are interpreted in the following manner. Each 7 x 3 cell of a table is indexed by a particular combination of sample sizes, {m in(m ,n),m ax(m ,n)}. The first column of each cell gives the nominal significance level, a, for the critical point, and the second column 183 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. provides the actual significance level. The third column provides TQ{min(m,n),max(m, n)}, the critical point for the T distribution at a nominal significance level of a. 184 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m . n a 4 S 8 O.IOO O 0 .0 4 3 7 2.4490 0.1000 0.0854 1.5000 0.1000 0 0718 2 .0 9 9 0 0.1000 0 .0 8 2 7 1.4020 0.0500 0 .0 4 9 7 2. <490 0.0 5 0 0 0.0 2 8 9 2.2500 0 .0 5 0 0 0 .0 7 1 8 2.0990 0.0 5 0 0 0 .0 4 7 0 1.9990 0.0250 0 .0 4 9 7 2.4490 0.0250 0.0289 2.2500 0.0250 0.0181 2.1900 0.0250 0.0120 2 .1 0 4 0 3 0.0100 0.0 4 9 7 2.-1490 0.0100 0.0289 2.2500 0 0100 0.0181 2.1900 0.0100 0.0120 2 .1 0 4 0 0.0050 0 .0 4 9 7 2.4490 0.0050 0.0 2 8 9 2.2500 0.0050 0.0181 2.1 9 0 0 0.0050 0.0120 2.1040 0.0025 0 .0 4 9 7 2.4490 0.0025 0.0 2 8 9 2.2500 0.0025 0 0181 2.1900 0 0 0 2 5 0.0120 2 .1 0 4 0 0.0010 0 .0 4 9 7 2.4490 0.0010 0.0 2 8 9 2.2500 0.0010 0.0181 2.1 9 0 0 0.0010 0.0120 2.1 0 4 0 0.1000 0.1575 1.4140 0.1000 0-1197 1.3280 0.1000 0.1 1 8 2 1.2900 0.0500 0.0 1 3 7 2.8280 0.0500 0.0 3 8 8 I 9910 0.0500 0.0 2 3 7 2.4680 0.0250 U-0137 2.8280 0.0250 0.0 3 1 4 1.9930 0 0250 0.0 2 3 7 2.4680 4 0.0100 0.0 1 3 7 2.8280 0.0100 0.0 0 7 2 2.8 5 7 0 0.0100 0.0 0 9 4 2.5810 0.0050 0 0 137 2 8280 0.0050 0.0 0 7 2 2.6 5 7 0 0.0050 0.0 0 9 4 2.5810 0.0025 0.0 1 3 7 2 8280 0.0025 0 007 2 2.6570 0 0025 0.0 0 9 4 2.5810 0.0010 0.0 1 3 7 2.8280 0.0010 0-0072 2.6570 0.0010 0.0 0 9 4 2.5810 0 . 1000 0.1 0 4 5 1.8000 0.1000 0.1115 1.1990 0.0500 0.0 5 8 3 1.8070 0.0500 0.0 5 3 9 1.7980 0 .0 2 5 0 0 004 2 3.1620 0.0250 0.0134 2.3 9 7 0 S 0.0100 0 004 2 3.1620 o.o too 0.0091 2.3 9 8 0 0 .0050 0 004 2 3.1620 0.0050 0.0022 2.9 9 8 0 0.0025 0 0 0 4 2 3.1620 0.0025 0.0022 2.9980 0.0010 0 .0 0 4 2 3.1620 0.0010 0.0022 2.9 9 8 0 0.1000 0.1 4 7 2 1.1540 0.0500 0.0 3 9 8 2.1900 0 0250 0.0 2 0 4 2.3090 6 0 0100 0.0 0 0 9 3.4640 0.0050 0.0000 3.4640 0.0025 0.0000 3.4640 0.0010 0 .0 0 0 9 3.4 d 4 0 Table C .l: Critical points, r o {3.. .6 ,3 .. . 6 }, for the T distribution. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m . (i 7 a 9 io O -IOOO 0.0 0 3 5 1.0640 0.1000 0.1101 1.2890 0.1000 0.0 9 0 4 1-7320 0.1000 0.1102 1.2030 0 .0 3 0 0 0.057S 1.9080 0.0500 0.0431 1.9340 0.0500 0.0 4 5 2 1.9410 0.0500 0 .0 3 5 2 1.8940 0.0 3 5 0 0.0 0 8 3 3.0 7 0 0 0.0250 0.0431 1.9340 0.0250 0 .0 4 5 2 1.9410 0.0250 0 .0 3 5 2 1.8940 3 0 0100 0.0 0 8 3 3.0 7 0 0 0.0100 0 0059 2.0160 0.0100 0.0 0 4 6 1-9900 0.0100 0.0 0 3 8 1.9620 0.0 0 5 0 0.0 0 8 3 2.0 7 0 0 0.0050 0.0059 2.0100 0.0050 0 .0 0 4 6 1.9990 0.0050 0.0 0 3 8 1.9620 0.0 0 3 5 0.0 0 8 3 3.0 7 0 0 0.0025 0 0059 2 0 1 6 0 0.0025 0 .0 0 4 6 1.9990 0.0025 0.0 0 3 8 1.9620 0.0010 0.0 0 8 3 2.0 7 0 0 0.0010 0.0 0 5 9 2.0160 0.0010 0 .0 0 4 6 1.9990 0.0010 0.0 0 3 8 1.9620 0-1000 0.1 0 7 0 1.2420 0.1000 0.0913 1.2240 0.1000 0 .0 9 4 7 1.1900 0.1000 0 .1 3 5 4 1.1590 0.0 5 0 0 0.0 4 6 0 1.7630 0.0500 0.0311 2.1780 0.0500 0.0480 1.6010 0.0500 0 .0 5 0 0 1.1830 0 0350 0 .0 3 3 9 1.0640 0.0250 0.0248 2.3570 0.0250 0.0210 2.1350 0.0 2 5 0 0 .0 3 1 2 2.2170 4 0.0100 0.0 0 6 3 2.4 8 5 0 0.0100 0.0061 2.4490 0.0100 0.0 0 4 3 2.3850 0.0100 0 .0 1 3 5 2.3180 0.0 0 5 0 0.0 0 6 3 2.4 8 5 0 0.0050 0.0061 2.4490 0.0050 0.0 0 4 3 2 .3850 0.0 0 5 0 o . o o t o 2.3600 0 .0 0 3 5 0.0 0 6 3 2.4 8 5 0 0.0025 0.0061 2.4490 0.0025 0.0 0 4 3 2 .3850 0.0 0 2 5 0.0010 2.3660 0.0010 0.0 0 6 3 2.4 8 5 0 0.0010 0.0061 2.4490 0.0010 0.0 0 4 3 2 3850 0.0010 0.0010 2.3660 0.1000 0.0 0 8 3 1.0770 0.1000 0.0873 1 4600 0.1000 0.0 9 3 0 1.0200 0.1000 0 .0 9 0 6 1.3410 0 .0 5 0 0 0.0531 1.7560 0.0500 0.0510 1.6920 0.0500 0.0360 1.6730 0.0500 0 .0 4 5 8 1 6260 0 .0 3 5 0 0.0 0 7 9 2.7 9 5 0 0.0250 0-0102 2.1320 0.0250 0.0 3 6 0 1.0730 0.0250 0 .0 1 8 8 1.9240 5 0.0100 0 0079 2.7 9 5 0 0.0100 0.0114 2.2590 0.0100 0 0 107 2.5710 0.0100 0 0 1 2 0 2.1730 0 .0 0 5 0 0.0 0 4 0 2.9 2 7 0 0.0050 0.0040 2.6650 0.0050 0.0 0 7 7 2 .7 2 1 0 0.0050 0 .0 0 5 8 2.5810 0 .0 0 3 5 0.0 0 4 0 3.9 2 7 0 0.0025 0.0024 2.8240 0.0025 0.0 0 1 3 2 .7 8 8 0 0.0025 0 .0 0 1 7 2.7160 0.0010 0.0 0 4 0 3.9 2 7 0 0.0010 0 0024 2.8240 0.0010 0.0 0 1 3 2 .7 8 8 0 0.0010 0 .0 0 1 7 2.7160 0.1000 0.0 7 7 3 1.3260 O.10O0 0.1307 1.0800 0.1000 0.0 0 3 2 1.3480 0.1000 0 .1 1 8 3 1.0320 0 .0 5 0 0 0.0 6 6 5 1.6530 0.0500 0.0513 1.9010 0.0500 0.0 5 6 2 1.5660 0.0500 0 .0 5 7 0 1.8970 0 .0 3 5 0 0 .0 3 5 0 2.0560 0.0250 0 0215 2.1000 0.0250 0.0211 2.0 8 6 0 0.0250 0 .0 1 7 4 2.0650 0 0.0100 0.0 1 4 4 3.2030 0.0100 0.0020 3.1510 0.0100 0 0 055 2.4 5 7 0 0.0100 0 .0 0 3 5 2.8460 0 .0 0 5 0 0.0041 2.7530 0.0050 0.0020 3.1510 0.0050 0.0 0 5 0 2.6 0 8 0 0.0050 0 .0 0 3 5 2.8460 0 .0 0 3 5 0.0 0 3 4 2.7540 0.0025 0.0020 3.1510 0 0025 0.0 0 1 4 2.9 4 9 0 0.0025 0 .0 0 2 8 3.0300 0.0010 0.0 0 0 6 3.3050 0.0010 0 0011 3.2400 0 0010 0.0 0 0 8 3.1 3 3 0 0.0010 0 .0 0 0 7 3.0980 0.1000 0.1 3 0 3 1.5550 0.1000 0.0997 1.2030 0.1000 0 . 1 101 1.4690 0.1000 0 .1 0 4 5 1.4120 0.0 5 0 0 0.0 6 9 3 1.0030 0.0500 0 0001 1.5370 0.0500 0 0 6 7 9 1.5110 0.0500 0 .0 3 6 4 1.7640 0.0 3 5 0 0.0 1 4 8 2.5020 0.0250 0.0260 2.0490 0 0 2 5 0 0.0 1 9 9 2.2030 0.0250 0 .0 2 9 5 1.9590 r 0.0100 0.0 0 7 4 2.0720 0.0100 0 0088 2.3890 0 0100 0.0 0 8 3 2.5190 o .o to o 0 .0 0 7 7 2.4480 0.0 0 5 0 0.0 0 7 4 2-0720 0.0050 0 0051 2 5020 0.0050 0 0 083 2.3190 0.0050 0 .0 0 6 4 2.4490 0.0 0 3 5 0 .0 0 0 3 3.7 4 1 0 0.0025 0 0012 3.0730 0.0025 0 0 007 3.4290 0.0025 0.0020 2.8240 0.0010 0 0 003 3.7 4 1 0 0.0010 0 0013 3 0730 o .o o t o 0 0 0 0 7 3.4 2 9 0 0.0010 0 .0 0 1 4 2.9390 6.1000 0 0002 1.7800 0 1000 0 0904 t .3850 0.1000 0 .0 7 4 0 1.5030 0.0500 0 0 6 0 4 1 9400 0.0500 0 0 447 1.7120 0.0500 0 .0 5 4 0 1.8440 0.0250 0.0300 2.0000 0 0250 0.0 3 3 5 t .9310 0.0250 0 .0 3 1 9 1.0970 8 0.0100 0 0050 2.0100 0.0100 0.0 1 0 7 2.4 1 4 0 0.0100 0 .0 0 7 0 2.5740 0 0050 0 0050 3.9100 0.0050 0 0031 2.7 7 1 0 0 0050 0 .0 0 5 5 2.7660 0 0035 0.0025 3.0000 0.0025 0 0025 2.8 9 7 0 0.0025 0 .0 0 2 9 2.8460 0.0010 0 0000 4.0000 o .o o to 0.0004 3.3 0 0 0 0.0010 0.0002 3.0880 0.1000 0 0810 1 4140 0 . 1000 0.0 9 0 1 1.3090 0.0500 0 0274 2.1000 0.0500 0 .0 5 0 2 1.7400 0.0250 0 0251 2.2 0 6 0 0.0250 0.0 1 9 4 2.0220 0 0 0100 0 0 125 2.3 5 7 0 0.0100 0 .0 1 4 3 2.2830 0 0050 0 0 015 3.2 0 1 0 0.0050 0 .0 0 4 5 2.0110 0 0025 0 0 015 3.2 0 1 0 0.0025 0 .0 0 2 4 2.7400 0.0010 0 0 008 3.2 9 9 0 0.0010 0 0010 3.0560 0.1000 0 .0 8 0 4 1.4310 0.0500 0 .0 3 9 8 1.7860 0.0250 0 .0 1 1 7 2.4000 to 0.0100 0 .0 1 0 7 2.6030 0.0050 0.0051 2.6030 0.0025 0 .0 0 0 5 3.4700 o .o o t o 0 0 005 3.4700 Table C.2: Critical points. r a {3... 1 0 ,7 ... 10}. for the f distribution. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m . n 1 1 1 2 13 14 0.1000 0 .0 9 4 ? 1.7320 0.1000 0.1 0 9 4 1.2450 o .io a o 0.1015 I 7320 0.1000 0.0 9 0 9 1.2450 0.0500 0 .0 5 8 7 1.8610 0.0500 0.0 6 8 4 1.7970 0 0500 0.0509 1.8410 0 .0 5 0 0 0.0 4 1 0 1.8220 0 0 3 5 0 0.0 1 0 5 1.9230 0.0250 0.0 2 8 4 1.8680 0.0250 0.0076 1 8940 0 .0 2 5 0 0 .0 4 1 0 1.8220 3 0.0100 0.0 1 0 5 1.9230 0.0100 0.0021 1.9250 0.0100 0 0076 I 6940 0.0100 0.0 0 6 0 1.8680 0.0050 0.0 1 0 5 1 9230 0.0050 0.0021 1.9250 0.0050 0.0076 1.6940 0 .0 0 5 0 0.0 0 6 0 1.8680 0.0025 0.0 1 0 5 1.9230 0.0025 0.0021 1.9250 0.0025 0.0076 1.8940 0 .0 0 2 5 0.0 0 6 0 1.8680 0.0010 0.0 1 0 5 1.9230 0.0010 0.0021 t 9250 0.0010 0.0076 1.8940 0.0010 0.0 0 6 0 1.8680 0.1000 0 .0 8 3 2 I 1570 0.1000 0 .1 0 7 5 1.1330 0.1000 0.1072 I 1110 0.1000 0.0921 1.1150 0.0500 0.0 5 1 0 I 5820 0.0500 0 .0 4 6 3 1.1540 0.0500 0.0515 1.6110 0 .0 5 0 0 0.0451 t .1330 0.0250 0.0 2 4 9 2.1100 0.0250 0 .0 2 8 4 2.1820 0.0250 0.0222 2.1490 0 .0 2 5 0 0.0 2 5 5 2 .1560 4 0.0100 0.0181 2.2270 0.0100 0 0095 2.2670 0.0100 0.0072 2.2230 0.0100 0 .0 0 6 7 2.2300 0.0050 0 .0 0 2 9 2.3190 0.0050 0 .0 0 0 6 2 3 0 9 0 0.0050 0.0072 2.2230 0 .0 0 5 0 0 .0 0 6 7 2 .2300 0.0025 0.0029 2.3100 0.0025 0 .0 0 0 6 2.3 0 9 0 0.0025 0.0003 2.2780 0 .0 0 2 5 0 .0 0 0 3 2.2670 0.0010 0 .0 0 2 9 2.3190 a .o o io 0 .0 0 0 6 2.3090 0.0010 0.0003 2.2780 o .o o to 0.0 0 0 3 2.2670 0.1000 0.0781 1.5670 0.1000 0 .0 9 9 4 1.4760 0.1000 0.1183 1.4620 0.1000 0 .0 9 8 0 1.4570 0.0500 0.0285 1.6180 0.0500 0 .0 6 5 ? 1.5420 0 0500 0.0644 1-5470 0 .0 5 0 0 0 .0 5 8 7 1.5150 0.0250 0.0285 1.6180 0.0250 0 .0 2 0 8 1.8750 0 0 2 5 0 0.0239 1.5780 0 .0 2 5 0 0 .0 2 2 3 1.8630 5 0 0100 0 .0 1 1 7 2.5120 0.0100 0 .0 0 0 3 2.3440 0.0100 0.0117 2.4700 0.0100 0 .0 0 9 6 2.4290 0 0050 0.0064 2.6380 0.0050 0 .0 0 4 8 2.5710 0.0050 0.0047 2.5790 0 .0 0 5 0 0 .0 0 4 2 2.5250 0.0025 0.0011 2.6960 0 0025 0 .0 0 0 7 2.6490 0.0025 0.0005 2.6310 0 .0 0 2 5 0 .0 0 4 2 2.5250 0.0010 0.00 I t 2.6960 0.0010 0 .0 0 0 7 2.6490 0.0010 0.0005 2.6310 0.0010 0 .0 0 0 4 2.5930 0.1000 0 .0 8 7 6 1.3020 0.1000 0 .1 0 7 5 1.0000 0.1000 0 0914 1 2900 0.1000 0 .0 9 8 2 0.9750 0.0500 0 .0 4 2 9 1.7360 0.0500 0 .0 5 9 9 1.6540 0.0500 0 0500 1.6320 0 .0 5 0 0 0 .0 5 8 5 1.8230 0.0250 0 .0 1 7 2 2.0170 0.0250 0 0 1 4 8 I 9090 0.0250 0.0312 1.9030 0 .0 2 5 0 0 .0 3 2 8 1.9100 a 0.0100 0.0 0 6 7 2.3080 0.0100 0.0 1 4 8 1.9990 0.0100 0.0076 2.1510 0.0100 0.0121 1.9510 0.0050 0.0044 2 5220 0.0050 0 .0 0 4 ? 2.6110 0.0050 0.0047 2.4530 0 .0 0 5 0 0.0 0 5 3 2 .5880 0.0025 0 0032 2.7700 0.0025 0 .0 0 3 0 2.9310 0.0025 0.0027 2.7210 0 .0 0 2 5 0.0 0 2 4 2 .8660 0 0010 0.0 0 0 4 3.0290 0.0010 0 0007 3 0000 0.0010 0.0003 2.9460 0.0010 0.0003 2 .9270 0.1000 0 1164 1.1130 0.1000 0 .1 0 7 2 1.3680 0.1000 0.1114 I 3730 0.1000 0 .1 0 1 9 1.3360 0.0500 0 0628 I- 4500 0.0500 0 .0 4 1 2 1.6140 0 0500 0.0576 1 4060 C.0500 0 .0 4 2 7 1.6000 0.0250 0.0 2 5 0 2.0480 0.0250 0 .0 2 7 0 I 8920 0.0250 0 0262 2.1650 0 .0 2 5 0 0.0278 1.8420 7 0 0100 0.0078 2.4170 0.0100 0 .0 0 7 5 2 3630 00100 0 0071 2.3440 0 0100 0 .0 0 6 7 2.3000 0.0050 0.0078 2-4170 0.0050 0 0066 2 3650 0 0050 0.0071 2.3440 0 0050 0.0060 2.3020 0.0025 0 .0 0 1 3 3 0900 0.0025 0 0025 2.7360 0.0025 0.0014 2.8330 0 .0 0 2 5 0.0025 2.6720 0.0010 O.OOlt 3.2970 0.0010 0 .0 0 0 6 3.0150 o .a o io 0.0009 3.2030 0 .0 0 1 0 0.0010 2.8000 0.1000 0 .1 0 6 3 1.3300 0.1000 0 .0 8 3 6 1.4140 0-1000 0.0949 1.3390 0.1000 0 .0 9 3 0 1.4140 0.0500 0.0488 1.7550 0 0500 0 0584 I 7780 0.0500 0 0515 1.7210 0 .0 5 0 0 0 .0 5 3 2 1.7440 0.0250 0.0348 1.8470 0 0250 0 .0 3 1 3 1.8250 0.0250 0 0302 1.7870 0 .0 2 5 0 0 .0 2 5 4 1.7720 • 0.0100 0.0004 2.3090 0.0100 0.0100 2.3070 0.0100 0.0107 2.2340 0 .0 1 0 0 0 .0 1 0 6 2.5120 0.0050 0.0051 2.4890 0.0050 0 .0 0 3 4 2.7380 0.0050 0 .005? 2 5820 0 .0 0 5 0 0 .0 0 6 8 2.6160 0.0025 0 .0 0 2 3 2.7710 0.0025 0 .0 0 3 4 2.7380 0.0025 0.0026 2.6810 0 .0 0 2 5 0 .0 0 2 4 2.6590 0 o o to 0 .0 0 0 5 3.1040 o .o o to 0 .0 0 0 4 3.3350 0.0010 0.0008 2.8390 0 .0 0 1 0 0 0006 3.1760 0.1000 0 .0 8 4 7 1.3480 0.1000 0 .0 0 8 0 1.3140 0.1000 0 .078? 1 3000 0.1000 0 .0 9 7 5 1.2730 0.0500 0 .0 3 5 2 1.8460 0 0500 0 .0 5 1 1 I 6820 0.0500 0 0399 t .? 8?0 0 .0 5 0 0 0 .0 4 7 4 1.6660 0.0250 0 .0 2 5 4 2 1830 0.0250 0.0 2 3 9 1.8120 0.0250 0 0259 2.1300 0 .0 2 5 0 0.0244 2.0150 9 0 .0 1 0 0 0 .0 1 4 4 2.2470 0 0100 0 0056 2.3570 0 0100 0.0129 2.1680 0 .0 1 0 0 0.0069 2.1490 0.0050 0.0028 2.8430 0.0050 0 0053 2.5230 0.0050 0 0038 2.6790 0 0050 0 0050 2.5470 0 0025 0.0025 3.0480 0 0025 0 0033 2.6300 0.0025 0.0027 2.9820 0 .0 0 2 5 0.0024 2.6920 0.0010 0 .0013 3 1460 0 0010 o .o o n 3.0 6 7 0 0.0010 0 0012 J 035 0 0 0 010 0.0010 2.9720 0.1000 0 1 0 9 6 I 3020 0.1000 0 .0 9 8 8 1 3740 0 1000 0 1146 1.2540 0 .1 0 0 0 0.0999 1.5250 0.0500 0.0476 I 7360 0.0500 0-0424 1.7120 0.0500 0 0502 1.6720 0 .0 5 0 0 0.0411 1.6560 0.0250 0 .0 2 3 0 I 8970 0 0250 0 .0 1 5 8 2.2350 0.0250 0 0253 1.9870 0 .0 2 5 0 0.0178 2.0380 10 0 .0 1 0 0 0 .0 0 7 7 2.3060 0 0100 0 .0 1 0 9 2.5210 0 0 1 0 0 0.0009 2.2440 0.0100 0.0115 2.4400 0.0050 0 0056 2.6050 0 0050 0 0059 2.5690 0 0050 0 0062 2.5080 0 .0 0 5 0 0 .0 0 5 6 2.4840 0.0025 0 .0 0 1 7 2.8970 0 0025 0 .0 0 1 0 3.1970 0.0025 0 0022 2.6190 0 .0 0 2 5 0.0014 2.9170 0.0010 0.0008 3.0400 0 .0 0 1 0 0.0010 3.1970 0 .0 0 1 0 0.0013 2.9270 0 .0 0 1 0 0.0010 3.2530 0.1000 0 .0 9 2 0 1 2 7 9 0 0.1000 0 .1 1 6 3 I 2440 0.1000 0 0977 1 2290 0 .1 0 0 0 0 1105 1.2030 0.0500 0 .0 4 1 5 1.8280 0.0500 0.0551 1.6590 0.0500 0 0497 1.6400 0 .0 5 0 0 0.0480 1.6050 0.0250 0 .0 1 7 6 2.1320 0.0250 0 .0 2 6 4 1.9830 0.0250 0.0204 2.0480 0 .0 2 5 0 0.0228 2.0060 1 1 0.0100 0 .0 0 4 3 2.7710 0 .0 1 0 0 0 .0 0 9 6 2.1770 0.0100 0.0065 2.4920 0 .0 1 0 0 0.0101 2.3380 0.0050 0 0043 2.7710 0.0050 0 002? 2.5680 0.0050 0.0053 2.8100 0 .0 0 5 0 0 .0 0 3 9 2.4 3 2 0 0.0025 0 .0 0 1 7 2.9840 0-0025 0 0025 2.7760 0.0025 0.0024 2.8670 0 .0 0 2 5 0 .0 0 2 6 2.8080 0 .0 0 1 0 0 .0 0 1 7 2.9840 0 .0 0 1 0 0 .0 0 1 4 2.9040 o .o o to 0 0003 3.4390 0.0010 0 0000 2.9 6 3 0 0.1000 0 .0 9 2 5 1 6010 0.1000 0.0810 1.2550 0.1000 0.0964 1.5420 0.0500 0.0 4 8 9 1.6320 0.0500 0 0604 1.5940 0 .0 5 0 0 0 0524 1.5730 0.0250 0.0 1 9 6 2.1000 0.0250 a 0260 I 9930 0 .02S0 0.0 2 3 9 1.8870 13 0.0100 0 .0 0 6 0 2.4490 a .o io o Q 0095 2.3920 0.0100 0.0095 2.3600 0.0050 0 0080 2.4400 0.0050 0 0038 2.5430 0 .0 0 5 0 0.0 0 2 5 2.7320 0 0025 0 .0 0 1 7 3.0320 0.0025 0 0028 2.7900 0 .0 0 2 5 0.0 0 2 5 2.7320 0 .0 0 1 0 0 .0 0 0 8 3 2650 0 .0 0 1 0 0 0010 3.0870 0 .0 0 1 0 0.0 0 0 9 3.1470 0.1000 Q 1079 1.1760 0.1000 0.0 8 7 3 1.1770 0.0500 0 0483 1.9230 0 .0 5 0 0 0.0 4 1 3 1.6350 0.0250 0 0249 1.9610 0 .0 2 5 0 0.0 3 0 3 1.9170 13 0 0100 0 0088 2.3540 0 .0 1 0 0 0.0121 2.3010 0.0050 0 0035 2.7450 0 .0 0 6 0 0 .0 0 5 0 2.6 0 3 0 0.0025 0.0035 2.7450 0 .0 0 2 5 0.0015 2.7 9 7 0 0.0010 0.0005 3.2770 0.0010 0 .0 0 1 1 3.0680 0.1000 0.1056 1.4820 0 .0 5 0 0 0.0 5 9 5 1.5110 0 .0 2 5 0 0.0 2 3 3 2.2230 14 0.0100 0.0 1 1 4 2.2670 0 .0 0 5 0 0 .0 0 3 ? 2.5920 0 .0 0 2 5 0.0031 2.9640 0.0010 0.0 0 1 4 3 .0 2 3 0 Table C.3: Critical points, ra{3...1 4 ,1 1 ... 14}, for the f distribution. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m . n 15 16 IT I S 0.1000 0.1024 1.7320 0.1000 0.0925 1.2340 0.1000 0.1058 1.7320 0.1000 0 .0 9 3 6 1.2250 0.0 5 0 0 0.0410 1-8270 0 .0 5 0 0 0.0353 1.8110 0.0 5 0 0 0.0350 1.8160 0 .0 5 0 0 0 .0 2 8 9 1.8020 0.0250 0.0410 1.8270 0 .0 2 5 0 0.0353 1.8110 0.0 2 5 0 0.0350 1.8160 0 .0 2 5 0 0 .0 2 8 9 1.8020 3 0.0100 0.0048 1.8730 0.0100 0.0040 1.8510 O.OICO 0.0033 1.8570 0.0100 0 .0 0 2 9 1.8380 0.0 0 5 0 0.0048 1-8730 0 .0 0 5 0 0.0040 1.8510 0.0 0 5 0 0.0033 1.8570 0 .0 0 5 0 0.0 0 2 9 1.8380 0.0025 0.0048 1.8730 0 .0 0 2 5 0.0040 1.8510 0.0 0 2 5 0.0033 1.8570 0 .0 0 2 5 0 .0 0 2 9 1.8380 0.0010 0.0048 1.8730 0.0010 0.0040 t .8510 0.0010 0.0033 1.8570 0.0010 0 .0 0 2 9 1.8380 0.1000 0.0045 1.0970 0.1000 0.C810 1.1010 0.1000 0.0868 1.0860 0.1000 0 .1 0 1 5 1.0730 0.0 5 0 0 0.0511 1-5060 0 .0 5 0 0 0.0462 1.1180 0.0500 0.0525 1.5850 0 .0 5 0 0 0.0 5 1 9 1.0050 0.0 2 5 0 0.0207 2.1 2 9 0 0 .0 2 5 0 0.0231 2.1 3 7 0 0.0 2 5 0 0.0190 2.1 1 4 0 0 .0 2 5 0 0.0 2 9 9 2.0960 4 0.0100 0.0053 2.1940 0.0100 0.0051 2.2030 0.0100 0 .0042 2-1720 0.0100 0 .0 0 8 6 2.1470 0.0050 0.0053 2 .1 9 4 0 0 .0 0 5 0 0.0051 2.2030 0.0 0 5 0 0.0 0 4 2 2.1 7 2 0 0.0 0 5 0 0 .0 0 8 6 2.1470 0.0025 0 0003 2 .2 4 2 0 0.U025 0.0002 2.2360 0.0025 0.0042 2.1 7 2 0 0.0 0 2 5 0 .0 0 0 7 2.1900 0.0010 0.0003 2.2420 0.0010 0.0002 2.2360 o .o o t o 0.0001 2.2 1 5 0 0 0010 0 .0 0 0 7 2.1900 0.1000 0 .U 1 7 1.4630 0.1000 0.0045 1.4430 0.1000 0.0741 1.4710 0.1000 0 .0 8 0 7 1.4310 0.0500 0.0550 1 5210 0 .0 5 0 0 0.0525 1.4040 0.0 5 0 0 0.0317 I 5090 0.0500 0.0 4 8 1 1.4770 0.0250 0.0220 1.5490 0 .0 2 5 0 0 0232 1.8540 0.0 2 5 0 0.0189 1.5260 0.0250 0 .0 2 6 5 1.5100 5 0.0100 0.0121 2 .4390 0 0100 0 0003 2.4050 0.0100 0.0072 2.4 5 2 0 0.0100 0 .0 0 8 9 2.3860 0.0050 0.0043 2.5350 0 .0 0 5 0 0.0034 2.4900 0.0 0 5 0 0.0072 2.4520 0.0050 0 .0 0 2 8 2.4620 0.0025 0.0043 2 .5350 0 .0 0 2 5 0.0034 2.4000 0.0 0 2 5 0.0013 2.5150 0.0025 0 .0 0 2 8 3.4620 0.0010 0.0004 2.5810 0.0010 0.0003 2.5310 0.0010 0.0013 2.5 1 5 0 0.0010 0 .0 0 0 3 2.5180 0.1000 0.0025 1.2810 0.1000 0.0842 0.9570 0.1000 0.0053 1.2750 0 1000 0 .0 8 3 3 0.9420 0.0500 0.0405 1.7900 0 .0 5 0 0 0.0464 1.8330 0.0 5 0 0 0.0474 1 7720 0 0500 0 .0 4 0 8 1.8110 0.0250 0.0288 1.8680 0 .0 2 5 0 0.0210 1.8790 0 0250 0 0277 1.8410 0 .0 2 5 0 0.0 1 7 1 1.8510 0 0.0100 0.0080 2 .1360 0.0100 0.0080 1.9140 0.0100 0.0090 2.1250 0.0100 0 .0 0 7 8 1.8850 0.0 0 5 0 0.0051 2.4040 0.0 0 5 0 0 0055 2.6480 0.0 0 5 0 0.0048 2.4490 0 .0 0 5 0 0 .0 0 5 7 2.6260 0.0025 0.0018 2 .8020 0 .0 0 2 5 0 0036 2.7500 0.0025 0.0015 2 7620 0 .0 0 2 5 0 .0 0 3 4 2.7170 0.0010 0.0003 2.8850 0.0010 0 0010 2.8340 0.0010 0.0015 2 .7 6 2 0 0.0010 0 .0 0 0 7 2.7 9 4 0 0.1000 0.0872 I 3460 0.1000 0.0068 1.2820 0.1000 0.0791 1 321 0 0.1000 0 .1 1 5 5 1.2650 0 .0 5 0 0 0.0453 1.3730 0 050 0 0.0448 1 5600 0.0 5 0 0 0 0444 1.3470 0.0 5 0 0 0.0531 1.3320 0 .0 2 5 0 0.0242 2 .1760 0 .0 2 5 0 0.0236 1.8800 0 0250 0 0232 2.1 4 3 0 0.0250 0 .0 2 4 6 2.0330 r 0 0100 0.0128 2 .2540 0 0100 0.0132 2.1860 0.0100 0.0110 2.2 1 4 0 0.0100 0 .0 0 9 9 2.1750 0 0050 0.0045 2 .2880 0 .0 0 5 0 0 0056 2.2550 0 0050 0 0040 2.2450 0.0050 0 .0 0 4 5 2.2190 0 0025 0.0020 2 .7740 0 002 5 0.0026 2.6240 0.0 0 2 5 0.0022 2.7590 0.0025 0 .0 0 2 5 3.6100 0.0010 0.0006 3 .1560 0 0010 0.0011 2.9240 0.0010 0.0005 3.1000 0.0010 0.0011 2.9510 0.1000 0.0975 1.30S0 0 1000 0.0050 1.4140 0.1000 0 1070 1 2530 0.1000 0 .1 0 0 7 I 4700 0 0500 0.0528 1.6820 0.0 5 0 0 0 0494 I 6070 0 0500 0.0443 1.6300 0 .0 5 0 0 0 .0 4 5 6 I 6660 0 0250 0 0 1 9 t 1.8770 0 025 0 0.0232 1.7320 0.0250 0 0264 1.7090 0 .0 2 5 0 0 .0 2 2 7 t .6990 a o .u to o 0.0086 2.2530 0 0100 0.0109 2.4650 0 0100 0 0000 2.2150 0.0100 0 .0 1 0 9 3.4280 0 0050 0.0058 2.5240 0.0 0 5 0 0.0062 2.5580 0 0050 0.0049 2.5060 0 .0 0 5 0 0 .0 0 5 4 2.5120 0.0025 0.0028 2 .6 t3 0 0.0 0 2 5 0.0020 2.5080 0.0025 0.0022 2.5620 0 0025 0 .0 0 1 9 2 5490 0.0010 0.0011 2.9440 0.0010 0.0006 2.0780 0 0010 0 0010 2 .9 2 3 0 0.0010 0 .0 0 0 8 2.9500 0.1000 0.0751 1.2640 0.1000 0.0853 1 2450 0.1000 0 . 1188 I 2170 0.1000 0.1151 1.1920 Q.OSOO 0.0456 1-7710 0 050 0 0.0544 1 6190 0 0500 0 0491 1.6660 0 0500 0 .0 4 6 6 I 6 260 0 0250 0.0263 2.0730 0 .0 2 5 0 0.0226 2.0040 0 0250 0 0254 2 .0 2 8 0 0 0250 0.0220 1.9640 0 0 0100 0.0121 2.1080 0.0100 0.0077 2.2100 0 0100 0 0117 2 .0 6 1 0 0.0100 0 .0 0 8 2 2.0940 0 0050 0-0044 2.4790 0 .0 0 3 0 0.0053 2.4000 0.0 0 5 0 0.0053 2 5990 0 .0050 0 .0 0 5 2 3.4390 0 0025 0 .0025 2.9 0 2 0 0 .0 0 2 5 0.0024 2.7190 0 0025 0 0027 2.8400 0 .0 0 2 5 0 001 9 2.7810 0.0010 0.0000 2.9510 0 0010 0.0007 2.9050 0.0010 0 0010 2.8850 0.0010 0 .0 0 0 9 2.8460 0 1000 0.1047 1.2200 0.1000 0.0009 1.5180 0.1000 0 1062 1 1910 0.1000 0 .1 0 0 8 1.4910 0.0500 0.0403 1.6260 0.0 5 0 0 0.0393 I 6120 0 0500 0.0469 1.5870 0 .0 5 0 0 0 .0 3 8 2 1.5770 0.0 2 5 0 0.0270 1.9780 0 025 0 0.0207 2.0210 0.0250 0.0226 1 984 0 0 .0 2 5 0 0 .0 2 3 2 I 8070 10 0.0100 0-0102 2.3740 0.0100 0 0123 2.3770 0 0100 0 0105 2 .3 2 1 0 o .o t o o 0.0121 2 3 280 0 0050 0.0056 2.4400 0 .0 0 5 0 0 0060 2.4180 0 0050 0.0057 2 .3 8 2 0 0 .0050 0 .0 0 5 6 2.3660 0.0025 0.0026 2.7700 0 .0 0 2 5 0.0018 2.8690 0.0 0 2 5 0 0027 2 .7 0 8 0 0 .0 0 2 5 0.0022 2.6760 0 0010 0.0010 3.0230 0.0010 0.0010 3.1700 0.0010 0 0010 2 .9 6 7 0 0.0010 0 0012 3.1040 0.1000 0.0983 1.1900 0.1000 0.1145 1.1700 0.1000 0.0057 1.1600 0.1000 0 .1 1 1 6 1.1430 0 0500 0 .0 5 1 7 1 8600 0 .0 5 0 0 0.0414 1 6050 0.0 5 0 0 0 0539 1.8200 0 .0 5 0 0 0 .0 4 5 3 1.5940 0 0250 0.0208 1.9840 0 025 0 0.0233 1 9500 0 0250 0 0201 1.9340 0 .0 2 5 0 0 .0 2 3 7 1 9050 i t 0 0100 0.0078 2.2720 0.0100 0.0092 2.3400 0 OlOO 0.0090 2 .2 5 3 0 0.0100 0 .0 1 0 4 2.2860 0.0050 0.0053 2.7290 0 .0 0 5 0 0.0048 2.3920 0 0050 0 0053 2 6620 0 .0 0 5 0 0 .0 0 4 7 2.5770 0.0025 0 .0027 2.7780 0 .0 0 2 5 0.0026 2.7300 0 0025 0 0025 2.7080 0 .0 0 2 5 0 .0 0 2 4 2 6690 0 0010 0.0004 3.1300 o . o o t o 0.0010 3.0360 0.0010 0 .0 0 0 7 3.0900 0.0010 0.0011 2.9710 0.1000 0 .0910 1.2420 0.1000 0.0957 1.4060 0.1000 0.0090 1.1540 0.1000 0 .0 9 4 1 1.4600 0 .0 5 0 0 0 .0538 1.5430 0 .0 5 0 0 0.0536 1.5270 0.0 5 0 0 0 0568 1.5020 0 .0 5 0 0 0 .0 5 3 3 1 490 0 0 0250 0 .0300 1.9280 0 .0 2 5 0 0 0252 2.1450 0 0250 0 0196 1.9690 0 .0 2 5 0 0 .0 2 7 0 2.1010 1a 0.0100 0 0112 2.3140 0.0100 0 0100 2.2910 0.0100 0 .0 0 0 7 2.2530 o .o t o o o . o t o o 2.2360 0 .0 0 5 0 0-0049 2.4480 0 .0 0 5 0 0 0031 2.6800 0.0050 0 0051 2.5560 0 .0 0 5 0 0 .0 0 3 8 2.4700 0.0025 0.0029 2.7000 0.0 0 2 5 0.0024 2.9930 0 0025 0 0021 2.7850 0 .0 0 2 5 0 .0 0 2 5 2.0290 0.0010 0 0 0 0 9 3 .0 8 6 0 o . o o t o 0.0010 3.0550 0 0010 0.0009 3 .0 0 4 0 0.0010 0.0012 2.9810 0.1000 0.1124 1.1360 o . t o o o 0.0926 1.1940 o .t o o o 0.1125 1.1050 0.1000 0 .0 9 5 4 1.1850 0.0500 0.0511 I 8570 0 .0 5 0 0 0.0475 1.4930 0 .0 5 0 0 0.0464 1.8080 0 .0 5 0 0 0 .0 5 1 3 1.6790 0.0250 0.0263 1.8940 0 .0 2 5 0 0.0321 1.8500 0 0250 0.0283 1.8420 0 .0 2 5 0 0.0202 1.8970 13 0.0100 0.0009 2.4530 0.0100 0.0073 2.2470 0.0100 0.0098 2.5270 0.0100 0 .0 0 9 2 2.2130 0 0050 0 .0 0 3 7 2.6520 0 .0 0 5 0 0.0047 2.3680 0.0 0 5 0 0.0049 2.5790 0 .0 0 5 0 o . o o s t 2.5370 0 .0 0 2 5 0 .0 0 3 7 2 .6 5 2 0 0 .0 0 2 5 0.0019 2.6980 0.0 0 2 5 0.0013 2.6950 0 .0 0 2 5 0 .0 0 2 5 2.6 2 0 0 0.0010 0 .0008 3.1540 o . o o t o 0.0010 2.9750 0.0010 0 0010 3.2490 o .o o t o 0 .0 0 0 9 3.0220 0.1000 0 .0968 1.1630 0.1000 0.0908 1.4360 0.1000 0.1025 1.3240 0.1000 0 .1 0 5 6 1.4050 0.0500 0 .0452 1.5570 0 .0 5 0 0 0.0634 1.4630 0 0500 0.0494 1. 1400 0.0500 0 .0 6 2 2 1.4250 0.0250 0 0 1 9 4 1.8690 0 .0 2 5 0 0.0271 2.1520 0 0250 0.0236 t.S 5 9 0 0.0 2 5 0 0 0269 2.1080 14 0.0100 0 .0064 2.3610 0.0100 0.0135 2 .t9 5 0 0.0100 0 .0 0 8 0 2.1 6 9 0 0.0100 0 .0 1 3 7 2.1380 0.0050 0 .0 0 4 9 2.5 2 4 0 0 .0 0 5 0 0.0045 2.3330 0.0 0 5 0 0.0046 2.5 1 6 0 0.0 0 5 0 0 .0 0 5 3 2.7170 0.0025 0.0020 2.7130 0 .0 0 2 5 0.0029 2.8720 0.0 0 2 5 0.0024 2.7 9 1 0 0.0 0 2 5 0 .0 0 1 7 2.8 5 0 0 0.0010 0 .0014 2.9600 0.0010 0.0004 3.3860 0.0010 0.0008 2.9 3 5 0 0.0010 0 .0 0 0 5 3.3 9 6 0 Table C.4: Critical points. T^,{3 ... 1 4,15... 18}, for the T distribution. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m . n IS 16 IT IS 0 .1 9 0 0 0.1200 1.0050 0.1000 0.1011 1.3050 0.1000 0.0787 1.3310 0.1000 0 .1 0 2 7 1.3470 0 0300 0.0500 1.7000 0 .0 5 0 0 0.050-1 1.6400 0.0500 0.0508 1.7470 0 .0 5 0 0 0 .0 5 1 5 1.6890 0 .0 2 5 0 0.0310 t .5250 0 0250 0.0224 1.6080 0.0250 0.0165 2.0 8 9 0 0 .0 2 5 0 0 .0 2 5 2 1.9230 I S 0.0100 O .O lll 2.5 0 6 0 0.0100 0.0060 2.2840 0.0100 0.0106 2.4460 0.0100 0.0008 2.3570 0 .0 0 5 0 0 0053 2 .5 5 6 0 0.0050 0.0067 2.5050 0.0050 0.0066 2.4 7 0 0 0 .0 0 5 0 0 .0 0 3 5 2.5290 0 .0 0 2 5 0.0013 2.6160 0.0025 0.0024 2.7750 0.0025 0.0024 2.8 6 0 0 0 .0 0 2 5 0 .0 0 2 5 2.7850 0.0010 0.0011 3 2220 0.0010 0.0008 2.0520 0 0010 0 0008 3.1 8 8 0 0.0010 0.0010 2.8850 0.1000 0 1167 1.3040 0.1000 0.0067 1.3440 0.1000 0.1014 1.3560 0.0500 0 0381 1.7650 0.0500 0 0479 I 6810 0 .0 5 0 0 0 .0 4 2 0 1 6 1 5 0 0 0250 0 0301 2 .0 9 2 0 0 0250 0 0255 2.0150 0 .0 2 5 0 0 .0 2 6 8 2.0340 10 0 0100 0.0060 2 .5 2 0 0 0 0100 0 0105 2.3510 0.0100 0.0078 2.3140 0.0050 0.0047 2 .7 8 0 0 0.0050 0.0043 2.5300 0 .0 0 5 0 0 .0 0 4 9 2.7120 Q.0025 0.0023 2.8280 0.0025 0 0030 2.7750 0 .0 0 2 5 0 .0 0 2 9 2.7 4 8 0 0.0010 0 0004 3.1 6 2 0 0 0010 0.0000 3.0260 o.ooto 0.0 0 0 8 3.0 9 1 0 0.1000 0 0853 I 3170 0.1000 0.1002 1.3050 0.0500 0 0380 1 .7 t4 0 0 .0 5 0 0 0 0488 1.6840 0.0250 0 0183 1.9070 0 .0 2 5 0 0.0 2 4 4 1.0580 IT o.otoo 0.0 0 7 2 2.4000 0.0100 0.0 0 9 3 2.2940 0 0050 0 0072 2.4000 0 0050 0.0 0 4 6 2.3770 0 0025 0-0026 2.0280 0 0025 0.0 0 3 4 2.6040 o.ooto 0 00 to 3 .0 8 6 0 0.0010 0.0011 3.0 3 1 0 0.1000 0.0 7 8 4 1.3330 0 .0 5 0 0 0.0474 1.5360 0 0250 0.0201 1.9900 IS 0.0100 0.0 0 8 8 2.2180 0 0050 0.0 0 3 2 2.6660 0 .0 0 2 5 0.0 0 3 2 2.6660 0.0010 O.OOll 3.1620 Table C.5: Critical points, f a{ 1 5 ... 18.15... 18}, for the T distribution. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m , n lO 30 31 22 0.1000 0.0054 1.7800 0.1000 0.0904 1.2180 01000 0.0938 1.7570 0.1000 0 .0 9 0 7 1.2020 0.0500 0.0407 1.7910 0.0500 0.0739 1.7020 0.0500 0.0400 1-7850 0.0 5 0 0 0 .0 3 4 5 1.7780 0.0350 0.0003 1.8330 0.0350 0.0258 1.7950 0.0 3 5 0 0.0400 1.7850 0.0 2 5 0 0 .0 3 4 5 1.7780 3 o .o to o 0.0003 1.8330 0.0100 0.0023 1.8270 0.0100 0.0054 1 8 1 4 0 0.0100 0 .0 0 4 4 1.8030 0.0050 0.0003 1.8330 0.0050 0.0022 1.8370 0.0 0 5 0 0.0054 1.8140 0.0 0 5 0 0 .0 0 4 4 1.8030 0.0035 0.0003 1.8330 0.0025 0.0032 1.8270 0.0 0 2 5 0 0054 1-8140 0.0 0 2 5 0 .0 0 4 4 1.8030 0.0010 0.0003 1.8330 0.0010 0.0022 1.8270 0.0010 0.0054 1.8140 0.0010 0.0 0 4 4 1.8030 0.1000 0.0003 1.0770 0.1000 0.0902 1 0000 0.1000 0.0945 1.0570 0.1000 0 .0 8 4 0 1.0600 0.0500 0.0530 1.5700 0.0500 0.0511 1.0800 0.0 5 0 0 0.0405 1.5800 0.0 5 0 0 0.0 5 1 0 1.0780 0.0350 0.0180 3.1030 0.0250 0.0271 3.0800 0.0 2 5 0 0.0328 2.0740 0.0 3 5 0 0.0 2 4 3 2.0 7 9 0 4 0.0100 0.0033 3.1540 0.0100 0.0074 3.1330 0.0100 0.0058 3.1140 0.0100 0.0 0 5 5 3.1 2 0 0 0.0050 0.0033 3.1540 0.0 0 5 0 0 0074 2.1320 0.0 0 5 0 0.0058 2.1140 0 .0 0 5 0 0.0 0 5 5 2.1200 0 0035 0.0033 3.1540 0 0035 0.0000 2.1720 0.0 0 2 5 0.0004 2.1510 0.0 0 2 5 0.0 0 0 3 2.1 5 6 0 0.0010 0.0001 3.1930 0.0010 0.0006 3.1730 0.0010 0.0004 3.1510 0.0010 0.0 0 0 3 2.1 5 0 0 0-1000 0.1305 1.4100 o .to o o 0.1037 I 4070 0.1000 0 .U 1 8 1.4110 0.1000 0.0 9 9 4 1.4010 0.0500 0.0 0 5 3 1.4580 0.0 5 0 0 0 0000 1.4420 0.0 5 0 0 0.0572 1.4470 0 .0 5 0 0 0 .0 5 5 5 1.4330 0.0350 0.0 3 8 0 1.4930 0.0 3 5 0 0.0240 1 4900 0.0350 0.0208 I 4780 0 .0 3 5 0 0 0254 1.8270 5 0.0100 0.0058 3.4300 o .o to o 0.0111 2.3450 0 0100 0.0135 2.3 5 1 0 0.0100 0.0110 2.3 3 5 0 0.0050 0.0058 3.4300 0.0 0 5 0 0.0045 3.4030 0.0050 0.0048 2.4120 0.0 0 5 0 0 .0 0 4 2 2 .3 8 8 0 0.0035 0.0 0 0 9 3.4800 0.0 0 2 5 0 0000 2.4570 0.0025 0.0000 2.4030 0.0 0 2 5 0 .0 0 4 2 2.3 0 8 0 0.0010 0.0009 3.4800 0.0010 0 0000 2.4570 0.0010 0.0006 3.4030 0.0010 0.0000 3.4 3 7 0 0.1000 0.0 9 0 3 0.9330 0.1000 0.1133 0-9200 0.1000 0.0988 1.2570 0.1000 0 .1 0 0 7 0 9 1 1 0 0.0500 0.0538 1.7340 0.0600 0.0383 1.7940 0.0500 0.0547 1.7240 0.0 5 0 0 0 .0 0 2 8 1.7290 0.0350 0.0179 1.0350 0.0250 0.0100 1.8290 0.0250 0.0171 1.8100 0.0 2 5 0 0 .0 3 4 4 1.7790 0 0.0100 0.0009 3.1030 0.0100 0 0003 1.0010 0 0100 0.0101 3.0900 o .o to o 0 .0 0 0 3 1.8420 0.0050 0.0050 3.5330 0.0050 0.0001 2.0000 0.0050 0.0048 2.5000 0.0 0 5 0 0.0 0 5 8 2.5 9 4 0 0.0035 0.0030 3.0000 0.0025 0.0020 2.0910 0.0025 0.0025 2.0590 0.0 0 2 5 0.0 0 2 5 2.6 0 9 0 0.0010 0.0007 3.7530 0.0010 0.0000 2 7010 0.0010 0.0005 2.7350 0.0010 0 .0 0 0 4 2.7 3 3 0 0.1000 0.1311 1.3700 0.1000 0.1108 1.2520 0.1000 0.1143 1 2570 0.1000 0 .1 0 7 0 1.2410 0.0500 0.0437 1.3300 0.0600 0.0534 1.3130 0.0500 0.0430 I 3090 0.0 5 0 0 0.0 5 0 0 1.5670 0.0350 0.0330 3.1170 0 0250 0.0259 3-0190 0.0350 0.0204 2.0900 0.0 2 5 0 0.0 2 7 3 2.0 0 7 0 T 0 0100 0.0100 3.1830 0.0100 0.0089 3.1480 0.0100 0 0007 2.1430 0.0100 0.0002 2.1 2 5 0 0.0050 0.0043 3.3100 0.0050 0 0044 2.1800 0.0 0 5 0 0 0038 2.1020 0.0 0 5 0 0.0 0 5 0 2.1 0 3 0 0.0035 0.0037 3.0500 0 0025 0 0020 2.5780 0 0025 0.0028 2 0300 0.0 0 2 5 0.0 0 2 4 2.5 9 5 0 0.0010 0.0007 3 0540 0 o o to 0 0011 2 9310 0 0010 0 0018 2.9350 0.0010 0.0 0 1 3 2.0 9 7 0 0.1000 0.1049 I 3330 0.1000 0.1038 1 4730 0.1000 0.1050 1.2100 0.1000 0.1 0 5 3 1.4680 0.0500 0 0 4 0 5 1 0330 0.0500 0.0402 1 0410 0 0500 0.0402 1 8010 0.0 5 0 0 0.0371 1 0300 0 0350 0.0351 1.0800 0 0250 0 0210 1 0730 0 0350 0.0204 I 0500 0.0 2 5 0 0.0220 1.0510 A o .o to o 0.0105 3.3900 0.0100 0 0103 3.3900 0 0100 0.0080 2.3010 0.0100 0.0100 2.3 7 3 0 0.0050 0.0040 3.4070 0.0050 0 0040 3.4020 0 0050 0.0047 2.4300 0 .0 0 5 0 0.0 0 4 8 3.4 3 1 0 0.0035 0.0031 3.5190 0.0025 0 0018 2 5090 0 0025 0.0022 2.4030 0.0 0 2 5 0.0 0 1 6 2.4 7 7 0 o .o o to o .o o to 3 8780 0.0010 0 0009 2.9400 0.0010 0.0011 2.8420 0.0010 0 .0 0 0 9 2.9360 0.1000 0.1109 1.1950 0 1000 0 1110 1.1730 0 1000 0.1053 I 1780 0.1000 0.1 0 0 4 1.1580 0.0500 0.0505 1.7400 0 0500 0.0480 I 5990 0 O500 0.0500 I 8210 0.0 5 0 0 0.0601 1.5760 0 0350 0.0330 1.9830 0.0350 0 0 2 14 1 9 3 1 0 0.0250 0 0223 1.9530 0.0 2 5 0 0 .0 2 0 8 1.9050 0 00100 0.0113 3.0330 0.0100 0.0090 3.0050 0 0100 0 0108 I 9920 0.0100 0 0102 2.1000 0.0050 0.0049 3.0930 0.0050 0.0054 2.4000 0 0050 0 0051 2.0590 0.0 0 5 0 0 0051 2.5 2 0 0 0.0035 0.0034 3.7700 0.0025 0.0010 2 7300 0 0025 0 0020 2.7490 0.0 0 2 5 0.0022 2.7 0 3 0 0 00 to 0.0009 3.8330 0 o o to 0.0008 2.7800 0 0010 0.0009 2.7800 0.0010 0 .0 0 0 9 2.7 5 9 0 0.1000 0.0934 1 3040 0 1000 0.0979 1.4200 0.1000 0.1014 1.3220 0.1000 0.1 0 9 8 1.4210 0.0500 0.0 4 4 7 I 5500 0 0500 0.0386 1.5490 0 0500 0.0458 1.5300 0 .0 5 0 0 0 .0 5 1 7 1.5090 0 0350 0.0373 1.0900 0.0 2 5 0 0.0240 2. lOtO 0.0 3 5 0 0.0249 1.8970 0.0 2 5 0 0.0261 2.0 5 1 0 10 0.0100 0.0104 3.3700 0 0100 0.0115 2.2080 0.0100 0.0108 2.3450 0.0100 0 .0 0 9 2 2.2 5 7 0 0.0050 0.0055 3.3350 0.0050 0 0054 2.3230 0 0050 0.0058 2.2970 0 .0 0 5 0 0.0 0 4 3 2.2 8 8 0 0.0035 0.0033 3.7340 0 0025 0 0027 2.8020 0 0025 0.0024 2.0440 0 .0 0 2 5 0.0 0 2 5 2.0 4 2 0 0.0010 o .o o to 3.0300 0 0010 0.0013 3.0510 o .o o to 0.0008 2.9930 o .o o to 0.0 0 0 8 3.0 1 0 0 0.1000 0.0938 1.1300 0.1000 0.1009 1.1220 0.1000 0.0048 1.1100 0.1000 0.1021 1.1040 0.0500 0 0548 1.7880 0.0500 0.0490 1.5070 0 0500 0 0504 1.7000 0.0 5 0 0 0 0518 1.6540 0.0350 0.0304 1.0040 0.0250 0 0232 1.0000 0.0350 0 0312 1.8400 0.0 2 5 0 0 .0 2 3 0 1.8390 11 0.0100 0.0095 3.4350 0.0100 0.0099 2.2440 0 0100 0 0108 2.4090 0.0100 0.0102 2.2 0 8 0 0.0050 0.0003 3.0080 0.0050 0.0050 2.5540 0.0050 0.0050 2.5090 0.0 0 5 0 0.0 0 4 8 2.4870 0 0035 0.0038 3.0520 0.0025 0.0025 2.0100 0.0025 0.0021 2.0050 0.0 0 2 5 0 .0 0 2 5 2.6 7 0 0 0.0010 0.0009 2.0770 0.0010 0.0009 2.9910 o o o to o .o o u 3.0980 0.0010 O.OOU 2.8 7 5 0 0.1000 0.0984 1.3220 0.1000 0 1133 1.3990 0 1000 0.1013 1 3480 0.1000 0.1 1 0 3 1.3780 0.0500 0.0439 t .5290 0.0500 0 0490 1.4000 0.0 5 0 0 0.0459 1 4530 0.0 5 0 0 0.0 4 8 7 1.4350 0.0350 0.0330 t .9460 0.0250 0 0 2 7 3 2.0990 0.0 2 5 0 a 0251 1.0350 0.0 2 5 0 0.0 2 7 2 2.0 6 7 0 13 0.0100 0.0 1 0 7 3.2040 0.0100 0 .0 0 9 ! 2.1900 0.0100 a .o to i 2.1030 0.0100 0 .0 0 9 0 2.1 5 3 0 0.0050 0.0050 2.5700 0.0050 0.0047 3.4290 0.0 0 5 0 0.0047 2.5320 0 .0 0 6 0 0.0 0 4 8 2.6 5 0 0 0.0035 0.0 0 3 7 2.7370 0.0035 0.0020 3.0800 0.0 0 2 5 0.0037 3.0 9 7 0 0 .0 0 2 5 0.0021 2 .8 3 t0 0.0010 0.0010 3.938Q 0.0010 0.0010 3.9210 o .o o to 0 0011 2.0 8 4 0 0.0010 0.0 0 0 9 2.8 7 0 0 0.1000 0.1 0 7 3 1.0790 0.1000 0.0985 1.1570 0.1000 0 1 0 7 0 1.0580 0.1000 0.0 9 9 4 1.2520 0.0500 0.0497 1.7740 0.0500 0.0530 1.0500 0.0 5 0 0 0.0488 1.7390 0 .0 5 0 0 0.0 4 7 2 1.6820 0.0350 0.0 3 7 3 1.7090 0.0250 0.0219 1.8740 0 .02S0 0.0271 1.7040 0.0 2 5 0 0.0 2 3 2 t.7 5 7 0 13 0.0100 0 .0 0 9 7 2.4040 0.0100 0 0100 2.1870 0.0100 0.0091 2.4350 0.0100 0.0 1 0 4 2.2780 0.0050 0.0044 3.5190 0.0050 0.0053 3.4830 0.0 0 5 0 0.0042 2.4700 0.0 0 5 0 0 .0 0 5 2 2.4 3 8 0 0.0035 0.0010 2.7990 0.0025 0.0020 3.7030 0.0 0 2 5 o o o to 2.0290 0.0 0 2 5 0 .0 0 2 6 2.7 1 7 0 0.0010 0.0010 3.1940 0.0010 o .o o to 3.9710 0.0010 o o o to 3.1380 0.0010 0 .0 0 0 9 3.0 2 9 0 0.1000 0.0 9 8 7 1.3500 0.1000 0.0951 1.3700 0.1000 0.0980 1.3280 0.1000 0 .1 0 0 8 1.3480 0.0500 0.0495 1.0950 0.0500 0.0401 1.7000 0.0 5 0 0 0.0507 1.0000 0.0 5 0 0 0 .0 4 2 7 t.6 9 t0 0.0350 0.0 3 0 3 1.9480 0.0250 0 0238 2.0040 0.0 3 5 0 0-0237 1.9880 0.0 2 5 0 0.0 2 5 9 2.0220 14 0.0100 0.0096 3.1380 0.0100 0.0004 2.4030 0.0100 0.0100 2.3 3 6 0 0.0100 0.0 0 7 4 2.2 5 5 0 0.0050 0.0 0 3 9 2.5370 0.0 0 5 0 0.0054 3.0020 0.0 0 5 0 0.0045 2.4 1 5 0 0 .0 0 5 0 0.0 0 4 6 2.6960 0.0035 0.0035 2.8000 0.0025 0.0019 2.7 8 7 0 0.0035 0 0025 2.7 4 8 0 0.0 0 2 5 0.0021 2.7350 0.0010 0.0010 2.9320 0.0010 0.0006 3.1750 0.0010 Q.OOll 3.0100 o .o o to 0.0 0 0 8 2.9750 Table C.6: Critical points, Ta{3 ... 14,19...22}, for the T distribution. 190 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m . n 10 3 0 3 1 22 0.1000 0.0 8 3 4 1.3790 0.1000 0-1008 1.3320 0.1000 0.0873 1.3700 0.1000 0.1018 1.3080 0 .0 5 0 0 0.0531 1.7040 0.0500 0.0543 1.6520 0 .0 5 0 0 0 .0 5 3 7 1.6690 0 .0 5 0 0 0 .0 5 3 7 1.6360 0 0250 0 .0 1 9 0 1.9300 0.0250 0.0255 1.9730 0.0 2 5 0 0.0202 1.9100 0 .0 2 5 0 0.0261 1.9630 15 0.0100 0 .0 1 lfl 3.3860 0.0100 0.0110 2.3140 0.0100 0.0122 2.3360 0.0100 0.0112 2 .2 9 0 0 0 .0 0 5 0 0.0 0 6 7 2.4170 0.0050 0.0042 2.4620 0 0 050 0 0069 2.3660 0 .0 0 5 0 0 0047 2 .4 1 5 0 0 0035 0.0 0 3 5 3.7490 0.0025 0.0031 2.7310 0 0025 0 0025 2.9040 0.0 0 2 5 0 .0 0 2 9 2.6 7 1 0 0.0010 0 .000ft 3.1080 0.0010 0.0010 3.0610 0 0010 0 .0 0 0 7 3.0420 0.0010 0 .0 0 0 9 3.0050 0.1000 0.1 0 3 3 1 3 1 1 0 0.1000 0.1024 1.3240 0.1000 0.1 0 6 6 1.2950 0.1000 0 .1 0 6 5 1.2970 0 .0 5 0 0 0.0 5 1 6 I 6390 0 0 5 0 0 0 0436 1.6030 0.0 5 0 0 0.0494 1.6540 0.0 5 0 0 0.0 5 0 2 t .6930 0 .0 3 5 0 0.035ft 1.9670 0 0250 0 0282 1.9860 0.0 2 5 0 0 0231 1.9850 0 .0 2 5 0 0 .0 3 0 7 1.9460 lfl 0.0100 0.0 1 0 3 2.2950 0.0100 0 0085 2.2830 0.0100 0 0108 2.3160 0.0100 0.0101 2.3900 0 0050 0 .0 0 4 7 2.4550 0.0050 0 0049 2.6480 0.0 0 5 0 0 0055 2.5910 0.0 0 5 0 0.0061 2.5950 0.0 0 3 5 0 0031 2.7030 0.0025 0.0028 2.6830 0 0025 0.0020 2.6740 0.0 0 2 5 0.0 0 3 4 2.6280 0.0010 0.0010 3.0410 0.0010 0.0009 2.8540 0 0010 0 0010 2.9780 o.ooto 0.0 0 0 9 3.2390 0.1000 O.OOQS 1 2570 0.1000 0.0964 1.3150 0.1000 0 0965 1.2120 0.1000 0.0 9 0 8 1.2870 0 -0500 0.05ft6 1.6470 0.0500 0.0505 1.6440 0 .0 3 0 0 0 0428 I 6310 0 .0 5 0 0 0 .0 5 0 5 1.6090 0 .0 3 5 0 0.0 3 3 4 1.8290 0.0250 0.0257 I 0730 0 0250 0 0250 1 9 4 1 0 0 .0 2 5 0 0 .0 2 5 9 1.9310 IT 0 0100 0 0091 2.3360 0 0100 0 0115 2.3020 0 0100 0.0094 2.2830 0.0100 0.0 0 9 5 2.2540 0 .0 0 5 0 0 0037 2 5280 0 0050 0 0049 2.6310 0 0050 0.0041 2.4960 0 0050 0.0 0 4 8 2.5 7 5 0 0 .0 0 3 5 0.0 0 3 8 2.9630 0 0025 0 0026 2.7820 0 0025 0.0 0 3 3 2.8980 0.0 0 2 5 0.0 0 2 3 2.8 2 8 0 o.ooto 0.0012 3 0040 0.0010 0 0007 3.0670 0 ooto 0 0014 2.9360 0.0010 0.0 0 0 9 2.9960 0.1000 0.0 9 6 2 t .3110 0-1000 o .m o I 2820 0.1000 0.0 9 9 5 I 2810 0.1000 0.1 1 4 8 1.2540 0 .0 5 0 0 0.0 5 3 5 1.6390 0.0500 0.0408 1 8460 0 0500 0.0 5 6 3 1.6010 0 .0 5 0 0 0.0 4 9 2 1.8150 0 .0 3 5 0 0.0 2 5 4 1.9670 0.0250 0 0227 I 9490 0.0 2 5 0 0.0 2 7 2 1 9210 0.0 2 5 0 0.0 2 3 8 1.9060 ia 0 0100 0.0 1 1 3 2.2950 0 0100 0 0100 2.4700 0 0100 0 0076 2.2840 0.0100 0.0 1 0 8 2.4200 0 .0 0 5 0 0.0 0 5 4 2.3640 0.0050 0 0043 2.5990 0 0050 0.0050 2 5620 0.0 0 5 0 0.0 0 4 2 2.5420 0.0 0 3 5 0.0 0 3 2 2.7590 0.0035 0 0016 2.8000 0 0025 0 0025 2.8190 0.0 0 2 5 0 .0 0 1 6 2.6980 0 0010 0 .000ft 3.0650 0.0010 O.OOll 3 2050 0 0010 0.0 0 0 9 2.0870 0 0010 0.0010 3.1360 0.1000 0 0985 1 2460 0.1000 0.1040 1 2770 0 1000 0.1009 1.4230 0.1000 0.1085 1.2490 0.0 5 0 0 0 0 459 t 6220 0.0500 0 0564 I 5970 0 0500 0 0484 1.5830 0.0 5 0 0 0 0589 1.5610 0.0 3 5 0 0 0253 2.0310 0 0250 0 0289 I 9160 0.0 2 5 0 0 0250 2.1060 0.0 2 5 0 0.0202 1.9130 ia 0 0100 o.otos 2.2710 Q.OIOO 0 0074 2.2490 0 0100 0 0113 2.2160 0.0100 0.0084 2.1860 0 .0 0 5 0 0.0 0 4 3 2.4290 0 0050 0 0033 2 5550 0 0050 0 0050 2-5610 0 .0 0 5 0 0.0 0 6 3 2.4980 0.0 0 3 5 0 .0 0 3 3 2 8800 0.0023 0 0024 2 8100 0 0025 0 0021 2.8400 0 0025 0.0022 2.8 1 0 0 0.0010 0.0 0 0 5 3 1920 0.0010 0 0009 2 9860 0.0010 0 0007 3.1310 0.0010 0.0010 3.0 4 0 0 0.1000 0 0876 1.2640 0.1000 0 1071 1.2460 0 1000 0 .0 9 1 4 1.2350 0 0500 0 0537 1.8000 0 .0 5 0 0 0 0418 1.6460 0 .0 5 0 0 0 .0 4 7 2 1.8280 0.0250 0 0349 I 8070 0 0250 0 0201 1.9200 0 .0 2 5 0 0.0271 1.8530 3 0 0.0100 0.0106 2 4950 0 0100 0.0091 2.1940 0.0100 0.0 0 8 0 2.4 3 8 0 0.0050 0 0052 2 5290 0 0050 0.0 0 6 2 2.4920 0 .0 0 5 0 0.0 0 5 8 2.4 7 1 0 0.0025 0 0021 2.6310 0.0 0 2 5 0.0024 2.7500 0 .0 0 2 5 0 0025 2.9 3 5 0 0.0010 0.0007 3.1620 0.0010 0 OOIO 3.0450 0.0010 0.0011 3.0 8 9 0 0 1000 0 1013 1 4630 0. lOOO 0.1 1 3 2 1.2160 0.0 5 0 0 0 0528 1.5430 0 .0 5 0 0 0.0 4 5 2 1.6070 0 0250 0.0253 2. t30O 0 .0 2 5 0 0.0 2 3 4 1.8750 31 0.0100 0.0063 2.3100 0.0100 0.0103 2.2750 0 0 050 0.0 0 4 9 2.7300 0 .0 0 5 0 0.0 0 4 6 2.5580 0.0 0 2 5 0 0024 2.7770 0.0 0 2 5 0.0031 2.7370 0.0010 0.0 0 0 9 3.0360 0.0010 0.0010 3.0410 0.1000 0.0 9 4 9 1.2060 0.0 5 0 0 0.0 4 9 8 1.7900 0 .0 2 5 0 0.0 2 9 5 1.8090 33 0.0100 0.0 1 3 2 2.3 8 7 0 0 .0 0 5 0 0 .0 0 6 7 2.4 1 2 0 0 .0 0 2 5 0.0 0 2 6 2.9 8 4 0 0.0010 0.0012 3.0150 Table C ./: Critical points, .. .2 2 ,1 9 .. . 2 2 }, for the T distribution. 191 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m . n a s 24 3 5 O.IOOQ 0.1106 1.7320 0.1000 0.0 9 9 9 1.1980 0.1000 0.1107 1.7320 0.0500 0.0344 1.7810 0.0500 0.0301 1.7740 0.0500 0.0303 1.7770 0.0250 0.0344 1.7810 0.0250 0.0301 1.7740 0.0250 0.0303 1.7770 3 0.0100 0 0030 1.8070 0.0100 0.0032 1.7970 0.0100 0.0033 1.8010 0.0050 0.0030 1.8070 0.0050 0.0032 1.7970 0.0050 0.0033 1.8010 0.0025 0 0030 1.8070 0.0025 0.0 0 3 2 1.7970 0.0025 0.0033 1.8010 0.0010 0.0030 1.8070 0.0010 0.0 0 3 2 1.7970 0.0010 0.0033 1.8010 0.1000 0.0601 1.0520 0.1000 0.0791 1.0550 0.1000 0.0809 1.0480 0.0500 0.053* 1.5510 0.0500 0.0 4 9 9 2.0000 0.0500 0.0469 1.5720 0.0250 0.0212 2-0680 0.0250 0.0 2 2 9 2.0720 0.0250 0.0194 2.0 6 2 0 4 0.0100 0.0047 2.1040 0.0100 0.0 0 4 7 2.1100 0.0100 0.0041 2.0960 0.0050 0.0047 2.1040 0.0050 0.0 0 4 7 2.1100 0.0050 0.0041 2.0060 0.0025 0.0003 2.1380 0.0025 0.0003 2.1440 0.0025 0.0041 2.0060 0.0010 0.0003 2.1380 0.0010 0.0003 2.1440 0.0010 0.0003 2 .1270 0.1000 0.1004 1.4050 0 . 1Q00 0.0958 1.3960 0.1000 0.1013 1.4000 0.0500 0.0510 1.4380 0.0500 0.0512 1.4250 0.0500 0.0489 1.4300 0.0250 0.0261 1.4660 0.0250 0.0 2 4 9 1.4710 0.0250 0.0253 1.4570 5 0.0100 0.0132 2.3410 0.0100 0.0101 2.3270 0.0100 0.0122 2.3330 0.0050 0 0044 2.3970 0.0050 0.0 0 3 3 2.3760 0.0050 0.0033 2.3840 0.0025 0 0044 2.3070 0.0025 0.0033 2.3760 0.0025 0 0033 2.3840 0.0010 0.0005 2.4440 0 0010 0.0004 2.4210 0.0010 0.0004 2.4280 0-1000 0.1001 0.9140 0 100U 0.1036 0.9030 0.1000 0.1005 1.2520 0.0500 0.0536 1.7160 0 0300 0.0600 1.7210 0.0500 0.0526 1.7090 0.0250 0 0313 1.7600 0-0250 0.0318 1.7670 0.0250 0.0280 1.7500 a 0.0100 0.0104 2.1450 0.0100 0.0088 1.8250 0.0100 0.0101 2.1370 0.0050 0.0048 2.5740 0.0050 0.0058 2 5820 0.0050 0.0045 2 .5 6 4 0 0.0025 0.0022 2.6410 0 0025 0.0024 2.6510 0.0025 0.0020 2.6250 0.0010 0.0004 2 .7 0 t0 0.0010 0.0004 2.7110 0.0010 0.0003 2.6820 0.1000 0.1075 1.2470 o .t o o o 0.1014 1.2320 0.1000 0.1233 1.2200 0.0500 0.0430 1.2940 0.0500 0 0 497 I 5530 0.0500 0.0487 1.2680 0.0250 0.0103 2.0790 0.0250 0 0273 1 9970 0 0250 0.0237 2.0340 T 0.0100 0.0070 2.1200 0.0100 0.0081 2.1060 0.0100 0.0109 2 .0620 0.0050 0.0040 2.1580 0 0050 0 0050 2.1420 0.0050 0.0047 2.1210 0.0025 0.0028 2.8200 0.0025 0.0024 2.5700 0.0025 0.0028 2 .7780 0.0010 0.0015 20100 0 0010 0.0010 2.8760 0.0010 0.0015 2 8480 0.1000 0 0066 1.2270 0.1000 0.0990 1 4980 o .t o o o 0.0980 1.2140 0.0500 0.0602 1.5600 0.0500 0 0468 1.5850 0.0500 0.0587 t .5490 0.0250 0.0260 1.6370 0 0250 0.0260 1.6180 0.0250 0.0251 1 .8 2 t0 a 0.0100 0.0084 2.3410 0.0100 0.0122 2.3140 o o t o o 0 0086 2 .3240 0.0050 0 0042 2.4090 0.0050 0.0064 2.3770 0.0050 0.0043 2.3870 0.0025 0.0022 2.4340 0.0025 0 0026 2.4280 0.0025 0.0022 2.4290 0.0010 0 0000 2.8280 0 0010 0.0011 2.9970 0.0010 o .o o to 2.8280 0.1000 0.0846 1.1640 0.1000 0.1038 1.1460 0.1000 0.0809 1.1510 0.0500 0 0532 1.7830 0.0500 0.0503 1 5560 0 0500 0.0554 1.7730 0.0250 0.0204 1.9010 0 0250 0.0 1 9 0 1 8830 0 0250 0.0275 1.8830 a 0.0100 0 0000 1 9650 0 0100 0 o t o o 2.2230 0.0100 0.0094 1.0430 0.0050 0.0061 2.5820 0.0050 0.0053 2.5040 0.0050 0.0061 2.5 6 2 0 0.0025 0 0014 2.7160 0.0025 0.0019 2.6740 0 0025 0.0033 2.6 3 7 0 0.0010 0.0014 2.7250 0.0010 0.0 0 0 9 2.7260 0.0010 0 0013 2.6060 0.1000 0.0880 1.3300 0.1000 0.1084 1 4080 0.1000 0.1010 1.3630 0.0500 0 0430 1.5000 0.0500 0.0504 1.4850 0.0500 0.0436 1.4910 0.0250 0 0246 1.8860 0.0250 0.0 2 6 9 2.0380 0.0250 0.0253 1 8650 10 0.0100 0.0102 2.2160 0 0100 0.0085 2.2280 o .o to o 0.0104 2.1920 0.0060 0.0043 2.3040 0.0050 0.0044 2.2580 0 0050 0.0049 2.2970 0.0025 0.0025 2.6420 0 0025 0.0030 2 .8 t6 0 0.0025 0.0025 2.6250 o .o o t o 0.0009 2.9550 0 0010 0 0007 2.9810 0.0010 0.0009 2.9230 0.1000 0.0830 1.0990 0.1000 0.1051 1.0880 0.1000 0.1117 1.0740 0 0500 0.0480 1.7280 0.0500 0.0 5 4 0 1.6420 0.0 5 0 0 0.0455 1.7050 0.0250 0.0289 1.8130 0.0250 0.0 2 2 3 1.8130 0.0250 0.0272 1.7900 l i 0.0100 0.0110 2.3840 0 o t o o 0 0097 2.2020 0.0100 0.0114 2.3 6 3 0 0.0050 0.0046 2.5310 0.0050 0.0050 2.4630 0.0050 0.0043 2.4 9 9 0 0.0025 0 0021 2.5660 0.0025 0.0027 2.5400 0.0025 0.0021 2.5 3 2 0 0.0010 0.0010 3.0650 0 0010 0.0010 2.0030 o .o o t o O.OOlt 3.0380 0.1000 0.1022 1.3320 0.1000 0.1053 1.3600 0.1000 0.0955 1.3380 0.0500 0.0471 1.4200 0.0500 0.0478 1.4140 0.0500 0.0490 1.4900 0.0250 0.0 2 6 2 1.9080 0.0250 0.0 2 6 2 2.0410 0.0250 0.0275 1.9400 l a 0.0100 0.0106 2.1290 0.0100 0.0 0 8 7 2.1210 o .o to o 0.0104 2. lOtO 0.0050 0.0051 2.4820 0 0030 0.0 0 5 2 2.6220 0.0 0 5 0 0.0049 2.4 5 2 0 0.0025 0.0022 2.7410 0.0025 0.0021 2.7900 0.0025 0.0029 2.6770 0.0010 0.0012 2.8380 0 o o t o 0.0 0 0 9 2.8280 o .o o t o 0.0010 2.8 0 2 0 0.1000 0.1035 1.0400 0.1000 0.0 9 9 9 1.2720 0.1000 0.1041 1.0250 0.0500 0.0454 1.7110 0.0500 0.0559 1.6350 0.0500 0 0441 1.6860 0.0250 0.0266 1.7340 0.0250 0.0248 1.7350 0.0250 0.0268 1.7090 13 0.0100 0.0089 2.3050 0 0100 0.0 1 0 9 2.2890 0.0100 0.0089 2.3 6 0 0 0.0050 0.0041 2.4280 0.0050 0.0050 2.4040 0.0060 0.0046 2.3 9 3 0 0.0025 0.0018 2.6180 0.0025 0.0 0 2 7 2.6980 0.0025 0.0026 2.8 4 7 0 o .o o t o 0.0009 3.0790 0 o o t o 0.0 0 0 9 3.0350 o .o o t o 0.0010 3.0430 0.1000 0 0946 1.3190 0.1000 0.0 9 6 0 1.3260 o .t o o o 0.0934 1.2990 0.0500 0.0503 1.6580 0.0500 0.0451 1.6030 0.0 5 0 0 0.0517 1.6340 0.0250 0.0236 1.9780 0.0250 0.0 2 5 5 1.9890 0.0 2 5 0 0.0238 1.9480 14 o .o to o 0.0 0 9 7 2.3080 0 0100 0.0081 2.2460 0.0100 0.0101 2.2870 0.0050 0.0053 2.4610 0.0050 0.0 0 4 2 2.6530 0.0 0 5 0 0.0053 2.5 3 3 0 0.0025 0.0022 2.7040 0.0025 0.0 0 1 9 2.6900 0.0025 0.0024 2.6 2 6 0 0.0010 0.0010 2.9840 0.0010 0.0008 2.8070 0.0010 0.0010 2.9 9 5 0 Table C.8 : Critical points, r a {3 ...1 4 ,2 3 ...2 5 } , for the T distribution. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m . n 3 3 2 4 3 5 0.1000 0.0024 1.2900 0-1000 0 .1 0 3 2 1.2870 0.1000 0 .0 9 4 2 1.2900 0.0500 0.0540 1.6390 0.0500 0.0 5 5 4 1.6090 0.0500 0 .0 5 4 2 1.6130 0 .0250 0 .0223 1.9090 0.0 2 5 0 0.0 2 6 6 1.0310 0 .0250 0 .0 2 4 0 1.8070 18 0.0100 0 0122 3.2040 0 0100 0 .0 1 1 3 2.2530 0.0100 0 .0 1 2 4 2.2590 0 .0050 0 .0034 3.4540 0.0 0 5 0 0 .0 0 4 9 2.5610 0 .0060 0 .0 0 3 7 2.4440 0 .0035 0 .0 0 2 9 3.8570 0 0025 0 .0 0 2 6 2.6850 0 .0025 0 .0 0 2 9 2.8 1 7 0 0.0010 0.0008 3.9860 0.0010 0.0010 2.9540 0 0010 0.0010 2 .9 3 9 0 0.1000 0.1071 1.2720 0.1000 0 .1 0 4 5 1.2750 0.1000 0 .1 0 7 1 1.2520 0.0500 0.0519 1.6220 0.0 5 0 0 0 .0 5 0 3 1 7680 0 .0500 0 .0 5 3 7 1.5960 0.0250 0.0233 1.9470 0.0 2 5 0 0.0 1 9 6 1.9360 0.0250 0 .0 2 3 7 1.9150 16 0.0100 0.0113 2.2720 0.0100 0.0 0 9 6 2.4370 0.0100 0.0 1 0 4 2.2 3 5 0 0.0050 0.0042 3.5060 0.0 0 5 0 0.0 0 6 3 2.5500 0 .0050 0 .0 0 5 7 2.5 0 4 0 0-0025 0.0023 3.6380 0.0 0 2 5 0.0 0 3 5 2.5810 0.0025 0 .0 0 2 6 2.7230 0.0010 0.0010 2.9310 0.0010 0.0012 3.0860 0.0010 0.0011 3.0250 0.1000 0.0002 1.1640 0.1000 0.1 0 3 5 1.2640 0.1000 0.1000 1.4340 0.0500 0.0 4 1 9 1.5090 0.0500 0 0530 1.5800 0.0500 0 .0 4 3 5 1.5710 0 .0 3 5 0 0 .0255 2.0340 0.0 2 5 0 0.0 2 8 8 i 8960 0 .0250 0 .0 2 4 4 2.0770 IT 0.0100 0 .0098 3.2380 o.otoo 0 .0 1 2 3 2.2120 0.0100 0 .0 0 9 6 2.2000 0 0050 0 .0044 3.3220 0.0 0 5 0 0.0 0 5 5 2.5280 0 0050 0 .0 0 5 0 2.5 8 1 0 0 .0035 0 .0028 2.8420 0.0 0 2 5 0 .0 0 2 5 2 7870 0 0025 0 .0 0 3 0 2 .7 9 4 0 0.0010 0 .0 0 1 5 3.8780 o.ooto 0.0000 2.9370 0.0010 0 000 7 2.9 9 0 0 0.1000 0.1023 1.2550 0.1000 0.0 8 5 4 1.2470 0.1000 0.1 0 1 4 1.2320 0.0500 0.0437 1-5040 0 0500 0.0 5 1 6 I 7840 0.0500 0 .0 4 6 7 1.5800 0.0250 0.0285 1.8820 0 0250 0.0241 1.8700 0.0250 0.0 2 0 4 t.8 6 9 0 16 00100 0 0087 2.2500 0.0100 0.0 0 0 5 2.4590 0 0100 0.0100 2.3280 0.0050 0 0054 2.5 tOO 0.0050 0.0 0 4 8 2.4940 0.0050 0 .0 0 5 2 2.4660 0.0025 0.0026 2.7640 0.0025 0.0021 2.6700 0.0025 0 .0 0 2 6 2.7730 0.0010 0.0010 2.9560 0 0010 0.0012 3.0780 0.0010 0.0012 3 0190 0.1000 0.0 9 8 3 1-4660 0 1000 0.1 1 1 3 1 2240 0.1000 0 .0 0 9 0 1.4590 0.0500 0 0 496 I 5500 0-0500 0.0 5 4 4 t 5310 0 0500 0 .0 5 0 0 1.5210 0.0250 0 0261 2.0650 0 .0 2 5 0 0 .0 2 3 0 1 8670 0 0250 0 0232 2 1080 10 o.otoo 0 .0118 2.1700 0.0100 0 0093 2.1940 0 otoo 0 .0 1 2 6 2.1 3 0 0 0 .0050 0 .0 0 5 0 2.6300 0.0 0 5 0 0.0044 2.4680 0 .0050 0.0 0 4 6 2 .7 1 1 0 0.0025 0.0020 2-7000 0.0 0 2 5 0 0025 2.7350 0 0025 0.0022 2 .7 3 0 0 0 0010 0 0008 3.0220 0 ooto 0.0011 3.0610 0.0010 0.0010 2.8130 0.1000 0 .1 1 15 1 2190 0.1000 0 0946 1.2110 0 1000 0 .0 8 7 0 1.2020 0.0500 0.0 4 6 7 1.5310 0.0500 0 .0 4 7 0 I 7980 0.0500 0 .0 4 9 2 l .6 7 t 0 0.0250 0.0 2 1 7 I 8680 0 .0 2 5 0 0 .0 2 8 0 1.8160 0.0250 0 .0 2 4 3 1.8250 30 0.0100 0.0100 2.2750 0.0100 0 0098 2.3980 0.0100 o.otoo 2.3 3 6 0 0 0050 0.0038 2.4430 0 .0 0 5 0 0 .0 0 6 2 2.4220 0.0050 0 .0 0 4 4 2.4100 0 0035 0.0038 2.7430 0 .0 0 2 5 0 0026 2.9170 0.0025 0.0020 2.7 8 6 0 0.0010 0 0012 3.0480 0 0010 O.OOtl 3.0270 0 0010 0.0012 2.9 9 1 0 0.1000 0.0972 1 4530 0.1000 0 .0 8 9 2 1.2460 0 1000 0 .1 0 3 0 1.4260 0.0500 0 0539 t 5090 0 0500 0 .0 4 6 9 1.4960 0 0500 0 0565 1.4800 0.0250 0 0250 2.0910 0 0250 0 .0 2 5 2 1.8560 0.0250 0.0231 2.0510 31 o.otoo 0.0073 2.2340 0.0100 0 .0 1 0 3 2.3270 o otoo 0 .0 0 8 5 2.2130 0.0030 0.0054 2.6880 0.0050 0 .0 0 4 9 2.3870 0 0050 0 .0 0 5 0 2.6380 0 .002S 0 0027 2.7160 0 .0 0 2 5 0 .0 0 3 2 2.6810 0.0025 0 .0 0 3 0 2.6 6 4 0 0.0010 0.0 0 0 9 2.9010 0.0010 0.0 0 1 1 2.9790 0 ooto o.ooto 3.1370 0.1000 0.0 8 7 4 1.2190 o.looo 0 .1 0 0 4 1.1800 0.1000 0.0020 1.1840 0.0500 0.0 4 9 4 1 5710 0 .0 5 0 0 0.0 5 3 8 1.7520 0.0500 0 .0 6 1 8 1.6420 0.0250 0.0 2 5 0 1.9460 0 .0 2 5 0 0.0 1 9 0 1.9410 0.0250 0 .0 2 4 6 1.9500 3 3 0.0100 0.0 0 9 9 2.3250 0.0100 0 .0 0 7 3 2.3610 0.0100 0 .0 0 9 3 2.2 9 8 0 0.0050 0.0049 2.4640 0 .0 0 5 0 0.0 0 3 8 2.5780 0.0050 0 .0 0 4 9 2.5 6 5 0 0.0025 0.0020 2.7800 0 0025 0.0022 2.9220 0.0025 0 .0 0 2 4 2.7 2 0 0 0.0010 0.0013 2.9730 0.0010 0 .0 0 0 6 3.2460 0.0010 0 .0 0 0 9 2.9 2 5 0 0.1000 0.0921 1.4590 o.tooo 0 .0 9 4 5 1.1930 0.1000 0 .1 0 5 4 1.3900 0.0500 0.0590 1.4740 0 0500 0.0501 1.6350 0.0500 0.0411 1.5580 0 .02S0 0 .0 2 8 2 2.0430 0 .0 2 5 0 0 .0 2 5 5 1.9890 0.0250 0 .0 2 5 4 2.0020 33 0.0100 0.0085 2.1590 0 0100 0 .0 0 0 8 2.3060 0.0100 0.0100 2.3830 0.0050 0.0063 2 6270 0 .0 0 5 0 0.0 0 5 4 2.5500 0.0050 0 .0 0 5 9 2.5740 0.0035 0.0034 2.6530 0 .0 0 2 5 0.0024 2.7200 0.0025 0 .0 0 1 8 2.7 7 5 0 0.0010 0.0010 3.2110 0.0010 0 .0 0 0 9 2.9920 0.0010 0.0010 3.1 4 6 0 O.tooo 0 .1 0 4 4 1.1540 o.tooo 0 .0 9 8 0 1 t46Q 0 0500 0 .0 5 7 0 1.7140 0.0500 0 .0 5 1 5 1.6810 0 .0 2 5 0 0.0 2 1 4 1.8790 0.0250 0 .0 2 2 7 1.9660 3 4 0 0100 0 .0 0 8 9 2.3090 o otoo 0 .0 t0 6 2 .2 8 1 0 0 .0 0 5 0 0 .0 0 4 6 2.5060 0.0050 0.0 0 4 8 2.5 6 6 0 0 .0 0 2 5 0.0 0 1 8 2.8860 0.0025 0.0 0 2 5 2 .8 0 2 0 0.0010 0.0 0 0 6 3.0070 0.0010 0.0011 3 .0 8 2 0 0.1000 0.1002 1.3900 0.0500 0.0451 1.5030 0.0250 0 .0 2 6 9 1.9600 38 0.0100 0 .0 1 0 7 2 .4 4 7 0 0.0050 0 .0 0 4 8 2.5 4 5 0 0.0025 0.0022 2.7 0 1 0 o.ooto 0 .0 0 0 8 3 .1 1 1 0 Table C.9: Critical points, Ta {15... 2 5 ,2 3 ... 25}, for the T distribution. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 
Asset Metadata
Creator Goldberg, Dani (author) 
Core Title Evaluating the dynamics of agent -environment interaction 
Contributor Digitized by ProQuest (provenance) 
School School of Engineering 
Degree Doctor of Philosophy 
Degree Program Computer Science 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag Computer Science,OAI-PMH Harvest 
Language English
Advisor Mataric, Maja (committee chair), Estrin, Deborah (committee member), Palmer, Kurt D. (committee member), Schaal, Stefan (committee member), Sukhatme, Gaurav (committee member) 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c16-153293 
Unique identifier UC11329544 
Identifier 3054737.pdf (filename),usctheses-c16-153293 (legacy record id) 
Legacy Identifier 3054737-0.pdf 
Dmrecord 153293 
Document Type Dissertation 
Rights Goldberg, Dani 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button