Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
00001.tif
(USC Thesis Other)
00001.tif
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NEURAL NETWORKS FOR TEMPORAL ORDER LEARNING AND STIMULUS SPECIFIC HABITUATION by DeLiang Wang A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) July 1991 Copyright 1991 DeLiang W ang UMI Number: DP22841 All rights reserved INFORMATION TO ALL U SERS The quality of this reproduction is d ep en d en t upon the quality of the copy subm itted. In the unlikely event that the author did not sen d a com plete m anuscript and th ere are m issing pages, th e se will be noted. Also, if material had to be rem oved, a note will indicate th e deletion. Dissertation Publishing UMI DP22841 Published by P roQ uest LLC (2014). Copyright in th e Dissertation held by the Author. Microform Edition © P roQ uest LLC. All rights reserved. This work is protected against unauthorized copying under Title 17, United S ta tes C ode P roQ uest LLC. 789 E ast Eisenhow er Parkw ay P.O. Box 1346 Ann Arbor, Ml 4 8 1 0 6 -1 3 4 6 UNIVERSITY OF SOUTHERN CALIFORNIA THE GRADUATE SCHOOL UNIVERSITY PARK LOS ANGELES, CALIFORNIA 90007 This dissertation, written by DELIANG WANG under the direction of h.?.?....... Dissertation Committee, and approved by all its members, has been presented to and accepted by The Graduate School, in partial fulfillment of re quirements for the degree of Chairperson . \ r \ $ . XJ t. j . a ^ Ph.D. CoS ’9J DOCTOR OF PHILOSOPHY Dean o f Graduate Studies Date August 1991 DISSERTATION COMMITTEE DEDICATED TO my parents who encouraged my early interest in pursuing knowledge ACKNOWLEDGEMENTS I first met Michael Arbib in 1984 when he visited Beijing University, where I was pursuing my Master's degree. I was at that time thinking about questions of how to model intelligent behaviors based on interconnecting neuron models, but suffering from ignorance of pertinent literature and lack of communication with experts in the field (I did not know of the existence of the field of Neural Networks). The meeting with him proved to be very inspiring for me, from which I was guided to the field, the work accomplished in his group and elsewhere in the world. I clearly remember that I was first exposed to his schema theory when we walked together along Lake Kunming in Summer Palace. That meeting later turned out to be a turning point of my life, when he brought me to the University of Southern California. The three and half years in his laboratory at USC have been most productive and pleasant for me. I express my deepest gratitude to him for his intellectual, moral, and financial support over the years, and the personal friendship that he has generously extended to me. I am very fortunate to have had the chance to learn from Christoph von der Malsburg about his dynamic link theory, when he came to USC in fall 1988.1 owe much to him for his ideas, his encouragement, and his academic guidance. The successful collaboration with him has been very important for my academic growth. My committee members Richard Thompson, Albert Herrera, and Kai Hwang have each contributed significantly to the quality of the work I have presented, through constructive discussions in a very encouraging manner. I thank them deeply. In addition to my excellent committee I am much grateful to Peter Ewert for his insightful discussions of his experimental data which are at the heart of this dissertation. He and his group granted me warm hospitality during my stay last summer in Kassel, FRG, playing with toads. The fruitful collaboration with Peter has essentially enhanced this dissertation. My special thanks to Birgit Kieburg and Michael Glagow who greatly helped me in conducting toad experiements. I have especially benifited from many discussions with Bill Betts and the simulation systems he has developed. Jeff Teeters was very patient in helping me during the early stages of the research by discussing implementing his retina model. Alfredo Weizenfeld and Irwin King have each spent lots of time in helping me get familiar with various computer systems. I have learned much from discussions with Joachim Buhmann who is an expert in neural dynamics. Paulina Baligod-Tagle and Ethel Scott have kindly provided me valuable help. My fellow friends in the Brain Simulation Laboratory have offered great intellectual support to me. Among these who have helped through both their friendship and intellectual companionship crucial to keep me motivated are: Reza Shadmehr, Bruce Hoff, Gabor Bartha, Thea Iberall, Mary Jo Preti, Jan Xu, Bin Wang, Martin Lades, Jim Liaw, Peter Dominey, Alberto Cobas, Andy Fagg, Bhavin Sheth, Jean-Marc Fellous, Lucia Simo, and Hyun-Bong Lee. Finally, I am pleased to see that Fernando Corbacho shares my interest and is willing to help me throught the final transition stages of my research. My parents and other members of my family gave me invaluable psychological support during my years pursuing a Ph.D degree in the United States. I am very happy that I got married early this year, and my wife, Ping Bai, joined me several months ago. Her love and caring play a key role in the final push of dissertation writing. Financial support of this research was provided in part by grant no. IR O l NS 24926 from the National Institutes of Health (M.A.Arbib, Principal Investigator). TABLE OF CONTENTS D edication ...................................................................................................................... ii A cknow ledgem ents .......... ,........................................................................................ iii L ist of tables ......... viii List of figures ........................................................................................................... ix A bstract ............................................................................................................................ xi CHAPTER I. In tro d u c tio n ...................................................................................................... 1 PART I. A THEORY OF TEM PORAL ORDER LEARNING II. T em poral Sequence Recognition and R eproduction: Basic A lgorithm s II. 1 Review of Previous Work ................................................................... 13 11.2 Temporal Sequence Recognition ........................................................ 16 A dual neuron model of STM ........................................................ 16 Sequence-detecting neuron model .................................................. 17 Recognition of complex temporal sequences .................................. 31 11.3 Temporal Sequence Reproduction ....................................................... 36 11.4 Conclusion .............................................. .............................................. 39 III. A Theory of Tem poral O rder Learning: Tim ing and Chunking ...... 43 III. 1 Introduction ....................................................................................... 44 111. 2 An Interference Model of STM ......................................................... 45 1 1 1 . 3 Sequence Recognition with Interval Invariance .................................. 50 Simple sequence recognition .......................................................... 50 Complex sequence recognition ....................................................... 57 1 1 1 . 4 Hierarchical Sequence Recognition ................................................... 63 1 1 1 . 5 Sequence Reproduction with Interval M aintenance............................ 67 Degree self-organization .................................................................. 71 Interval maintenance ......................................................................... 76 v 111. 6 D iscussion ............................................................................................ 80 On complex temporal sequences ..................................................... 80 H ierarchies ......................................................................................... 84 E fficiency ............................................................................................ 87 U nits .................................................................................................... 88 Studies of delay intervals ............................................................... 89 Cognitive aspects ............................................................................. 91 111. 7 Conclusion ........................................................................................... 94 PART 2. A NEURAL M ODEL FO R STIMULUS SPEC IFIC HABITUATION IN TOADS IV. Visual P attern Discrim ination in Toads: A C om putational Model .. 97 IV. 1 Biological Background ...................................................................... 98 IV.2 Distributed vs. Temporal: Basic Hypothesis .................................. 100 IV.3 Modeling of Retinal Processing ..................................................... 104 IV .4 Tectal Relay ....................................................................................... 118 IV.5 Integration in Anterior Thalamus ..................................................... 120 IV .6 Predictions ......................................................................................... 124 IV .7 D iscussion ......................................................................................... 127 V. C onfigurational P attern Recognition by D ishabituation: Behavioral T ests of the P redictions ................................................................................ 132 V .l Introduction ........................................................................................ 133 V.2 M aterials and Methods ................................................................... 135 V.3 R esults ................................................................................................ 137 Contrast reversal ............................................................................ 137 Size effect ....................................................................................... 139 Separate process of dishabituation ............................................... 142 V.4 Discussion .......................................................................................... 143 Habituation property ...................................................................... 143 Background contrast dependence ................................................. 145 Size effects ....................................................... .............................. 145 Learning capabilities ...................................................................... 147 VI. M odeling D ishabituation H ierarchy: The Role of the M edial Pallium VI. 1 Biological Bases ............................................................................ 151 Biological evidence ....................................................................... 151 Neurophysiological evidence ........................................................ 152 Theoretical studies ......................................................................... 155 VI.2 Two Learning Loop Hypothesis ..................................................... 156 VI.3 A Neural Model of the Medial Pallium ............................................ 158 Model structure ............................................................................... 159 Formal description ......................................................................... 162 Synaptic plasticity .......................................................................... 167 VIA Computer Simulation ....................................................................... 176 V I.5 Predictions .......................................................................................... 193 Dishabituation mechanisms ........................................................... 193 Underlying neuronal structures ..................................................... 194 Long-term memory effects ............................................................ 194 Resting habituation ......................................................................... 196 Influence of habituation on dishabituation .................................... 196 VI.6 D iscussion .......................................................................................... 197 V II. C onclusion ................................................................................................. 203 Future Perspectives .................................................................................... 207 Interplay between Modeling and Experimentation ................................... 209 A ppendix A .......................................................................................................... 212 Retina M odel ............................................................................................... 212 Tectum M odel ............................................................................................ 215 Anterior Thalamus Model ........................................................................ 215 MP Column Model .................................................................................... 216 B ibliography 219 LIST OF TABLES 4.1 Parameter Value of Retinal Ganglion Cell M o d els.................................... 110 4.2 Parameter Value of the Anterior Thalamus M o d el..................................... 122 6.1 Parameter Value of the MP Column Model .............................................. 177 ■ v i i i LIST OF FIGURES 1.1 Dishabituation Hierarchy ................................................................................ 7 2.1 Diagram and Response of a Dual Neuron Model ........................................ 18 2.2 Simplified Diagram for the Neural Model of Sequence Recognition 22 - 2.3 The Curve of Input Potential and Training for Sequence Recognition............. 28 2.4 Diagram and Simulation for Complex Sequence Recognition.........................34 2.5 General Architecture for Temporal Sequence Recognition.............................36 2.6 Network Architecture for Temporal Sequence Reproduction......................... 38 2.7 Reproduction of a 4-degree Higher-order Complex Sequence........................ 40 3.1 Diagram of the STM Model ....................................................................... 48 3.2 Monotonic Increase of the Input Potential and Training for Recognition of Sequence with Time-warping ..................................................................... 55 3.3 Expanded Unit Model and Simulation for Complex Sequence Recognition 60 3.4 Architecture of Hierarchical Sequence Recognition .................................... 64 3.5 Computer Simulation of the Hierarchical Sequence Learning M odel................68 3.6 Architecture for Complex Sequence Reproduction ...................................... 71 3.7 Reproduction of a Complex Sequence ........................................................ 81 3.8 Data and Model Outputs for Shift in Mean CR Peak L atency.........................92 4.1 Diagram of the Entire Model .................................................................... 103 4.2 R2 Response to Worm, Antiworm, and Square ......................................... 105 4.3 R3 Response to Worm, Antiworm, and Square ......................................... 106 4.4 R4 Response to Worm, Antiworm, and Square ......................................... 107 4.5 Simulated Retinal Response to the 8 Worm-like Stim uli............................ I l l 4.6 R2 Temporal Firing Rate to the 8 Worm-like S tim uli................................ 113 4.7 R3 Temporal Firing Rate to the 8 Worm-like S tim uli................................ 114 4.8 3-D Snapshot of the Membrane Potential of the R2 and R3 L ayers.............. 116 4.9 AT Response to the 8 Worm-like Stimuli ................................................ 123 4.10 Dishabituation Hierarchy Predicted by Shrinking Stimulus S iz e .................. 126 4.11 Dishabituation Hierarchy Predicted by Reversing Contrast Direction............ 128 5.1 Experim ental Apparatus ............................................................................. 136 5.2 Experimental Test of the Contrast Reversal Prediction.............................. 138 5.3 Experimental Test of the Model's Size Reduction Prediction...................... 140 ix 5.4 Test of the Size Effect in Dishabituation ................................................... 141 5.5 Test of Size vs. Configuration Effects in Dishabituation............................ 142 5.6 Experimental Test of the "Separate Process Question" .............................. 144 6.1 Two Neural Loops Underlying Learning Behaviors in T o ad s..................... 157 6.2 Diagram of an MP Column Model ........................................................... 160 6.3 Effect of Cumulative Shrinking ................................................................ 167 6.4 Ten z Functions with Different Parameter Values ...................................... 174 6.5 Simulation of Habituation and Dishabituation: 1 ....................................... 179 6.6 Simulation of Habituation and Dishabituation: 2 ....................................... 182 6.7 Simulation of Habituation and Dishabituation: 3 ....................................... 184 6.8 Simulation of Habituation and Dishabituation: 4 ....................................... 186 6.9 Simulation of Habituation and Dishabituation: 5 ....................................... 189 6.10 Simulation of Habituation and Dishabituation: 6 ....................................... 191 6.11 Simulation of Separate Process of Dishabituation ...................................... 193 x A B S T R A C T The central theme is a quest for models of learning using the neural network approach. There are two parts to the dissertation: a formal neural network model of temporal order learning and a biological neural model for stimulus-specific habituation in toads. In the first part, learning mechanisms are explored at the abstract level, i.e., we design neural networks to learn, recognize, and reproduce complex temporal sequences. Short-term memory is modeled by a network of neural units with mutual inhibition. Sequences are acquired for long-term memory with a new rule, called the attentional learning rule, that combines Hebbian rule and a normalization rule with sequential system activation. Acquired sequences can be recognized without being affected by speeds in presentation and certain distortions in symbol forms. Sequence reproduction is achieved with two reciprocally connected layers, and reproduction of complex seqeunces can maintain the temporal course of learned sequences. Different layers of the model can be constructed in a feedforward manner to recognize hierarchically organized temporal structures, in a way similar to human information chunking. In the second part, learning mechanisms are studied at the neurobiological level with a specific animal. A computational model is first presented for visual pattern discrimination in toads. The anterior thalamus (AT) model integrates visual inputs from the retina and the tectum, and produces orderly average firing activities in response to the stimuli in a dishabituation hierarchy. The output from the AT model is fed to the model of the medial pallium (MP), where neuronal responses to the stimuli are further processed and stored. A model of synaptic plasticity is proposed for MP as an interaction of two dynamic processes which simulates acquisition and both short- and long-term forgetting. Large-scale computer simulations demonstrate that the model of the interacting brain structures can reproduce experimental data remarkably well. The model of AT and MP structures yields a range of experimental predictions concerning the properties of learning and pattern discrimination. Initial model testing experiments have validated certain predictions and lead to new findings of behavioral phenomena. a : ?. x ii 1 CHAPTER I INTRODUCTION The central theme of this thesis is a quest for models of learning, which we believe is the key to the understanding of brain functions. This quest is approached by two case studies at two different levels: abstract neural network (Part 1) and neurobiological (Part 2). For Part 1, we chose a very important function of intelligent behaviors: temporal order processing, and showed how advanced capabilities may come out of effective learning algorithms. For Part 2, however, we chose a specific kind of behavior (the dishabituation hierarchy) from a specific species of animal (anurans). The main focus here was to reveal plausible neural mechanisms that may underlie the learning behaviors. The approach throughout is to search for computational principles that yield comparable complex patterns of behavior within the biological constraints on the underlying structure. For temporal sequence learning, the approach aims at achieving better computational performances based on using networks of biologically plausible abstract constituents to meet psychological constraints; whereas for habituation modeling, the methodology is best described by computational neuroscience (Arbib, 1989; Schwartz, 1990), where computational models are constrained as much as possible by experimental data, and capable of providing conceptual frameworks for explaining data and predicting new experimental phenomema. From another perspective, Part 1 is a study of sequential aspects of cognition, i.e., how to form temporal links in a sequence of patterns, whereas Part 2 is a study of spatial aspects of cognition, i.e., how to find spatial links among a group of visual objects to be processed, stored, and later recognized. The temporal link between two 2 patterns (A and B) is formed if A concurs with B or A occurs before/after B. The spatial link is formed between A and B if they have common spatial features. We believe that the temporal link and the spatial link are the two elementary principles for organizing cognition and knowledge, and these two types of links are interacted with each other to form cognitive processes. Let us look at how a brain state switches to another. A brain state is often be induced by an external stimulus, being a recognition of the stimulus based on its composing spatial features and/or temporal features* . The recognized stimulus may cause a switch to another state, based on spatial links (reasoning by analogy, for example) or temporal links (a recall of an episode, for example). The associated state may switch to yet another one, and this process of association can proceed endlessly by itself, but it is constantly interrupted by external stimulus or internal stimulus (like hunger, thirst, fatigue). The free process of association is often constrained by solving certain problems like approaching a prey, avoiding a predator, identifying a mate for the value of survival, or problem solving, planning, reasoning by rules, and so on, as higher brain functions. Learning possesses fundamental importance here because it is the process by which temporal and spatial links are built into the brain. The two parts of research to be presented in this dissertation are the two case studies in the venture of building temporal and spatial links for underlying cognitive processes. The ability to understand one's environment, essential for intelligence, is not static. The order in which events occur can be even more important than the events themselves, and an intelligent system, whether it be a frog, a robot, or a human, must be able to detect the ordering of events and to produce the ordering by some cue. Yet many attempts to model neural networks, such as associative memory and the Boltzmann Malsburg (1981,1985) has pioneered the idea that temporal correlation forms a basic brain function, although he emphasizes the concurrence relation. For examples of pattern recognition using segmentation see (Malsburg & Schneider, 1986; Wang, 1989; Wang et al., 1990). 3 machine, dealt only with static equilibrium which has nothing to do with the ordering of patterns. Temporal arrangement is at the heart of thought, language and action, and contributes greatly to human intelligence. Recognizing a temporal pattern is crucial in hearing and vision, and generating the temporal pattern underlies processes like motor pattern generation, speech and singing. Neural networks to store, recognize and reproduce temporal sequences of input stimuli have been previously studied by a number of investigators. One of the major problems in previous models for temporal sequence learning is the difficulty to deal with complex temporal sequences that contain recurring subsequences, which are apparently indispensible for real applications like speech recognition, music generation, etc. In this dissertation, we present a computational theory of temporal order, based on cognitive theories of forgetting in short-term memory (STM). What distinguishes our model from others are two basic hypotheses embodied in the model: (1) There is a common mechanism to process both complex sequences and simple sequences; and (2) Reproduction of a component in a sequence is based on recognition of a certain prior subsequence of the component. In chapter 2, temporal order learning is studied for a specific case: each component of the sequence has the same presentation interval. This model is based on two reciprocally connected units, called a dual neuron, for maintaining an input signal, producing oscillations with autonomous damping. Following the Hebbian training (Hebb, 1949) of ordered graded signals and a normalization rule among all synaptic weights of a neuron, a neuronal quantity, the input potential, which is the weighted sum of the ordered inputs, increases monotonically until it reaches its upper limit. This property naturally leads to the concept of a sequence-detecting neuron, which by training recognizes only a specific ordering of events. This learning feature can be extended to a 4 general sequence recognition system which recognizes any complex sequences and is not sensitive to small distortions in input signals and presentation orders. By bidirectionally connecting a sequence detecting layer and an input layer, a learning algorithm, based on recognizing partial subsequences, is proposed to reproduce any complex sequence. In the model, long-term memory (LTM) of temporal links is built into the network with the Hebbian rule of synaptic plasticity, which is formed based on network activities kept in short term memory. The above model is unable to solve another major problem in temporal sequence learning: the time-warp problem. This problem poses different requirements for recognition and reproduction. For sequence recognition, we wish a network to recognize a time-warped sequence, whereas for reproduction we wish a network to reproduce a sequence with the same temporal course as the learned sequence. To solve the time-warp problem is apparently a crucial step toward applying the model in realistic domains. Inspired by cognitive studies on human short-term memory (STM), an STM model is developed by using lateral inhibition, conforming with the interference theory of forgetting in STM. It will be shown that this model of STM, combining with the previous learning algorithms for transfer of sequence learning to LTM stored in synaptic weights, can provide a solution to the time-warp problem both in sequence recognition and sequence reproduction (Chapter 3). To handle the learning of very long sequences (beyond the magic number 7±2) we also propose in chapter 3 a mechanism for hierarchical sequence recognition, similar to human information chunking. Here subsequences already stored in LTM may serve as integral subunits in the storing of new sequences. This mechanism seems both natural and necessary for detecting long sequences, like a paragraph of sentences, a piece of music, and so forth. 5 Although in developing learning algorithms for temporal order processing, our emphasis was to demonstrate how new learning schemes can help tackle technical problems, another important measure embodied throughout the model development is biological relevance. Many of the model ingredients are directly lent by relevant data. In particular, the plausibility of the basic model components will be analyzed, and certain experimental studies will be discussed in light of our model. Certain implications of the model will be also made and discussed (Chapter 3). Habituation is a decrease in the strength of a behavioral response that occurs when an initially novel stimulus is presented repeatedly (Harris, 1943; Thompson & Spencer, 1966). It is probably the most elementary and ubiquitous form of plasticity, and its underlying mechanisms may well provide basis for understanding other forms of plasticity and more complex learning behaviors. As a simple and fundamental form of learning, habituation has been extensively studied in different animals. Its neural mechanisms in some invertebrates have been clearly defined (e.g., Aplysia, Kandel, 1976; for models see Wang & Hsu, 1988; 1990; Gluck & Thompson, 1987); but this type of habituation does not seem to be sensitive to stimulus configurations, so that their principles cannot be readily extrapolated to explain stimulus-specific habituation, which has been mostly related to processes involved in learning and pattern recognition. On the other hand, amphibia show stimulus specific habituation (Ewert & Kehl, 1978). Since they have been carefully studied from behavioral, anatomical, physiological, and theoretical points of view, and also since their brain is not as complex as that of mammals, they form good biological models for the study of the neural mechanisms responsible for learning processes (Ewert, 1980). 6 After repeated presentation of the same prey dummy in their visual field, toads reduce the strength of orienting responses toward the moving stimulus. This visual habituation has the following characteristics (for a review, see Ewert, 1984): (1) Locus specificity. After habituation of an orienting response to a certain stimulus applied at a given location, the response can be released by the same stimulus applied at a different retinal locus (Eikmanns, 1955; Ewert & Ingle, 1971). (2) Hierarchical stimulus specificity. Another stimulus given at the same locus may restore the response habituated by a previous stimulus. Only certain stimuli can dishabituate a previously habituated response. Experimental results (Ewert & Kehl, 1978) show that this dishabituation forms a hierarchy of stimulus patterns (Figure 1.1), where only patterns higher in the hierarchy can dishabituate the habituated responses of the stimuli lower in the hierarchy, and on the same level only the left stimulus can slightly dishabituate the right one. Interestingly, this hierarchical stimulus specificity differs from both the stimulus insensitive habituation as might be used in invertebrates and the full stimulus specificity exhibited in mammals where habituation to a stimulus can be dishabituated by a different stimulus (Sokolov, 1960; Thompson & Spencer, 1966). This study was mainly triggered by this fascinating behavior demonstrated by dishabituation. The biological relevance of stimulus-specific habituation may be to keep the IRM (innate releasing mechanism) for prey catching alert to "new" stimuli (Schleidt, 1962). The dishabituation hierarchy suggests that it is configurational cues of the stimulus and not only its "newness" which decide the toad’s response (Ewert & Kehl, 1978). It is reasonable to assume that toads have not developed the advanced spatial shape recognition capability of higher animals, but have well developed the ability to recognize certain stimulus configurations, which , for example, are used in discriminating prey and predator. 7 movement direction Figure 1.1 Dishabituation hierarchy for worm stimuli used in stimulus-specific habituation. One stimulus can dishabituate all the stimuli below it. One the same level the left stimulus can slightly dishabituate the right one (redrawn from Ewert & Kehl, 1978). 8 In order to propose a plausible brain model for the dishabituation hierarchy, we must answer the following question first: how does the toad's visual system discriminate different worm-like stimuli manifested in studies of dishabituation? Although Ewert and Kehl (1978) demonstrated the behavioral responses leading to the hierarchy, they did not investigate the neural mechanisms involved, which must involve visual structures such as the retina and the tectum. For toads to exhibit the dishabituation hierarchy, there have to be differing representations of different stimulus shapes somewhere in their visual system. Unfortunately, physiological studies provide very little data on the response of visual structures to a variety of object shapes (for reviews see Grtisser & Griisser- Comehls, 1976; Ewert, 1984). Furthermore, knowledge about the discrimination of different worm-like sitmulus shapes is interesting not only from neurobiology but also from the perspective of pattern recognition in machine vision. In chapter 4, we develop a model for discriminating different worm-like stimuli, which provides a basis for explaining and simulating the dishabituation hierarchy. The anterior thalamus (AT), which receives direct inputs from retina and tectum, is assumed to be the neural structure that integrates various visual afferents and does the discrimination. A detailed model of the anuran retina was made available recently (Teeters, 1989; Teeters & Arbib, 1991), and is used as the "front end" for the present model. Although the tectum has been previously modeled (Cervantes-Perez et al., 1985; Betts, 1989), we here simply represent it as a relay for retinal signals. The model of the anterior thalamus receives excitatory afferents from tectal small pear cells and inhibitory afferents from retinal R3 cells. Computer simulation demonstrates that the model of anterior thalamus elicits a higher intensity response when a stimulus higher in the dishabituation hierarchy (Fig. 1.1) is presented. The theory predicts that different 9 dishabituation hierarchies will be produced when the stimulus-background contrast is reversed or when the stimulus size is changed. In Chapter 5, we report the results of testing the behavioral predictions made in the previous chapter. After the predictions were made by the anterior thalamus model, a number of behavioral experiments were designed to test them. The contrast-reversal effect was validated but the effect of changing stimulus size was not. Our further experiments indicate that visual pattern recognition in toads exhibits size invariance. We also found that dishabituation by a second stimulus has a separate process from habituation to a first stimulus. As for typical learning process, the time course of recovery from habituation exhibits two phases: a short-term process that lasts for a few minutes and a long-term process that lasts for at least 6 hours (Ewert, 1984). Given the different outputs from the AT discriminating circuitry to the different stimuli, the next topic is to study the neural mechanisms for habituation, which involves the posteroventral medial pallium in telencephalon, also called "primordium hippocampi" by Herrick (1933). Bilateral transection of the medial pallium (MP) destroys the aftereffects in previously habituated toads (Finkenstadt & Ewert, 1988a). In addition, both the effects of habituation and the associative learning ability in naive animals are abolished by MP-lesion (Finkenstadt & Ewert, 1988b; Finkenstadt, 1989b), favoring the hypothesis that the MP is the primary nucleus for these forms of learning. Chapter 6 presents the model of stimulus-specific habituation based on differing representations of the worm patterns in the anterior thalamus. Learning processes are assumed to take place in the medial pallium that receives direct projections from AT, for which a model of a basic functional unit (column) is proposed. A neural mechanism based on unilateral inhibition, called cumulative shrinking, is proposed for mapping AT 10 temporal responses into a form of population coding referenced by spatial positions. An analytic model is proposed for synaptic plasticity, which incorporates acquisition and two time courses of recovery corresponding to short- and long-term memory. The model exhibits the habituation and dishabituation processes that remarkably resemble what was found in the experiments leading to the dishabituation hierarchy and some further experiments reported in chapter 5. A range of model prodictions are produced by the model with respect to mechanisms of habituation and cellular organization of the medial pallium. Finally, Chapter 7 summarizes the major contributions of this thesis and outlines directions for future research. Some concluding remarks are drawn on the nature the study exemplified in this thesis. Part 1 A Theory of Temporal Order Learning 12 CHAPTER II TEMPORAL SEQUENCE RECOGNITION AND REPRODUCTION: BASIC ALGORITHMS Summary As a preparation for the general theory of sequence learning, we develop in this chapter neural network algorithms to learn, recognize and reproduce complex temporal sequences with a fixed interval. Short-term memory (STM) of input signals is modeled by units comprising recurrent excitatory connections between two neurons (a dual neuron model). The output of a neuron has graded values instead of binary ones. Sequences are acquired by a new learning rule, the attentional learning rule, which combines a Hebbian rule and a normalization rule with sequential system activation. With this training rule, we show that a certain quantity, called the input potential, increases monotonically with sequence presentation, and that the neuron can only be fired when its input signals are arranged in a specific sequence. These sequence-detecting neurons form the basis for the model of complex sequence recognition which can tolerate distortions of the learned sequences. A recurrent network of two layers is provided for reproducing complex sequences. 13 II. 1 Review of Previous Work Before reviewing previous work, we introduce the following terminology (Wang & Arbib, 1990). Generally, a temporal sequence S is defined as: Pi~P2~--Pm* Each p 2 - (i = 1, is called a component of S (sometimes we call it a spatial pattern, or just a symbol). The length of a sequence, is the number of components in the sequence. In general, a sequence may include repetitions of the same subsequence in different contexts. For example, Sj: C-A-B-D-A-B-E contains repetitions of subsequence A-B, and such a subsequence is called a recurring subsequence. The correct successor can be determined only by knowing symbols prior to the current one. We refer to the prior subsequence required to reproduce the current symbol Pi in S as the context of P[ and the length of this prior subsequence as the degree of p, -. The symbol D in S j, for example, has a degree of 3. The degree o f a sequence is defined as the maximum degree of its components. A 1-degree sequence is called a simple sequence, and otherwise a sequence is a complex sequence. If there exists a recurring subsequence of S that contains in itself another recurring subsequence, e.g. A -B -A in A -B -A -C -A -B -A -D , 5 is called a high- order complex sequence, otherwise a first-order complex sequence. Neural networks to store and recognize a temporal sequence of input stimuli have been previously studied. Grossberg (1969) demonstrated one neural network called the outstar avalanche that can be used to generate temporal patterns. The outstar avalanche is composed of n sequential outstars. Any outstar M,- can store a spatial pattern and be activated by a signal in the vertex v4 . These vertices are connected as: vj — >V 2 — > ... — > vn and a signal from v ,- arrives with some delay at v;+;. So an initial signal at Vj can produce sequentially the spatial patterns stored in M j, M 2 , M n respectively. Based on the anatomy of the dentate gyrus region of the mammalian hippocampus, Stanley and 14 Kilmer (1975) designed a network called the wave model which can learn sequences of inputs separated by certain time intervals and reproduce these sequences when cued by their initial subsequences. Recently, using a bidirectional associative memory built from two fields of fully-connected neurons, Kosko (1988) showed that by feeding the spatial pattern output from one field back to the other field, the network can generate a sequence of patterns over time which alternates between the two fields. Using a synaptic triad made up of three neurons A-B-C as building blocks, Dehaene et al. (1987) proposed a layered neural network, called the selection model, which can recognize temporal sequences. The description of a synaptic triad guarantees that neuron B is activated only when A and C appear in the order C-A, i.e. neuron B is made sequence-sensitive. A network of synaptic triads can be constructed for a sequence of any length which may include some repetitions of a part of the sequence. In order to make the neural network able to learn an arbitrary sequence, connections among these synaptic triads are made randomly and the resulting network can be selected by an input sequence. This selection model is based on an ad hoc assumption on the architecture of the network, so learning is severely limited by the immense connections that would be required to learn an arbitrary temporal sequence which is not trivially short. Storage of temporal sequences in the spin-like Hopfield network has been proposed recently by several authors (Kleinfeld, 1986; Sompolinsky & Kanter, 1986; Tank & Hopfield, 1987; Buhmann & Schulten, 1987; Gutfreund & Mezard, 1988; Guyon et al., 1988; Kuhn et al., 1989). In this paradigm, each pattern is stable over some time period, at the end of which a sharp transition leading to the next pattern occurs due to stored transitions between consecutive patterns. One difficulty is the storage and retrieval of complex sequences. In most of these models, a given pattern can occur only once among all the stored sequences, which is a severe restriction. 15 Reproduction of a temporal sequence has also been explored using backpropagation (Jordan, 1986; Doya & Yoshizawa, 1989; Elman, 1990). In the Jordan model and the Doya and Yoshizawa model, the basic idea is that the output layer associated to each component is fed back and blended with the input representing the next component, whereas in the Elman network, the hidden layer is fed back to influence the next pattern. Complex sequence reproduction again causes severe problems to this type of connectionist model. Some remedies have been proposed for dealing with complex sequences for spin-like models and connectionist models, and will be discussed later in the next chapter. In Part 1, we propose a different approach for storage of temporal sequences. This chapter develops basic algorithms for the general theory to be presented in the next chapter, and deals with a specific case of sequence learning, that is, presentation of all sequence components lasts for the same time interval. The general case with varying intervals will be studied in the next chapter. A dual neuron model is used for storing a signal for a short time span, producing an oscillatory activity with decreasing amplitude. The output of this dual neuron is a graded signal, rather than the binary signal used in many neural network models. Following Hebbian training (Hebb, 1949) of ordered graded signals, a neuronal quantity, the input potential, which is the weighted sum of the ordered inputs, is shown to increase monotonically until it saturates. After this training, if we set the threshold of the neuron to the saturation point of its input potential then this neuron can only be activated by this specific sequence of inputs. This property naturally leads to the concept of a sequence-detecting neuron. An important feature is that, after learning, this type of neuron is fired by a previous sequence of patterns, not just a previous pattern, so it overcomes the limitation of networks which can only generate simple sequences. The same idea is used for recognizing any complex sequence. 16 Furthermore, we show that by adding another sequence detecting layer, any complex sequence can be reproduced. The basic algorithms and the learning rule will be used in the next chapter where the general theory of sequence learning is presented. An earlier version of this chapter appears in Wang and Arbib (1990). II.2 Temporal Sequence Recognition A dual neuron model of STM In order to link two temporally discontiguous patterns, crucial for temporal sequence processing, the previous pattern has to be preserved for a certain period of time. This temporal link can be provided by short-term memory. STM has been extensively studied in psychology (see, for example, W ingfield & Byrnes, 1981), and was classically thought to be physiologically due to recurrent excitatory connections (Kupfermann, 1985; Schmidt, 1985). This physiological explanation is adopted in many efforts of neural network modeling (for example see Grossberg, 1976; Lara et al., 1982). To simplify the process while preserving the basic idea, we use a dual neuron (shown in Fig.2.1a) to model STM. The two neurons N j and N 2 have activities or membrane potentials mj(t) and m2 (t) described using the leaky integrator model (in discrete form): im j(t+ A t) = m j(t) + At [-K m j(t) + T j 2 m 2 (t+At-x/2 ) + I(t)] \m 2 (t+At) = m2 (t) + At [-K m2 (t) + T2 1 m j(t+ At-z/2)] where K is a relaxation constant, T j2 and T21 are synaptic weights, At is a discretization interval, and t is the cycle time for a signal to travel between the two neurons. I(t) represents external input to this dual neuron. In the absence of external input (i.e. I(t)= 1), 17 formula (2 . 1 ) can form a simple damped oscillatory system with period Tby choosing appropriate parameters. Although similar in form to other coupled oscillators (for example see Wang et al., 1990), the oscillation created here damps. Each time the signal appears on N j, the amplitude of the signal decreases. It is interesting to note that this autonomous decay is consistent with the decay theory (Conrad, 1957) of forgetting in STM, which claims that items kept in STM will decay automatically unless they are rehearsed. This decrease will be designated by a function g(h) which is monotonically decreasing from 1 toward 0 with g(0) equal to 1. Analytical analysis shows that g(lx) is an exponential decay function, as manifested in Fig.2.1b which shows the response of a dual neuron model when N j is stimulated initially. This represents the model for STM to be used in this chapter. Sequence-detecting neuron model A biological neuron updates its state (firing or silent) in a few milliseconds, a typical presentation of a symbol in a sequence lasts several hundreds milliseconds, and the length of STM is usually in seconds. Since we will use a single dual neuron to represent a symbol in a sequence for simplicity*, we use two time scales to model the fact that symbol transition in a sequence is much slower than neuronal state transition. One time scale is for the interaction among symbols (called symbol scale), here represented by dual neurons; another is for interaction among neurons within each oscillator (called neuron scale). Again to simplify the situation, we choose one step on the symbol scale, denoted as A, to equal r rhythmic periods of a dual neuron (namely A = r t). From the previous section we see that STM can last many A's, hence many symbol presentations. * One single neuron here should be viewed biologically as a neuron assembly. 18 a b Figure 2.1 a. Diagram of a dual neuron, b. Response of a dual neuron model, which maintains a signal for a certain memory span. The model parameters are: K = 8.3, T1 2 = 6.5, T2 1 = 10.0, t = 12Af, Af = 0.1. 19 The idea for this sequence detecting neuron model is that sequence learning polarizes the synaptic weights of the detecting neuron in such a way that these polarized weights can form the maximal membrane potential when the learned sequence is presented (maximization principle). Suppose that we have n dual neurons <N]j, N 2 1 > * <N1 2 , N 2 2 >,~; <Njn, N2n>, where each N ji connects to all the neurons N jj (i^j) in the network. The activity of neurons N ji and A^; are mjft) and m2ft) respectively, and are defined as (cf. Eq.2.1): ’m n (t) + At [-Kn m n (t) + T 1 2 m 2 i(t+At-T/2) + M f { ^ , W ijm jdt+ At-A) + I f t ) - r ^ ] , i f t m o d A = 0 mjft+At) = ^ j* i .m jft) + At [-Kji m i ft) + Tj 2 m 2 ((t+ A t-t/2 )], otherwise (2 .2) m if t+At) = Min(mji(t+At), B) (2.3) i f x > 0 m = ft otherwise (2.4) m2i(t+At) = m2i(t) + At [-K2i m2i(t) +T2i mn (t+At-T/2)] (2.5) In (2.2) is the threshold for N ji for input other than from I f t ) represents external input to Njj. The condition in (2.2) that t mod A - 0 describes the time scale for the presentation of symbols. In real situations, symbols are presented continuously, and this is discretized by A in our model. The corresponding mjft) at these time instants (t mod A = 0) can be interpreted as the average values over time interval A, or as the value at 20 the end of each presentation of a symbol. Wjj is the synaptic weight from neuron Njj to N]an d the summation in (2.2) represents interactions among dual neurons. M is a gain factor. Formula (2.3) stipulates a maximal activity B for Njj. From now on, we say neuron Njj is firing at time t if mj/t) = B. The formulation for N2 ; is similar to that for Nji except that only receives input from Njj. Tj 2 and T2 1 are fixed weights between N 2i to Njj, and they are the same for all n dual neurons. Activity mjj(t) has graded values with maximum B. Once Njj fires, this signal will oscillate between the dual neuron Njj and N 2j with damping until the signal totally vanishes if no further input can activate Njj. Note that the formulas are such that non-firing neurons can still affect the state of other neurons. Synaptic learning follows a Hebbian learning rule (Hebb, 1949) for modification and a later normalization (Malsburg, 1973). Again, learning takes place on the symbol scale (A) since only the weights of connections among oscillators are changed. Wij(t) = Wjj(t-A) + Cj f[m jj(t)-B ] m j/t) _ _ (2.6) W j / t ) = W j/t) j X W ij<t) jNi where Cj is a gain factor of learning. Note that f[mjj(t) - B ] is 0 unless Njj fires. In general, the larger is Cj, the faster is learning and the more easily is the memory value overwritten by a new stimulus, so the choice of Cj reflects a balance between learning speed and stability. According to (2.6), the effect of learning on any neuron is to change the distribution of all weights to that neuron, so it is reasonable to assume that initially Wjj = H (n-l). Contrast to rhythmic signal damping for STM (Eq. 2.1), synaptic modification defined in (2.6) is a form of long-term learning. Synaptic weights acquired 21 by this learning rule is never changed if the network is not stimulated by further external inputs. In the remaining part, our discussion will be based on a "grandmother cell" representation in which each neuron N ji represents either one spatial pattern or a sequence detector to simplify the understanding of sequence recognition, as in Dehaene et al. (1987) and Tank and Hopfield (1987). Suppose that we train neuron N ji to detect a sequence Sz. This is done in our model by activating N ji (setting 7/(7) at a high level) immediately after the presentation of S,-. The end of a sequence presentation is detected by an end detector in the system, which uses indication like pauses (implicit) or separators (explicit) between sequences. We call this specific type of training attentional learning. The attentional training is the general learning rule to be used in our model of temporal learning, not specific to this chapter. It is different from unsupervised learning, and is also different from typical supervised learning where a desired output or a "sign" of the response (as in reinforcement learning) has to be provided externally. Triggering N n needs less information than providing an answer. The activation of a sequence detector at the end of presentation of the sequence may be driven by attention, which is indispensable for learning a sequence (Nissen & Bullemer, 1987; Cohen et al., 1990). We define = Pij-Pi2~ -- P i^ 1 - * j - n 0’ = & )» where pattern p ^ fires (is represented by) neuron ^ lij. In this subsection we only consider as a simple sequence. Since we are now only concerned with recognizing Sz - , we can simplify the original fully connected network into the one where N2i is projected upon by Njij (j = 1, ..., k) and all other connections among dual neurons are left out, as shown in Fig. 2.2. This simplification is possible because when Nji is activated by the attentional rule, only NUj (j = 1 , k) are active, and cross interference among Nj^ 's can be eliminated by setting to a high value (but not higher than 7^., see Eq. 2.2). 22 Nii Figure 2.2 Simplified diagram for the neural model of recognition of the sequence S f p i f -pi2 ~...-pik. Let us introduce a concept, called the input potential of a sequence to its detector, as the weighted sum of inputs to the detector immediately after presentation of the sequence. The concept of the input potential plays a key role in the model of sequence learning (see also the next chapter). In the chapter, we only consider the case that each symbol presentation lasts for the time interval A. Within this context, the input potential 23 IP I of Sj to N jj (since dual neuron i is only for S f becomes the weighted sum to N jf immediately after presentation of S2 (we use t' to indicate these time instants) k k IPi = X Wu m u (t') = B X Wu g((k-j)A) (2.7) H J J H J where g(&A) is as introduced earlier (A is a multiple of t). This formula follows because when pattern is presented, N j^ is activated and m j^t) reaches value B. Formulas (2.2) and (2.3) together guarantee that once Njf. is activated due to either an external input or a summed input from other neurons, its activity drops monotonically on the symbol time scale if further input to N^. cannot fire it. In this situation, indeed, further input to Njij cannot fire it, because (a) since S2 - is a simple sequence Njij can only be activated externally once by pattern during the presentation of S2; and (b) besides the presentation time of Pf. the summed input from other neurons Njig * j ) can never activate N^. since this summed input cannot overcome the threshold of ^ liy which can be chosen as a parameter, as pointed out before. At the completion of the presentation of the entire sequence, mn.(t) drops to B»g((k-j)A). The external input to a dual neuron can only affect it through the binary gate f(x) in (2 .2 ), and the dual neuron oscillates at its own pace if nothing is further gated in. The precise form of g(S-A) is not important, and the only thing that matters for our later analysis is that g(&A) is monotonically decreasing, which is satisfied in our model. In fact, g will becomes a linearly decreasing function in the next chapter where vatying intervals for symbol presentation are allowed. Theorem 1. Repeated training with only S i results in all weights to N j 2 following the distribution: W#. = C g((k-j)A) (j = 1, ..., k), and all Wu = 0 (I ^ij, iff, where C is a constant. 24 [Proof] According to (2.6), the synaptic weights to N jj only change when N jj is firing, which is immediately after a presentation of Sj due to the attentional learning rule. Let us denote Wjj. by before any training, and Wjj. by W^. after the presentation J J J J of Sj. It is easily verified that, after the mfh presentation of Sj, following (2.6) we have: W°. m w % = i r + C i B g ( < k 'i>A)l 3 J d < 2 ' 8 ) k where Q = 1 + CjB ^ g((k-j')A). Since Q > 1, we get M where C = C jB t(Q - 1). For all Wjj, I *ij, ..., i^, we have K t Q-E D- Here we clearly see that the larger Q is, the larger Q is, and the more rapidly the weight distribution converges. Corollary 1. Repeated training with Sj results in k IPi = B C ^ g 2 ((k-j)A) (2.9) Note that IPj depend only on the length of sequence Sj. 25 Theorem 2. During repeated training with Sj, define AIPj1 = (IPi after the mfh presentation ofSj) - (IPi after the presentation o f S{) k = b ' Z ( W ? - w fy 1 )g«k-j)A) Then we have AIP1 . ■ fit ________1 - Q t n - M P r = - ^ I (2 -1 0 ) [Proof] According to formula (2.8), k u p ? = b X <w?i - w%:] > g« k - m j = l J J k m m -1 B ' L i K i f f n - + c i B g ( ( k - j m 'L - £ - i i i m - m H 1 Qm Qm 1=1 Q 1=1 d B * ™ X / Q B g((k-j)A) - ( Q - 1)WU1 g((k-j)A) W j=l I k Since Q = 1 + Q B l g((k-j)A), we obtain j = l „ 2 ai ^ k k A tr ? = - p ^ [ ' L s 2 « k -j)A )- ’ Z g « k - j ) A ) 2 I W °ig((k-j)A)] (2.11) U j = l j = l j = l j M P\ Q — 7 Q.E.D. m-1 ^ 26 Theorem 2 tells us that if the first training with S* increases (or decreases) IP}, then the input potential keeps increasing (or decreasing) in subsequent training. Corollary 2. I f repeated training with S; begins from the initial state, i.e. Wy = ll(n-l), then after each training AlPf > 0 , which means IPi increases monotonically with sequence training. [Proof] From (2.11) k k 'L g « k -j)A ) t 1 Q m j = l n - 1 / = ; 2 k k ^ ( Y ,g 2 ((k~j)A) - t [Y jg((k-j)A )]2} since k<n- 1 Q j = i K j = i MP™ > 0 , because g(x) is a positive, monotonically decreasing function, and the following fact* : If aj >a 2 > ... > an, then n n r & a 2 . > & a i ) 2 Q.E.D. i = l i= l Corollary 1 plus Corollary 2 gives us the insight to build a model for temporal sequence learning. If we choose 1} in (2.2) as the input potential expressed in (2.9), i.e., k = B C l g2 ((lc-j)A) (within a certain small error e) , then the result of training with S2 - H * This is easily proven using mathematical induction and the Cauchy-Schwartz inequality introduced later on. 27 is to build IPi in order to fire N ji by the presentation of St - alone. In other words, after a certain number of training trials, a presentation of S( alone will fire Nji, and neuron Njf will recognize sequence S^. We say that neuron Nji has learned the sequence S; if the presentation of S; can activate this neuron that it could not before training. Value 1 } can k be set up during the first training, since from (2.7) and (2.9) we have 7} = B g 2 ((k- H k j)A) = - 5 - m 2 ( t’ ), namely we can avoid using g((k-j)A) by looking at the a j=l 1 1 J corresponding membrane potentials at t'. Conversely, note the interesting fact that 7} can be set purely on the basis of the length of the sequence. Figure 2.3 shows a computer simulation of the sequence-detecting neuron model. In Fig.2.3a, we show a curve which reflects the increase of TPj of a sequence-detecting neuron with number of trainings of sequence S 2 : A -B -C — D -E. Value 7) is set by the system to 1.3634 (where e is chosen as 0.001) and after the 9th training IP^ goes above this threshold, and so the following presentation of the sequence alone activates N ji without the training input /,. Fig.2.3b shows the corresponding simulation. The following theorem guarantees that after a sequence is learned by Njj, only the learned sequence can activate iV /,-, i.e., N ji does not make any mistake in recognizing the learned sequence. Theorem 3. After neuron N ji has learned sequence S t h e presentation o f Si induces the maximal postsynaptic potential. [Proof] According to Theorem 1, after Nji has learned sequence S,-, we have W# . = C g((k-j)A) (j = 1,..., k) and W# = 0 (I* i j , ..., ifc).When any sequence S a (could be a complex sequence) different from St - is presented, the induced membrane potential of N n is 28 Figure 2.3 a. The curve of input potential with number of training trials, b. Training for sequence recognition. Ten dual neurons have been modeled, and temporal activities of Njj, Nj2, •••, Nj$, and Nj jq are displayed in the figure from top to bottom. Note that in each case, a dual neuron must fire to first achieve a non-zero membrane potential, but thereafter activity decays according to the curve g (as shown in Fig.2.1b on an expanded time scale). A symbol indicates which pattern the corresponding neuron represents. The sequence to be detected is S 2: A -B -C — D — E. During each training cycle, the sequence is presented, followed by an activation of a sequence detecting neuron S. Each training cycle is followed by a test cycle, during which the sequence is presented alone, i.e. without a following activation of S, in order to see if neuron S can be activated by the sequence. After 9 trainings the sequence detecting neuron S can be activated by another presentation of the sequence. The parameters are: M = 20, B = 2 .0 , Ct = 0.4, Tu = 6.5, T2 1 = 1 0 .0 , r1 0 = 1.3634, K n = K2i = 5.0, T= 6 At, A = x, At = 0.1. Input potential (IP) 29 IP increases monotonically with number of trainings 1.34 0.9-5 0.74 0.54 0.3 1 0 6 7 8 9 2 3 4 5 1 0 Number of trainings with Sequence: A-B-C-D-E b A B C D E S L i 1 1 m u n w u .iim iu .i 1 L a i 1 1 1 L . L i . Ik. Jjiii.JIk. Ln .feki.ii 1 1 1 1 k . l k i l k i l k i l k . L i L i Iki Ik. L . L i 1 . Ik . Ik . Iiiii L . Ik/..Ik .lki..lkt..lkL.ttki..Nkv..k i Ik . Ilk. Ik . I k Ik Li y l .. L l l i l l u . H k. Irk .. mk J 1 < 1 1 Ik. Ik;. Ik ;.. M u .. K k i k .L... III 1 III. 1 liu . i l l L ■ L t i .. ' M i i • i • i ii... 1 1 1 1 1 IkL . ' mL.. • Ik raining ;ycle est :ycle raining :ycle est :ycle i i t i > i • i i i • i i i i * ♦ • i i » ! 1 • ' • 1 1 raining jycle i i i i 4J ' « y ■ 4 1 > 1 1 30 k k X Wijmjjft) = X W umn/t) = C I g«k-j)A) m n .(t) M M J J j=l J Since only one pattern can be firing at any time (due to the nature of sequence) and the biggest signal is B (see Eq.2.3), at most m jip ) (j = 1 , k) are a permutation of B, Bg(A) , ..., Bg((k-1)A) (in the real situation, some of them may be missing). Hence, k X Wijmij-(t) < C i I g«k-j)A)-g(pjA) M j=l where PjP2 ---Pk a certain permutation of 0123...£-1 (not the identity). Based on the Cauchy-Schwartz inequality, that is, if aj, a j , a / c>0 k k 2 a2 . > X ajap j = l J j = i n where PiP2 — Pk is a permutation of 123---&, we have k k X Wifn ij(t) < C f i X g((k-j)A)-g(PiA) < C B X g2 ((k-j)A) = J} j* i j=l H Recall that equals the membrane potential induced by the presentation of S;. Therefore, the theorem is proven. Q.E.D. This theorem realizes the maximization principle that repeated training of a sequence polarizes the weights of the corresponding detecting neuron so that it can only 31 be triggered by this specific sequence. Unlike many other models, this model genuinely stores sequences rather than transition dyads. Recognition of complex temporal sequences In the previous sections, we proposed the dual neuron model to implement STM, and the property which is used from the STM model is basically an exponential decay membrane potential within the STM period. To concentrate on solving the problem of complex sequence learning, in the following we explicitly incorporate an exponential decay into the model to simulate STM. We thus replace the dual neuron <Njj, N 2 f> by a single neuron T V /. The corresponding neural implementation by dual neurons can be done in the same way as in the previous section. There is a problem with the above sequence detecting neuron model if we apply it to arbitrary sequence detection. When a complex sequence, like S 3 : A -B -A — C-A— B -E — B -D , is presented to the previous model, described in (2.2)-(2.6), then the later presentation of a recurring pattern will overwrite the signal of the previous presentation maintained in STM. That is, the sequence detecting neuron can only detect the last presence of a recurring pattern. To solve the overwriting problem, we introduce multiple synapses between two neurons, each of which corresponds to one occurrence within the temporal summation period. The idea is that we replace neuron N j by an expanded netw ork, as shown in Fig.2.4a, whose terminal Pjr remembers the trace for the r1^1 most recent impulse generated by Nj. Thus, The way the Pjr operate is like a stack: a new impulse generated by Nj pushes the whole array Pjr by one place and imprints itself on the first terminal (see the definition below). Every Pjr is decremented each cycle A. This mechanism for complex sequence recognition is preserved in the next chapter with 32 varying intervals. In the following model, the membrane potential and the output of neuron Nj at time t are mft) and Sft) respectively. (1 - a ) Pjr(t-A) P jr(t) = \ 1 .(1 - a) Pj r_j(t-A) ifSj(t) = 0 if Sj(t) = 1 and r = 1 (2.12) if Sj(t) = 1 and r > 1 mft+A) = XX w ' P j A t ) + i f t ) M r = l (2.13) Sft) = flmft) - rj (2.14) W^(t) = Wr i}(t-A) + Cj Sft) Pjr(t) I j ' * i r= l (2.15) where a is the decay parameter of Pjr(t), playing a similar role as the g(x) introduced earlier, and c represents the number of terminals of each neuron Nj. is the weight of J IJ the synapse that the r1^1 terminal of Nj makes on Nj. It is sufficient to set c to the number of maximal occurences of a symbol in a sequence, e.g. c = 3 is sufficient to recognize S 3 . The choice of c, the number of terminals each neuron has, limits the number of occurences of the same symbol within a temporal sequence. Synaptic modification is defined in the same way as before, except that we have c synapses between two neurons 33 instead of one. Due to the normalization in (2.15), weights are set initially, W ^ t) = l/[c (n-1)]. The maximization principle applies to this model for complex sequence recognition similarly. The conclusions from the previous section, i.e. Theorems 1, 2, and 3, Corollaries 1 and 2, are also established in this model. Particularly, we have Theorem 4. After neuron Nj has learned any complex sequence Sj, the presentation ofSj induces the maximal postsynaptic potential in Nj. If we set Fj in the same way as in the previous section, this theorem guarantees that system (2.12)-(2.15) can learn and recognize without mistake any complex temporal sequence whose components may repeat no more than c times in the sequence. Fig. 2.4b shows a simulation for learning and recognizing sequence S3 . This neural model can be used directly for the recognition of temporal sequences which contain distortions. This can be achieved in the following two steps: 1° Lowering the threshold of each sequence-detecting neuron. The previous threshold setting (cf. Eq.2.9) is only appropriate in the absence of distortions. In order to tolerate distortions, we need to lower the threshold a little bit such that if the current sequence induces a membrane potential close to that induced by the learned sequence, the corresponding neuron will fire. Thus a sequence-detecting neuron can be fired by a set of sequences close to the learned sequence. 2° In case 1°, a currently presented sequence may activate more than one detecting neuron, which is not desirable. To avoid this situation, we can feed the signals of all the firing neurons (if any) tor a competitive neural network (winner-take-all network, see among others Amari & Arbib, 1977; Rumelhart & Zipser, 1986). This competitive network ensures that only the neuron which is triggered and has the biggest signal is activated. 34 Figure 2.4 a. Diagram of an expanded neuron model for complex sequence recognition, b. Recognition of the complex sequence S 3 : A -B -A -C -A -B -E -B - D. Ten neurons have been modeled, and temporal activities of N j, N 2 , ..., N$, and N jq are displayed in the figure from top to bottom. During each cycle of training, a peak of activity indicates the activation of the correponding neuron. Thus, for example, the trace for A has 3 peaks, and the trace for C has 1 peak between the second and third peaks of A. Each training cycle is followed by a test cycle during which the sequence detecting neuron is not activated externally. After 6 trainings, the sequence detecting neuron S can be activated by the presentation of S 3 alone. The last column is to test if the detecting neuron can be activated by another sequence, A -C -A -C -D -B -E -D -B . The parameters are: c = 5, a = 0.4, Cj = 2.0 (i = 1, ..., 10), r1 0 = 0.629, A = 1. M O n W > training cycle test cycle training cycle test cycle training cycle test cycle test cycle n 36 With this extension, the model can serve as a general sequence recognizer. Fig.2.5 shows the whole system architecture. In the input layer each neuron represents a spatial pattern. The connection from the input layer to the recognizer layer is all-to-one correspondence. For clarity only one neuron is shown in the recognizer layer. The connection from the recognizer layer to the competition layer is a one-to-one correspondence. Note that this discussion will be applied to the model of general sequence learning presented in the next chapter. Input layer with STM Sequence recognizer layer Winner-take-all layer F igure 2.5 General architecture for temporal sequence recognition. For explanation see text. II.3 T em poral Sequence R eproduction Temporal sequence reproduction is a different, and somehow more difficult, issue than temporal sequence recognition. If the sequence in question is a simple sequence, 37 reproduction becomes much easier because we only need to store transitions between each two consecutive patterns. This, as mentioned in Section II. 1, has been achieved by many authors. However, the real difficulty lies in reproduction of complex sequences, where a correct transition to a pattern is determined by its context, not simply by one previous pattern. The model for sequence recognition can be borrowed for sequence reproduction. In the previous models, each neuron in the input layer represents a single spatial pattern. But a neuron can also be considered as a sequence detector. It appears that if we make a neuron function as a detector of the part of a sequence before the pattern that this neuron represents, we can readily realize sequence reproduction. This idea has the attractive feature that sequence training for reproduction is nothing but a simple sequence presentation, not like sequence recognition where we need to activate (teach) the sequence-detecting neuron deliberately for each sequence detector. However, it has some problems for general complex sequence reproduction. First, a self reference problem occurs when transition to a pattern depends on a context which contains the pattern itself, like pattern A in the 4-degree sequence: A -B -A -C -A — B -A -D . Secondly, since we usually choose a memory span equal to or a little larger than the degree of the sequence (we cannot expect a memory span of the same length as the sequence itself), this requires that a neuron should be able to be activated by different subsequences. For example, in the 2-degree sequence A ~ B -C -A -D -E -A -F -G ~ A -H -I, pattern A can be transitioned from B— C, D — E , and F— G. This violates the previous hypothesis that one neuron can only be activated by one sequence. We call this the multiple reference problem. These two problems can be solved with a direct extension if sequences to be reproduced are not more complex than first-order complex sequences. 38 Layer d (sequence detector layer) Layer p (input layer with STM) Figure 2.6. Network architecture for temporal sequence reproduction. At the beginning, the connections between layer p and layer d are all-to-all correspondence. The appropriate connection pattern for reproduction will emerge after repetitive training with a temporal sequence. The idea proposed here separates the neurons for detection from those standing for spatial patterns. An additional layer of neurons, called layer d, comprises sequence detectors which follow the model for complex sequence recognition presented previously. This layer is connected to the original layer of spatial patterns (called layer p) bidirectionally such that a sequence detector can activate the appropriate next spatial pattern. The connection configuration is shown in Figure 2.6. This paradigm is applicable to the situation without the restriction of a fixed interval as we shall see in Chapter 3. The number of neurons in layer d must not be less than the length of the sequence minus the degree of the sequence. Let rj be the degree of the sequence. During training a sequence is presented to layer p . After the first rj-1 patterns have been presented in layer p, a neuron in layer d is randomly chosen (but fixed in successive 39 trainings* ) to fire synchronously with each presentation of a spatial pattern in layer p. In Chapter 3, a more elaborate training scheme will be provided. The recurrent connections from layer d to p are set up according to the Hebbian rule, i.e. whenever there is a neuron iv f firing in d and Np . firing in p, there will be a synaptic link established from ivf to Np ; . J * J 1 This training process is repeated several times until each neuron in layer d has learned to recognize a specific subsequence. Later reproduction of the sequence is done by presenting its initial context to layer p. Since our sequence detection model can detect any complex sequence, as a result this model can reproduce any complex sequence. As an example, we have trained this 3-layer network with the 4-degree higher- order complex sequence S 4 : A - B - A - C -D - A - B - A - E -F -A - B - A - G -H -A -B -A - / During training, sequence S 4 is presented to layer p. After the first 3 patterns have been presented in layer p, a neuron in layer d is chosen to fire synchronously with each presentation of a spatial pattern in layer p. After 6 trainings, the network can reproduce the whole sequence by being presented A -B -A -C , the initial context. Fig.2.7 shows this process which includes the last 3 trainings with the reproduction. II.4 Conclusion Two ideas are central to this neural model of temporal sequence learning: short term memory and sequence sensitivity. In order to link two temporally discontiguous patterns, the earlier pattern has to be preserved for a certain period of time which is assumed to be achieved by STM in our model. Our results suggest an important computational function of STM, i.e. STM could This could be done with a style of self-organization demonstrated by Malsburg (1973) and Kohonen (1990). For simplicity, the current system simply "remembers" the initial choice. 40 i w w v ' l WAt\ /V W V |V M K |V M ^ \ / ihhK /V V IV k /V Ns IV k /V V V i IV, /V IV IV 1 IV IV fv IV i v IV IV IV i iv IV V V i v V . K . ; ! ^ g ) '1 *3 o a & 0 0 '1 & B 5 * 0 0 * 1 £ •3 0 a & * < • < JU U U .C U u u o r e p r o d u c t i o n c y c l e Figure 2.7 Reproduction of a 4-degree higher-order complex sequence S 4 : A - B -A -C -D -A -B -A -E -F -A -B -A -G -H -A -B -A -I-J . The model is trained with this sequence for 6 times and the last 3 trainings are shown in the figure. The entire sequence is reproduced by the model with the presentation of S 4 S initial context: A -B -A — C. 41 lay a basis for temporal sequence learning. This idea is carried further in the next chapter for dealing with varying intervals of symbol presentation. In addition, STM puts a direct restriction on the degree of a primitive sequence to be learned. Hierarchical methods may be required for learning sequences with greater degrees than STM capacity, as suggested in the chunking theory (Miller, 1956). A recurring subsequence may be viewed as one single unit, which could lessen STM load significantly. A version of hierarchical sequence recognition will be explored in the next chapter. The second key idea is sequence-sensitive training. We find that following the attentional learning rule, the input potential to a neuron increases monotonically (cf. Corollary 2). This idea underlies the recognition as well as the reproduction of complex temporal sequences. In chapter 3, we will discuss some of the biological implications of the attentional learning rule introduced here. In the model, the transition from a previous subsequence to a current pattern is not stored explicitly anywhere in the network, as in many other modeling efforts. Instead it is reflected by the distribution of weights to a sequence detector neuron, and this distribution will maximize the input potential of the detector neuron upon the presentation of the previous subsequence. One important feature of this model is that the length of the previous subsequence (or the degree of a complex sequence in general) does not affect the performance of the sequence learning, whereas it could cause severe problems for many other models previously proposed. The training turns a neuron from sequence insensitive to sequence sensitive, like order emerging from chaos. A graded input to a synapse can be viewed as a firing rate of impulses or a graded potential, and the Hebbian rule and the normalization rule are both biologically plausible. So this sequence-sensitive training is consistent with basic principles of known neurobiology. 42 The model aims at dealing with complex temporal sequences directly, not only because they are indispensable for real applications like speech recognition, music generation, etc., but also because they pose critical problems for previous models for temporal sequence learning. This model provides a general solution to this problem both for sequence recognition and sequence reproduction without causing significant extra computational expense (see Section III.6). At the same time, the problem of ad hoc wiring for temporal coupling, existing for example in the outstar avalanche model and the selection model, has been avoided here. With these achievements, we now turn to the next chapter, where the general theory of temporal order processing is presented with a particular treatment of time intervals and hierarchical organization of arbitrary sequences. 43 C H A PTER III A THEORY OF TEMPORAL ORDER LEARNING: TIMING AND CHUNKING Sum m ary With the ground laid in the previous chapter, we present a computational theory of the learning, recognition and reproduction of temporal sequences. This model is based on an interference theory of forgetting in short-term memory (STM), using a network of neural units with mutual inhibition. The STM model provides information for recognition and reproduction of arbitrary temporal sequences. Sequences are acquired by the attentional learning rule proposed before. Acquired sequences can be recognized without being affected by speeds in presentation and certain distortions in symbols forms. Different layers of the STM model can be naturally constructed in a feedforward manner to recognize hierarchical sequences, significantly expanding the model's capability in a way similar to human information chunking. Also presented is a model of sequence reproduction that consists of two reciprocally connected networks, one of which behaves as a sequence recognizer. Reproduction of complex sequences can maintain interval lengths of sequence components. A mechanism of degree self-organization based on a global inhibitor is proposed for the model to optimally learn required context lengths in order to disambiguate associations in complex sequence reproduction. Certain implications of the model are discussed at the end of the chapter. 44 III.1 Introduction In the last chapter, we proposed a new mechanism for learning temporal sequences. We modeled short-term memory (STM) by units comprising recurrent excitatory connections between two local neuron populations, in which each population is represented by a single quantity corresponding to local field potential. The activity induced by an input signal to a unit oscillates with damping, thus decaying over time. Using the attentional learning rule, the neural networks with this model of STM are able to learn complex temporal sequences, recognize these sequences with tolerance to certain distortions in form, and reproduce them. What distinguishes our model from others (see Section II. 1) are two basic hypotheses embodied in the model: (1) There is a common mechanism to process both complex sequences and simple sequences; and (2) Reproduction of a component in a sequence is based on recognition of the context of the component. However, the previous chapter can only deal with a specific case in temporal order processing: each symbol presentation has a fixed interval, because STM was modeled by a decay which has a fixed temporal course. In another word, the time-warp problem remained to be solved. The time-warp problem poses different requirements for sequence recognition and reproduction. For sequence recognition, we wish a network to recognize a time-warped sequence, whereas for reproduction we wish a network to reproduce a sequence with the same temporal course as the learned sequence. We attempt to solve the time-warp problem in this chapter. In the previous case, all context recognizers use the same degree, which must not be less than the degree of the entire sequence to be recalled. The requirement is replaced in this chapter by a dynamic tuning mechanism whereby each recognizer learns during training its necessary degree for 45 unambiguously producing the next symbol. W e also propose a mechanism for hierarchical sequence recognition, similar to human information chunking. This mechanism seems both natural and necessary for processing long sequences, like a paragraph of sentences, a piece of music, and so forth. All these together form a general theory of temporal order processing. Earlier versions of this chapter appear in Wang and Arbib (1991c; 1991d). In the remaining part of this chapter, section 2 presents a new model of STM. In section 3, a solution to the time-warp problem is proposed for general sequence recognition. In section 4, a proposal for hierarchical sequence recognition is provided. Complex sequence reproduction with various time intervals for individual components is approached in section 5. In section 6, the present model is compared with others and certain theoretical implications are discussed. Finally, section 7 concludes the first part of this thesis. III.2 An Interference Model of STM It has been found in the study of memorizing nonsense syllables that each syllable in the series has links not only to adjacent words in the series, but also to remote words (Lashley, 1951). In order to link two temporally discontiguous patterns, the previous one has to be memorized until the latter one occurs. This typical short-term memory phenomenon lays an important basis for temporal order processing. In order to provide for temporal processing, a model of STM must provide the following four basic functions: (1) Maintaining a symbol for a short time period. How long can an item be retained? Peterson and Peterson (1959) found that the probability of a correct recall 46 declined rapidly over an 18-second period, when subjects were asked to perform some distracting task to prevent rehearsal. What causes forgetting? Two dominant views are decay versus interference (Reed, 1982). An interference theory proposes that memory for other material or the performance of another task interferes with memory and thus causes forgetting. A decay theory, on the other hand, proposes that forgetting still occurs even if the subject had to do nothing over the retention interval, as long as the subject did not rehearse the material. (2) Maintaining a number of symbols. Miller (1956) tells us that the capacity of STM is only about seven symbols, but suggests that recoding information to form chunks can help overcome this limitation. (3) Coding the order of input symbols. Given that STM can hold several items simultaneously, the order that these items enter STM must also be coded some way. It has been observed that subjects engage in linear scanning when judging whether a test symbol is contained in a short memorized sequence (Sternberg, 1966). However, it remains unknown how order is coded in STM. (4) Coding the length of presentation of each symbol. When one learns a sequence, one can recognize it even though each component of the sequence is presented at a different speed. Yet, a professional musician can recall a multiple-page score, reproducing almost exactly the memorized length of each note, although each note may last differently. Since STM is an interface between input symbols and long-term memory (LTM), STM must be able to code the length of each held symbol. This function of STM provides first level information for solving the time-warp problem. The STM model introduced previously conforms with the decay theory of forgetting, since the activity of a unit when stimulated oscillates and decays over time. The order of input symbols is coded by the different amplitude of unit activities elicited 47 by these symbols because of decay over different times since presentation. However, the number of items the STM model can hold varies with the length of the presentation intervals of each symbol, and the longer each presentation takes, the fewer items can be held in STM. Furthermore, the model cannot code the length of each symbol presentation, and therefore it cannot provide a solution to the time-warp problem. Waugh and Norman (1965) tested whether the loss of information from STM is caused primarily by decay or by interference, and reported that interference, rather than decay, is the primary cause of forgetting. The current majority view seems to weight interference more heavily than decay. Although some decay may occur, the amount of forgetting caused by decay is substantially less than the amount caused by interference (Reed, 1982; Murdock, 1987). The following computational model of STM we will describe is based on the interference theory. Let unit i represent the ith local neuronal population (/ = 1, 2,..., n), the building block of this STM model, and J t,- its excitation level. Each unit receives an external input Ej, which is 1 so long as the external input is on and 0 otherwise, and is inhibited by all the other units, as shown in Fig.3.1. Two further quantities are associated with unit i: the internal state, sz -, which signals activation of the unit and provides inhibition to the other units; and the excitation level Xi which provides a decaying memory trace, and which is used in the learning rules of section 3. The internal state is defined as From the definition we can see that the internal state is a "newness" signal activated only by the beginning of an external input. Detection of the beginning part of an otherwise (3.1) 48 external input can be neurally implemented with a threshold and adaptation of the external input. Figure 3.1. Diagram of the STM model. Unit i receives external input £)• as well as inhibition from all other units in the model. The figure shows outgoing projections from unit i. Minus signs indicate inhibition. The excitation level of unit i lies in the range of {0, 1,..., T}, and is defined as (T x /t) = ) x i(t-l) - 1 if Xi(t-1)>0, yi(t)= 1 (3.2) [jtj( t- 1 ) otherwise where yj represents overall inhibition that unit i receives from the other units and is formulated as 49 y i( t) = f{ L s i( t- l) - 1) (3.3)* M with/(jc) = x if x > 0 or 0 otherwise, as defined in (2.4). Thus ytft) = 1 iff an external input is applied to any unit other than unit i. From the above definitions we see that whenever Sj(t) = 1, xtft) is brought to its highest value T and unit i is activated or triggered. If any of the units is activated, the inhibition that it exerts on the rest of the network will drive all other active units, i.e. those whose excitation levels are larger than 0, down to the next lower level. Now let us see how this model of STM satisfies the above four requirements. Firstly, this model preserves a symbol, or an information item, on a unit whose excitation level codes the item. Let us assume that external inputs arrive at STM serially (it is easy to serialize simultaneous inputs by a competitive network, see among others Didday, 1970; Grossberg, 1976; Amari & Arbib, 1977; Rumelhart & Zipser, 1986). Any new item input to STM decrements the excitation levels of all active units in STM. Therefore STM can at most code T items. T is a system constant which is equivalent to the capacity of the STM model, suggesting T is about 7±2 in humans (Miller, 1956). Variability in capacity may be attributed to individual differences and different types of materials to be memorized. If we consider only the case where all items in STM are different for the time being, then a symbol can be maintained in STM from when it is input to when the Tth subsequent item is entered, conforming with the interference theory. Secondly, T symbols can be preserved simultaneously in the model. Thirdly, the order of input symbols is coded by the excitation levels of the units that represent the symbols. The * Since the weights of inhibitory connections are the same, the mutual inhibitory connections can be replaced by a global inhibitor. A global inhibitor can reduce the number of connections by one order of magnitude, but results in a less reliable system due to information centralization in the inhibitor. 50 larger the excitation level of a unit is, the more recent is the symbol represented by the unit. Since all inputs to the STM model are serial, there is a strict temporal order among all symbols held in the model. Finally, the length of a symbol's presentation is reflected by the time period while the corresponding external input is on, and its coding mechanism will be given in section 5. In conclusion, the above simple formal model is capable of coding the four necessary functions o f STM. Possible neural circuitries for implementing units, local neuronal populations, will be discussed in section 6. We will see in the following sections how information carried in the model is used for processing of temporal order. III.3 Sequence Recognition with Interval Invariance The following model for general sequence recognition is based on the above STM model, and the learning algorithm is the same as used in the previous chapter. The major focus in this section is to solve the time-warp problem. Here, sequence recognition is not affected by varying presentation intervals for individual components in a sequence, which is called interval invariance. Simple sequence recognition Before we propose a solution for general sequence learning, it helps to elucidate basic ideas by presenting a model for simple sequence recognition. Suppose that an extra unit 0, called a detector, is to be trained for recognizing a simple sequence Sq. The detector unit receives projections from n units in the STM model (cf. Fig.2.2), and sq is formed by 51 n s0(t) =f(Z w 0iXi(t-i) + I0(t) - r0 ) (3.4) i= 1 where W qi is the connection weight from unit i to the detector. Modification of connection weights follows a Hebbian rule and a later normalization as follows where C z is a gain factor of learning. The larger is C j , the faster is learning and the more easily is the memory value overwritten by a new stimulus. The effect of learning on the detector is to change the distribution of all weights to that unit, so it is reasonable to assume that initially Wqi = l/n. Let Sq = Poj-Po2- - -POf-- -PO j(;’ 1 - - n• W ithout losing information, we suppose that pattern pg. fires (is represented by) unit Oj. Since Sq is a simple sequence, Oj * Oj if i * j. Our purpose is to train the detector to recognize Sq with interval invariance. The training is done by presenting Sq to the model and activating unit 0, i.e. setting Eq(t) to a high level, immediately after the presentation of Sq. This is the attentional learning rule defined in Section II.2. In this learning paradigm, we say unit 0 is attended when E0 (t) is set by the system to 1. During training of Sq, each presentation is allowed to vary its speed. That is, any Pq. can have a different presentation interval from that of any other component of Sq of the same presentation trial, and even from that of the same pq. on a different presentation trial. If the detector can be activated by presentation of Sq but not by any other sequence, n (3.5) W 0I(!) = w 0 i<t) ! X W0 i{t) 1=1 52 we say that it has learned to recognize the sequence. Of course, for recognition to be interval invariant, after learning the detector should also be activated by the same Sq with a presentation speed different from any used in training. As defined in Section II.2, the input p oten tial IPq of Sq to the detector is the weighted sum to the unit at time t' immediately after the presentation of Sq. That is since (according to Eq.3.2), X q . ( 1 ) is set to T by input P q . , and decrements only when a new input is received. Equation (3.6) is the same as Eq.2.7, except that the function g(l) there is instantiated to a linearly decreasing function here. The formal analysis in that paper applies as long as function g(l) is monotonically decreasing. Therefore all relevant theorems and corollaries are also established in this model, and they are summarized below without proof. 1° Repeated training with Sq leads all weights to unit 0 to have the distribution: W00. = 2(T-K+i)/[K(2T-K+1)], with W0j = 0 for j * 0 j , ..., 0K. 2° Repeated training with Sq leads to n K ip 0 = X w 00 x0m = X (T-K+i)w00 i=2 1 1 i=2 1 (3.6) IP° - K(2T-K+1) ( T ~ K + l ) 2 (3.7) where IPq depends only on the length, K , of the sequence. 3° Define A//^m as the IPq after the mth presentation of Sq minus the IPq after the (m-l)th presentation of Sq. Then 53 AIPJ M P’ J} = — — y o < 2 m (3.8) where Q = 1+Cz(2 r — A r+l)/2. Furthermore, if repeated training with Sq begins with the initial condition, i.e. Wqi = 1 In, then after the first training, AIPq > 0. Therefore, based on (3.8), AIPq > 0 after each training. In other words, IPq increases monotonically with sequence training. The above conclusions imply that if we set j F q in (3.4) to the input potential expressed in (3.7), i.e. rn K(2T-K+1) S (T-K+i)2 (3.9) then the result of training is to build up IPq so as to fire the detector by presentation of Sq. Since the Fq in Eq.3.9 is the limit value of IPq, a small error £ should be subtracted from that Fq when applied in practice. Because Fq is dependent only on the length K of the sequence in question, it can be easily set up during the first training of the sequence. 4° After the detector has learned sequence Sq, only presentation of Sq induces the maximum activity on the detector unit. The result thus embodies the maximization principle as in the previous chapter, namely that repeated training of a sequence polarizes the weights of the corresponding detection unit so that it can only be activated by this specific sequence. The maximization principle has two parts. The first involves training with (3.5) that drags the weight distribution of the detector along with the direction of the training signals from units activated by the sequence. The second simply uses the fact that the inner product of two 54 normalized parallel vectors reaches the maximum value. The latter fact has been previously used for pattern classification (the nearest neighbor method, see Duda & Hart, 1973) and even in pattern recognition by neural networks (Kurogi, 1987). Our contribution lies in proposing a biologically plausible learning scheme (Eq.3.5) that naturally prepares weights for later application of the maximization process. A computer simulation of this simple sequence recognition with interval invariance was conducted, and the result is shown in Fig.3.2. The sequence to be detected was A-B-C-D-E, each component being presented for different intervals that are created by a random number generator within a certain range. Fig.3.2a shows the monotonic increase of IPq with number of sequence presentation trials. The increase follows a typical inverse exponential curve with negative increase rate. Fig.3.2b depicts the actual training and recognition process, with training intervals {9, 3, 6, 9, 5} for A, B, C, D, E respectively. After the 6th training trial, IP q went above the system-set threshold, and so any following presentation of the sequence was able to activate the detector. After that, the same sequence with the different interval series {9, 7, 3, 6, 4} (also generated by the random number generator) for its components was tested, and the model succeeded in recognizing this time-warped sequence. See the figure legend for the parameter values used. The idea behind interval invariance is that during presentation of a sequence component, only the beginning of presentation of its input symbol triggers activity in a unit (cf. Eq.3.1), and only presentation of a new symbol to the network decrements its current activity levels - i.e. the only decay is triggered by interference. Therefore it does not matter how long that presentation lasts. This same idea is used for recognition of any complex sequence, which is presented next. 55 Figure 3.2 a Monotonic increase of input potential IPq with number of training trials of sequence A-B-C-D-E. After the 6th trial, IPq was within a small error £ = 0.001 of the system-set threshold value of unit 0 (Tq = 5.4). b Training for recognition of the sequence with time-warping. Let units 1-5 represent patterns A, B, C, D, E respectively; a symbol in the figure indicates the corresponding unit. S corresponds to the detector unit 0. During each training cycle, the sequence was presented, followed by an activation of the detector unit (the attentional learning rule). An activation is indicated in the figure by a peak value equal to T. Each training trial was followed by a test cycle, during which the sequence was presented alone in order to see if unit 0 could be activated by the sequence. Presentation intervals for individual components were generated by a ranged random number generator, and they were {9, 3, 6, 9, 5} for A, B, C, D, E respectively. After 6 trials, the detector unit was able to be activated by another presentation of the sequence. After the detector learned the sequence, another test trial was made by presenting the same sequence with a different interval series {9, 7, 3, 6, 4} also randomly generated. As shown in the last column, the detector recognized the time-warped sequence. The parameters are: n = 10, C; = 0.04 (i = 1, ..., 10), T = 7. 56 6.00 -] 5.75 - 5.50 J 5.25 : £■ 5.00 - G 4.75 - ^ 4 .5 0 -j •Jp 4.25 - § 4.00 : O 3.75 - ^ 3.50 - P 3.25 - 3.00 - & 2.75 - 2.50 -Q 2.25 -j 2.00 H-------------- 1 -------------- 1 -------------- 1 -------------- 1 -------------- 1 -------------- 1 -------------- 0 1 2 3 4 5 6 7 N u m b e r of train in g trials b * f N N N N N N N N N M ' h h r s ■ h i h i h i hi H H N N N N r i H !ri c r i h l h l h l ni ni r i r i ni r i f i n 1 n » n m n i n i ni ni n ri n n f i n 1 ni E : : : ; ] i j L i j — . 1 • I i i i li i training cycle test cycle training cycle test cycle i I i i i t & ' • • 1 • • • * • ' £ : : : : ! it cycle test cycle test o 1 3 S- : 57 Complex sequence recognition The above mechanism for simple sequence recognition cannot be directly applied for complex sequence recognition. A unit corresponds to a symbol in a sequence, and the external activity of the unit is represented by only one quantity: its excitation level. Therefore according to (3.2) a later occurrence of a symbol in a sequence may overwrite an earlier occurrence stored in the STM model. For example, the different occurrence of A in seqeunce Sj: A-B-A-C-A-B-E-B-D. To overcome this problem, we proposed in the previous chapter that a unit be represented by an expanded network, such that it has multiple terminals to hold different occurrences of a symbol, with multiple channels to connect to other units. Fig.3.3a shows a diagram for a single unit. The following model combines this idea for solving the overwriting problem with the new STM model for interval invariance. Suppose unit i has m terminals, and the excitation level of its rth terminal is represented by Jtjr. A new input maximally activates Xjj, and "shifts" all other traces "downwards", so that jqr holds the rth most recent occurrence of the symbol represented by the unit. The STM model (Eq.3.1 through Eq.3.3) and the definitions of £)-, and y* thus remain the same except that T if Si(t)=1, r - 1 if Si(t)= 1, r> l, Xi r_i(t-1 )> 0 if x ir(t-l)> 0 , y>i(t)= 1 otherwise A gain let unit 0 be trained to detect an arbitrary sequence S q- A s before, during training each com ponent o f the sequence is allow ed to have a different presentation interval from that o f any other com ponent o f S q o f the sam e trial, and from that o f the 58 same component on a different trial. After learning, the model should be able to recognize Sq presented with a speed different from that of any training trial. The detector receives inputs from n units from the STM model, and sq is defined as n m S0 «) = ■ « ! s W u X iJ t-D + Itft-D -r o ) (3.11) i = l r = l where is the weight of the connection that the rth terminal of unit i makes on unit 0 (the detector), and it is updated according to WoiW = W + Ci S0 < *) Xir(V ^ n m ^ . (3.12) K i O = K i d / Z I i'= l r = l I It suffices to set m to the maximum number of occurrences of symbols in a sequence. For example, to recognize S j, m can be set to any number larger than or equal to 3. The choice of m, the number of terminals of each unit, limits the number of occurrences of the same symbol in a complex sequence. The learning rule is the same as in Eq.3.5 except that all the m synapses of a unit are modified. Due to the normalization in (3.12), the connection weight t) is set to 1 /(mri) initially. With this modification of the model, the above conclusions (number 1° through 4°) with simple sequence recognition are established similarly. In particular, the maximization principle applies: 59 5° After the detector has learned a complex sequence Sq, only presentation of Sq induces the maximum activity on the detector unit. The threshold Fq in (3.11) can also be set similarly during the first training trial of the sequence. This conclusion guarantees that the model is able to recognize any complex sequence, with no more than m repetitions of a symbol, with time warp. As a demonstration, we simulated the above model for recognizing the sequence S j . The attentional learning rule is used for complex sequence learning as before. During a training trial, each component had a presentation interval generated by a ranged random number generator. For Sj, the generated interval series was {9, 3, 6, 9, 5, 9,1, 3, 6}. After the detector had learned the sequence, it was tested with another presentation of Sj with interval series {4, 9, 4, 5, 8, 5, 4, 5, 3} similarly generated. The detector correctly recognized the test sequence. Fig.3.3b shows the simulation process for learning and recognizing sequence Sj. The above model can also handle, with a straightforward extension, temporal sequences that contain form distortions, like erroneous symbols or reversed orders. In order to accomodate this feature, it is necessary for the previous threshold value of the detector set by Eq.3.9 to be lowered a little such that the detector can also be triggered by a sequence similar to the learned sequence. According to (3.11), the detector measures the similarity between the learned sequence Sq and an arbitrary sequence Sa by comparing the difference between the system-set threshold value IPq in (3.7) and the input potential (IPa) induced by presentation of Sa. It is difficult to formulate the difference. To shed some light on how different Sa is from Sq, let us assume Sa = paf P a2 ~ -- ~Pai• Repeated training with the complex sequence Sq: Poj-Po2 '---'POf[ leads to a polarized weight distribution of the detector unit: 2(T-K+l)/[K(2T-K+l)], 2(T-K+2)/[K(2T-K+l)], ..., 2T/[K(2T-K+1)], and all others zero. Those linearly increasing weights correspond with 60 Figure 3.3 a An expanded unit model for complex sequence recognition. A unit has multiple terminals which make contacts with other units. Xjj holds a trace of the most recent external input to unit j, Xj2 the previous one, and so on, to a maximum of r such occurrences, b Recognition of the complex sequence Sj'.A- B -A -C -A -B -E -B -D with time-warping. See the legend of Fig.2b for understanding the plot. The generated presentation interval series was {9, 3, 6, 9, 5,9, 7, 3, 6} for the sequence. After 6 training trials, unit 0 learned the sequence, i.e., could be activated by another presentation of the sequence. After that, the same sequence was again presented with the different interval series {4, 9, 4, 5, 8, 5, 4, 5, 3}, and the detector recognized the time-warped sequence. In the figure, only the last two training-test cycles are shown with the time-warping test. The parameters are: n = 10, m = 5, C, = 0.02 (i = 1,..., 10), T = 10. C fl M © o 63 > training cycle test cycle training cycle test cycle test cycle o\ 62 P0 p P 0 2 ’ POk respectively (see 1° above), and the pathways that these non-zero weights correspond to are called real projections. Immediately after the presentation of Sa, the excitation levels of a set of units stimulated by Sa are T-L+l, T-L+2,..., T. The similarity between Sa and Sq depends on how many of the stimulated units by Sa can pass through real projections to contribute to IPa, and how much those ordered units triggered by Sa matches those triggered by Sq. As a particular example, when Sa = Sq, the weight vector of the detector after training with Sq parallels the activity vector induced by presentation of Sa , and thus according to the Cauchy-Schwartz inequality (see page 30 of chapter 2) IPa reaches its maximum value which is equal to the threshold of the unit. There is always a tradeoff between tolerance and precision. After lowering of the thresholds of detectors, a specific sequence may trigger more than one detector, and a detector may be activated by more than one sequence, if we assume there are many detector units in the system for recognizing different sequences. Again a competitive network can help single out a detector which is trained by a sequence most similar to Sa. One question not yet addressed with the recognition model concerns the length of a sequence which can be recognized. We have a hidden hypothesis when we develop the model, that is, the length of the sequence K should not be larger than the capacity T of the STM model. When the length of Sq in question is larger than T, the above model only pays attention to and learns and recognizes the end subsequence pq t+j - pqk t + 2 Pqk of Sq. A s we know (Miller, 1956) human STM has a very limited capacity (7±2), even though T can be set freely in engineering applications. But the above model is not sufficient as a cognitive model, since humans can memorize and recognize sequences much longer than ones directly limited by T. The next section addresses this problem. 63 III.4 Hierarchical Sequence Recognition Given the severe capacity limitations of STM, one method of reducing these limitations and so expanding our capacities is by chunking (Miller, 1956). Extension to sequence learning and sequential integration is an obvious application of the chunking notion. An example of chunking in sequence learning is the hierarchical organization of language. There is a series of hierarchies of sequential organization: the sequence of letters in the word, the sequence of words in a sentence, the sequence of sentences in the paragraph, the sequence of paragraphs in a discourse. Not only language, but all skilled actions seem to involve the same kind of hierarchical organization (Lashley, 1951; Arbib, 1990). Our hierarchical sequence recognition model, based on the chunking notion, consists of a cascade of layers of units. Units in layer i fully project to those in layer i+1, and each layer by itself is an STM model that is a fully connected network as shown in Fig.3.1. The whole network is feedforward, and projections from a lower layer to a unit in the next higher layer are exactly like full projections of the units in the STM model to a sequence detector. This connection architecture is shown in Fig.3.4 with an example sentence, S2 ■ "complex temporal sequence learning based on short term memory". Three layers shown in the figure are the letter layer, the word layer, and the sentence layer.* * Hierarchical structure in language is, of course, more subtle than "crude chunking" since the "chunk" is based on syntax and semantics, rather than on a setting of some T. Thus, for example, a sentence is not represented directly as a string of words, but rather as a string of strings of strings ... corresponding to a syntactic/semantic parse tree for the sentence. In particular, the strict separation of levels adopted here must give way to a more flexible format that allows recursive specification of linguistic entities. It is beyond the scope of this dissertation to address such aspects, let alone the crosslinks for cross-reference that tum the tree into a more general graph structure. We want simply to note that our theory of chunking may play a crucial role in later studies of connectionist approaches to language. Language understanding is a very complex issue, and involves many other processes like long-term memory, communication, and so on, but STM undoubtedly plays a critical role in it (Carpenter & Just, 1989). 64 SENTENCES complex temporal sequence learning based on short term memory Figure 3.4. Architecture of hierarchical sequence recognition. Different layers are connected in a feedforward manner from bottom to top. The bottom layer is the input layer, and others are detector layers of different levels. The letters and words symbolize different units in the layers. Let jcj represent the excitation level of terminal r of unit i in layer /, and sl - the internal state of the unit. The weight of the connection from terminal r of unit j in layer I to unit i in layer l+l is represented by v / r .. The dynamics of x- and s\ are the same as I J l r I before (Eqs.3.10 and 3.11), and modification of is also the same (Eq.3.12). The first layer directly interacts with the external environment. Therefore sj is driven by external inputs. All other layers organize information from the basic input perceived by layer 1. All units in higher layers are sequence detectors, thus different layers detect different levels of information from an external input sequence. In Fig.3.4, for instance, layer 1 detects individual letters, layer 2 detects individual words composed of sequences of letters, layer 3 detects individual sentences composed of sequences of words, and so on. 65 The higher a layer is in the architecture, the higher is the level of input hierarchy that a unit in the layer can detect and the longer is an input sequence that the unit can recognize. During training, units in higher layers are activated by the attentional learning rule. Previously, the attentional rule was only applied to a single detector, but now there are many detector units in a higher layer. Another attribute of the definition of attentional learning is that the internal activation of detectors must be sequential. This requirement is consistent with a basic property of attention. A unit in any higher layer has exactly the same model as a unit in the first layer. Therefore when attention shifts from a unit to another in a higher layer, activation thus triggered will drive the excitation levels of all other active units of the same layer to the next lower level due to mutual inhibitory connections. In other words, a higher layer forms its own STM due to sequential shifts of attentional activation. The STM model in a higher layer operates in the same way as one in the first layer, but with a larger time scale. So different time scales are formed automatically in different layers. This also explains why units in higher layers can recognize longer sequences. How is attention allocated when there are different detectors in different layers? The correct order of attentional shift should be from lower layer to higher layer, because before layer / has been trained for recognition there is nothing to attend to for layer i+1. Taking Fig.3.4 as an example, words in the sentence have to be attended to and learned before a unit in the sentence layer is attended to. As a result, a detector in a higher layer needs a longer time to learn. This is reasonable because the detector usually learns and recognizes a longer and more complicated sequence. To learn the sequence S 2 , for example, the fastest possible way would have two stages. The first stage would present the sequence repeatedly until detectors in the word layer have learned each individual word. Suppose this stage takes X trials, depending on the value of Q in (3.12). After 6 6 this stage is finished, that is, presentation of the sequence alone can activate each word detector in the word layer, the next stage would be to learn the sentence which is the ordered words. This stage takes another X training trials. So all together it would take at least IX. A next question is when attention should be paid, or when a detector should be activated by the system? There is no general rule to that except that attention is activated at the end of presentation of a subsequence. In written English, for example, attention to words can be prompted by word separators like blank, comma etc., and attention to sentences can be prompted by sentence separators like period, semicolon, and so on. In speech, attention promptors could be sharp transitions between vocal movements, pauses between words, etc., though the difficulty of segmenting normal "running” speech poses problems, beyond those for "well-defined" sequences in written text or slow well- enunciated speech, that are not addressed here. Existence of these separators between sequences is not limited to language. Actually, it is because of the existence of these separators that one can speak of the hierarchical organization of temporal sequences. A computer simulation of the model is partly shown in Fig.3.5 for recognizing sequence S2 . The first layer contains units for representing the basic symbols, the 26 English letters. The second layer contains word detectors, among which the 10 words in S2 are represented. The third layer contains sentence detectors, among which the sentence S2 is represented. During training, the sequence was repeatedly presented, and attention was allocated according to the strategy described above. That is, words are attended and learned first, and the sentence is attended to after the words have been learned. The model took 12 training trials to learn the sequence. Interval invariance is automatically achieved, because it is an intrinsic property of the recognition model defined in Section 3. As in previous simulations, the interval of each component presentation was generated by a 67 ranged random number generator, and a test was conducted using a different sequence of intervals after the model had learned S2 . Different time scales are clearly exhibited in the figure if letter units are compared with word units. The capacity parameter T has been set to 10, yet the length of sequences that the model can recognize is not limited by T. In the above simulations, for example, the length of S 2 is 53. The length of sequences that a hierarchical model can learn and recognize increases exponentially with the number of layers in the model. Say T equals 10, the maximum length of a sequence leamable from units in the third layer is 100, from units in the fourth layer 1000, and so on. As noted before, from the engineering perspective, if we do not constrain the value of T long sequences can also be learned and recognized without resort to the hierarchical architecture. But it should also be noted that after the model learns S2 , it is able not only to recognize the whole sentence, but also to recognize individual words in the sentence independently. So more is learned compared to the non- hierarchical scheme presented in the last section. Learning long sequences by increasing T seems like rote learning, whereas learning them by a hierarchical architecture needs to break long sequences down into leamable smaller parts and seems to involve a measure of understanding. III.5 Sequence Reproduction with Interval M aintenance As mentioned in Section II. 1, various solutions have been proposed for reproducing simple sequences, the main idea being to store transitions between each pair of consecutive patterns. Chapter 2 offers a further model for complex sequence reproduction based on the learning mechanism for complex sequence recognition and the separation of detector units from symbol units. With this same scheme for dealing with 6 8 Figure 3.5. A computer simulation of the hierarchical sequence learning model for recognizing S2 : "complex temporal sequence learning based on short term memory". Only one training cycle is shown in the figure for clarity. During training, the presentation interval series for S2 was randomly generated. After 12 training trials, the model learned to recognize S2 . W e only show the external levels of the units, without displaying their multiple terminals. For each letter unit, we see that it is activated to its maximum when its letter appears, then decrements as each subsequent letter is introduced, resetting to zero when the word separator is encountered. During training, a unit for a given word was activated at the end of presentation of that word (the attentional learning rule) and thus learned the sequential letter structure of that word on the basis of the letter units active at that time. Once the constituent words had been learned, the same mechanism was applied one level further up the hierarchy to train the sentence unit to recognize the given sequential order of the words in S2 . In the simulation, all units have 3 terminals, C; = 0.3 for all units in different layers, and T = 10. 69 a b c d e f g h i j k 1 m n o P q r s t u v w X y z complex temporal sequence learning based on short term memory Sentence n. j m . n n n a m . n n r - i n rd m J D l r — i n n rn. Q d n . a V _2 n n n H •§ •3 S B * 70 complex sequences, the present model attempts to propose a solution for the time-warp problem with reproduction. Although interval invariance is desired for sequence recognition, sequence reproduction requires an opposite solution: interval maintenance. A dynamic tuning mechanism is also presented for degree self-organization of detector units. The structure of the model for sequence reproduction has two layers, as shown in Figure 3.6. Layer £ is called the input layer, which basically serves as a STM model as shown in Fig.3.1. One unit in this layer is dedicated to represent one symbol even though the symbol may have multiple occurrences as in a complex sequence, so different units represent different symbols in layer £. Units in layer £ function as sequence detectors as described in Section 3, and there is a global inhibitor within this layer (See the footnote on page 49). These units recognize the contexts of individual components in a sequence, and anticipate the occurrence of these components. Layer £ connects with layer £ bidirectionally, and before training connections between them are complete. The projections shown in Fig.3.6 depict what results from training, such that unit i in layer £ receives projections only from those units in C , that represent symbols in the context detected by unit i, and unit j in layer £ only receives input from units in £ that anticipate the occurrence of the symbol represented by unit j. This connection pattern is formed through learning. During the training process, a sequence with various component intervals is presented to layer £. At the end of each component presentation, a unit in layer t, is randomly selected (but fixed in successive trainings, see the footnote on page ***) to fire. That is, training of units in layer £ follows the attentional learning rule. The recurrent connections from layer | to layer f are formed according to a Hebbian rule as following. If unit i in layer £ (recorded as <i, £>) and unit j in layer t, (recorded as <j, £>) are firing simultaneously then a connection link from </', £> to <i, £> is established, and its weight 71 is denoted as w f f which will be defined later. All connection weights from units in £ to those in £ are initially zero. Layer \ (detector layer) Layer £ (input layer) Figure 3.6 Architecture for complex sequence reproduction. Within layer £ (the input layer), every unit inhibits every other one to form an STM model as shown in Fig.3.1. Within layer | (the detector layer), all units project to a global inhibitor which further projects back to them. At the beginning, the connections between layer £ and layer £ are all-to-all correspondence. The appropriate connection pattern between them for reproduction will emerge after repetitive training with temporal sequences. Plus signs indicate excitation, and minus signs indicate inhibition. D egree self-organization The global inhibitor in layer £ receives input from all units in the layer and projects back to them. A degree parameter d; is introduced for <i, £>, and it affects the dynamics of the internal state of <i, < % > in the following way (cf. Eq.3.11) 72 n m sf(t) = /( £ £ wru m /t-i), di>+ -rf) (3.i3) H r~l i / » f x if x > T - y h(x, y) | o otherwise (3-14) where label £ in (3.13) indicates layer xjr is the excitation level of the rth terminal of unit <j, £>, and W *.. represents the connection weight from the rth terminal of </, £> to <i, £>. Symbols n and m stand for the number of units and the number of terminals for each unit respectively in layer The domain of is {1, 2, T}. Through function h(x, y) the role of dj is to gate in certain excitation levels of units in layer £. For instance, if di = 1, then only when xjr equals T can unit </, £> affect <i, £>. That is, if <i, £> has degree 1, it can only sense the most recent item occuring in layer £. Obviously, the larger is d;, the more items can <i, < fj> sense from layer £. The formulations of and H* are l J 1 modified accordingly % f t ) = w y t - l ) + Ci s\(t) h(xJr<t), di) _ " m ^ (3.15) I w r < t ) = w T jd ), £ £ v / m ^ M r=J d j i l T - d j + l ) § ( T ~ d i + 1) 2 (3.16) 73 Let the activity of the global inhibitor of layer £ be represented by z, and q represent the number of units in layer £ Variable z is defined as < ? z ( t ) = f & s \ ( t - l ) - 2 ) (3.17) i=l 1 and therefore the inhibitor will be activated if there is more than one unit firing simultaneously in layer £. According to (3.13), the internal state s^(t) can be triggered either by system attention through if(t-l) or by input signals from layer £. The latter is called anticipation. What the inhibitor actually does is to detect conflicts among those detectors in layer < f j . Since system attention is always sequential, the inhibitor can only be activated by conflicting attention and anticipation or just by conflicting anticipation from the detector layer. Degree df (i = 1, ..., q) is initially set to 1. Self organization of d -t is done according to dt(t) = di(t-l) + 1 if sf(t-l) = 1, z(t) = 1, di(t-l)< T (3.18) that is, the degree of <i, £> increments if this unit together with other units causes activation of the global inhibitor. If the degree of <i, £> increments, there will be one more unit from the input layer that can be sensed by <i, £>. Thus the previously learned weight distribution to the unit (see Eq. 3.15) will have to change its direction of distribution. In the situation, the model re-initiates the weight distribution to <i, £> and threshold r f is also modified according to (3.16) based on the new value of dj. From (3.13), (3.14) and (3.15), it is clear that if d^t) grew larger than T, the STM capacity of 74 layer £ it would be equivalent to d[( t) = T in the dynamics of the internal state and weight distribution of <i, £>. That is why dtft) has an upper limit of T. Value T consequently limits the degree of a sequence to be reproduced. A computer simulation of the model was conducted for reproducing a complex sequence S 3 : J-B-A-C-D-A-B-A-E-F-A-B-A-G-H-A-B-A-H-I. Learning a complex sequence is slower than learning a simple sequence, because the complex sequence needs to dynamically increase the degrees of certain detectors, and each time such self organization is done earlier training of those detectors is discarded. Roughly speaking, time required for training increases linearly with the degree of a sequence. It took 18 training trials before the model learned to reproduce S 3 , whereas 6 trials sufficed to reproduce a simple sequence. Due to the training scheme, the number q must not be less than the length of the sequence minus 1. For S3 of length 20, 19 units were selected in layer % and trained to anticipate the second to the last component of S3 respectively. The degree vector acquired by the self-organization mechanism is {1, 2, 3, 1, 1, 2, 3, 4, 1, 1, 2, 3, 4, 1, 2, 2, 3, 4, 2} for those detectors. The ninth component E, for example, must memorize the 4 prior components D-A-B-A in order to be generated; the second component B, however, only needs to memorize the previous component / . In the sequence A-B-C-A-B-D-A-B-E, it might be argued that symbol B does not need to memorize 2 prior components, as produced by the above algorithm, but one prior component since symbol B is always preceded by A. However the result produced by the algorithm is justified if we generally allow each component being presented for a different interval. In this situation different A's preceding symbol B may have different time intervals in presentation, and therefore are, strictly speaking, different. The above neural algorithm optimally identifies the amount of context required to reproduce any complex temporal sequence unambiguously. The context degree vector 75 reveals many properties of the sequence being reproduced. For example, the degree vector produced with S3 reflects, among other things, whether a component is preceded by a single component or by a recurring subsequence, and where a recurring subsequence starts and ends in the sequence. We believe that this kind of information is important in self-detection of recurring subsequences and generalization of a temporal structure from many sequences. These open issues are critically important for further studies of temporal order. The same problem of finding the minimum amount of context has been studied by Kohonen (1987) for producing unambiguous inference rules in sequence generation. The proposed solution, termed dynamic expanding context, relies on explicit rules for resolving inference conflicts. The right hand side of an inference rule is a symbol in a sequence and the left hand side of the rule consists of the context of the symbol. All left hand sides are initially set to the predecessors of the right hand side symbols, and later repetitive scanning will expand the left hand sides as necessary for resolving conflicts. All rules are stored in a table, and a significant amount of table searching is required by the system. The method has been applied to speech recognition and music generation (Kohonen, 1987; 1989). A basic difference of our proposal is that we do not resort to any external rules. Units representing symbols and detectors in our model are connected in a neuron-like manner, and communication among units is typically neural. Thus information is distributed over units and connections, and sequence processing is parallel. High-level operations, like table lookup or memory search, are avoided in the system. With little modification, our model can apply to those application domains explored by Kohonen. 76 Interval maintenance In our model, the interval length of a component presentation is the time period during which the external input E of the unit corresponding to that component is equal to 1. This is equivalent to the period when the excitation level of the unit equals T. In the above model of sequence reproduction, a unit in layer £ detects the onset of the context of a component in order to trigger that component in the reproduction process. In sequence S3 above, for example, there is a detector in layer £ that is trained to detect the context D- A-B-A and to anticipate the onset of symbol E. According to the model, after training this detector is activated just one time step after the second A starts to occur (see Eq.3.13). But E should not be triggered until the whole interval of A occurrence has elapsed. The idea for interval maintenance is to code intervals by connection weights from the detector layer to the input layer. Since the backward projections from layer £ to £ provide many- to-one correspondence, an interval can be simply coded by a backward connection weight such that temporal integration of the entire interval is required to trigger the next component. Due to the introduction of backward projections from layer £ to layer £, the previous internal state of unit i in layer £ is now defined as (cf. Eq.3.1) r 1 if Ei(t)=l, Ei(t-1)=Q otherwise (3.19) where Ej(t) is a cumulative activity of unit <j, £>. Suppose that during training, each presentation of a sequence has the same interval series, then eJ is defined as 77 t j X 4 ( f ) if si(t) = 1 Efft) - j w 7 ' J' (3.20) - 0 otherwise where tj is the start of the period during which unit </, £> is consecutively activated till £ time t. Note that the temporal integration EJ is easy to compute locally and recursively. At the end of this consecutively active period t2 , Ej(t) =t 2 ~ tl- Training of the backward projections is defined by the following Hebbian rule A _ —I — if e% ( h - H j “ ’ " j if E)(t)>0, s)(t-l)= 1, Si(t-1)=0, Si(t)= 1 rs l m - 1 ) ‘ 2 ~‘I 1 1 W^9(t) =< J (3.21) y The condition that E*j(t) > 0, sj(t-l)=1, Si(t-1)=0, and Si(t)=1 holds iff the detector of unit <j, % > precedes the onset of the next symbol represented by <i, £> in the sequence. This time instant is the same as ^ • In conclusion, the time interval of a symbol presentation is coded as the reciprocal of the corresponding connection weight. In general, one interval series of presentation may be different from another one. In order to cope with this situation, instead of storing one interval directly in a weight, two parameters are stored in the connection terminal, one is an average {jl of different training intervals and another is a deviation cr 2 of training intervals. During reproduction of a sequence, a Gaussian number is generated based on ju and o2, which has the same function as t2 ~t\ in (3.21). Each generated interval will also modify /i and cr2 like a presentation interval. Therefore, learning is a process of forming jj. and < T 2. Let e; represent the interval of the ith presentation of a symbol. Two factors are taken into 78 consideration in forming fi and cr2. First, each interval should contribute a certain amount. This is called the averaging factor. Second, a recent interval should have more impact than a remote one. This is called the recency factor. These two factors are embodied in the following learning rules. I 1 €l o n (3-22> W k + l = ( 1 - /* ) v-k + p e k + l where J 3 is the recency parameter ranging between 0 and 1 , which ensures that, except the first interval, the most recent interval has a constant contribution regardless of the presentation history. Expanding the above formula, we have »k = P e k + p (l-p ) ek_! + p ( l-p ) 2 g jc 2 + ... + p ( l - p ) ^ e2 + (1 ej (3.23) where fi + P (l-p) + p ( l- p ) 2 + ... + P (\-p ) k ' 2 + (1 — = 1 , so that the definition of Hk is still a type of averaging. From (3.23) it is clear how each interval contributes to the overall average, with the more recent an interval is, the greater its effect. If we view the above weighted formulation of jn^ as the average from samples ej, e2, taken with frequencies f j = (1 — j$)k~ l ,f 2 = p (\— p)k~ 2, ...,/fc = P, respectively, we can define by the formula 4 = 1 ^1 lL fi(ei-mk )2 i=l k 7 k K 1 i=i i=i 79 = ((I-#*'1 e] + IXl-p)*-2 4 + - + P 4 - ft [O -/?)4 '1 «i + P O -P P 2 e2 + ...+ P eki } (3.24) and j = {(l-p ) k ~2 e2 j + + ••• + P - Vlc-1 [(1 - P ) k ' 2 ei + £(l-/})*-3 <?2 + ... + p «w ] } (3.25) which yields the following recurrence learning rule for the deviation = 0 so that = 0 , if ej = ... = e£. With the learning rule of (3.22) and (3.26), interval maintenance defined above is thus achieved. This model's ability should again be attributed to separation of context detection in layer £ and symbol presentation in layer £. Because of this separation, a unique link can be established from the detection layer to the input layer, and this link is able to carry interval information without confusion even when allowing complex sequences. A computer simulation of the model was conducted to reproduce the complex sequence S j. During training, the interval of each symbol was initially generated by a ranged random number generator, but fixed in subsequent training trials for simplicity. As previously stated, the model took 18 training trials to learn the sequence. The number 80 of trials is decided by the requirement of degree self-organization and the gain factor Cf of learning (Eq. 3.15). Let us define the initial context of a sequence as the beginning subsequence required to uniquely determine the rest of the sequence. After learning, the entire sequence with various interval lengths was able to be reproduced by presentation of its initial context, subsequence J in this case. Figure 3.7 presents the simulation result, which contains a temporal course of the last training trial together with the reproduction process. For parameters see the legend of the figure. Since in this simulation the speed of presentation is the same from one trial to another, the acquired deviation for every link interval is zero. Therefore the time course of the sequence is faithfully preserved in reproduction. While the time course of a sequence can be reproduced by the model, the overall speed of reproduction can be easily controlled with a global velocity tuning agent which projects to all synapses from layer £ to layer £ in the form of presynaptic synapse. The velocity agent can scale all jj.'s (averages), thus implementing the "scaling effect" of sequence generation. The scaling effect is often seen when musicians are learning a new piece: they practice it at a slow pace to get relative timing, and then play it faster and faster. III.6 D iscussion On complex temporal sequences A basic feature of this model is to cope directly with complex temporal sequences, considering simple sequences as a specific case, whereas many other models do it the other way around. Complex sequences, in fact, are indispensable for almost every kind of natural temporal behaviors, in reading, writing, speech production, music generation, 81 A 6 C D E F G H I J 0 1) Figure 3.7. Reproduction of the complex sequence Sf. J-B-A-C-D-A-B-A-E-F- A-B-A-G-H-A-B-A-H-I. The interval series {9, 3, 6 , 9, 5, 9, 7, 3, 6 , 4, 9, 4, 5, 8 , 5, 4, 5, 3, 7, 8 } was first randomly generated, and fixed in subsequent training trials. The model took 18 training trials before it was able to reproduce the entire sequence with presentation of S j's initial context: J . Only the last training cycle and the reproduction cycle are plotted. Note that not only the order but also the time intervals of the sequence were reproduced. All units in layer £ have 3 terminals, and C, = 0.3. The other parameters are (5 - 0.3, and T = 7. 82 skilled motor behaviors, and so on. Processing of complex sequences must be achieved before any neural network model can be applied to solving those problems of temporal order. Because of the necessity, some methods have been introduced for handling the issue. In the back propagation approach (Jordan, 1986; Doya & Yoshizawa, 1989; Elman, 1990), when a state is fed back either from the output layer or the hidden layer, certain information about history is preserved, and it was suggested that this feedback information can be used for the disambiguation required in the complex sequence situation. In the Jordan model (Jordan, 1986), for example, the state that is used as input to generate the next component is coded as temporal summation of a number of previous components in the sequence. Since the entire previous subsequence is coded by a single state, it is unwarranted that different subsequences can be uniquely recorded. Also, in order to let the same recurring symbol appear in the output layer, it is possible that dissimilar inputs* would have to learn to yield the same output, while similar inputs would have to learn to yield different outputs. Again, the attempt to code the history as a single vector poses the severe problem of ambiguity in the situation of reproducing complex sequences. In our model, however, a previous history is distributed among different units, each of which maintains its own activation over a variable amount of time, depending on further inputs to the STM model. Disambiguation thus can be ensured. The use of hierarchies has been suggested for helping reproducing a complex sequence in a backpropagation architecture, and thus will be commented on later. In the approach based on spin-like neurons (Hopfield, 1982; Amit, 1989), recognition and reproduction of complex sequences have been studied by a number of authors, including Tank and Hopfield (1987), Dehaene et al. (1987), Guyon et al. (1989) * Similarity here is measured by the Hamming distance, i.e. the number of different bits in two matrices. 83 and Kuhn et al. (1989). The Tank and Hopfield model relies on a set of patterned delays to program the recognition circuit. However, no mechanisms have been proposed for how to acquire and maintain these delays. The selection model (Dehaene et al., 1987) uses high-order synapses, synaptic triads, for coding basic temporal order. To learn a sequence, high-order connections are randomly made and a desired architecture can be selected by an input sequence — simple or complex — according to the authors. Besides immense connections required to learn and reproduce a reasonably long sequence, it does not appear from the model that reproduction of an arbitrary complex sequence is guaranteed. The use of high-order synapses is also the key to the model by Guyon et al. (1989), where the order of the synapse has to be made at least equal to the degree of the sequence in question. The system overhead due to the number of connections caused by introducing high-order synapses becomes a serious concern. A different method has been taken in the model by Kuhn et al. (1989) for complex sequence reproduction. They focus on one type of complex sequence, that is, a sequence which contains only one recurring subsequence that itself is a simple sequence. A number of specific measures were taken to solve this kind of sequence generation problem, such as using two time scales for local and remote associations. This type of sequence belongs to the so called first-order complex sequence, which can be reproduced with a direct extension to the one layer network for sequence recognition (Chapter 2 ). To link remote components in a sequence, the present model does not resort to high-order synapses, but instead introduces multiple terminals for each unit. The number of connections thus caused is a constant multiple (m) of those needed in a conventional network. In the high-order synapse scheme, on the other hand, the number of connections needed is D orders of magnitude higher, where D is the degree of the sequence. 84 In most of the neural network models for sequence processing, distortion problems have not been addressed (Hopfield & Tank, 1987, is an exception) , particularly generation of a sequence with different component intervals. The present theory provides a solution to distortions in both complex sequence recognition and reproduction. In reproduction, a different presentation speed of a sequence is allowed for each training trial and sequence reproduction does not generate a rigid time course, but rather is random within a certain range circumscribed by the recency and averaging factors. The ability to handle the time-warp problem and the erroneous symbol problem in the domain of complex sequences represents a significant step forward to processing temporal order by the neural network approach. H ierarchies In continuation to the above discussion, hierarchies have been proposed as a way to cope with complex sequences (Doya & Yoshizawa, 1990; Jennings & Keele, 1990; Jordan, 1990). A simple subsequence at some level is coded as a single symbol in the next higher level. During reproduction, components in higher levels are generated at slower time scales, allowing time for generating lower subsequences corresponding to these components. Within each level, only a simple sequence needs to be reproduced by a back propagation network. One obvious problem is that parsing of a sequence into different levels has to be provided externally by the designer in these networks, since the back propagation networks have not been shown to be able to self organize an elementary sequence into various hierarchies. As pointed out before, complex sequences are ubiquitous. If the basic network can handle only simple sequences, a great deal of parsing would be required before the network models could be used. 85 The idea of employing hierarchies is used differently in our model from their proposals. The motivation behind our proposal is to overcome the limitation of capacity of STM. Hierarchies are not required for processing complex sequences, since this is a basic capability of the model. For instance, the English word "efficiency" would require several hierarchies to be formed in the proposed back propagation models, and several ways exist to organize it into different hierarchies that contain only simple subsequences, and no solid reasons seem to favor one parsing scheme while rejecting others. This word would be naturally handled as a single entity in our model. Since the STM capacity limits human temporal order processing and chunking is the basic means for humans to organize temporal information, we therefore can largely rely on the natural delimitors when we use the present model to hierarchically process long and complicated sequences arising from natural temporal behaviors. Using performance measures, Nissen and Bullemer (1987) have demonstrated that attention is required for subjects (humans) to learn to reproduce a temporal sequence of symbols. Under distraction with dual-task conditions, acquisition of the sequence was minimal. The sequence used in the investigation was S j : D-B-C-A-C-B-D-C-B-A, a complex one. A more detailed study was done recently by Cohen et al. (1990) using the same experimental technique. They studied three example sequences which can be symbolized as S 2 : A-E-B-D-C, S 3 : A-D-C-A-C-B, and S 4 : A-C-B-C-A-B, and are classified as unique, hybrid and ambiguous sequences respectively. Their experimental results show that the unique and hybrid sequences can be learned by subjects under attentional distraction, but the ambiguous sequence is much more difficult to acquire under the same attentional distraction. Interestingly, they suggest that ambiguous sequences involve hierarchical representation and thus require attention. From the present model, we would like to offer a different explanation of their data. More attention is 86 required to learn and reproduce complex sequences (S j , S 3 and S 4 ) than simple sequences (S2 ) because degree self-organization is required in layer £ when reproducing a complex sequence. The model further predicts that higher degree complex sequences are more difficult to acquire than lower degree complex sequences. For example, Sj has degree 3 and would be more difficult to learn than S3 which has degree 2. This is because higher degree sequences need more levels of self-organization according to the present model than lower degree ones. As for separation of S3 and S 4 , the argument by the authors is rather confusing, because both of them would require hierarchical representation and therefore attention. But they emphasiz that S3 has some unique associations, whereas S 4 does not. It is difficult to link the proposal of hierarchical representation and local unique associations existing in a sequence. Why S3 should be classified as a distinct category than S4 is thus unclear. Much more sequences need to be experimented with for justifying such a classification. It is interesting to notice linkage between the attentional learning rule and attentional requirements in learning sequences by humans. Attention is perhaps also needed for learning a simple sequence, like S2 above. The difference revealed in acquiring simple sequences and complex sequences may suggest different amounts of attention required. Even under distraction with dual-task conditions, it is hard to say that attention is fully excluded in performing sequential tasks. In fact, from the revealed data curves (Nissen & Bullemer, 1987; Cohen et al., 1990), there is a tendency to acquire complex sequences even under the dual-task distraction. Although our demonstration of hierarchical representation of temporal sequences uses natural separators to form different levels, people also use other heuristics for chunking. For example, in the U.S., the 10-digit phone number 2137406991 is often parsed into three chunks: area code (213), then 3 digits (740), then 4 digits (6991). It is 87 difficult to generalize a rule of chunking a continuous sequence without background knowledge and the way it is heard. Another related issue is how attention is allocated in the situation of hierarchical sequence recognition. We call this the scheduling problem. In the simulation in Section IH.4, the scheduling simply followed the bottom-up strategy. That is, a sentence is attended (learned) only after its constituent words have been learned, and a word is attended only after its constituent letters have been learned. Scheduling will become more complicated if chunking of a sequence without explicit separators is required. Scheduling is related to selective attention, where a number of neural models have been developed (see among others Didday & Arbib, 1975; Koch & Ullman, 1985). An obvious extension to the present model is to have a competitive network model for self-organization of attention scheduling that can avoid having an external instructor teach the system. E fficien cy Depending on the value of the gain factor C,- (Eq.3.12), training for sequence recognition or reproduction usually takes from several to tens of trials before the model learns the task. By tuning the value of the gain factor, the speed of learning can be controlled externally. Even in hierarchical sequence recognition and complex sequence reproduction, the most time-consuming tasks, the speed only deteriorates linearly with the number of layers or the degree of a sequence. The number of training trials needed for the model is comparable with that for humans in performing similar tasks (Nissen & Bullemer, 1987; Cohen et al., 1990). Not only that, the present model exhibits remarkable computational advantages over other models. In back propagation models, training a network usually takes thousands or more trials (Rumelhart et al., 1986). This amount of training cannot, of course, be avoided for the models that use the back- 88 propagation algorithm for sequence reproduction. In spin-like network models, since associations are preprogrammed in the network, no training is involved in general. Even so, it usually takes a significant amount of time for the system to settle down to an equilibrium state of the dynamics. U n its The building blocks of the present theory are units, which can be thought of as local neuron populations. Many functions of a unit, like spatial summation of inputs (see Eq.3.11), temporal summation of a single connection (used for interval maintenance), connectional plasticity (see Eq.3.12), etc., resemble those of a single biological neuron. Yet some more functions are assumed. The most outstanding one is, perhaps, introduction of mutiple terminals for a single unit. These terminals could anatomically correspond to multiple neural fibres or many synaptic terminals possibly efferent from a neuron assembly. As discussed before, having different symbols represented by different units ("grandmother cells") is consistent with the concept of local neuron populations, which has been utilized in other situations (Buhmann & Schulten, 1987; Wang et al., 1990). Our results suggest an important computational function that could emerge from interactions among neuron assemblies. More distributed representations of the individual functional units should be possible. We have not gone into detailed neural circuitries for implementing units of local neuronal populations. A style of this implementation of local populations can be found in Buhmann and Schulten (1987). Neural oscillations might be able to provide an implementation of the excitation levels of units that only take discrete numbers (Freeman et al., 1988; Wang et al., 1990; Chapter 2). In this representation scheme, a neural oscillator would correspond to a unit, and the amplitude of an oscillation would 89 correspond to the excitation level of a unit. In chapter 2, discrete amplitudes are demonstrated by reciprocally connected units that damp autonomously in time, corresponding to the decay theory. This present model, however, would require replacement of the autonomous decay by the one driven externally by other units. Studies of delay intervals The recency factor of interval acquisition in the learning rule of (3.22) and (3.26) is supported by biological data. Kojima and Goldman-Rakic (1982) found that in performing delay tasks, a group of prefrontal neurons in monkeys displayed time- dependent firing patterns. In their experiments, delays of 2, 4 and 8 s were employed for training the monkeys to depress the hold keys until the delay period ended. By increasing the length of the delay, latency of firing activity and the position of firing peak were observed to readjust to the changes in the anticipated time of delayed response. Less direct evidence comes from studies of the interstimulus intervals (ISI) in classical conditioning. After repeated pairing of a conditioned stimulus (CS) and an unconditioned stimulus (US), the animal will develop the conditioned response (CR) after presentation of the CS, and the distribution of CR latencies centers near the ISI (Smith, 1965). In the rabbit’ s nictitating membrane response, it has been observed that when ISI shifts from a value to another, the CR latency originally conditioned to the first value shifts rapidly to the one corresponding to the second value (Leonard & Theios, 1967; Coleman & Gormezano, 1971; Hoehler & Thompson, 1980). This evidence is typical of the recency factor. In human eyelid conditioning, what Ebel and Prokasy (1963) observed from the CR latency distribution well conforms to our learning rule. As training with a fixed ISI progressed, the standard deviation of the CR decreased, and both mean and standard deviation of latency varied directiy with shifts in ISI. 90 More specifically, let us map the conditioning paradigm into training to generate a length 2 sequence: CS-US in the present model, with the major concern of how long the model takes to reproduce US (namely CR) after the onset of the first component CS. If the model is trained with ISI interval Atj first, and later it is trained with interval At2 , the learning rule of (3.22) and (3.26) predicts that the acquired interval by the model will shift gradually from Atj to due to the recency and averaging factors. This model phenomenon is strikingly similar to the data of Coleman and Gormezano (1971) from classical conditioning of the rabbit's nictitating membrane response, as summarized in Fig.3.8a. They studied the effect of ISI shifts by employing ISIs of 200 and 700 ms and three subsequent ISI shift conditions (fixed, gradual, and abrupt) with two directions of ISI shift (short to long and long to short). In the gradual shift condition, the intermediate ISI’s between 200 and 700 ms were used duing the shift training (the last 4 days of a 9 day period), while in the abrupt case, the animal was first trained in days 1-5 with one interval, and then trained in days 6-9 with another interval. Our model response with the learning rule (3.22) is shown in Fig.3.8b. Compared to Fig.3.8a, the model yields not only comparable quantitative results, but also similar time courses. In particular, both of the animal group and our model have linear shift for gradual conditions, and exponential shift under the abrupt condition. As a comparison, Fig.3.8c shows the results produced 1 k with a pure average model, where = r Z (cf. Eq.3.22). It is clear to see, without i=l the recency factor, how poor the results are. Furthermore, the learning rule (3.22) predicts that the amount of prior preparatory training does not affect later shift in mean CR peak latency. More specifically, in the Coleman and Gormezano experiment, we predict that the same shift occurs even with one, two, three, or four day prior training instead of 5 days in the original experiment. The similar gradual shift was observed in the 91 CR topographies (instantaneous CR amplitudes) in the direction of the ISI shift. A later observation by Hoehler and Thompson (1980) confirms the systematic changes in the CR topographies in the direction of the ISI shift. C ognitive Aspects The present model is based on the interference theory of forgetting, retroactive interference in particular (Waugh & Norman, 1965). Our work demonstrates that a drastic difference in computational power could be gained by adopting a different view from basic studies of cognitive science. The computational model of STM offered here represents a simplified view, and has not incorporated other characteristics like proactive interference and the similarity factor. Nonetheless, our theory presents first attempts to solve complex problems using basic cognitive models. Another cognitive source of the STM model is from Miller's work (Miller, 1956). The magic number seven plus or minus two is explicitly incorporated into the excitation level of a basic unit (parameter T). The technique to overcome the capacity of the STM model, i.e., using hierarchical representation of temporal sequences, is directly inspired by the chunking idea that reveals how humans process information (Miller, 1956; Simon, 1974). Different layers in the model (Fig.3.4) correspond to different levels of the hierarchical representation, and process different extents of temporal sequences. The hierarchical model of sequence recognition suggests that there should be a distinct STM within each layer, and thus different levels of STM that function rather independently. Different time scales are characteristic of different levels of STM, but each STM obeys the same description, like the capacity limit, interference and so on. This is a novel prediction of the theory for human behaviors of STM at the cognitive level. This prediction could be tested, for example, by allowing a subject to read or to listen to a 92 Figure 3.8 Data and model outputs for shifts in mean CR peak latency in blocks of 10 test trials (days) for the 200- and 700-ms. In the plots, ISI conditions are indicated by the labels for all six groups, with F for fixed, A for abrupt, and G for gradual. Each of the 9 days of training consisted of 90 paired CS-US trials and 10 CS-alone test trials, and beginning with the fifth trial of each 100-trial session, every tenth trial was a test trial. Each point represents the mean value of CR peak latency over 10 test trials within a day. Under the gradual condition, ISI incremented or decremented in steps of 25 ms after every twentieth trial; and for the abrupt condition, ISI was immediately shifted from one interval to the other throughout days 6-9. a. Experimental data. Each condition group contain 12 individuals, and the result is the average over the group (redrawn from Coleman & Gormezano, 1971). b. Model results. The same group of ISI conditions were used to yield the comparable results with the experimental preparation. In the simulation, f$ = 0.02. c. Simulation results from the pure averaging method (see text). Mean C R Peak Latency i n ms 93 900 800 700 600 500 400 300 200 1 0 0 0 o- Experimental Data -Q — □ J I I I I I J L 8 9 10 Simulation Results of the Model 900 800 700 600 500 400 300 200 1 0 0 0 1 2 1 0 — □— 200 F ------El---- ■ 200 F ----- O--- ■ 700 F ----- O----■ 700 F -----■----■ 200-700 A ■ 200-700 G -----* ---- 700-200 A -----A--- ■ 700-200 G Simulation Results from Averaging 900 800 700 600 500 400 300 200 100 0 1 2 3 4 6 7 9 1 0 5 D ays of T raining 94 piece of hierarchically organized material, and later on asking what the subject can identify or recall from different levels of hierarchies. O f course, our model represents just a much simplified view on the richness of hierarchical knowledge representation and the chunking theory. Different levels of hierarchies may not correspond to different physical levels of neural networks, and a mechanism of establishing multiple levels of hierarchies within a single level of network may be desired. How to recognize a long sequence, like one composed of hundreds, even thousands, of components? Two ways may be possible from the present theory. One is to utilize the hierarchical scheme described before. Because the number of elementary components in a sequence that can be recognized increases exponentially with the number of layers in the model, the hierarchical scheme offers a very effective method for recognizing long sequences. Yet another way is to make use of the process of sequence reproduction. Long sequences could be very simple (in terms of the degree of a sequence), and to reproduce them only needs to detect subsequences whose lengths are not larger than the degrees of the sequences, and may not involve recognition of long subsequences at all. In other words, the idea is to transform recognition o f long sequences into reproduction that requires only recognition of possibly much shorter subsequences. The price of that would be an extra comparison of the sequence reproduced by the model (mentally) with the one being presented externally. III.7 Conclusion The goal of this part of the dissertation is to explore mechanisms for temporal information processing. A unified theory is provided for learning, recognition, and reproduction of complex temporal sequences. The entire model is built upon units 95 corresponding to local neuron populations, and thus suggests a new level of modeling. Time intervals of sequence components do not affect recognition, but are preserved in reproduction. The present computational theory is inspired by cognitive studies not only in formation of the short-term memory model, but also in modeling of hierarchical information chunking. We demonstrate throughout this part of the thesis that complicated aspects of temporal order can be achieved by temporal linkage (local and remote) among different levels of sequence components, going much beyond what can be achieved by simple associative chaining as rejected by Lashley (1951). Meanwhile, we realize that many other problems with temporal order, like goal-directed planning, syntax formation, and hierarchy construction, remain largely untouched. However, we believe that the theoretical framework lays a sound ground for further study of temporal integration. 96 Part 2 A Neural Model for Stimulus Specific Habituation in Toads 97 C H A PT ER IV HIERARCHICAL PATTERN DISCRIMINATION IN TOADS: A COMPUTATIONAL MODEL Summary Behavioral experiments show that toads exhibit stimulus- and locus-specific habituation. Different worm-like stimuli that toads can discriminate at a certain visual location form a dishabituation hierarchy. What is the neural mechanism which underlies these behaviors? This chapter proposes that the toad discriminates visual objects based on temporal responses, and that discrimination is reflected in different average neuronal firing rates at some higher visual center, hypothetically anterior thalamus. This theory is developed through a large-scale neural simulation which includes the retina, the tectum and the anterior thalamus. The neural model based on this theory predicts that retinal R2 cells play a primary role in the discrimination via tectal small pear cells (SP) and R3 cells refine the feature analysis by inhibition. The simulation demonstrates that the retinal response to the trailing edge of a stimulus is as crucial for pattern discrimination as the response to the leading edge. New dishabituation hierarchies are predicted by this model by reversing contrast and shrinking stimulus size. 98 IV .l Biological Background As mentioned in chapter 1, after repeated presentation of the same prey dummy in their visual field, toads decrease the number of orienting responses toward the moving stimulus. This phenomenon, called habituation, has been extensively investigated in many species, ranging from invertebrates, like Aplysia (Kandel, 1976), where habituation seems to be independent of the specific patterning of the stimuli used, to mammals where habituation exhibits stimulus-specificity so that habituation to a certain stimulus pattern may be dishabituated by a different stimulus pattern (Thompson & Spencer, 1966; Sokolov, 1960; 1975). Visual habituation in toads has the following characteristics (for a review, see Ewert, 1984): 1. Locus specificity. After the habituation of an orienting response to a certain stimulus applied in a given location, the reponse can be released by the same stimulus applied at a different retinal locus (Eikmanns, 1955; Ewert & Ingle, 1971). 2. Hierarchical stimulus specificity. After habituation to one stimulus, the response may be restored by presentation of a different stimulus at the same location. It seems that only certain stimuli can dishabituate a previously habituated response. Experimental results (Ewert & Kehl, 1978) show that this dishabituation forms a hierarchy of stimulus patterns, as shown in Fig. 1.1, where a pattern in the hierarchy can dishabituate the habituated responses of the stimuli lower than or to right of it in the hierarchy, but not vice versa. The dishabituation hierarchy suggests that it is configurational cues of the stimulus and not only its "newness" which decide the toad's response. It is reasonable to assume that toads have not developed the advanced spatial shape recognition capability of higher anim als, but have developed the ability to recognize certain stimulus 99 configurations, which, for example, are used in discriminating prey and predator. However, our aim here is not to model such discrimination, but rather to investigate the neural mechanisms that might underly this dishabituation hierarchy, which has been dem onstrated so far only in behavioral experiments. For toads to exhibit the dishabituation hierarchy, there have to be differing representations of differentially habituatable shapes somewhere in their visual system. Physiological studies provide, however, few data on the response of the visual areas, such as the retina and the tectum, to a variety of relevant shapes (for reviews see Griisser & Griisser-Comehls, 1976; Ewert, 1984). In the Lara-Arbib model of stimulus-specific habituation behavior in the toad (Lara & Arbib, 1985), the discrimination of the stimuli in Fig. 1.1 is made by retinal ganglion cell type R2. In order to achieve this, the authors introduce a measurement of the convexity of a stimulus, and provide a group of ad hoc functions each of which is used to emulate how a specific stimulus traverses the excitatory receptive field (ERF) of R2. However, Lara and Arbib's measurement of convexity does not really reflect the convexity of an object. To avoid such problems, the present model is based on detailed modeling of the anuran retina (Teeters, 1989; Teeters & Arbib, 1991). Nonetheless, we do seek to reproduce one feature of the Lara-Arbib model, namely that height in the hierarchy corresponds to the strength of firing induced in some region of the brain. Our specific aim in this chapter is to develop a model for discriminating different worm-like stimuli which is able to simulate a class of cells whose average firing rate in response to the different stimulus types exhibits the same order as shown in the dishabituation hierarchy. We hypothesize that these cells lie in anterior thalamus, and thus suggest new physiological experiments to test our theory. The simulation of habituation and 100 dishabituation processes will be provided chapter 6 . An earlier version of this chapter appears in Wang and Arbib (1991a). IV.2 Distributed vs. Temporal Coding: Basic Hypothesis An object can be neurally coded by distributed activity in a group of neurons, or by temporal firing patterns of single cells. Here the former is described as distributed coding and the latter as temporal coding. Distributed coding is strongly favored by theoreticians due to considerations of reliability, although it seems that both are used in the object representation of primates (Gross et al., 1985). We can make the situation clearer by avoiding the suggestion of a strict dichotomy. In "purely distributed" coding, there is no single cell whose firing correlates strongly with the specific pattern being discriminated - only the firing of a population encodes that discriminand. In "purely temporal" coding, there is a unique cell ("a yellow Volkswagen detector") whose firing encodes the discriminand. However, data on the toad tectum (e.g., Ewert, 1987b) suggest a form of temporal coding which is also distributed in the sense that, for example, the firing of T5.2 cell signals the presence of a worm-like stimulus in its visual field (temporal coding), yet nearby T5.2 cells, having overlapping receptive fields, can encode the same stimulus if it appears nearby (thus yielding the redundancy and reliability of distributed coding). Our basic hypothesis, then, is that anurans represent objects by temporal coding, in this latter sense. More specifically, we assert that the firing rate of specific neurons in some neural center of the toad visual system is higher in response to a stimulus in the upper part of the hierarchy than to one in the lower part, without denying that many cells may exhibit highly similar temporal codes. 101 If the brain response to the difference between two patterns is based only on their Hamming distance (i.e. the number of differing bits in two matrices) then dishabituation would be symmetrical, i.e., if stimulus A can dishabituate stimulus B, then stimulus B should be able to dishabituate stimulus A as well. This might be the case in higher animals like mammals where the dishabituation could be accounted for by a comparator model (Sokolov, 1960; 1975), but this contradicts the observed hierarchy in toads (Ewert & Kehl, 1978). Moreover, the discrimination capability of toads is rather limited. In their original experiment, Ewert and Kehl did not find other worm configurations of the same length and height as those in Fig. 1.1 that could be discriminated (Ewert, personal communication, 1989). This limitation could be straightforwardly explained by the hypothesis of temporal coding because a frequency coding can only be markedly differentiated into a number of levels and therefore the capacity is severely limited compared to distributed coding. It could be that amphibians, a phylogenetically older species than mammals, have not yet achieved the advanced distributed coding which has immense potential capacity. Looked at from the other direction, however, amphibians do reach hierarchical stimulus-specificity which does not seem to be obtained in invertebrates (Kandel, 1976). A direct prediction of our basic hypothesis is that the dishabituation hierarchy is underlain by the different firing rates of certain neurons in the toad visual system. This major prediction will be explored in simulations presented in the following sections. In the experiment of Ewert and Kehl (1978), all moving objects are 20mm long and 5mm high, which corresponds to 16° and 4° visual angle respectively, from the viewing distance of 70mm. The dots which are added to the triangular objects (see Fig. 1.1) are 1 mm in diameter which is about 1°. Because all the moving objects have the same length and height, the critical cues are ( 1 ) leading edge (or the angle subtended by a leading 102 edge); (2) trailing edge; (3) isolated dot; (4) striped pattern. The following analysis will be made in terms of these cues. Before we go into the detailed information processing of the toad visual system at different levels, the general paradigm of the simulation is provided first. The model that we have developed in the following sections is tested by a large-scale computer simulation which incorporates the retina, the tectum and a novel array of cells which we hypothesize to lie in the anterior thalamus (the basis for this hypothesis will be presented below). The anatomy of the simulation is summarized in Fig.4.1. In the figure, conical projections represent on-center off-surround convergence, while the cylindrical projection from the R2 layer to the small pear cell (SP) layer represents a 1 to 1 mapping. The connections from the receptor layer to both the depolarizing bipolar cell (BD) layer and the hyperpolarizing bipolar cell (BH) layer also constitute a small many-to-one convergence. The receptor layer contains 140x140 cells which correspond to a 70°x70° visual field. Bipolar and amacrine cell layers (ATD: on-channel, ATH: off-channel) consist of 140x140 cells respectively, in correspondence with the receptor layer. Three types of ganglion cells, R2, R3 and R4, have been modeled, each consisting of 25x25 cells which correspond to a 70°x70° visual field since the ganglion cells have 20° RF and lie 2° apart. The R2 layer projects to the SP layer in the tectum, and the SP layer and R3 layer together converge on the AT layer in the anterior thalamus, where the worm-like pattern discrimination is finally achieved. The entire simulation contains about 100,000 cells. Bitmap stimuli are used. 103 Receptor Layer BD layer BH layer ATH layer ATD layer R3 layer R4 layer R2 layer + : excitatory - : inhibitory SP layer AT layer Figure 4.1 Diagram of the entire model used in this simulation project. The retina, the tectum and the anterior thalamus have been incorporated in the model. For explanation see text. 104 IV.3 Model of Retinal Processing Any biologically significant neural model of visual object recognition must include retinal processing. The anuran retina is among the best known neural structures, and triggers considerable modeling as well (for examples see Ewert & Seelen, 1974; an der Heiden & Roth, 1987; Teeters, 1989; Teeters & Arbib, 1991). Our analysis is mainly based on Teeters' retina model since it provides the most detailed account of the toad retina to date. The receptive fields of retinal ganglion cells are usually thought to be composed of an excitatory center and an inhibitory surround (Kuffler, 1953). Both mechanisms are described by spatially Gaussian-distributed curves around a common midpoint, but the inhibitory one has a lower peak and wider spread. The whole neuronal response is formed as a difference of Gaussians (DOG), with the excitatory Gaussian minus the inhibitory one. However this ignores fine details of cellular interactions within the retina. A more detailed model (Teeters & Arbib, 1991) for the anuran retina prior to ganglion cells (Figure 4.1) follows the generally accepted overview of retinal processing. Receptors and horizontal cells together form the center-surround receptive field for the bipolars. (Horizontal cells are not shown in the figure due to their limited role in visual processing in this retina model. For detailed discussion see Teeters & Arbib, 1991) The bipolar output provides the input to amacrine cells where extensive processing is performed including temporal processing which emphasizes transient responses. Three different types of ganglion cell were identified in the retinotectal projection of toads (Griisser & Griisser-Cornehls, 1970; Ewert & Hock, 1972) which correspond to R2, R3 and R4 in frogs (Griisser & Griisser-Cornehls, 1976). The responses of the three retinal ganglion types to three classes of stimuli used in the Ewert laboratory are summarized in 105 Data 40 R2 30 - 20 - 16 2 4 8 32 6 0 EDGE SIZE (DEGREE) worm antiworm I square Worm Antiworm Square Teeters* R esult 40 H R 2 30 - 20 - 10 - 2 4 8 16 32 M odel R esult 40 R 2 30 - 20 “ 10 - 2 16 32 4 8 EDGE SIZE (DEGREE) EDGE SIZE (DEGREE) Figure 4.2 R2 response to worm, antiworm , and square. Top: The experimental data (From Ewert, 1976). Bottom left: Response of the Teeters model (From Teeters, 1989). Bottom right: Response of our modified retina model. A worm stimulus is a rectangle with its elongated edge parallel to the direction of movement; An antiworm stimulus is a rectangle with its elongated edge perpendicular to the direction of movement. As in following figures, each point represents the temporal average firing rate in response to the corresponding stimulus. 106 Data 4 0 R 3 30 - 20 - 2 8 16 32 4 6 0 EDGE SIZE (DEGREE) Worm Arrtiworm Square Teeters’ R esult M odel R esult 40 H R 3 30 - 20 - 10 - 2 4 8 16 32 40 R 3 30 - 20 - 10 - 32 2 8 16 4 EDGE SIZE (DEGREE) EDGE SIZE (DEGREE) Figure 4.3 R3 response to worm, antiworm, and square. Top: The experimental data (From Ewert, 1976). Bottom left: Response of the Teeters model (From Teeters, 1989). Bottom right: Response of our modified retina model. 107 Data 40 R 4 30 - 20 - 10 - 2 8 16 32 4 6 0 Worm Antiworm Square EDGE SIZE (DEGREE) T eeters' R esu lt M o d el R esu lt 40 40 H R 4 R 4 30 - 30 - 20 - 20 - 10 - 2 4 8 32 2 8 16 32 16 4 EDGE SIZE (DEGREE) EDGE SIZE (DEGREE) Figure 4.4 R4 response to worm, antiworm, and square. Top: the experimental data (From Ewert, 1976). Bottom left: Response of the Teeters model (From Teeters, 1989). Bottom right: Response of our modified retina model. 108 the top panels of Figs.4.2, 4.3, 4.4 respectively (Ewert & Hock, 1972; Ewert, 1976). Each data point corresponds to the average firing rate of the given cell during the response to the leading edge of the horizontally traveling object (Ewert, personal communication, 1989). We shall later consider data from the Ewert laboratory that also take the trailing edge into account In fact, the response to the trailing edge must also be taken into account to explain hierarchical stimulus specificity The responses of R2, R3 and R4 ganglion cells are formed by different combinations of ATD and ATH. For implementation details see Teeters and Arbib (1991). During a simulation of the retina, a moving stimulus is directly mapped onto the receptor layer. The dynamics of the membrane potential m(t) of a neuron in a later layer is formed by input I(t) from previous layers according to the leaky integrator model (following the style of modeling in Lara et al., 1982): dm( t) . . . . dt ~ ~ m(t) + + h (4 -!) where tm is the time constant, h is a resting level, and I(t) represents the weighted sum of excitatory and inhibitory inputs. Rather than using a detailed model of spike initiation, the firing rate S(t) of the neuron is formed by %(m(t)), where the choice of non-linear function may vary from cell-type to cell-type. In the model, each cell type corresponds to a two dimensional matrix, with a single cell represented by the membrane potential m (ij,t) of the neuron at position (i, j) and time t. The input I(i,j,t) to this neuron is created by summing up the contributions of the preceding layers. Each contribution is formed as the convolution of a kernel which approximates a DOG with the output from the appropriate cell matrix: 109 (4.2) where * represents convolution, S indicates the output firing rate of the input layer, and a kernel element k(x,y) is defined as where R is the radius of the receptive field measured in degrees of visual angle. The activity distribution of the receptive field is uniquely determined by parameters We, Wf, < J e, and < 7 j. The retina model in this paper is a slightly modified version of the Teeters model (Teeters, 1989) and more closely approximates the electrophysiological data. The difference between our model and the Teeters model for the toad retina, besides different implementations, can be summarized as the following: (1) Different sets of parameters for ganglion cells (see Table 4.1); (2) Our model simulates both the on-channel and off- channel response of R2 cells (see Eq.4.4 below) while his model only simulates the off- channel response; (3) Our model accepts bitmap stimuli directly, which is crucial for simulating the retinal response to various configurations of worm stimulus in Fig. 1.1, while his model only accepts structured stimulus shapes of worm, antiworm and square (see Fig.4.2). Data from the Teeters model and our modified model are presented in Figs. 4.2, 4.3, and 4.4 for the three types of ganglion cells respectively, together with the We exp[-(x2 +y2 )/(2a2)j - exp[-(x2 +y2 )/(2 o 2 )] k(x,y) = e 1 ,0 if x 2 + y 2 < R 2 otherwise (4.3) 110 electrophysiological data*. Table 4.1 lists various parameter values of R2, R3 and R4 cells used both in the Teeters model and in our model, and Appendix A gives the full equation set used in our simulation, including the retina model. T able 4.1 P aram eter V alue of R etinal G a n g lio n C ell M o d els The Teeters' m odel O u r m odified m odel R2 R3 R4 R2 R3 R4 W e 1 . 0 1 . 0 1 . 0 1 . 0 1.15* 1 . 0 W j 0.43 0.82 0 . 0 0.47* 0.91* 0 . 0 ae 2.4 5.0 3.5 2.4 2 .0 * 3.5 C T i 4.0 1 0 . 0 — 4.0 1 0 . 0 — R 9.75 9.75 9.75 9.75 9.75 9.75 * Value different from the Teeters' m odel Figure 4.5 shows the average firing rates of R2, R3 and R4 cells of the model when the 8 worm-like stimuli from the dishabituation hierarchy (Fig. 1.1) move across their receptive fields. Note that a is the highest in the hierarchy, while h is the lowest. Stimuli d and f give the largest R2 and R3 responses, since their leading edge, the vertical bar, can fully fit within the ERF of the retinal ganglion cells and thus elicits a larger response than the diagonal edge of the other stimuli with the same vertical length. R4, in contrast, gives approximately equal responses to all stimuli due to its large receptive field which contains both off- and on- channel contributions. In a recent model * The average firing rate of a neuron is computed by the temporal integration of its instantaneous firing rate divided by the time period during which a non-zero firing rate is consecutively elicited. I l l of the toad tectum for prey-catching behavior, concentrating on modulation of worm and antiworm response, Betts (1989) omitted the R4 connections used in an earlier model (Cervantes et al., 1985) which also addresses data on response to squares. Since the present model is concerned only with responses to worm-like stimuli, we will similarly ignore the response of R4 cells. Simulated Retinal Response to the Worm-like Stimuli 40 3 0 - 03 c 3 cc 0 3 2 0 - & C D 03 I < 1 0 - 0 3 b c d e f g h 9 Stimulus Name Figure 4.5 Simulated retinal response to the 8 worm-like stimuli shown in Figure 1.1. All three ganglion types are tested in the model. In this simulation, only the response to the leading edge is recorded in R2 and R3 cells. Tsai and Ewert (1987), in one of the first studies to consider the contribution of the trailing edge of moving objects to retinal responses in toads, found that R2 cells show almost no preference in response to either edge of an object, while R3 cells show a much stronger off-channel (from white to black) than on-channel (from black to white) 112 response, which correlates with the behavior. The R3 response to the trailing edge was modeled in the Teeters model with a 1.0/0.2 ratio of off-channel to on-channel response. In the current simulation, we model the R2 cell response with a 1.0/1.0 ratio of off- to on-channel contribution, i.e. the contribution from the trailing edge is as strong as from the leading edge of the stimulus. Analytically, the R2 membrane potential is described by xr2 dm' r 2 ^ ' t^ = -mr 2 (ij,t) + (kr2 * (Satfl+Satd))(ij,t) (4.4) All symbols in (4.4) have been described before except the assumption that R2 cells receive the equal contribution from the off-channel (Sat^) and on-channel {Sat(j) of amacrine cells. Subscripts indicate the neuron types, e.g., kr 2 stands for the kernel of a R2 cell as defined in (4.3) and Table 4.1. The detailed definition of Satj2 and Sat(j is given in Teeters and Arbib (1991; see also Appendix A). Fig.4.6 shows the temporal responses of an R2 neuron to the 8 worm-like stimuli in Fig. 1.1, and Fig.4.7 shows the corresponding R3 responses from our modified retina model. Here only firing rate is displayed, with %(x) = Max(x, 0) for both R2 and R3 cells. In terms of single cell response, a vertical bar of 4° height elicits the strongest response in both R2 and R3 cells, and the more inclined is a stimulus edge, the less efficient it is to trigger a retinal response. This is because the more inclined is a stimulus edge with the same vertical height, the more does it encroach on the inhibitory surround and the longer is the duration of the response. Note the effect of the dots in worm e relative to a and worm g relative to b in the R2 response. Since the dot is encroaching on R2's IRF while the edges of stimuli e and g traverse R2's ERF, worms e and g elicit smaller R2 responses than worms a and b respectively. Note also that the two response areas of R2 to an object in Fig.4.6 generally correspond to the R2 response to the leading R2 Temporal Response 113 FR worm a r 75.0 r 75.0 FR worm b r 75.0 FR worm c FR worm d FR worm e 75.0 75.0 FR worm h 75.0 r 75.0 FR worm f r 75.0 FR worm g F igure 4.6 R2 temporal firing rate to the 8 worm-like stimuli from the retina model. Time runs from left to right. The unit of the numbers in the figure is impulses per second (Hz). FR: firing rate. R3 Temporal Response 114 50.0 FR worm a r 50.0 FR worm b FR worm c FR worm d FR worm e FR worm f FR worm g FR worm h 50.0 r 50.0 50.0 r 5o.o r 5o.o r 50.0 Figure 4.7 R3 temporal firing rate to the 8 worm-like stimuli from the retina model. See the legend for Fig.4.6 for other information. 115 edge and the trailing edge of the object. The leading edge response is to the left of the trailing edge response because time runs from left to right in the plot. Since all objects are moving at the same speed, the distance between the peaks of the two areas corresponds to the distance between the middle points of the leading edge and the trailing edge of the object. For example, the distance between two peaks elicited by worm a is smaller than that elicited by worm d. Figure 4.8 presents the 3-D snapshots of the membrane potentials of 25x25 matrices of cells in response to worms a,b,d,h traversing the visual field of the retina model for R2 and R3 respectively. The responses of R2 and R3 to the other 4 stimuli have also been simulated, but are omitted for space. In the simulation, the density of receptors is 1 cell / 0.5° while the ganglion cell density is 1 cell / 2° resulting in a 4 to 1 density ratio. In addition each ganglion cell has a receptive field of approximately 20°x20° or 40x40 receptors. This results in a 140x140 receptor matrix (i.e. a 70°x70° visual field) serving as input to the 25x25 ganglion cell matrix. In contrast to Figures 4.6 and 4.7 where the temporal response is given, Figure 4.8 shows the spatial response of the ganglion cells, which are difficult to observe electrophysiologically. Although a bit obscured in some cases (e.g. worm a in Fig.4.8a) by the effects of sampling and the display technique, there is generally a clear correspondence between the responses in Figure 4.8a and 4.8b and their corresponding geometric shapes in Fig. 1.1. The off-channel preference of R3 cells is clearly shown in the figure. Since a single neuron in later visual centers (like the tectum and the anterior thalamus) integrates a 2-D cell patch of the retinal ganglion layers, the 3-D snapshots in Fig.4.8 are very helpful in envisioning the response characteristics of later visual processing. 116 Figure 4.8 a. 3-D snapshot of the membrane potential of the 25x25 R2 layer to worm patterns a, b, d, and h shown in Fig. 1.1. The stimulus is moving from left to right, as shown in the figure. All response potentials are scaled uniformly, b. 3-D snapshot of the membrane potential of the 25x25 R3 layer to worm patterns a, b, d, and h shown in Fig. 1.1. All response potentials are scaled uniformly. R2 to worm a R2 to worm b [- 80.0 et'fc R2 to worm d direction R2 to worm h R3 to worm a R3 to worm b ■85.0 t f v ° ' ,-vem ' en t R3 to worm d dire' ction R3 to worm h 118 IV.4 T ectal R elay Based on anatomical data (Neary & Northcutt, 1983; Wilczynski & Northcutt, 1983) and functional lesion data, Ewert (1987a) suggested that the basic pathway for habituation in amphibians is: retina — » tectum — > A T (anterior thalamus) — > MP (medial pallium) — » PO/HYP (preoptic region/hypothalamus) — > tectum. This pathway is referred to as loop(2 ) and is generally supposed to be responsible for modulation of the innate releasing behaviors of amphibians (see Section VI.2 for a more detailed discussion). In this chapter, we are only concerned with the first part of this loop: retina — > tectum — » A T, where the discrimination of the stimuli is presumably achieved using the circuitry analyzed below by our simulation. More specifically, we shall demonstrate a circuit (called ATL) that can effect the desired discrimination and, as a logically separate claim, suggest that it is located in AT. In our model, we do not address the question of how the optic tectum discriminates prey from predator, but we do hypothesize that it plays little or no role in the finer pattern discrimination that underlies the dishabituation hierarchy, but rather relays the input from the retina to the anterior thalamus where visual information is further processed and carried up to the telencephalon. The reasons for this hypothesis are the following. The tectum receives inputs from both R2 and R3 retinal cells, and is the neural center mediating prey-catching behavior in amphibians (Ewert 1987a,b). As for stimulus-specific habituation, behavioral data show that the releasing values for all the stimuli in the hierarchy are almost the same, as stressed by Ewert (1984). Also, the prey- catching behavior shows off-channel preference, which correlates very well with the neuronal activities in R3 and T5.2 cells (Tsai & Ewert, 1987). This finding leads Tsai and Ewert (1987) to propose that R3, not R2, carries the primary information to prey 119 analysis circuitry located in the tectum. However, as argued in the previous section, the response to the trailing edge should have a significant role in worm discrimination. This suggests that R2 may be more involved in the discrimination of worm-like stimuli than R3. This trailing edge consideration leads us to downplay the tectum as a major processor of "sub-worm" discrimination. With HRP and cobalt-filling, Lazar et al. (1983) found that in frogs the main projection units to the anterior diencephalon from the tectum are the small piriform neurons (SP), which are located in layer 8 of the tectum. This finding leads us to assume that SP cells relay the visual information concerning worm discrimination. According to the tectal column model (Lara et al., 1982; Cervantes et al., 1985) which was abstracted from anatomy of the anuran tectum (Szekely & Lazar, 1976), each column comprises a pyramidal cell, PY, as the sole output cell, a large pear-shaped cell, LP, a small pear-shaped cell, SP, and a stellate inhibitory intemeuron, SN. The tectum is modeled by an array of locally connected columns. In the model of Cervantes et al. (1985), the SP cells are defined as follows: t sp dm sdhh!l = -msp( ij't) + Sr2 (i,j,t) + Igl(iJ,t) - I sn(iJ,t) - I th 3 (i,j,t) (4.5) where GL stands for the glomerulus within a tectal column, TH3 is one type of thalamic- pretectal cell, and Igi(i,j,t), Isn(i,j,t), and It^ ( i j , t ) represent weighted inputs from GL, SN, and TH3 cells respectively. In terms of retinal afferents, SP only receives R2 inputs. In the present model, SP cells also receive R2 inputs and project to the anterior thalamus. Since R2 projects to SP topographically, and the role that SP has in this model is to relay R2 activity, the neuronal response of SP is made equal to the response of R2 to any 120 stimulus, in order to simplify implementation. Future modeling will pay more attention to the dynamics of tectal circuitry. IV.5 Integration in Anterior Thalamus The anterior thalamus (AT) consists of several nuclei, but due to the lack of more specific data, AT will be discussed as a whole. The anterior thalamus receives ascending R3 and R4 retinal projections (Scalia & Gregory, 1970; Griisser & Griisser-Cornehls, 1976) and SP tectal projections (Lazar et al., 1983). Among other ascending projections to the telencephalon, AT has a direct projection to the medial pallium (Scalia & Colman, 1975; Neary & Northcutt, 1983). Although responses of visually sensitive neurons have been recorded in AT, the data only provide a preliminary picture. Compared to the optic tectum and the caudal thalamus, AT is much less understood in terms of neurophysiology and morphology. For example, no well-observed neuronal types have been reported there. Ingle (1980) found that large ablations o f AT usually depressed prey-catching behavior. Also AT has been proposed as part of the modulatory loop(2) (Ewert, 1987a). However, the kind of visual processing performed by AT remains unknown. .We offer in the present model a definite hypothesis: Based on the specific position of AT in loop(2) and our previous analysis of visual information processing, we propose that it is the anterior thalamus where the finest pattern discrimination is achieved by neuronal responses. As stated above, it is too early to form a model of anatomical circuitry for AT. However, since the computational function of AT is one of our major concerns in this project, we will be contented with a simple array of neurons, called ATL, for modeling the anterior thalamus for the time being. ATL neurons receive excitatory- 121 center inhibitory-surround inputs from tectal SP cells, and direct inhibitory inputs from R3 cells, as shown in Fig.4.1. Quantitatively, xatl = -mati(ij,t) + (katll*Ssp)(iJ,t) - Max [ 0 , (katl2 *Sr3 )(ij,t)] (4.6) k a tll(x >y) = < We sp W i sp .0 i f \xI < m j , lyl < m j i f m j < \xI < m 2 , m j < lyl < m 2 otherwise (4.7) katl2<x’ y) = Wl r3 ■0 i f \x\ < ri], \y\ < tij if nj < \x\ < n2, nj < lyl < n 2 otherwise (4.8) ifm(t) > eati if not (4.9) where the Max operation ensures the inhibitory effect of the R3 input. The meaning of the other symbols has been described previously. Table 4.2 provides all parameter values used in the simulation, together with the number of the formula where each parameter appears. As we see from the table, mj = nj = 6 , and m 2 = « 2 = 17, resulting in a 25° ERF surrounded by a 50° IRF for the ATL cell. See Appendix A for the full set of equations used in the simulation. 122 T ab le 4.2 P aram eter V alue of th e A n terio r T h alam u s M odel P a ra m e te r ^atl W e vv sp w jp m-j m 2 F o rm u la (4.6) (4.7) (4.7) (4.7) (4.7) V a lu e 0.065 0.0091 -0.003 6 1 2 P a ra m e te r w e r3 w 1 r3 n l n 2 6atl F o rm u la (4.8) (4.8) (4.8) (4.8) (4.9) V a lu e 0.0095 -0.003 6 1 2 13.0 Figure 4.9 shows the average firing rates (see the previous footnote) of a single ATL neuron to the 8 worm-like stimuli. The open symbols represent the response of the full model, while for a comparison the filled symbols provide the response without R3 inhibition. The result clearly matches the ordered dishabituation hierarchy in Fig. 1.1. Not only do stimuli higher in the hierarchy generate larger ATL responses, but the stimulus pairs b— c and d— e which are on the same level in the hierarchy generate nearly equal responses. The dishabituation hierarchy created by this model is almost the same as the one observed experimentally in Fig. 1.1, and the only discrepancy compared to Fig. 1.1 is that the model do not create the preference of stimulus b over c. Actually, after comparing the original data to be included in Fig.7.6E, we do not see the reason in the dishabituation hierarchy why b is considered to be a little more preferable to c. In summary, we propose the following mechanisms to explain the dishabituation hierarchy in Fig. 1.1. 123 (1) Both the leading edge and the trailing edge of a worm stimulus have to be taken into consideration. (2) The receptive field of ATL neurons (25° ERF and 50° IRF in our model) is big enough to "see" both the leading and the trailing edge (cf. Fig.4.6). Stimulus a elicits the biggest response, particularly bigger than stimulus d, because both diagonal edges elicit strong responses in R2 cells (see Fig.4.6) and these responses can be best integrated in ATL cells due to the small distance between the midpoints of its leading and trailing edge response. Simulated ATL Response to Worm-like Stimuli 42 40 0) 38 o o C E 36 o > c != 34 L L . 32 ? 30 < 28 With R3 26 Without R3 24 o a b c d e f g h g Stimulus Name Figure 4.9 AT response to the 8 worm-like stimuli shown in Figure 1.1. The 8 average firing rates are ordered, which corresponds to the ordered hierarchy in Figure 1.1. The figure also shows ATL response to the stimuli without inhibitory projection from R3 cells. 124 (3) Stimuli b and c are preferred to stimulus f because the inhibition of R3 cells, which has off-channel preference, is bigger for f than for b and c. (4) Stimuli with dots appear lower in the hierarchy because they elicit a smaller R2 response due to IRF interaction. (5) A striped pattern elicits the smallest response in ATL neurons because of R3 inhibition. This is particularly clear when we compare the two curves in Fig.4.9. IV . 6 P red ictio n s The simulations so far presented lead to a number of specific predictions: (1) When the animal is presented with different worm-like stimuli, they will elicit different neuronal responses at a certain neural center, and the order exhibited based on average firing rate corresponds to the order exhibited in the dishabituation hierarchy. (2) Retinal ganglion cell type R2 plays a primary role in the discrimination of the stimuli, since R2 responds best to small moving objects and detects equally well both the leading and trailing edge of a stimulus. (3) In the discrimination of different "sub-worms", the optic tectum serves only to relay information from the retina to AT via SP cells. (4) R3 cells have an inhibitory role in worm pattern discrimination. This is due to their off-channel preference (from white to black). (5) The anterior thalamus is the structure which reflects the final pattern discrimination due to its special position in the modulatory loop(2). This structure receives excitatory projections from SP and inhibitory projections from R3. 125 The current model will create different hierarchies based on different sizes of worm-like stimuli. After completing the previous simulations, we shrank the size of all the stimuli to 10 mm long and 2.5 mm high corresponding to 8 ° by 2°, and tested these stimuli. The worm that is 8 ° long and 2° high forms an optimal stimulus to T5.2 cells in the tectum which correlate well with prey-catching behavior (Ewert, 1984). Fig.4.10 presents the dishabituation hierarchy predicted by this model. A remarkable difference has been found, compared to Fig. 1.1. Particularly, stimulus h lies at the top of Fig.4.10, in contrast to the bottom position in Fig. 1.1, and the stimuli with dots appear higher in the hierarchy, reversing the original relation exhibited in Fig. 1.1. Our explanation is that since the stimulus size is halved compared to Fig. 1.1, the previous IRF interaction in the R2 receptive field is converted into an ERF interaction which strengthens overall responses. This ERF interaction is particularly manifested by stimulus h. Note that the R3 inhibition in ATL neurons is relatively smaller than the excitation from SP cells, and thus cannot prevent stimulus h from inducing a strong ATL response. This leads us to postulate that the effect of dot and striped pattern is relative to stimulus size in pattern discrimination. Furthermore, this prediction suggests that if multiple stimuli lie close to each other they tend to cooperate to form a stronger response than any one of them, while if the stimuli lie far from each other they tend to compete and counteract each other's response. The above prediction of absolute size sensitivity of the dishabituation hierarchy is complicated by the size constancy which toads and frogs exhibit (Ewert et al. 1983). The mechanism underlying size constancy is unknown and has not been incorporated into this model. If, for example, size-constancy is not achieved until after the anterior thalamus, we may not be able to find the general absolute size sensitivity of the hierarchy. However, the predicted dishabituation hierarchy presented in Fig.4.10 can be precisely 126 tested if the stimuli are moved at the same distance from the toads as in Fig. 1.1, i.e., 70 mm from toads. 2.5 mm 1 0 mm i i i i i i h I I I m ovem ent direction a I g Figure 4.10 Dishabituation hierarchy predicted from this model by shrinking stimulus size. For explanation see the legend for Fig. 1.1. All the stimuli are 10 mm long and 2.5 mm high. The same set of stimulus configurations is used as in Figure 1.1. 127 In this model of pattern discrimination both on-channel and off-channel effects are considered important. We have tested the same stimulus patterns as in Fig. 1.1 but reversed the contrast-direction, i.e. white stimuli moving against a black background (w/b). A new dishabituation hierarchy is found in our simulation, shown in Figure 4.11. The response of R2 cells with w/b is the same as with with b/w, but now R3 cells show a trailing edge preference. We continue the previous prediction list by summarizing what is presented above: (6 ) When the stimulus size is halved, the new dishabituation hierarchy shown in Fig.4.10 is predicted for behavioral experiments. (7) When the stimulus-background contrast is reversed, the new dishabituation hierarchy shown in Fig.4.11 is predicted for behavioral experiments. IV .7 D iscussion In this model we have suggested a pattern recognition paradigm for toads and frogs, which uses temporal coding for representing different worm-like objects. Hierarchical stimulus specificity manifested in this paradigm represents an intermediate step between stimulus non-specificity found in several invertebrates (Kandel, 1976) and full stimulus specificity demonstrated in mammals. The strict locus-specificity implied in temporal coding may hamper the animal's location invariance in recognizing objects which seems crucial for concept formation, an important feature of pattern recognition in many mammals. However, locus-specificity may have survival value for the anuran by allowing it to track the same stimulus moving at a different location. In fact, the capability of the pattern discrimination of anurans is rather limited (Ewert, personal communication 1989). So the "sameness" of stimuli is different for anurans than for mammals where the 128 movement direction F igure 4.11 Dishabituation hierarchy predicted from this model by reversing contrast direction. For explanation see the legend for Fig. 1.1. In contrast to Fig. 1.1, white stimulus is moving against black background. The same set of stimulus configurations is used as in Figure 1.1. 129 visual discrimination is much more accurate. But one feature of the pattern recognition paradigm suggested in this chapter is that pattern recognition is based on visual cues like leading edge, trailing edge, dots, or striped patterns, which contrasts with the paradigm based on Hamming distance. This is related to the fact that, in anurans, retinal ganglion cells respond to quite complex features of the stimulus, while by contrast, in mammals, the responses of retinal ganglion cells are stereotyped rather. Much progress has been made in experimental research on the anuran retina; and much modeling effort has been devoted to understanding its function. However, this project makes the first extensive utilization of anuran retinal processing to explore the capability of the anuran visual system in discrimination of sim ilar objects. Our understanding of the anuran retina based on this study, as suggested in Figure 4.8, is that R2 forms localized edge detectors and R3 best detects the transition from white to black in the environment. Besides certain phenomena, such as erasability, which have been modeled by Teeters (1989), these functions of R2 and R3 can be basically achieved by a high-pass filter. In the previous tectal column model (Lara et al., 1982; Cervantes et al., 1985), small pear cells receive R2 inputs through the glomerulus dendrites, as well as SN inhibition. The output of SP projects on the large pear cell and the pyramidal cell in the same column. The interaction that SP is involved in the tectal column model makes it able to determine the proper times for vertical recruitment of excitation to facilitate a response in the efferent (PY) neuron. This tectal column model is anatomically based on an earlier view of synaptic interactions within the optic tectum (Szekely & Lazar, 1976). The SP cells were considered as local neurons until recently Lazar et al. (1983) found their projections onto the anterior thalamus, which underlies this model where the SP cells behave as relays from the retina to the anterior thalamus for pattern discrimination. These 130 two views on the role of small pear cells might suggest that there exist two physiological subtypes of small pear cells, one of which is involved in facilitation of prey-catching behavior and the other conveying information for sub-worm discrimination. In the model, besides the relay role of SP cells, the tectum does not play an active role in discriminating different "sub-worms". The tectum is a critical structure for prey/predator discrimination (see Ewert, 1984), as modeled previously (Lara et al., 1982; Cervantes et al., 1985). We propose, through the model, the view that prey/predator discrimination have different neural mechanisms than finer discrimination among different prey stimuli. Prey/predator discrimination involves the interaction between the tectum and the thalamic-pretectal region (TP), and our model suggests that worm pattern discrimination involve the integration of the retina and the tectum by the anterior thalamus. The role of TP must come into play when habituation of predator avoidance behavior is modeled. Some discussions about TP's role in learning can be found in Section VI.2, but we do not attempt to model predator related learning in this dissertation. Based on this neural model, we have predicted two dishabituation hierarchies by using the same configuration of worm stimuli (c.f. Figs. 4.10 and 4.11) with different stimulus size and stimulus-background contrast. One natural extension would be to consider the effect of speed on the dishabituation hierarchy. The speed effect on R2, R3 and R4 cells has been modeled in Teeters' retina model (1989). If the moving speed of the worm stimuli in Fig. 1.1 is changed, we expect the same dishabituation hierarchy since varying the moving velocity of stimuli changes the response to all the stimuli uniformly in retinal ganglion cells (Griisser & Griisser-Comehls, 1976; Teeters, 1989) and hence cells in higher neural centers. So the speed of the stimuli should not affect relative dishabituation. 131 In this chapter we are only concerned with the mechanism for discriminating different worm-like stimuli. Given the ordered firing responses to the different stimuli, we must next ask how toads can store these responses, and later exhibit habituation and dishabituation. This will be addressed later in chapter 6 , where the medial pallium, the structure homologous with the hippocampus of mammals, will be our major concern. In that chapter, we propose a column model of the medial pallium, and a formal description of synaptic plasticity. We will reproduce detailed experimental data on habituation and dishabituation processes, on which the dishabituation hierarchy (Fig. 1.1) is based. 132 C H A PTER V CONFIGURATIONAL PATTERN RECOGNITION BY DISHABITUATION: BEHAVIORAL TESTS OF THE PREDICTIONS* S um m ary In the previous chapter, we developed a neural model for visual pattern discrimination, based on previous behavioral studies which dem onstrated that toads exhibit a dishabituation hierarchy for different worm-like stimuli. The model suggests that visual objects are represented by temporal coding and predicts that the dishabituation hierarchy changes when the stimulus/background contrast is reversed or the stimulus size is varied. The behavioral experiments reported in this chapter were designed to test these predictions. (1) For a pair of stimuli from the contrast reversal prediction, the experimental results validated the theory. (2) For a pair of stimuli from the size reduction prediction, the experimental results failed to validate the theory. Further experiments concerning size effects suggest that visual pattern discrimination in toads exhibits size invariance. (3) Inspired by the Groves-Thompson account of habituation, we found that dishabituation by a second stimulus has a separate process from habituation to a first stimulus. * The experiments reported in this chapter were conducted with Peter Ewert. 133 V .l Introduction As originally demonstrated by Ewert and Kehl (1978), habituation in toads is partially stimulus specific, exhibiting hierarchical stimulus specificity. In the dishabituation hierarchy shown in Fig. 1.1, all objects were black and moved at a constant speed against a white background, and they had the same long extension of area parallel to the direction of movement and short extension perpendicular to it, referred to as "worm-like" (though not only worms but also other small invertebrates moving in direction of their longer body axis, such as caterpillars, carabid beetles, woodlice and millipedes, fit this prey-schema). A stimulus pattern higher in the Fig. 1.1 hierarchy can dishabituate (i.e., release prey-catching despite habituation to) another stimulus pattern lower in the hierarchy, whereas a stimulus lower in the hierarchy cannot dishabituate a stimulus higher in the hierarchy. Before any habituation, however, all the stimuli were about equally strong in releasing prey-catching response. In the previous chapter, we developed a neural model for simulating the dishabituation hierarchy, which will be called the anterior thalamus (AT) model. The basic hypothesis of the model is that toads represent visual objects by firing activities of single cells in a certain visual structure, that is, visual objects are represented by temporal coding. Drawing on known toad neurobiology, the model incorporates the neural structures of the retina, the tectum and the anterior thalamus, and the latter (AT) is assumed to be the structure where discrimination of the visual patterns is achieved with reference to the dishabituation hierarchy. The output of the model clearly matches the ordered dishabituation hierarchy of Fig. 1.1. Based on the model, the question was raised of whether the dishabituation hierarchy changes when the stimulus/background contrast is reversed or the stimulus size 134 is varied. The former idea drew on previous behavioral experiments showing that common toads snap predominantly toward the leading edge of a black worm-like stripe moving against a white background, but mainly toward the trailing edge if the stripe is white and the background black (Burghagen & Ewert, 1982), a phenomenon which can be traced back to the property of off-dominating retinal R3 ganglion cells (Tsai & Ewert, 1987). On the basis of the simulated ordering of AT responsiveness, the model predicts a dishabituation hierarchy different from Fig. 1.1, as shown in Fig. 4.11, if the direction of the stimulus/background contrast is reversed. In this situation, the response of R2 cells is about the same with respect to contrast reversal, but R3 cells show a trailing edge preference, thus leading to the different dishabituation hierarchy. Turning to size effects, the model in general predicts that different hierarchies will be produced for different sizes of worm-like stimuli, because of an assumption that shape effects for worm discrimination rely to a large extent on the interactions of the excitatory receptive field (ERF) and inhibitory surround (IRF) of retinal R2 ganglion cells, which have quite specific sizes. In particular, when the stimulus size is halved compared to Fig. 1.1, the previous IRF interaction is converted into an ERF interaction, resulting in the predicted hierarchy shown in Fig.4.10. We have long been puzzled by the question of whether dishabituation counteracts previously learned effects of habituation, since different results would result in contrasting models for simulating habituation processes. In A plysia, it has been demonstrated that the synapses which were functionally inactivated due to profound habituation could be restored by a sensitizing stimulus (Carew et al., 1971). On the other hand, Groves and Thompson (1970) argued, mainly on the basis of the data from mammals, that sensitization, while releasing a new response, does not affect the trace of habituation. 135 The aim of the behavioral study reported in this chapter was to test the model’s predictions as well as the relevance of the Groves and Thompson dual process theory to toad dishabituation. An earlier version of this chapter appears in Wang and Ewert (1991). V.2 M aterials and M ethods a) Subjects. The behavioral experiments were performed in the laboratory of Peter Ewert in Kassel, FRG, with 200 common toads Bufo bufo (L.), which were kept in 30 "aqua- terraria" (60 x 30 x 30 cm^ each) at the constant room temperature of 20°C and were fed regularly with mealworms (Tenebrio molitor L.). b) Experimental set-up. A standard experimental set-up was used for measurements of the prey-catching turning activity of the toad (modified after Ewert, 1968). The animal sat in a cylindrical glass vessel within a homogeneous white, and diffusely illuminated arena, as illustrated in Fig.5.1. Prey dummies were two-dimensional pieces of black cardboard with longer extension ( 2 0 mm or 1 0 mm) in the horizontal direction of movement and shorter extension (5 mm or 2.5 mm, respectively) perpendicular to it. A black (or white) stimulus was moved mechanically by means of an electric motor around the vessel at 20°/s against a 40 cd/m^ white (or black) background at a distance of 70 mm from the vessel. All stimuli were moved from left to right from the viewpoint of toads, that is, clockwise from the observer looking downward (Fig.5.1). When the dummy fitted the prey category, the toad followed it by successive orienting movements. The orienting response habituated if the same prey dummy was continuously presented in the way described above, that is, the number of prey orienting turns per successive 1 -min interval declined progressively. The criterion for habituation was reached when the animal 136 responded less than 3 times to the dummy in a given 1-min interval. A habituation experiment usually lasted for 40 to 60 min. The total length of time for a stimulus series (till habituation occurred) could vary in different animals depending on their different motivational levels. All experiments were performed in early mornings or late afternoons, during which animals were most active. Figure 5.1 Experimental apparatus for the quantitative investigation of stimulus specific habituation of prey-catching orienting behavior in toads. A pair of stimuli A and B can be switched outside of the animal's visual field. (Redrawn from Ewert & Kehl, 1978). c) Exchange of stimulus objects. After habituation of the prey-catching orienting response to a particular dummy, this stimulus could be automatically exchanged with another one, following the method of Ewert and Kehl (1978). Usually two different dummies were fixed in holders mounted opposite to each other on a disc which rotated around the center 137 of the arrangement below the arena base (see Fig.5.1). The holders beneath the arena were not visible to the toad. The position of dummy holders (within the slit beneath the base) could be shifted independently by means of electric motors which made one dummy disappear and another appear. More specifically, after habituation to a stimulus A (Fig.5.1) - i.e. when the number of 1-min orienting turns to A reached the habituation criterion (< 3/min) - dummy A was switched underneath the arena base. At the same time, to test dishabituation, another dummy B was brought into the arena from underneath the arena base. The apparatus was so designed that the exchange of two dummies was done automatically outside of the toad visual field (Fig.5.1, for further details see Ewert & Kehl, 1978). d) Stimulus discrimination tests. Animals were used for the quantitative experiments if they showed 20-40 prey-catching orienting movements during the initial interval of 1 min in response to an optimal rectangular 2.5 x 30 mm^ prey dummy. To determine whether toads are able to discriminate between two different prey dummies A and B, the prey- catching orienting activity was first habituated to stimulus A; then the response to B was tested and also habituated. Experiments were repeated with 10 different toads from the "animal pool" mentioned above. Another 10 animals were used in the reverse order: first habituated to stimulus B and then tested with A. V.3 Results Contrast reversal In order to test the prediction of the stimulus/background contrast effect on the dishabituation hierarchy, we selected a pair of stimuli b/f from Fig. 4.11 which showed a 138 B 10mm l i t 1 t> H.,„tHthn,l Figure 5.2 Experimental test of the contrast reversal prediction of the anterior thalamus model. Dishabituation tests between a white right-pointing triangle of 5 mm x 20 mm extension (white b) and its mirror image (white f)> as shown in the figure. A. Habituation of toad's prey-catching orienting response first to white f and immediately afterwards the test of the response (see vertical arrow) to white b. B. Reversed order of presentation. Abscissa: habituation time (min). Ordinate: successive number of orienting turns per minute, the orienting activity R; each curve point represents an average value out of 1 0 individuals, and the vertical bar indicates the standard deviation. strong difference between the "black experimental hierarchy" (Fig. 1.1) and the "predicted white hierarchy" (Fig. 4.11). The experimental results are presented in Figure 5.2. In Fig. 5.2A, white f, which is the left-pointing triangle, was first presented to the animal; immediately after the habituation criterion was met, its mirror image, white b, was tested. All investigated toads failed to respond to b. In Fig. 5.2B, the presentation order was reversed: following habituation to white b, white f was presented. A statistically significant increase (P < 0.01; t-test) of the responses was shown in response to white f. Comparing the experimental results with the black b/f preference (Fig. 1.1), it 139 can be concluded that the toad is able to distinguish between white b and white f of the same length and height, and that white f is preferred to white b in dishabituation, opposite to the effect with the corresponding black stimuli. The experimental result is thus as predicted by the model. Size effects To test the prediction of size reduction (Fig. 4.10), we selected a pair of stimuli b/d which yields an opposite preference of dishabituation to that in the original data (Fig. 1.1). Both were 2.5 mm high and 10 mm long, and thus called small b and small d hereafter. The model testing results are presented in Fig.5.3. In Fig.5.3A, small d was presented first, and immediately after the habituation criterion was reached, small b was tested. Remarkable dishabituation was exhibited in the toads (P < 0.01). However, if the order of presentation was reversed as in Fig.5.3B where small b was presented first and small d was tested next, only slight dishabituation was observed. From these results, it can be concluded that the toad is able to distinguish small b and small d, and small b is preferred to small d in dishabituation. Although the stimulus size was halved, the same preference was established by the toads. At least for this particular pair of stimulus configurations, the model failed to be confirmed. W hat went wrong with the model? The model predicts in general that the dishabituation hierarchy changes with stimulus size. One reasonable explanation would be that toads exhibit the same dishabituation hierarchy within a certain range of stimulus size. To test this conjecture of size invariance, we first compared two different size stimuli of the same configuration. We chose configuration b of both 2.5 mm x 10 mm and 5 mm x 20 mm, and the results are presented in Figure 5.4. In Fig. 5.4A, small b was first presented and habituated; immediately afterwards presentation of big b elicited a 140 remarkable increase (P < 0.01) of prey-catching behavior. However, as presented in Fig. 5.4B, if big b was first presented and habituated, small b elicited almost no prey-catching response. These experiments demonstrate that, as indicated by dishabituation, toads recognize different sizes of configuration b, and that they prefer the bigger size of 5 mm x 20 mm to the smaller size of 2.5 mm x 10 mm. Note that no obvious difference in strength was found between big b and small b in releasing prey-catching within the first minute interval. B 10m m H H u + u tU itu m S a w ... d 10x2.5mm b 10x2.5mm F igure 5.3 Experimental test of the model's "size reduction prediction". Dishabituation test between black right-pointing triangle of 2.5 mm x 10 mm size (small b) and black rectangle of 2.5 mm x 10 mm size (small d). A. Habituation of toad's prey-catching orienting response first to small d, and immediately afterwards the test of the response to small b. B. Reversed order of presentation. For further explanations see the legend of Figure 5.2. 141 A B R 50 - i 40 - i 30 - ; 20 - > 10- i 10x2.5mm 20x5mm lOmin F igure 5.4 Test of the size effect in dishabituation. Two black stimuli of different size and the same configuration, small b (2.5 mm x 10 mm) and big b (5 mm x 20 mm), were used. A. Habituation of toad's prey-catching orienting response first to small b, and immediately afterwards the test of the response to big b. B. Reversed order of presentation. For further explanations see the legend of Figure 5.2. Further experiments tested dishabituation between small b (2.5 mm x 10 mm) and big f (5 mm x 20 mm). Two effects may take place for this particular pair. After a straightforward reasoning from Fig. 5.4, one would expect that big f should have a preference to small b. From the perspective of configurational cues, however, shape b has preference to shape f (Fig. 1.1). What actually happened is shown in Fig. 5.5. In Fig. 5.5A, if small b was presented and habituated first, and immediately afterwards big f was tested, no significant increase occurred in response to big f; however as shown in Fig. 5.5B, after habituation to big f, small b elicited strong prey-catching behavior (P < 0.01). These results clearly demonstrate that toads prefer small b to big f, as exhibited by the ordering of dishabituation, and configuration plays the predominant role in this 142 situation. The only explanation we can offer based on the results presented from Fig. 5.3 to Fig. 5.5 is that visual object discrimination in toads is unaffected to some extent by object size. Animal's preference of shape b to f is not changed even when the size of the stimuli vary considerably. The results in Fig.6.3 strenghen this suggestion, because they show that the animal is able to tell apart the different size b's, not just confuses them together. Meanwhile, the different size b's retain the same preference to a different shape. B % 10x2.5mm % 10mm 20x5 mm F ig u re 5.5 Test of size vs. configuration effects in dishabituation. a. Habituation of toad's prey-catching orienting response first to black small b of 2.5 mm x 10 mm, and immediately afterwards the test of the response to black big f of 5 mm x 20 mm. b. Reversed order of presentation. For further explanations see the legend of Figure 5.2. Separate process of dishabituation Does dishabituation in the toad have a process independent of habituation, as suggested by the Groves and Thompson analysis of mammalian data? This question could be best investigated by testing toad's response to a habituated stimulus shortly (a 143 number of seconds) after presentation of a dishabituating stimulus. Two black stimuli b and f from Fig. 1.1, and a white background were used for this investigation, and the results are presented in Fig. 5.6. As shown in Fig. 1.1, black b has preference over black f in dishabituation. In the experiments, f was first presented to the toad and immediately following habituation, stimulus b was presented for 30 seconds. In response to b, the toad exhibited a remarkable increase (P < 0.01) in orienting activity. Note that the isolated circle in Fig. 5.6 represents the average number of orienting turns for a 0.5 min period while all other black dots represent the number for a 1 min period. Immediately afterwards, stimulus b was withdrawn and f was presented again. All tested toads showed a very sharp decrease in the orienting response, behaving as though still habituated to f. To further confirm our observation, we switched back again to stimulus b after the toad failed to release a response to the second presentation of stimulus f, and observed in all cases that the toads quickly recovered the response. Once again, when stimulus f was switched back, the toads stopped the response. These results clearly demonstrate that dishabituation does not counteract the effects of previous habituation, but rather has a separate neuronal process. The same behavior was also observed from other pairs of stimuli, including white stimuli. V .4 D iscussion Habituation property Regarding the dishabituation hierarchy previously observed by Ewert and Kehl (1978, see Fig. 1.1), the present dishabituation study reveals that toad's pattern discrimination - within worm-like shapes - is even more sophisticated than originally expected. It should be noted that dishabituation is again unidirectional with the new group 144 !r ! 50 - 40 - 30 - I 20- : 10 - lOmin Figure 5.6 Experimental test of the "separate process question". Habituation of toad's prey-catching orienting response to black f, then immediately afterwards the test of the response to black b for 30 sec, and then the test of the response to black f again. In the figure, the isolated empty circle and the vertical bar on it represent the average response and its standard deviation, for the period of 30 sec. For further explanations see the legend of Figure 5.2. of stimuli used in this report. This is consistent with the basic hypothesis of the anterior thalamus model, namely that toads and frogs use intensity (temporal) coding for representing different visual objects. The investigation so far suggests that pattern recognition in anurans takes advantage of visual cues like leading edge, trailing edge, dots, or striped patterns, rather than using literal images (eidetic templates). In this regard, some parallels could be drawn in visual perception between anurans and invertebrates like honeybees (Wehner, 1981; van Hateren et al., 1990) and octopuses (for review see Young 1964; W ells, 1978), since it has been suggested that these invertebrates also use certain pattern parameters (like orientation and contour) in visual pattern recognition. The relevant data on these species of animals are mainly drawn from 145 conditioned training, and very few investigations have been conducted with habituation making a direct comparison difficult. Background contrast dependence The anterior thalamus model predicts that the toad's dishabituation hierarchy depends on stimulus/background contrast. The predicted hierarchy of contrast reversal (Fig. 4.11) points out the precise preference between the eight stimuli. This prediction is confirmed experimentally for the critical object pair b and f, the two mirror images of a right triangle. In terms of sensitivity to stimulus/background contrast these results are consistent with behavioral (Burghagen & Ewert, 1982) and physiological (Tsai & Ewert, 1987) data on edge preference (leading vs. trailing edge) that switches with contrast reversal. Comparable phenomena concerning figure/ground contrast are known from human perception. The classical example of Rubin's (1915) "face/vase" psychophysical phenomenon shows us that our visual system at black/white contrast borders gets into a conflict regarding the interpretation of object and background. Rubin's well-known picture can be interpreted either as black face silhouettes facing each other against a white background or as a white vase against a black background, a decision influenced by the contrast direction. Size effects The size variance prediction of the model failed to be validated, challenging further development of the anterior thalamus model to explain the new phenomenon of size invariance. Toads are able to recognize stimulus shapes by the dishabituation method, and their recognition is to some extent unaffected by stimulus size. Of course, toads will not respond with prey-catching if a stimulus is too big or too small. Note that 146 the human ability of size invariance in pattern recognition is also limited by a size range, which makes sense for adaptation. To our knowledge, these findings are the first to indicate that anurans, as lower vertebrates, exhibit size invariance in visual pattern recognition. It was suggested by some authors (Verlaine, 1924; Mazokhin-Porshnyakov, 1969) that wasps and honeybees have the ability to form the concept of "triangularity" - a generalization over all triangles of different size, orientation, and so on. Later, in trying to confirm the results, Anderson (1972) demonstrated that bees could by training discriminate any triangle configuration from any square configuration, but when trained to discriminate a specific triangle configuration from a specific square configuration, they never succeeded in generalizing this discrimination to any other triangle and square configurations. Therefore, although the previous results of Mazokhin-Porshnyakov were verified, an additional control parameter shows that the bee has not formed a concept of "triangularity". Also, it has been demonstrated that after training to discriminate two shapes, octopuses can transfer the discrimination to other shapes, including the same shape of different sizes, seeming to be able to generalize over size (Sutherland, 1969; W ells, 1978). However, no experiments show that octopuses can discrim inate configurations of the same shape but different sizes. When talking about size invariance or generalization, one must distinguish generalization from confusion. When an animal fails to discriminate two objects, it only confuses the objects. Generalization implies that the same response has to be elicited by perceptually distinguishable stimuli. It appears that Mazokhin-Porshnyakov did not make the distinction between generalization and confusion, thus overgeneralizing his results. From this perspective, we see no evidence suggesting that octopuses show size invariance in visual pattern discrimination. On the other hand, toads do distinguish size as shown in Fig. 5.4, and even when size effects favor an opposite preference, 147 configuration still decisively triggers the animal's behavior (Fig. 5.5). Therefore, our suggestion of size invariance in toads is on a firm ground. Since generalization over size is a crucial aspect for concept formation, the following questions rise naturally. Do toads form concepts in recognizing visual objects? Are anurans the phylogenetically lowest animals that have developed size invariance? Why do toads form this specific shape preference o f Figure 1.1 in recognizing visual objects by dishabituation? These interesting questions need to be further studied. Learning capabilities As demonstrated in Fig. 5.6, dishabituation of a habituated prey-catching response does not interfere (at least immediately) with previously acquired habituation. Toads must be able to store different visual patterns in distinct neuronal substrates, and later recall them independently. This characteristic essentially distinguishes the learning ability of toads from Aplysia where it does not seem that different patterns could be acquired through training (Carew et al., 1971; Kandel, 1976). The strikingly clear result in Fig.5.6 conforms with the dual process theory of habituation (Groves & Thompson, 1970), making amphibians closer to mammals in terms of habituation. The neural mechanisms previously proposed for modeling Aplysia's habituation and sensitization (e.g. Wang & Hsu, 1990) thus have to be modified when modeling B ufo’s habituation, the topic to be approached in the next chapter. The ability to separate different learning traces resulting from different stimulus patterns makes it possible to study directly the capacity of toad's visual pattern discrimination. It also opens new ways for investigating visual pattern discrimination of individual toads, quantitatively studying the retention of each acquired pattern by the habituation method. The learning capability of toads, or generally amphibians, has thus to be 148 reconsidered. Our knowledge of learning and memory processes in amphibia is rather scarce. The toad has been often considered a "difficult" or "poor" learner (Thorpe, 1963; Boice et al., 1974; Thompson & Boice, 1975), with few exceptions (for a review see Ewert, 1984). Our current knowledge about toad's learning in prey catching is sufficient to discard the poor learner conception (Finkenstadt & Ewert, 1988a; b). Avoidance behaviors can also be trained in these animals (Schmajuk & Segura, 1980; Karplus et al., 1981). Toads and frogs are also capable of discriminative learning and its reversals in a T-maze or a Y-maze (Schmajuk et al., 1980; Jones & Falkenberg, 1980; Harvey et al., 1981). The study by Ewert and Kehl (1978) clearly demonstrates that toads can discriminate similar visual objects (same length and height) by dishabituation, and the data of the present study suggest that toad's visual pattern discrimination is to an important extent size invariant. Note that, size invariance is one of the most important problems in engineering pattern recognition (e.g., see Fukushima, 1988). There are more data about shape discrimination of toads and frogs from conditioning, showing that they can learn to avoid hivebees, and that their acquired preference can be retained for different intervals of time, ranging from hours to several months (Cott, 1936; Eibl-Eibesfeldt, 1952; Brower & Brower, 1962). In summary, toads are not only able to discriminate different visual objects, but also able to store them in different neuronal substrates to avoid confusion. Toads can recognize different visual objects by dishabituation or conditioning, and recognize stimulus configurations regardless of change in stimulus size within a certain range. At the same time, toads exhibit the ability of differentiating stimulus size. It can, therefore, be safely concluded that amphibians in evolution have developed fairly advanced learning capabilities. Due to their relatively simple visual system compared to mammals and relatively large amount of data available for various visual structures (Ewert, 1984; 149 1987b), toads provide an ideal example for investigating visual perception and pattern recognition. 150 CHAPTER VI MODELING THE DISHABITUATION HIERARCHY: THE ROLE OF THE MEDIAL PALLIUM Summary We present a neural model for a basic functional unit (column) of the medial pallium, which is consistent with known biological evidence. A neural mechanism, called cumulative shrinking, is proposed for mapping AT temporal responses into a form of population coding referenced by spatial positions. A model of synaptic plasticity is proposed at the electrophysiological level, as an interaction of two dynamic processes which simulates acquisition and both short-term and long-term forgetting. The structure of the MP model plus the plasticity model allows us to provide an account of the neural mechanisms of habituation and dishabituation. Computer simulations have been conducted to reproduce: the original experimental data on which the dishabituation hierachy was based* and some further experiments. There is a remarkable match between model results and experimental data. A set of model predictions is presented, concerning mechanisms of habituation and cellular organization of the medial pallium. 151 Y I.l Biological Bases i Toads and frogs, as stated previously, exhibit habituation when the same visual prey is presented repeatedly. This visual habituation shows both locus specificity and hierarchical stimulus specificity (in the sense that dishabituation forms a hierarchy, see Ewert & Kehl, 1978). The dishabituation hierarchy is studied in chapter 4, and what we have achieved there is that a group of cells in the toad anterior thalamus responds with different temporal firing activities when the stimuli of different configurations are presented. In particular, the average firing rates of activities elicited by presentation of the stimuli exhibit the same order as in the dishabituation hierarchy. We must then ask where and how the response activities are stored, and investigate the learning mechanisms involved in habituation and dishabituation processes. These are the issues we will study in this chapter. Behavioral evidence In addition to locus and hierarchical stimulus specificity, prey-catching behavior in toads exhibits the following typical properties of habituation (for a review see Ewert, 1984). • After the same prey dummy is presented at the same retinal locus repetitively, the response intensity decreases exponentially. • Spontaneous recovery occurs after the stimulus is withheld. The time course of recovery from habituation exhibits two phases: a short-term process that lasts for a few minutes, and a long-term process which lasts for at least 6 hours. • Habituation is faster and lasts longer with an increase in the number of training series. 152 • The response to the dishabituating stimulus decreases with repeated presentation. The decrease of the response to a repeated dishabituating stimulus also follows a typical course of habituation. • Dishabituation does not counteract the effects of previous habituation, but rather establishes a separate neuronal process (see Chapter 5). • Data from the previous chapter suggest that toads to certain extent show size- invariance in recognizing visual objects by dishabituation. Predator-avoidance behavior is also habituated when a predator stimulus is repetitively presented, and predator habituation lasts a shorter period than that of prey habituation (Ewert & Rehn, 1969; Ewert & Traud, 1979). The data from the dishabituation hierarchy test only for short-term memory. Recent observations by Cervantes-Perez et al. (1991) suggest that stimulus-specific habituation may last much longer than hours. In a group of experiments, they reported hundreds of days. More individuals need to be tested in order to get more reliable results on long-term habituation. N europhysiological evidence Cells with adaptation properties can be found in different visual neural structures. In the retina, R2 cells show very strong adaptation in response to repeated stimulus traversing (see Griisser & Griisser-Cornehls, 1976, for a review). Various tectal cells show neuronal adaptation, including T5 “newness cells” of the frog (Lettvin et al., 1961; Ingle, 1973) and T 5(l,2), T 2(l,2), and T4 cells of the toad (Ewert, 1984). In the thalamic-pretectal (TP) region, class TH9 exhibits relatively long-lasting discharges after the stimulus has traversed its ERF (Ewert, 1971). However, in retinal, tectal or TP neurons, neither neuronal adaptation nor facilitation effects lasted longer than 90 to 120 153 seconds. Since behavioral studies show that habituation effects may last 24 hours or longer, habituation substrates must be located elsewhere. Preliminary physiological studies indicate that visual telencephalic neurons exhibit long-term adaptation, in comparison with those obtained in the tectum and the TP region (Ewert, 1984). In a particular telencephalic structure, the medial pallium (MP), a recent recording investigation identified three types of visual sensitive neurons exhibiting spontaneous firing activities. MP1 neurons strongly increase, MP3 neurons decrease, and MP2 neurons do not alter their discharge rates in response to 30 minutes of repetitive stimulation with a visual moving object traversing their excitatory receptive fields (Finkenstadt, 1989a). The involvement of the telencephalon, MP in particular, receives strong support from lesion experiments. After bilateral transection of the telencephalon just in front of the preoptic region, the prey-catching activity in toads showed a remarkable decrease in response to a repeatedly moving prey object (Ewert, 1965; 1970). After bilateral lesions of the ventral medial pallium (vMP), toads showed accurate prey-orienting response, though reduced in magnitude. However, they showed no progressive decrease of the orienting activity to repetitive stimulation (Finkenstadt & Ewert, 1988a). Similar retardation of habituation was also demonstrated in salamanders after unilateral MP lesion (Finkenstadt & Ewert, 1983). In addition, both the effects of conditioning and the associative learning ability in naive toads are abolished due to MP-lesion (Finkenstadt & Ewert, 1988b; Finkenstadt, 1989b). Anatomically, the medial pallium has been thought to be a homolog of the mammalian hippocampus ("primordium hippocampi" by Herrick, 1933). Its location in the dorsomedial wall of the hemisphere, is in a position similar to that of the mammalian hippocampal region. Hoffman (1963) discribes three cell types in the medial pallium: 154 flask-shaped cells with dendrites projecting medially; stellate cells; and pyramidal cells whose dendrites arborize medially and laterally. The medial pallium receives via the medial forebrain bundle (MFB) projections from the anterior thalamus and the thalamic- pretectal region; and it projects descendingly via MFB to the ventral thalamus (including TP), the preoptic area and the hypothalamus (Kicliter & Ebbesson, 1976; Kicliter, 1979; Northcutt & Kicliter, 1980; Neary & Northcutt, 1983). The preoptic area sends descending efferents to the tectum and the hypothalamus, the latter also sends efferents to deep layers of the optic tectum (Wilczynski & Northcutt, 1977; Neary & Wilczynski, 1977; Neary & Northcutt, 1983). Regional distribution in glucose utilization of various visual areas has been studied by the 14C-2DG method (Finkenstadt et al., 1985; Finkenstadt et al., 1986). The results assure or suggest the involvement of the tectum, TP, the anterior thalamus (AT), MP, the preoptic (PO) region, and the dorsal hypothalamic area (dHYP) in the prey- catching response. O f particular interest is the comparison of 2DG-uptake across brain structures between naive and trained toads. In habituated toads, compared to naive toads, vMP, a certain portion of the PO region, and dHYP showed a statistically significant increase in 2DG uptake, whereas the tectum showed a significant decrease (Finkenstadt & Ewert, 1988a). They found, however, no habituation-related changes in the TP area. It has been repeatedly shown that, when toads are trained to associate mealworms with predator sign stimuli, the trained toads in comparison with naive toads display a strong increase in glucose utilization in vMP (Finkenstadt & Ewert, 1988b; Finkenstadt, 1989b; Merkel-Harff & Ewert, 1991). 155 Theoretical Studies Lara and Arbib (1985), in their computational model, postulate that habituation is coded in telencephalic neurons which receive input from the tectum and modulate prey- catching behavior through their projection to TP neurons. TP inhibits pyramidal neurons in the tectum, thus modulating the toad's orienting response. Their modeling of habituation processes follows the general idea of the comparator model of stimulus- specific habituation (Sokolov, 1960; 1975). The idea is that an STM system creates and maintains a model of a stimulus and compares it with a current stimulus. If they are different, the animal releases a new response and STM is updated; if they are similar, habituation builds up. The Lara-Arbib model (1985; Lara, 1989) has not gone further than parameter setting when simulating the telencephalon network. Direct projections from the tectum to the telencephalon lack anatomical support. Furthermore, their assumption of telencephalic modulation via TP is not consistent with the 2DG-uptake data that TP shows no change while the tectum shows a strong decrease after long-term habituation training (Finkenstadt & Ewert, 1988a). Ewert (1987a) proposes the idea of neural loop interaction to explain amphibian prey-catching behavior and its modulation. Loop(l) involves the tectum, the TP area, and the striatum, and is supposed to mediate prey-catching behavior; the so-called loop(2 ) starts with the retina and the tectum which send axons to AT and from there to MP, which then projects descendingly to PO and from there to HYP which in turn sends efferents to the tectum (retina — » tectum — » AT — > MP — > PO — » HYP — » tectum). Loop(2) is supposed to modulate prey-catching behavior initiated in the tectum. All data obtained to date strongly suggest that vMP is the neural structure where learning occurs. For stimulus specific habituation in anurans, vMP modulates prey-catching behavior via 156 the PO/HYP pathway, inhibiting tectal activities (Ewert, 1987a; Finkenstadt & Ewert, 1988a). This is consistent with the neurophysiological data surveyed above. VI.2 Two Learning Loop H ypothesis Ingle (1976) observed that the releasing value of prey-catching behavior with prey stimuli increases following habituation of a predator stimulus, i.e., habituation with a large object (predator) facilitates the response to a small prey. This suggests that the effect of the habituation of predator avoidance behavior is generalized along the parameter "area" of predator stimuli, in contrast to stimulus-specific habituation with prey-catching behavior. In the situation of conditioned training, when toads were fed mealworms out of the experimenter’s hand for a certain time period, animals learned to associate the the experimenter’s hand with food, and finally responded with catching behavior to the moving hand alone (Brzoska & Schneider, 1978; Burghagen, 1979). This association was generalized to other large objects as well. More recently, Finkenstadt and Ewert (1988b) quantitatively studied conditioning of a large moving square (predator dummy) with mealworms. After a few weeks, the large square became effective to trigger the conditioned response (CR), and they found that, after training, the maximum response activity toward square, antiworm- and worm-like objects is all shifted towards larger size. Recent studies of association between a prey dummy (US) and olfactory cues (CS) confirmed the same phenomenon (Merkel-Harff & Ewert, 1991). After successful associations, with presence of the odor, toads respond with prey-catching behavior to virtually any moving objects. Toads and frogs after they swallowed honeybees and received a pain in their stomach can learn to avoid these painful insects, and their acquired preference can be retained for different intervals of time, ranging from hours, 157 days, and weeks to several months (Cott, 1936; Eibel-Eibesfeldt, 1952; Brower & Brower, 1962). R2, R3, R4. OT TP AT PO/HYP m o d u la to ry loop( 2 .2 ) m o d u la to ry loop( 2 .1 ) MP Figure 6.1 Two neural loops underlying learning behaviors in toads. Loop(2.1) modulates prey-related behaviors mediated by the optic tectum, and loop(2 .2 ) modulates predator-related behaviors mediated by the thalamic-pretectal area. Abbreviations are: R: retina; OT: optic tectum; TP: thalamic-pretectal area; AT: anterior thalamus; PO: preoptic region; HYP: hypothalamus; and MP: medial pallium. Based on the above data, we propose that there are two distinct neural loops for modulating innate releasing behaviors, which will be called loop(2 . 1 ) and loop(2 .2 ). 158 Anatomically, loop(2.1) is: retina, tectum — » AT — > MP — > PO, HYP — » tectum, and loop(2.2) is: retina, tectum — » TP — » MP — > TP, as shown in Fig.7.1. Loop(2.1) is basically the same as loop(2) of Ewert (1987). We propose that loop(2.2) in amphibians is also involved in learning. Current neuroethological evidence suggests that predator- avoiding behavior is mediated by the TP area (Ewert, 1984; 1987b). Functionally, we suggest that loop(2.1) modulates prey-related behaviors centered in the tectum and learning in this loop is stimulus-specific (specialization), whereas loop(2.2) modulates predator-related behaviors centered in TP and learning in this loop is stimulus non-specific (generalization). This fits well with the behavioral data summarized above. Both habituation and conditioning in toads exhibit stimulus-specific properties with prey-catching behavior (Ewert & Kehl, 1978; Chapter 5; Cott, 1936; Eibel-Eibesfeldt, 1952; Brower & Brower, 1962). In both habituation and conditioning related to predator stimuli, however, the effects of learning generalize along stimulus configurations (Ingle, 1976; Brzoska & Schneider, 1978; Burghagen, 1979; Finkenstadt & Ewert, 1988b; M erkel-Harff & Ewert, 1991). For example, toads avoid through conditioning honeybees but not all prey-like stimuli such as mealworms. Loop(2.2) shows resemblance to the modulatory neural connections as assumed by Lara and Arbib (1985). The difference is that we consider loop(2.2) dedicated to predator related learning, whereas in their model, both prey and predator related habituation was modeled in the loop. VI.3 A Neural Model of the M edial Pallium. After the role of the medial pallium is established for the memory locus, let us answer the question of how memory traces are established. Since, unfortunately, very 159 few data are available about the neuronal circuitries and internal synaptic connections of the medial pallium, the following model is to a certain extent a functional model, but we still made an effort to accomodate existed data as much as possible. Preliminary anatomical studies of the medial pallium indicate that this structure organizes in an orientation vertical to the telencephalic ventricle, with cells mainly projecting in this direction (Hoffman, 1963; Kicliter & Ebbesson, 1976). This leads us to suggest that the medial pallium processes optic information in a vertical way by means of functional units, which presumably have a form of vertical column. This bears resemblance to what has been assumed for the organization of the optic tectum (Lara et al., 1982; Cervantes et al., 1985). This way of organization of the medial pallium is consistent with locus specificity of habituation in toad's prey-catching behavior. In what follows, we will only model one such functional unit, which is sufficient to demonstrate most of the properties of hierarchical stimulus-specific habituation. M odel structure For stimulus-specific habituation, the medial pallium receives inputs from the anterior thalamus, a cell type of which (ATL) has been modeled in chapter 4. In the model, an MP column receives input from an ATL neuron, representing a specific visual location. Different locations in the retina would correspond to different MP columns, as habituation is locus-specific. The MP column model includes five types of neurons, among which three of them are MP1, MP2, and MP3 as described physiologically by Finkenstadt (1989a) and the other two IP1 and IP2 are hypothetical for modeling purposes. Each type has the same size of a neuronal array in the column. The connectivity within the types are: PI receives projections from ATL, and projects to MP2. MP2 sends projections to P2 and MP3. MP3 sends inhibition to MP1, which 160 O Input from. n- MP2 MP3 MP1 P2 OUT Figure 6.2 Diagram of an MP column model. Each cell type is a layer of cells numbered from left to right as 1, 2, ..., n. Synapses are indicated by little triangles. Empty triangles indicate excitatory synapses, black triangles inhibitory, and filled triangles indicate habituation synapses. further inhibits P2. The final outcome of the functional column is reflected by a hypothetical neuron OUT that integrates activities from the P2 layer. The connections from MP2 to P2 and MP3 are only plastic connections, that is, their efficacies are 161 modifiable by training, which underlies stimulus-specific habituation. The anatomy of the entire column is shown in Figure 6.2. The basic function of each neuronal type is outlined below, and the formal discription will be given in the next subsection. • The output from AT is such that a stimulus higher in the dishabituation hierarchy (see Fig. 1.1) triggers a higher average firing rate (intensity) in AT than does a stimulus lower in the hierarchy. This response is fed into layer P I, in such a way that a higher intensity stimulus H from AT triggers more neurons than does a lower intensity stimulus L. More precisely, H activates neuron group PI j ~ Pi/, and L activates neuron group PI j ~ P I i, with h> I. • P I layer inhibits MP2 directionally, in the sense that MP2,- is excited by Pi,- and inhibited by neurons Plj, j > i. Layer MP2 has two functions: to shrink and to normalize activities in layer PI. Layer MP2 together with PI converts a firing rate response into a spatial distribution of neuronal activities. Stimuli of different firing rates are represented by different neuron groups in layer MP2. These cells also originate spontaneous activities. • Projections from MP2 to MP3 and P2 are in one-to-one correspondence. These excitatory connections are habituatable. • Due to projections from MP2, MP3 cells also show spontaneous activities. After repetitive stimulation, the efficacies of projections from MP2 become smaller and smaller, leading to less and less spontaneous activitiy in MP3. • One MP1 cell is inhibited by one corresponding MP3 cell. MP1 itself originates spontaneous firing activities. After repetitive stimulation, MP1 increases its level of activity due to reduced inhibition from MP3. 162 • In layer P2, P2^ receives one-to-one projection from MP2i, and unilateral inhibition from MPlj, j > i. The projections from MP2 underlie habituation to a repetitive visual stimulus, and unilateral inhibition from MP3 underlies hierarchical dishabituation. It functions like this. If L is presented first, and after its habituation H is presented, the unilateral inhibition from MP1 has no effect on later presentation of H, since H is represented by a cell group in the array right to that representing L(Fig.6.2). Therefore presentation of H can elicit a new response. If the order of presentation is reversed, i.e. H first L next, the unilateral inhibition from MP1 will inhibit the response to L which would otherwise be elicited. • The OUT neuron integrates inputs from the P2 layer, representing the output from the medial pallium. This output will modulate toad's response to a prey via its indirect projections to the tectum (Figure 6.1). The interaction of the medial pallium with the tectum is beyond the scope of the present study (see the discussion section). Formal description When a stimulus is presented to the toad retina, a response from AT is computed, and its average firing rate is presented in Fig.5.9. We will not use the actual temporal course of AT activity which underlies the computation of the average firing rate for this simulation, because of the following reasons: (1) The average activity is more reliable; (2) There are insufficient temporal courses of activities from retinal recordings available to calibrate the model outputs (Ewert & Hock, 1972; Teeters, 1989). Therefore, we should not rely on detailed temporal courses of AT response except average firing activities to compute MP functions. (3) During the experimental process of habituation (Ewert & Kehl, 1978; see the previous chapter), a prey dummy was presented to the toad continuously without any interruption for an entire session of experiment (tens of 163 minutes). The toad constantly followed stimulus rotation until it gets habituated. Thus, it would be unnatural to use the detailed course of temporal activity which is obtained from an AT cell of a stationary subject. For these reasons and for modeling simplicity, we assume that during stimulus presentation a stable input from AT with the value of an average firing rate is continuously fed to the MP2 layer. a) PI layer Grossberg and Kuperstein (1986) propose a network, called the position- threshold-slope map, that converts different input intensities to different positions in an array of neurons which have different sensitivities (weights) and thresholds that covary. We have shown that (Wang & King, 1988; Wang & Arbib, 1991b), using an array of neurons with different spans for temporal summation, we are also able to convert firing trains of different frequencies into different positions in a next array. The layers PI and MP2 together will transform different temporal activities into spatial activity distribution centered at different positions. Suppose that each layer in the column has n neurons, and mpi(i,t) represents the membrane potential of the i'th PI cell at time instant t dmD j(i,t) dt - ~ Apl mpl(l’t) + B P 1 !(*) + P (6.1) where Apj is a relaxation parameter, and Bpj is a connection weight from the AT input, I(t), which is assumed to be constant during the period of a stimulus presentation, p represents the amplitude of an uncorrelated white noise term introduced to the AT input. The noise is introduced to compensate to some extent the situation that we use a constant AT input for a stimulus while the animal is "seeing" the stimulus from different visual angles in the experiments. The value of p should not be too large to confuse the original 164 order of average firing rates as shown by the ATL cell in response to the worm stimuli in the dishabituation hierarchy. The output of cell P7,-, Npj(i,t), is formed by where 0* is the threshold. In implementation, the choice of 0* is a linear function of position i: 04 = 46.5 i/n + 17.75 for simplicity (any monotonical function of i would do the job). The only difference between PI cells is thresholds, with cells PI j, P I2, •• • > P ln having linearly increasing 0,'s. If a stimulus can trigger P ip it can also trigger PI j for smaller j. Thus, stimulus H with a higher intensity triggers a group of cells PI j , ..., PI h which contains the cell group P l j , ..., P I / triggered by a less intense stimulus L. b) MP2 layer As described before, a stimulus triggers a different number of cells in the P I layer, depending on the intensity of the input. In order to map temporal activity into different spatial locations, we need to get rid of the subset relation between cell groups triggered by different stimuli. The idea is to enforce unilateral inhibition from layer P I to layer MP2. Inspired by Grossberg's (1976) shunting inhibition idea that better fits normalization than subtractive inhibition, we propose that the membrane potential of the zth MP2 cell at time instant t otherwise (6.2) dmmjj2(i,t) d t = ~ A m p 2 m m p 2 (i> t) + ( B m p 2 ~ m m p 2 (i,t)) Irft) - m m p 2 (i,t)Z , lip) (6.3) j>i 165 with 0 < m mp2 (i,0 ) < B mp2 . A m p 2 and B m p 2 are parameters, and Ijj(t) represents the input from neuron P lj, equal to W ij Npi(j,t) for j > i with Wa = 1 and Wy > 0. At equilibrium (namely, dmmp2 (i,t)/dt = 0 ), m ^ d ) = Zm& l i i ----- (6 4) ^ mp2 + I i i + 2 - i 1 i j j > i With the unilateral shunting inhibition of (6.3) and the PI layer, the network exhibits a phenomenon, called cumulative shrinking, as described in the following. Let an arbitrary stimulus activate the neuron group P I 1 , P I 2 ^ •• • > l^at * s’ Npj(l,t) = Npj(2,t) = ... = Npj(i,t) > 0. At equilibrium of the MP2 layer, m m p 2 (k ) =------- B m p 2N pl(i) ^ ----- (6 5 ) A mp2 + NpI(i)(l + X W kj) j = k + 1 for k = 1,..., i. If A m p 2 is small compared to Npj(i), the output of neuron Pl^, mmp2 (i) ~ B m p 2 (Note that the summation in the denominator vanishes for k = i) no matter how large Npj(i) becomes, and the smaller is k the smaller is the value mmp2 (i) due to cumulative inhibition. The cumulative shrinking normalizes and shrinks the activity in the MP2 layer along one direction, whereas the shunting inhibition as demonstrated by Grossberg (1976) only does normalization. This mechanism also performs with non equal values of input (namely Np](l) , ..., Npj(i)). 166 Because of introducing cumulative shrinking, the stimulus pattern is not evenly represented by the neuron group P lj, P l 2, Pl j , but by a distribution of normalized cell activities maximized at M PP In this cell group, the more is a cell to the left (see Fig.6.2) the less it contributes to the representation due to shrunk activity. Thus, a different stimulus can be represented by a cell group maximized at a different cell in layer MP2 depending on the intensity of the stimulus. This is the scheme for the MP to represent a temporal pattern. The effect of the cumulative shrinking (distribution of shrinking activity) can be tuned by choosing an appropriate function for Wy, and in implementation we choose Wy = (j - i)/3 for j > i. Figure 6.3 illustrates the effect of the cumulative shrinking with the AT inputs for the eight worm-like stimuli. The same behavior can also be achieved with a network that applies a power function to a value normalized by a maximum, which can be selected with a winner-take-all network (Grossberg & Kuperstein, 1986). However, we found through simulation that, because of cumulation of inhibition and avoidance of using the power function, the cumulative shrinking network is much more effective and less noise-sensitive. The output of cell MP2j, N mp2 (i,t), is formed by NmP 2 ( i> t) = mmp2( i) + hj (6 .6 ) where parameter hj is the base activity, originating spontaneous firing in the MP2 layer. c) MP3 layer The MP3 layer receives a sole, excitatory, input from MP2. The membrane potential of the 2th MP3 cell at time instant t is mmp3 (i,t), where Figure 6.3 Effect of cumulative shrinking. Each row shows the activity distribution of an array of 50 MP2 cells to a specific input I (Eq.6.1). The membrane potential of each cell is proportional to the diameter of a circle. Zero activity is illustrated by the smallest circle. In the plot, inputs are the AT response to the eight worm-like stimuli in the dishabituation hierarchy (Fig. 4.9), which are indicated by a name at the beginning of each row. The parameter values are: Apj = 1.0; Bpi = 1.0; p = 0.05; A mp2 = 0.1; and Bpl = 1.1. dmmny(i,t) — & ------= -Am p 3 mmp3 (i,t) + yi(t) Nmp2 (i,l) (6.7) where Amp^ is a relaxation parameter and ytft) is the weight of the projection from cell M P2(. W eight yrft) is habituatable with initial value ytfO) =yo-> and its m le of modification will be described in the next subsection. Due to spontaneous activity in layer MP2, there is also spontaneous activity in layer MP2 following (6.7). The output of cell MP3[, Nmp^(i,t), is formed by 168 N m p 3 (i’t) ~ m m p 3 (i) (6.8) d) MP1 layer The MP1 layer receives its sole, inhibitory, input from MP3. The membrane potential of the ith MP1 cell at time instant t, where B mpi is a weight parameter of the inhibitory projection from the MP3 layer. Without external input, the membrane stabilizes at the level of h - 2 - Thus, the MP1 layer generates spontaneous activity. The inhibition from the MP3 layer serves to reduce the level of the MP1 spontaneous activity. The output of cell MW,- is produced by with [x]+ = Max(0, x) to ensure a non-negative response, e) P2 layer This layer receives excitatory input from layer MP2 through habituatable projections, and unilateral inhibition from layer MP1. The membrane potential of the ith P2 cell at time instant t, (6.9) (6.10) dmp2 (i,t) _ _ . I d t = ~ ^ p 2 m p 2 ( i > t ) y i ( 0 ( N m p 2 ( i > t ) - h i ) — B p 2 2 - t ^ ^ m p l ( i ’t) ~ ~ C m p i ] (6.11) j>i 169 where Ap 2 is a relaxation parameter, and yrft) as appeared in (6.7) is the modifiable weight of the projection from cell MP2 ;, which will be described in the next subsection. The term (Nmp2 (i,t)-hj) detects activation above the level of spontaneous firing, namely external stimulation, from the MP2 layer. In other words, only external stimulation from layer MP2 can pass through the plastic pathway. Bp 2 is the strength o f left inhibition (Fig.6.2), and Cmpj is the resting activity of an MP1 neuron before any habituation occurs. From equations (6.6)-(6.9), it is not difficult to find that c mpl = h2 - hl*yo*B mpi (6.12) The output of this P2 cell is generated by Np2 (i,t)= [mp2 (i,t)]+ (6.13) This layer combines habituation of a stimulus pattern with unilateral inhibition, and it is where learning effects are reflected. Both layer P2 and MP3 receive plastic inputs from the MP2 layer. One difference between P2 layer and MP3 layer is that the latter is affected by the spontaneous activity in layer MP2 (Eq.6.7) whereas the former is not (Eq.6.11). In fact, it is crucial for the function of MP3 layer to reduce its spontaneous activity following habituation training, as outlined before. Finally, as the sole efferent neuron of the MP column, cell OUT does nothing but integrate activities from the P2 layer. The membrane potential of the cell at time t 170 ^mout(t) dt (6.14) where Aout is a relaxation parameter. The output of the cell Nout(t) ~ mO U t(t) (6.15) and it represents the result of visual processing of MP functional unit. The cell is not necessary, since the column can directly send as efferents the neural fibres of P I cells. The introduction of this cell facilitates comparison of modeling results with experimental data. Synaptic plasticity Although much work has been done to design more efficient learning algorithms for solving computational problems, studies on modeling biological learning processes are rather scarce. Simulation of long-term memory has been hardly touched. Before the toad habituation model (Lara & Arbib, 1985), Lara (1983) gave a general account of stimulus-specific habituation of the orienting reflex in vertebrates. Gluck and Thompson developed a computational model of the neural substrates of associative learning (1987). A model of synaptic modulation was presented by Changeux and Heidmann (1987) at the level of neurotransmitter receptors and ion channels. Grossberg and Schmajuk (1988) proposed a neural network model that controls behavioral timing. Wang and Hsu (1990) modeled short and long-term habituation and sensitization behaviors in Aplysia. What are the neural mechanisms of habituation? Current neurobiological studies suggest that short-term habituation operates on presynaptic terminals as a result of 171 reduced neurotransmitter release (Kandel, 1976; Thompson, 1986; Dudai, 1989). Long term habituation, however, may be accompanied by structural changes. For instance, the frequency of active zones in presynaptic terminals and the average size of each active zone can be modified by long-term habituation training (Bailey & Chen, 1983; Bailey & Kandel, 1985). Mathematically, the decrease of synaptic efficacy is mostly modeled by build-up of inhibition V (see among others Lara & Arbib, 1985; Gluck & Thompson, 1987) where V q is the normal, initial value of V ; S (t) is the activity transmitted through the synapse, symbolizing training; T , the time constant, governs the rate of habituation; and a regulates the rate of recovery. This simple differential equation models the exponential curve of habituation and spontaneous recovery. When simulating Aplysia's habituation, Wang and Hsu (1988; 1990) made a model to incorporate both short- and long-term memory. An S-shaped curve has been used to model the build-up of both short-term and long-term habituation, described as T dT = <*(V o-v(t» + S(t) (6.16) if stimulated (6.17) H = F (t) otherwise where a , b are control parameters; F (t) designates forgetting of both short- and long-term memory, defined in Wang and Hsu (1990). Since the S-shaped curve has two varying courses depending on the sign of the second-order derivative and a single turning point, 172 the inflection point, it is used to model the two forms of habituation so that both short term and long-term habituation are represented by a single curve and the transfer of short term memory to long-term memory corresponds to the switch from below to above the inflection point. In continuation of the model of the MP column, let us now provide the dynamics for yi(t) as introduced in (6.7) and (6.11) - = oczi(t) (y0 - y t <t)) - pytft) (Nmp2 (i,t) - h2) (6.18) = yzjft) W t) - 1) (Nmp2 (i,t) - hj) (6.19) where T is the time constant for controlling the rate of habituation (reduction of the weight ytft)). The first term in (6.18) regulates recovery, and yg is the initial value (before any habituation training occurs) of yt - . Thus, as long as ytft) evolves below the value of yg, this term will try to bring it back to the level of yg. Constant a controls the rate of recovery. Variable zrft), as defined in (6.19), will have an activity dependent control on the rate of forgetting. The second term in (6.18) regulates habituation, and parameter ft controls the speed of habituation. The term (N mp2 (i,t) — hj) is the effective input from layer MP2 (the spontaneous activity hj in the MP2 layer does not affect habituation), and it multiplies ytft) to form activity gated input in contrast to direct input in (6.16). The intuition behind the activity gated input is that a habituation stimulus is more effective in an early stage of habituation when habituation starts to grow than in a late stage when habituation becomes profound. We can see from this term that habituation is regulated only by presynaptic input, which corresponds to the hypothesis that habituation is presynaptic. 173 If there is no external input to the column, the input (N mp2(i,t) - h j) is equal to 0 (see Eqs. 6.3 and 6 .6 ), and therefore no change is made to the value of z/r). In order to study the behavior of formula (6.19), let us assume without loss of generality that there is a constant input 1 to the presynaptic terminal. Then (6.19) becomes which is similar to the one in (6.17). The inverse form of this differential equation has been used to model the growth of the number of individuals of a species under limited resource (Braun, 1978). With the initial condition t = tg, ztft) = 0.5, the solution to the equation is The monotonically decreasing inverse S-shaped curve has two varying stages is at t = tg. The transition speed from the first fast-decreasing stage to the second slow- decreasing one is controlled by the slope at the inflection point, which is equal to -yjA. The larger is y, the quicker is the transition. The overall speed of the decrease of the curve ztft) is controlled by the value of tQ, and the larger is tg the slower is the speed of decrease. Furthermore, a larger tg corresponds to a smaller ztfO), the initial value of z* . Therefore, the initial value of zz - in equation (6.19) governs the speed of decrease of the curve. Figure 6.4 shows two groups of z; curves with different values of yand tg. dzi(t) = r*&) - 1) (6.20) (6 .21) depending on the sign of the second-order derivative and a single inflection point which 174 0 . 8-- I 0 . 6- 0 . 4 - 0 . 2 - 20 15 10 Figure 6.4 Ten ztft) functions with different parameter values. Parameters tg = 5, 6 , 7, 8 , 9 respectively, and y= 0.5 for the thick curves and 1.0 for the thin curves. The effect of ztft) on the synaptic weight is to modify the speed of forgetting of habituation. As seen from Fig.6.4, before the inflection point zi(t) holds a relatively large value, and forgetting as manifested by the first term of (6.18) is relatively fast; after the inflection point, z/f) holds a relatively small value, and forgetting is relatively slow. These two phases are used to model two phases of memory: short-term memory (STM) and long-term memory (LTM). In an extreme case, when zrft) equals its asymptote, 0, habituation as accumulated by training in (6.18) never involves forgetting. It is imaginable that with very small value of ztft), forgetting may take longer than the time span of a certain animal. This is equivalent to say that long-term memory is never lost. 175 The time course of transition from STM to LTM can be fully controlled by the model parameters yand tg. It should be mentioned that equation (6.18) and (6.19) operate at different time scales, although they both are controlled by presynaptic training signal. Function z/t) operates on a much longer time scale than does yrft), because it would take much longer to see long-term habituation than to see initial effects of habituation training. In other words, the same input (Nmp2 (i,t) - hj) affects value z / 1) in (6.19) much slower than it does y^t) for in (6.18). The choice of the parameters in (6.18) and (6.19) can be decided when modeling a specific set of data. We will see in the next section the choice of these parameters for modeling toad’s habituation. Unfortunately, there are few quantitative data available for long-term habituation in toads. Thus, the only constraint there for choosing values of yand tg is that one session of stimulation (running tens of minutes) is not able to drive ztft) over the inflection point, since we know that profound habituation can be reached only with a series of training sessions (Ewert, 1984). Only yi(t) is embedded in formulae (6.7) and (6.11) of the MP dynamics as the synaptic weight cell MP2P As we see from (6.18) and (6.19), synaptic modification is local, and therefore the above analysis of the synaptic plasticity can be readily coupled with the MP dynamics. It turns out that parameter tuning for the learning rules is quite straightforward when the simulations presented in the next section were made. The S- shaped habituation rule (6.17) has been used to simulate long-term habituation in Aplysia (Wang & Hsu, 1990) for up to 3 weeks simulation time, and the model results are generally comparable with the quantitative data (Carew et al., 1972). The synaptic plasticity rule presented here can produce the comparable LTM traces with the rule (6.17) due to the common features. Thus, we do not see a problem to simulate long-term habituation in Aplysia with the new learning rule proposed above. 176 VI. 4 C o m puter Sim ulation The single column model of the medial pallium presented before has been simulated. Before the results are presented, the following points need to be clarified: • The stimuli were originally presented to the model of the toad retina, and all the calculations from the retina to the anterior thalamus have been carried out by the model presented in chapter 4. The computation in the medial pallium model is based on the average firing activities of the anterior thalamus. • The results presented below are from a single MP column, representing the toad's response at a specific visual location. • To compare with the prey-catching response of the toad (see Ewert & Kehl, 1978; Chapter 5), the output of the medial pallium needs to affect the tectum where prey- catching behavior is generated (Fig. 6.1). Modeling of this interaction is beyond the scope of this thesis (see the final conclusion). Rather, the response from MP is viewed as modifying the initial orienting activity. That is, the medial pallium in this model plays a scaling role on prey-catching behavior (see more discussion in section 6 ). For each array of MP cell types, 50 cells have been simulated (n = 50). The results are shown by the activities of the single output cell of the column: OUT. The result seen from cell OUT is interpreted as a coefficient that modifies the number of initial orienting turns made by the animal in the first min interval when the stimulus is seen. Time was measured like this: a basic time step 0.05 (discretization step of the differential equations) corresponded to 1 sec* . Each stimulus was continuously presented for 60 min, and then it was switched to another stimulus. The second stimulus was also * A shorter time step (like 1 ms) could have been used, and the result should be the same. The major concern was the computation time. 177 presented for 60 min. For visualization purposes we only present 10 data items for each presentation, each of which corresponds to a 6 min step. In all the simulations presented below, all the stimuli were moved from left to right relative to the animal model as in the experiments, and they had the same length 20 mm and height 5 mm. We have simulated all the experiments done by Ewert and Kehl (1978) related to the 8 stimuli in the dishabituation hierarchy. Most of the simulations are presented in Figures 6.5-6.10, and some simulation results are not presented here for the sake of space. Table 6.1 provides the values of all the parameters used in the MP column model, together with the numbers of the formulae where these parameters appear. Appendix A provides all equations used in the the MP column model. Table 6.1 Parameter Value of the MP Column Model P a ram e te r A pi Bpi P Amp2 Bmp2 h i F o rm u la (6 .1 ) (6 .1 ) (6 .1 ) (6.3) (6.3) (6 .6 ) V a lu e 1 . 0 1 . 0 0.05 0 . 1 1 . 1 0 . 6 P a ra m e te r A mp3 B mp3 h 2 A p 2 B P 2 Aout F o rm u la (6.7) (6.7) (6.9) (6 .1 1 ) (6 .1 1 ) (6.14) V a lu e 1 . 0 1 . 0 0.9 1 . 0 0 . 1 0 . 1 P a ra m e te r y 0 X a P y to F o rm u la (6.18) (6.18) (6.18) (6.18) (6.19) (6 .2 1 ) V a lu e 1 . 0 2 0 0 3.2 24 0 . 1 50 178 In the first set of simulations, two pairs of stimuli were studied. In Fig.6.5A, stimulus f, the left-point triangle, was presented first. The response of the model was habituated after continuous presentation of the stimulus. After 60 min simulation time, stimulus d, the rectangle, was presented, and it triggers a new response. That is, d can dishabituate the habituated response to f. The reverse order of presentation was studied in Fig.6.5B, and apparently no dishabituation was demonstrated. In Fig.6.5C and D, another pair of stimuli was simulated. From the results we can see that stimulus b, the right-pointing triangle, was able to dishabituate the habituated response to the rectangle, but stimulus d was unable to dishabituate the habituated response to b. Figure 6.5E shows the corresponding experimental results. The simulated toad model is just one "average individual", whereas in the experimental results, a summary was presented out of a group of individuals. From the comparison between Figs.6.5A-D and Fig.6.5E, it can be concluded that our model can clearly reproduce the experimental data with this set of stimulus pairs. The reduction of the initial activity of the second presentation as particularly seen in Fig.6.5A is in the model due to overlapping in the population coding (see Fig. 6.3). The cells participated in coding stimulus d are partly involved in coding f. Thus, when f was habituated, the part of cells representing d was also habituated, and therefore the initial response to d was not as great as if without previous presentation of f. In Figure 6 .6 , we studied dishabituation among triangles which are mirror images of each other. In Fig.6 .6 A, the left-pointing triangle (f) was presented first and the right- pointing triangle (b) was then tested. Stimulus b was able to dishabituate the habituated response to f, but no dishabituation was shown when the order of presentation was reversed, as shown in Fig.6 .6 B. As shown in Fig.6 .6 C and D, no dishabituation was 179 F igure 6.5 Computer simulation of habituation and dishabituation of prey orienting response in toads. The response was taken from the OUT cell of the MP column, and it was measured as the relative value of the initial response of each frame that is scaled to the same value measured experimentally. Each stimulus was continously presented for 60 min simulation time, corresponding to 3000 steps. The response was collected every 6 min. In a frame, the response to the first stimulus is indicated by little circles and that to the second one is indicated by little triangles. A. Stimulus f was first presented and habituated, and then d was tested. B. The reverse order of presentation as in A. C. Stimulus d was first presented and habituated, and then b was tested. D. The reverse order of presentation as in C. E. The experimental results obtained by Ewert and Kehl (1978) are shown using the same combination of the stimuli. i - O Z -0E I Il'l lH ilH lll l 081 181 shown when stimuli b and c were presented in either order. The model was not able to tell the difference between b/c by the method of dishabituation. This is because no distinction has been made by processing of the anterior thalamus (Fig. 4.9). Fig.6 .6 E shows the corresponding experiment results. The model reproduces the data except in the pair of b/c where weak dishabituation in the experiments was shown, but not significant. In another set of simulations, the isosceles triangle (a) was tested against d and b . Fig.6.7A and B show the results with the rectangle. When d was first presented, and a was then tested, a full new response to later presentation of a was exhibited in the simulation (Fig.6.7A). When the order of presentation was reversed, however, no significant dishabituation occurred (Fig.6.7B). If b was first presented and habituated, as shown in Fig. 6.7C, a was able to trigger a new response. But in the inverse situation (Fig. 6.7D) b was not able to dishabituate habituation to a. If inspecting Fig.6.7B carefully, one can see that the model showed a bit higher activity when d was first presented. This is because, in the MP column model, the left inhibition of layer MP1 onto layer P2 was not strong enough to fully depress the activities triggered by presentation of d (seeEq.6.11 for details). Fig.6.7E shows the corresponding experimental results. In the following set of simulations, "complex patterns" of g and h were investigated. In Fig. 6 .8 A, d was first presented and hibituated. Immediately afterwards, the right-pointing triangle with a dot (g) was tested, and it was not able to elicit a new response. If, however, g was first habituated, occurrence of d elicited remarkable dishabituation. In Fig.6 .8 C, the striped pattern (h) was first presented and habituated. This time presentation of g was able to dishabituate previous habituation. In the inverse situation, i.e., if g was first presented (Fig. 6 .8 D), stimulus h failed to trigger a new response. Fig. 6 .8 E shows the corresponding experimental results. 182 F igure 6 . 6 Computer simulation of habituation and dishabituation of prey orienting response in toads. A. Stimulus f was first presented and habituated, and then b was tested. B. The reverse order of presentation as in A. C. Stimulus c was first presented and habituated, and then b was tested. D. The reverse order of presentation as in C. E. The corresponding experimental results obtained by Ewert and Kehl (1978). See the legend of Fig. 6.5 for other explanations of this figure. 183 6 4 0 30 - 20 - 10 - D 40 30 - 20 - 10 - 30- 20 1 0 - l O m i n illNllllllllllWHIlHlliillHniiibiiKHlMMHH ! 30- [ 2 0 - ! io- lO m i n H 1111H iH H l ) 184 Figure 6.7 Computer simulation of habituation and dishabituation of prey orienting response in toads. A. Stimulus d was first presented and habituated, and then a was tested. B. The reverse order of presentation as in A. C. Stimulus b was first presented and habituated, and then a was tested. D. The reverse order of presentation as in C. E. The corresponding experimental results obtained by Ewert and Kehl (1978). See the legend of Fig. 6.5 for other explanations of this figure. 185 B 40 30 20 - 10 - » ■ 1 1 * X a D 40 30 - 20 - 10 - i i i i i i i i i I a i R 2 0 - l O m i n I......... H i.m i 40 I Q m i n M Ul I'H(i III H HU Ml HI lit I Mil H (I I HIM 186 Figure 6 . 8 Computer simulation of habituation and dishabituation of prey orienting response in toads. A. Stimulus d was first presented and habituated, and then g was tested. B. The reverse order of presentation as in A. C. Stimulus h was first presented and habituated, and then g was tested. D. The reverse order of presentation as in C. E. The corresponding experimental results obtained by Ewert and Kehl (1978). See the legend of Fig. 6.5 for other explanations of this figure. Illllll I mini M 1 1 1 ' 1 < )* • hiituni1 nmi UjUIQl i - 0 2 0£ -01 02 l i l i l i j » I I I » I I I iiiiBlj oz OS a g v m 188 In Figure 6.9 A and B, we compared stimulus g and b where the difference lies only in the appearance of a dot. If g was habituated first, presentation of stimulus b elicited a full response (Fig.6.9A); if the order of presentation was reversed, presentation of g showed no dishabituation. The same situation is true when g was compared with c. The model responded with strong prey-catching activity to presentation of c after the model habituated to stimulus g (Fig.6.9C), but stimulus g was not able to dishabituate habituation to stimulus c (Fig.6.9D). The corresponding experimental results are shown in Figure 6.9E. In Figure 6.10, we studied the pair of the top stimulus a and the bottom stimulus h in the dishabituation hierarchy. First, stimulus h was presented until habituation occurred, and then stimulus a was presented instead of h. A remarkable dishabitation was seen from the model response, as shown in Fig.6.10A. However, after habituation to stimulus a, the model did not show any new response to presentation of stimulus a. Figure 6 .10C shows the corresponding experimental results. A notable phenomemon with the a/h pair is that when a was presented after h, it elicited a larger initial activity than did h. This "overshooting" of dishabituation is what exactly occurred in the corresponding experiment (Fig.6.10C). What yields this model phenomenon? This is because of the way of normalization of the cumulative shrinking mechanism. According to (6.5), after cumulative shrinking, the membrane potential of the MP2 cell activated most strongly by a pattern (referred to as the representative cell, cf. Fig.6.3) is equal to B m p 2 Npj(i,t)l(A m p 2 + Npj(i,t)), which is a monotonically increasing function of Npj(i,t). Since Npj(i,t) is proportional to the input from the ATL layer of the anterior thalamus (Eq. 6.1), the bigger is the AT response to a pattern, the larger is the activity of the MP2 representative cell. The difference in activities of representative cells propagates through other layers of the column model until it reaches the OUT cell, where 189 F igure 6.9 Computer simulation of habituation and dishabituation of prey orienting response in toads. A. Stimulus g was first presented and habituated, and then b was tested. B. The reverse order of presentation as in A. C. Stimulus g was first presented and habituated, and then c was tested. D. The reverse order of presentation as in C. E. The corresponding experimental results obtained by Ewert and Kehl (1978). See the legend of Fig. 6.5 for other explanations of this figure. 190 B 4 0 30 - 20 - 10 - t i feS toga. * b l B I !■ D 30 20 - 10 - * g 40- 30 2 0 - l O m i n 30- 2 0 - < t f I nM ,f * 1 M ' I H H ( 1 1 H * lO m i n 191 B t ntfi0krh. a ^ iBiill h R , 10- T III 11II T lO m in T IIIIIII T Figure 6.10 Computer simulation of habituation and dishabituation of prey orienting response in toads. A. Stimulus h was first presented and habituated, and then a was tested. B. The reverse order of presentation as in A. C. The corresponding experimental results obtained by Ewert and Kehl (1978). See the legend of Fig. 6.5 for other explanations of this figure. 192 the initial overshooting caused by stimulus a was seen after habituation to h because the AT response to a is larger than to h. In the last simulation we shall present, a separate process of dishabituation was investigated. The same procedure in the experiment, as presented in the last chapter (see Fig.5.6), was adopted in the simulation. Stimulus f was first presented for 60 minutes simulation time, until the response was habituated. Then stimulus b was presented for 0.5 min, and immediately afterwards f was brought back again. The simulation results are presented in Fig.6.11. Each little square in the plot represents the model response in a 4-min step, but the open circle represents half of the model response within that 0.5 min period, corresponding to the number of orienting turns in 0.5 min in the experiments. A strong dishabituation was shown when b was presented, just as in Fig.6 .6 A. But habituation to stimulus f maintained when b was withdrawn and f was presented again. It can be concluded that presentation of b did not counteract the habituation effects caused by previous presentation of f, exactly as happened in the experiments (see Fig. 5.6 for a comparison). In all simulation results presented so far, as demonstrated by Figures 6.5-6.11, the model reproduces the experimental data remarkably well. The same good results were also found in other simulations corresponding to the Ewert and Kehl experiments, but are omitted here for space. Not only the dishabituation hierarchy is fully preserved as indicated by average firing rate of the AT model presented in chapter 4, but also habituation properties, such as habitation curves and overshooting of dishabituation, are demonstrated by the model as only seen in quantitative experimental data. 193 60 50 - 40 - 30 - 20 - 10 - -O ' t ^ ^ ^ flS S £ § 3 "f I B & £ g S 9 R g B B B b _ ^ h ttK ^ n B s 8 s m B m S S § * Figure 6.11 Computer simulation of separate process of dishabituation of prey orienting response in toads. Stimulus f was first presented for 60 min of simulation time, and then b was tested for 30 sec. Immediately afterwards, f was tested again. See text and the legend of Fig. 6.5 for other explanations. V I.5 P red ictio n s This model of the medial pallium extends visual information processing one step further than the anterior thalamus, which is based on retinal and tectal processing. The theory we have developed in the previous sections yields a number of neurobiological predictions. They are presented with respect to the following aspects. D ishabituation m echanism s Why can presentation of a certain stimulus dishabituate habituation to some stimuli, but not to others? That is, what causes dishabituation? The model allows us to predict that dishabituation is nothing but the release of a normal prey-catching behavior. The reason that presentation of a second stimulus fails to release a new response, however, is because of inhibition caused by habituation to the first stimulus. Thus, 194 dishabituation is not a result of facilitation, as the name may suggest. The failure of dishabituation is caused by cross-talk between the first and the second stimulus. In light of this prediction, dishabituation naturally has a separate neuronal process, as indicated by the experimental results of Fig.5.6. Underlying neuronal structures The success of the MP column model in simulating the experimental data largely results from the structure of neural model as shown in Fig.6.2. The most characteristic feature of the model, perhaps, is the unilateral projections both from P I to MP2 and from MP1 to P2. This intrinsic asymmetry of anatomy underlies the behavioral asymmetry exhibited in the dishabituation hierarchy. We accordingly predict that unilateral projections exist in the neuronal organization of the medial pallium, and these projections are inhibitory. With reference to known anatomy of the medial pallium, we predict that these unilateral projections extend parallel to the telencephalic ventricle, and they communicate horizontally over short distances within a functional unit which is presumably organized in the vertical direction. Long-term memory effects In habituation training, one session of experiment usually takes tens of minutes with a large variability (Ewert & Kehl, 1978; Chapter 5; Cervantes-Perez et al., 1991). An interesting phenomenon occurs if we observe the data curves carefully (see Fig. 6.5- 6 . 1 0 for experimental data curves): toads did not reduce their orienting activity to zero, but rather saturated at some level. In experiments, training with one stimulus was often stopped when the toad reacted at some low level of activity stably for a period of time (Ewert, personal communication, 1990). The same phenomenon was also observed 195 when the experiments reported in the previous chapter were conducted. The question is why does prey-catching orienting activity saturate at a non-zero level? Our model offers an explanation for the phenomenon. According to the model of synaptic plasticity (Eqs. 6.18 and 6.19), we know that at equilibrium the weight of a plastic synapse will setde down at a yn z,-(t) Ji(t) = --------------- 0 1 (6 .2 2 ) a zi(t) + P(Nmp2 (i,t)- hj) which is a non-zero value with a constant input from the MP2 layer. This is the reason why in all simulation results shown above, the model also saturated at a non-zero activity level. It is straightforward from (6.22) that if Zf(t) becomes smaller, the saturation point will be lower. According to the analysis in Section VI. 3, ztft) can be significantly reduced only following long-term habituation training, which can be obtained with a series of training trials. We thus predict that, due to long-term memory traces, toads will saturate at lower and lower level of activity as a series of training proceeds, and they will eventually reach zero level. Theoretically speaking, the same long-term effect should occur if sufficient long time training is applied even with one session. It is interesting to note that Cervantes-Perez et al. (1991) once tested an animal for 220 min, in order to reach the zero level. But one needs to be careful with such a long time behavioral experiment, since other factors might cause the animal to stop prey-catching, such as fatigue, animal's motivation and so on. So we suggest this prediction to be tested with a series of trainings. 196 Resting habituation We know from the model that if a higher intensity stimulus H is presented first to the MP model and later habituated, a lower intensity stimulus L cannot elicit a new response. Would habituation still occur toward repeated presentation of L when L cannot elicit a response? Our model predicts that a habituation process still takes place in this situation. This is because in the model habituation is due to the reduction of the efficacies of the synapses that MP2 cells make on P2 and MP3 cells and it solely depends on presynaptic inputs from MP2 cells. The reason that L cannot elicit a new response is due to unilateral inhibition resulting from habituation to H that exerts on P2 cells. Although unable to release a prey-catching response, stimulation of L elicits normal MP2 activities, thus causing habituation process to happen. We refer to this kind of habituation as resting habituation. The resting habituation prediction can be best tested by comparing the prey- catching orienting responses of two groups to presentation of L. The first group receives no stimulation after they habituate to H, while the second group do receive repeated stimulation of L after they habituate to H. Comparison experiments should be performed after a period of forgetting, so that habituation due to presentation of H does not prevent the first group from reacting with prey-catching response. Influence of habituation on dishabituation Our model provides a framework which allows us to investigate phenomena beyond those recognized so far by experimentalists. Particularly, we consider in a stimulus pair that the relative amplitude of the initial response of the second presentation is not a coincidence, but tells a rich story of how the first presentation affects the second 197 one. To summarize what is studied in the last section, we offer the following explanations: • The overlapping of cell participation in representing different patterns leads to a smaller amplitude of initial activity elicited by a dishabituating stimulus, as exemplified by Fig.6.5A and its corresponding experimental data. • A tiny response is elicited by a second presentation of stimulus L because inhibition caused by prior presentation of stimulus H has not fully blocked channels activated by L. An example can be found in Fig.6.7B and its corresponding experiment. • Overshooting of dishabituation is a result of higher input to the medial pallium of the second presentation of H in comparison to the first presentation of L. The overshooting phenomenon is particularly clear in Fig.6.10A and its corresponding experiment. VI.6 D iscussion In the MP column, we have introduced five types of neurons, each of which plays a different role in visual information processing. The existence of three of them, MP1, MP2, and MP3 is supported by physiological recordings. In the model, all these three types show spontaneous activities. MP2 cells are not affected by habituation (Fig. 6.2), whereas after habituation training MP3 cells decrease and MP1 cells increase their spontaneous firing activities. These properties of model cell types conform with the original classification of these cell types (Finkenstadt, 1989a). The fact that MP shows a significant increase in 2DG uptake after habituation (Finkenstadt & Ewert, 1988a) may be a result of net outcome of a relatively greater increase in MP1 activities and a decrease in MP2 activities. Introduction of P I and P2 types is based on computational 198 considerations, and we predict that cells with these computational properties exist in the medial pallium. Spontaneous activities as shown in MP1, MP2 and MP3 cells have an important computational role in the model. A cell of this type can either increase or decrease its stable activity level, dependent on different situations, whereas a cell without spontaneous firing can only increase its activity level after stimulation. It is the reduction of spontaneous activity of MP3 cells after habituation that gives rise to inhibition exerting upon a later presentation of a different stimulus. The aftereffects of habituation manifest through decrease and increase in spontaneous activity of MP3 and MP1 cells respectively, and finally through unilateral inhibition stemming from MP1 cells. From another perspective, our modeling suggests important computational values of having spontaneous activities in the central nervous system. The MP column model shown in Fig. 6.2 does not appear to be a computationally minimal structure, since MP1 can be merged into MP3, which then sends unilateral excitation to P2 cells to the left, instead of MP1 unilateral inhibition. However, there is a conceptual difference between the two structures. The current structure conforms with the view that a lower intensity stimulus L cannot elicit a new prey-catching response after the animal habituates to a higher intensity stimulus H because earlier habituation yields more inhibition on the later response to L. The structure after merging corresponds to the view that the animal fails to respond to L after habituation to H because necessary excitation for eliciting a response to L is reduced due to earlier habituation. We have proposed the cumulative shrinking network as a key mechanism for mapping temporal activities into population coding referenced by spatial positions. This general mechanism gracefully fits into the context of AT outputs and MP computing tasks, underlying a range of desirable simulation performances. In order for the 199 mechanism to take place, we need to identify in the medial pallium a layer of neurons with different thresholds that have a correlation with the direction of unilateral projections starting from these cells. Furthermore, these unilateral projections exhibit shunting inhibition on their target cells. These theoretical speculations need to be verified experimentally. In the model, different visual stimuli are represented by activity distributions of different cell groups (see Fig.6.3). Presentation of a new stimulus triggers a new response because it activates a different group of cells. Hierarchical stimulus specificity exhibited in amphibians can be explained by unilateral inhibition within a functional unit, as depicted in Fig.6.2. The dishabituation mechanisms we expound in this chapter are different from those revealed in the study of A p ly sia (Kandel, 1976), where dishabituation caused by sensitization is due to facilitation from an external channel. On the other hand, our model structure is consistent with neuronal substrates proposed by the dual-process theory (Groves & Thompson, 1970) which claims that sensitization (dishabituation) develops independently. This medial pallium model provides a computational basis for the comparator model (Sokolov, 1960) of stimulus specific habituation in vertebrates. If we leave out the unilateral inhibition from MP1 cells, the same structure can readily exhibit bilateral dishabituation implicated by the comparator model. In the model, short-term and long-term memory are modeled by a single S-shaped curve. Long-term memory develops as training accumulates. Conciseness is one functional advantage of modeling two phases of learning by a single curve. The same idea of using S-shaped curve was employed in modeling Aplysia's habituation (Wang & Hsu, 1990), but they used three equations to decribe the entire learning process: acquisition, short-term forgetting and long-term forgetting. The present model combines 200 two forms of forgetting into one, and integrates the forgetting term with acquisition (see Eqs. 6.18 and 6.19), thus it only needs two curves to describe the entire process. The model does not imply, by combining two learning processes into one equation, that STM and LTM share the same neural m echanism s. W hat it says is that at the electrophysiological level, the two forms of memory could be represented by a single dynamic process. This combination might yield some insights into the relation between STM and LTM biological processes, since it is hard to imagine that the two memory forms have totally independent processes. This view is also consistent with the fact that in Aplysia the two forms of habituation share the same locus - the presynaptic terminals of sensory neurons (Castellucci et al., 1978). In our model, value ytft) (see Eq.6.18) can be modified by presynaptic stimulation quite rapidly during a short time period while value zi(t) (see Eq.6.19) stays basically the same, and in terms of the synaptic mechanisms involved the change in y tft) would correspond to the change in neurotransmitter release. A change to value z / 1) needs much more profound training in the model than a change to value y /f), and this change would correspond to structural changes in presynaptic synapses (Bailey & Chen, 1983). Available toad data on long-term memory does not permit us to have a fine tuning on the model of LTM. Perhaps the only data item is a recent observation by Cervantes- Perez et al. (1991) that stimulus-specific habituation may last hundreds of days. Nonetheless, consideration of long-term memory has provided valuable information for modeling the structure of the medial pallium. A preliminary version of the MP model proposed previously (Wang & Arbib, 1991b) can account for short-term habituation effects, but fails to explain long-term storage. With the present model, habituation can maintain very long time depending on parameters chosen for formula (6.19), since 201 presentation of a different stimulus does not disrupt previous memory, as it did in the previous version. We have assumed a scaling role of MP on prey-catching behavior which is presumably mediated by the tectum. The scaling effect may be applied to the tectum, via the preoptic region and the hypothalamus, in a way similar to the T5 base modulator model for prey-catching behavior (Betts, 1989). In the original hypothesis (Betts, 1989), only TP inhibition (modulation) has been extensively discussed. We propose that MP indirect projections (Fig. 6.1) modulate the state of the T5 base, thus scaling the prey- catching behavior generated by T5 neurons. The scaling role hypothesized here is a little more subtle than a multiplier. Without input from MP (for example in the case of lesion), the scaling has no impact on tectal activities; other than that it basically plays a role of multiplication. This is consistent with the lesion results that after bilateral lesions of MP toads still exhibit accurate prey-catching response but they show no hint of habituation (Finkenstadt & Ewert, 1988a). It has also been reported that after bilateral transection of the telencephalon, the toad fails to show prey-catching behavior (Ewert, 1967). This is perhaps because of the removel of the striatum that is believed to be involved in mediating prey-catching behavior (see Ewert's loop(l), 1987a). In terms of connections, we speculate that the MP influence apply to large pear cells and/or pyramidal cells through contacts with their basal dendrites residing in deep layers of the tectum, in contrast to retinal influence that applies exclusively to the apical dendrites locating in superficial layers of the tectum (Szekely & Lazar, 1976). This speculation is supported by the evidence that hypothalamic efferents terminate in deep tectal layers (Neary & Northcutt, 1983). MP influence applying to the basal dendrites of these cells makes the scaling role more feasible since they are near the somata. 202 The single column model needs to be extended to a matrix of columns to address directly questions like locus specificity. A straightforward extension would demonstrate locus specificity of habituation, if the mapping from AT to MP is topographical. But real situation is more complex. As shown by Ewert and Ingle (1971), after habituation at a specific visual location, the toad's response to a stimulus is inhibited to the nasal direction, but is facilitated to the temporal direction. Thus, there are global cross links among dishabituation at different spatial locations, which would require global interactions among different columns in the MP model. It should be made clear that the MP column model represents a much simplified view of MP functions. MP is certainly involved in conditioning and learning related with predator stimuli (see Section VI.2), and in linking olfactory processing (Merkel-Harff & Ewert, 1991). It is the structure where two learning loops interact with each other. In fact, the anatomy of the model is to us a model with minimum complexity of neuronal circuitry that can explain various experimental data. For economic reasons we suspect that in anurans, stimulus-specific conditioning might share the same neural circuitry for stimulus-specific habituation. Our modeling should help to elucidate the underlying neural mechanisms of conditioning. 203 CH A PTER V II CONCLUSION Learning is a process of acquiring knowledge about the world, and memory is the retention or storage of that knowledge. The ability to learn is the essence of animal intelligence, and our greater human ability to learn distinguishes us from other animals. The exploration of learning mechanisms is at the heart of this thesis. In Part 1, learning is pursued with abstract networks of functional units, and in Part 2, it is pursued with a specific behavior of a specific animal. This dissertation encompasses both associative learning (the attentional learning rule, Part 1) and non-associative learning (habituation, Part 2). In the first part, learning is embodied in the process of acquiring abstract temporal sequences. Learning as acquisition of knowledge would not make much sense without utilization of the acquired knowledge in solving problems encountered in the environment. Two major problems addressed are sequence recognition and sequence reproduction. We have demonstrated that complex behavior of temporal order can be accomplished by temporal linkage - local and remote - among different levels of sequence components. The most important contributions of abstract temporal sequence learning are briefly summaried below: • A sequence storage mechanism is proposed, that is, sequences are stored by the distributions of connection weights converging on individual units. This storage mechanism results from a computational model of short-term memory. • New sequences are learned by a new rule, called the attentional learning rule, that combines a Hebbian rule and a normalization rule with sequential system activation. 204 • A maximization principle is proposed for sequence recognition based on monotonic increase of input potential. • It is proven that any complex sequence can be recognized, and this recognition shows interval invariance and proper resistence to distortions in symbol arrangements. • It is proven that any complex sequence can be reproduced, and this reproduction shows interval maintenance. • The mechanism of degree self-organization is proposed for optimally acquiring context lengths in complex sequence reproduction. • The hierarchical organization principle is proposed for many layers of the STM model to recognize hierarchical sequences. In the second part, learning is embodied in identifying the neural mechanisms of habituation. In contrast to Part 1, we adopt the approach of computational neuroscience (Arbib, 1989; Schwartz, 1989). Input stimuli are images of dot matrices created by a computer icon editor. The input information flows through a retina model, a tectum model, an anterior thalamus model, and finally arrives at the medial pallium model. Each of these models is a layered neural networks abstracted from known neuroanatomy and neurophysiology, and it does its own processing on the optic input and sends efferents to the next nucleus model. In the medial pallium model, visual stimuli first imprint their traces by modifying the synaptic efficacies of certain neurons, which underlies habituation behaviors in toads.The simulation of pattern discrimination and stimulus specific habituation was done after the retina model was fixed. Since the retinal responses are constrained strongly by the experimental data (Ewert & Hock, 1972; Tsai & Ewert, 1987), the retinal model, even its various parameters, cannot be modified to fit other simulation purposes since otherwise the original match between the model and the data could not be preserved. This represents a real challenge for later simulations based on 205 retinal output. On the other hand, this requirement also provides a strict testbed for hypotheses and neural models. This part itself provides an example of consistent modeling of the interacting brain structures of the retina, the tectum, the anterior thalamus, and the medial pallium. Another challenge we have faced is how to manage a large-scale neural simulation, since a large-size network simulation seems unavoidable if one wants to seriously study the anuran vision by a theoretical approach. The main contributions of this part are briefly summarized below • A modified version of the Teeter's retina model is suggested and implemented. The modified retinal model generates the responses of retinal ganglion cells that closely match the experimental data. • We propose that visual pattern discrimination is achieved with temporal coding by a network of cells in the anterior thalamus. The AT model receives excitatory SP inputs from the tectum and inhibitory R3 inputs from the retina. Simulation demonstrates that the AT model clearly shows the same order of responsiveness to the worm stimuli as in the dishabituation hierarchy. • While testing the model predictions, we show by behavioral experiments that the dishabituation hierarchy is affected by the stimulus/background contrast, but is not affected by the stimulus size. We thus suggest that toads exhibit size invariance in visual pattern discrimination. We also find that dishabituation by a second stimulus has a separate process from habituation to a first stimulus. • A column model for the medial pallium is proposed, comprising five types of cells. The cumulative shrinking is proposed for shrinking and normalizing activities unilaterally. Synaptic plasticity is modeled by two interacting differential equations, which are capable of simulating acquisition, and short-term and long-term forgetting at 206 the electrophysiological level. Extensive computer simulation demonstrates that the model reproduces the experimental data remarkably well. There are two common characteristics behind the two case studies of learning. First, learning involves modification of the weights of connection pathways. In the case of temporal sequence learning, the connection strengths between units (local neuron populations) are modifiable by the attentional learning rule; and in the case of toad's habituation, the efficacies of synaptic terminals of certain MP neurons are modifiable according to the habituation rule defined. These are the traces of imprinting. Secondly, learning involves adaptive changes in behavior. In the first case, learning turns a network from sequence insensitive to sequence sensitive, so that the network develops the ability to recognize temporal sequences and to reproduce them by certain cues. Sequence recognition and reproduction certainly possess vital value for animals to adapt to the world. In the second case, stimulus-specific habituation, while ignoring a repeated stimulus of no consequence, is to keep the animal alert to a novel stimulus which may signify danger, or food, or perhaps a mate. It is a memory not to respond, which is as important as a memory to respond (the first case). While the formation of LTM has the same form of manifestation in both parts: modification of connection weights, STM, however, involves different mechanisms. In the first part, STM is embodied in maintenance of the activity of a unit for a short time period before it is coded into LTM by the attentional learning rule; whereas in the second part, STM also involves a change in synaptic weights, and LTM shares the same mechanisms as STM but involves longer training. Actually, the two forms of STM may coexist in the nervous system, with the first kind being due to reverberatory neuronal loops and the second kind being due to short-term synaptic plasticity. Their time scales 207 are also different. STM of the first kind lasts for, as the model requires, hundreds of milliseconds to seconds. The second kind, however, lasts for a few minutes or longer. In specific domains studied in this dissertation, temporal links (cf. Chapter 1) are reflected by a certain distribution of synaptic weights toward the unit of a target detector (sequence recognition) or an established projection from a detector to a symbol unit (sequence reproduction). After these links are formed, we have demonstrated that presentation of a previously learned sequence can lead to the activation of a certain detector, or presentation of some cue can lead to the recall of the rest of the learned sequence. Spatial links, on the other hand, are reflected by common MP2 cells in the medial pallium model that represent different visual objects, as the result of AT responses of differing average firing rates. After the links are created, the toad model shows proper habituation or dishabituation toward a visual stimulus. F u tu re P erspectives Obvious future research directly as a result of the work in this thesis is the performing of neurobiologic al and psychological experiments to test the predictions outlined within chapter 3 and in sections IV . 6 and VI.5. In chapter 5, we have behaviorally tested the contrast reversal prediction, and the experimental results validated the prediction. But the size reduction prediction was not validated by the experiments, which in fact suggest that pattern discrimination by dishabituation exhibits size invariance to a certain extent in toads. This new finding challenges our modeling to demonstrate size invariance in simulating pattern recognition in anurans. More comprehensive experiments need to be done to verify the full predicted hierarchy of contrast reversal (Fig. 4.11) and to further establish the suggestion that pattern discrimination in anurans shows size 208 invariance. We have also conducted physiological recordings (with Peter Ewert and Michael Glagow) on retinal ganglion cells in response to the stimulus patterns used in the experiments of chapter 4, and results are yet to be read out and summarized. From the modeling perspective, the theoretical framework presented in Part 1 leaves many open problems for future research. Among those are the following • A symbol or a component of a sequence is represented by a single unit in the theory. Future study needs to go downward, using more detailed represention of symbols, like matrix representation. Issues like pattern distortions may be addressed in this context. It would be interesting to see how the mechanisms proposed here work with more distributed representations of sequence components. • Scheduling of attention within many levels of hierarchical sequence processing systems. In section III.4, attention scheduling within hierarchically organized networks is done by the strict rules employed by the system. A better way of doing it would be by a mechanism of selective attention using self-organization principles of a network. • How to form chunks from an input sequence? Our present model relies on natural delimitors to form chunks in hierarchical sequence processing. Chunking from a continuous input flow is a very complicated issue, and in general it involves top-down guidance from grammar and semantics of input flow. • What are the mechanisms of generating new sequences? Subjects need to form new sequences to solve more complicated problems, like goal-directed planning. A related issue is to discover a temporal structure from input flow, like forming a syntax from many sentences. These high-level cognitive functions are more creative, and even very hard for humans to conduct. In the modeling of habituation, the following research issues remain to be conducted 209 • An obvious extension of the work in chapter 6 is to build a model of the medial pallium with many columns, like the two dimensional tectal column model (Cervantes et al., 1985). The extension will not trivially be an array of duplicates, since neighboring MP columns may share network circuitry to a certain extent and there should be cross connections among different columns correponding to different visual locations (Ewert & Ingle, 1971). • Integration with the tectum model. As discussed in chapter 6 , the MP influence modifies the reponse of the tectum model. This integration needs to be implemented to complete the loop(2.1), and we suggest that the MP scaling role is realized by shunting inhibition on tectal neurons. By then, Rana computatrix (Arbib, 1987; 1991) will be able to exhibit plastic behaviors. • The size prediction of Chapter 4 was not validated by the experiments. This failure challenges us to adjust the model of the anterior thalamus to incorporate this important feature of pattern discrimination. The present one layer AT model has to give a way to a more elaborate circuitry for processing visual inputs. • Toads and frogs domonstrate remarkable capabilities of pattern recognition following conditioned training. Since conditioning also involves plastic changes in the medial pallium, how to integrate both learning forms will be a very interesting topic. Our speculation is that the coding mechanisms of chapter 6 for stimulus specific habituation would also be used somehow for stimulus specific conditioning. Interplay between M odeling and Experimentation It is needless to point out the importance of experimental data to neural models of brain functions, because experimental data are the ground of brain modeling. Neural 210 modeling, on the other hand, can be of great importance to clarify and to guide experimental research for a better understanding of the brain functions. Moreover, brain modeling can provide computational principles to answer the question of how a brain function might be performed. An example is the model of the medial pallium. Biological data clearly indicate that habituation takes place in the medial pallium, but they have not provided an answer to the "how" question, for which chapter 6 provides one solution. Identification of computational principles is important for both advancement of experimental neuroscience and for construction of intelligent machines. The spirit thoughout this dissertation is the emphasis on the interplay between modeling and experimentation. A neural model of brain functions must be able to explain experimental data. But this is not sufficient. A good model, whether it is physical, chemical, or biological, has to be predictive. Predictions not only help to realize the unknown, but also make the theory testable, and are important for setting up a dialogue between theoreticians and experimentalists. The study reported in Part 2 was triggered by the experiments of Ewert and Kehl (1978), based on which the model of visual pattern discrimination was developed. This model of the anterior thalamus yields certain predictions, which then ignited the experimental findings reported in chapter 5. New results from the prediction testing experiments, particularly the establishment of separate processes of habituation and dishabituation, provide very valuable data for simulating the anterior thalamus in chapter 6 where we propose the model of neural substrates for habituation processes. Through the continuing dialogue between modeling and experimentation, understanding of habituation mechanisms is furthered. It is our hope that this thesis can serve as an example for establishing such a dialogue. For the dialogue to be possible, theoreticians have to develop neural models constrained by experimental data as much as possible and more importantly make their models predictive, while 211 biologists, on the other hand, should understand models, relate their data to modeling, and face challenges from modelers by testing their predictions. We believe that this kind of dialogue is both crucial and fruitful for understanding brain functions (Arbib, 1989; Ewert, 1991). 212 APPENDIX A FULL EQUATION SET FOR PART 2 SIMULATIONS This appendix provides the full set of mathematical equations used in the simulations of Part 2 (Chapter 4 and Chapter 6 ). There are four neural structures addressed: the retina, the tectum, the anterior thalamus (AT), and the medial pallium (MP), for which equations will be given respectively. All parameter values have been collected in Table 4.1, Table 4.2, and Table 6.1, except retinal cells prior to ganglion cells, for which parameter values are given when relevant equations are introduced. For the retina, the tectum, and the AT models, two dimensional definitions are provided, whereas for the MP model the definition is one dimensional with respect to a single column. The discretization step At is equal to 0.05. Retina Model At time 0, all the variables in the retina model are initialized to 0. a) Receptor Layer (A.1) 1 if illuminated if not (A .2) b) Bipolar Layer • Hyperpolarizing bh(ij,t) = r(ij,t) • Depolarizing bd(ij,t) = -r(i,j,t) c) The Amacrine Layer • Hyperpolarizing d x h (ij,t) T ath dt = bh(i,j,t) -xh(i,j,t) ath(i,j,t) = Max[Cmp (bh(i,j,t) -xh (ij,t)), ath(i,j,t-At) exp(-At!xaliJ\ Sathiij’1 ) = ath(i,j,t) • Depolarizing dxd(i,j,t) ^atd dt = b d (ij,t)-x d (ij,t) atd(i,j,t) = Max[Cmp (bd(ij,t) -xd(i,j,t)), atd(iJ,t-At) exp(-Atltatci)] 214 Satd( UJ) = atd( ti> 0 (A. 10) with Cffip — 5.0, H ath = 0.3, T -Q id = 0.3. d) Ganglion Cell Layers The kernel for convolution is defined below for generating the membrane potentials of the ganglion cells. k(x,y) = 1 W e exp[-(x2 +y2 )/(2<J2)] - Wj exp[-(x2 +y2 )/(2 a j)j if x 2 + y 2 < R 2 0 otherwise (A. 11) R2 layer T r 2 dmr2f J't) = -mr2(ij,t) + (kr2 * (Sah+SalJ)XiJ.t) (A. 1 2 ) where * represents convolution. Sr2 (i,j,t) = Max(mr2 (ij,t), 0) (A. 13) R3 layer % dmr3f J’t> = -m .sdj.t) + (kr 3 • (Sa,h + Cd S ^ X i j . t ) (A. 14) 215 with C j = 0 .2 . R4 layer = + (kr4* (Sath- S atd))(iJ,t) (A. 16) M ax(mr4 (i,j,t), 0) *r4(l’ J’V ~ M ax(m r4 (i,j,t), 0) + Cr Parameters t r2 = ^ r3 = Tr4 = 0.1, and Cr = 0.2. The the parameter values for kernels kr2 , kr3, and kr 4 are provided in Table 4.1. Tectum Model with % sp = 0 . 1 . Anterior Thalamus Model *a<ldm‘ “ £ ‘J't> = +<ka!„*Ssp)(iJ,t)-Max[0,(ka,i2*Sr3)<ij,in (A.20) 216 katu(x > y ) = We sp Wi sp 0 if \xI < m i, ly I < m i i f m i < Ijc I < m 2 , m j < lyl < m 2 otherwise (A.21) r3 katl2 (x > y) = " Wi r3 0 if Ijc I < ttj, lyl < n j i f n j < Ijc I < nl < - n 2 otherwise (A.22) Satfo) ~ ifm (t) > 6 ati if not (A.23) All the parameter values used in the AT model are provided in Table 4.2. MP Column Model a) P I Layer dmv l(i,t) dt = -Api mp l(i,t) + Bpi I(t) + p (A.24) Npi(i,t)-- \mpl(i,t) W if mpi(i,t) > O i otherwise (A.25) with O j = 46.5 i/n + 17.75. 217 b) MP2 Layer dt ~ ~^mp2 mmp2 (iJ) + (Bmp2 rnmp2 (iJ)) lift) mmp2 (i’t ) ^ fj(t) (A.26) j > i (A.27) W, a - j) 3 V i f j > i i f j = i (A.28) N mp2(i>*) = m m p 2 (i) + h l (A.29) c) MP3 Layer dmm D 3 (i,t) dt = ~^mp3 mmp3(i>t) + y ft) ^mp2 (i> t) (A. 30) N m p 3 (i> t) ~ m m p 3 (i) T^ d T = a zi(t) (y o -y f* )) - Pyft) (Nmp2 (i,t) - h2) (A.31) (A. 32) = yzft) (Zi(t) - 1) (Nmp2 (i,t) - hj) (A.33) d) MP1 Layer dmmpi (i, t) dt -mm p](i,t) + h2 Bmpj N mp3 (i,t) (A.34) 218 Nmpl(i’t) Max(ntm pj(i,t), 0) (A.35) e) P2 Layer dmV 2 (i,t) . . 'v'' , , dt = ~^p2 mp2(iJ) + ytft) (Nm p2 (i> t)~hl) ~ Bp2 xl/ [Nmpi(i>t)— Cmpj] (A. 3 6 ) J>* Cmpl = h 2 ~ hl*yO*Bmpl (A.37) Np2 (i,t)= Max(mp2 (i,t), 0) (A.38) f) OUT Cell dmout(t) N out(V = mout( 0 (A.40) All the parameter values used in the MP column model are provided in Table 6 .1. 219 BIBLIOGRAPHY Amari, S., & Arbib, M.A. (1977). Competition and cooperation in neural nets. In J. Metzler (ed.), Systems Neuroscience (pp. 119-165). New York: Academic Press. Amit, D. J. (1989). Modeling brain functions: The world o f attractor neural networks. Cambridge: Cambridge University Press. an der Heiden, U., & Roth, G. (1987). Mathematical model and simulation of retina and tectum opticum of lower vertebrates. Acta Biotheoretica, 36, 179-212. Anderson, A. (1972). The ability of honey bees to generalise visual stimuli. In R. Wehler (ed.), Information processing in the visual systems o f Arthropods (pp. 207-212). Springer Berlin Heidelberg New York: Springer-Verlag. Arbib, M.A. (1987). Levels of modeling of mechanisms of visually guided behavior. Behavioral and Brain Sciences, 10, 407-465. Arbib, M.A. (1989). The metaphorical brain 2: neural networks and beyond. New York: Wiley Interscience. Arbib, M.A. (1990). Programs, schemas, and neural networks for control of hand movements: Beyond the RS framework. In M. Jeannerod (Ed.), Attention and performance XII (pp. 111-138). Hillsdale, NJ: Erlbaum. Arbib, M.A. (1991): Neural mechanisms of visuomotor coordination: The evolution of Rana computatrix. In M.A. Arbib, & J.-P. Ewert (Eds.), Visual structures and integrated functions (in press). Research notes in neural computing. Berlin: Springer- Verlag. Bailey, C.H., & Chen, M.C. (1983). Morphological basis of long-term habituation and sensitization in Aplysia. Science, 220, 91-93. 220 Bailey, C.H., & Kandel, E.R. (1985). Molecular approaches to the study of short-term and long-term memory. In C.W. Coen (ed.), Functions o f the Brain (pp. 98-129). Oxford: Clarendon Press. Betts, B. (1989). The T5 base modulator hypothesis: a dynamic model of T5 neuron function in toads. In J.P. Ewert, & M.A. Arbib (Eds.), Visuomotor coordination: amphibians, comparisons, models, and robots (pp. 269-307). New York: Plenum. Boice, R., Quanty, C.B., & W illiams, R.C. (1974). Competition and possible dominance in turtles, toads, and frogs. Journal o f Comparative and Physiological Psychology, 8 6 , 1116-1131. Braun, M. (1978): Differential equations and their applications. New York: Springer- Verlag. Brower, J.V.Z., & Brower, L.P. (1962). Experimental studies of mimicry 6 . The reaction of toads {Bufo terrestris) to honeybees (Apis millifera) and their dronefly mimics {Eristalis vinetorum). American Naturalist, 96, 297-307. Brzoska, J., & Schneider, H. (1978). Modification of prey-catching behavior by learning in the common toad (Bufo b. bufo L., Anuran, Amphibia): Changes in response to visual objects and effects of auditory stimuli. Behavioural Processes, 3 , 125-136. Buhmann, J., & Schulten, K. (1987). Noise-driven temporal association in neural networks. Europhysics Letters, 4, 1205-1209. Burghagen, H. (1979). Der Einflub von figuralen, visuellen M ustern auf das Beutefangverhalten verschiedener Anuren. P hD Dissertation, University of Kassal. Burghagen, H., & Ewert, J.-P. (1982). Question of "head preference" in response to worm-like dummies during prey capture of toads Bufo bufo. Behavioural Processes, 7, 295-306. Carew, T.J., Castellucci, V.F., & Kandel, E.R. (1971). An analysis of dishabituation and sensitization of the gill withdrawal reflex in Aplysia. International Journal o f Neuroscience, 2, 79-98. 221 Carew, T.J., Pinsker, H.M., & Kandel, E.R. (1972). Long-term habituation of a defensive withdrawal reflex in Aplysia. Science, 175, 451-454. Carpenter, P.A., & Just, M.A. (1989). The role o f working memory in language comprehension. In D. Klahr, & K. Kotovsky (Eds), Complex inform ation processing: The impact o f Herbert A. Simon. Hillsdale, NJ: Erlbaum. Castellucci, V.F., Carew, T.J., & Kandel, E.R. (1978): Cellular analysis of long-term habituation of the gill-withdrawal reflex of Aplysia California. Science, 202, 1306- 1308. Cervantes-Perez, F., Lara, R., & Arbib, M.A. (1985). A neural model of interactions subserving prey-predator discrimination and size preference in anuran amphibians. Journal o f Theoretical Biology, 1 1 3 ,117-152. Cervantes-Perez, F., Guevara-Pozas, A.D., & Herrera-Becerra, A.A. (1991). Modulation of prey-catching behavior in toads: Data and modeling. In M.A. Arbib, & J.-P. Ewert (Eds.), Visual structures and integrated functions (in press). Research notes in neural computing. Berlin: Springer-Verlag. Changeux, J.-P., & Heidmann, T. (1987). Allosteric receptors and molecular models of learning. In G.M. Edelman, W.E. Gall, & W.M. Cowan (Eds.), Synaptic Function (pp. 549-601). New York: Wiley & Sons. Cohen, A., Ivry, R.I., & Keele, S.W. (1990). Attention and structure in sequence learning. Journal o f Experimenal Psychology, 16, 17-30. Colman, S.R., & Gormezano, I. (1971). Classical conditioning of the rabbit's (Oryctolagus cuniculus) nictitating membrane response under symmetrical CS-US interval shifts. Journal o f Comparative and Physiological Psychology, 77, 447-455. Conrad, R. (1957). Decay theory of immediate memory. Nature, 179, 831-832. Cott, H.B. (1936). The effectiveness of protective adaptations in the hive-bee illustrated by experiments on the feeding reactions, habit formation and memory of the common toad (Bufo bufo bufo). Proceedings o f the Zoological Society London, 1, 111-133. 222 Dehaene, T., Changeux, J.P., & Nadal, J.P. (1987). Neural networks that learn temporal sequences by selection. Proceedings o f the National Academy o f Sciences USA, 84 , 2727-2731. Didday, R.L. (1970). The simulation and modeling of distributed information processing in the frog visual system. Ph.D thesis, Stanford University. Didday, R.L., & Arbib, M.A. (1975). Eye movements and visual perception: "Two visual systems" model. International Journal of Man-Machine Studies, 7, 547-569. DiMattia, B.V., Posley, K.A., & Fuster J.M. (1990). Crossmodal short-term memory of haptic and visual information. Neuropsychologia, 2 8 ,17-33. Doya, K., & Yoshizawa, S. (1989). Adaptive neural oscillator using continuous-time back-propagation learning. Neural Networks, 2, 375-385. Doya, K., & Yoshizawa, S. (1990). Memorizing hierarchical temporal patterns in analog neuron networks. In Proceedings o f the Internatinal Joint Conference on Neural Networks (Vol. 3, pp. 299-304). San Diego, CA. Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. New York: Wiley & Sons. Dudai, Y. (1989). The neurobiology o f memory: concepts, findings, trends. New York: Oxford University Press. Ebel, H.C., & Prokasy, W.F. (1963). Classical eyelid conditioning as a function of sustained and shifted interstimulus intervals. Journal o f Experimental Psychology, 65, 52-58. Eibl-Eibesfeldt, I. (1952). Nahrungserwerb und Beuteschema der Erdkrote (Bufo bufo L). Behaviour, 4, 1-35. Eikmanns, K.H. (1955). Verhaltensphysiologische Untersuchungen iiber den Beutefang und das Bew egungssehen der Erdkrote {Bufo bufo L.). Z eitschrift fiir Tierpshchologie, 12, 229-253. 223 Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, 179-211. Ewert, J.-P. (1965). Der Einflub peripherer Sinnesorgane und des Zentralnervensystems auf die Antwortbereitschaft bei der Richtbewegung der Erdkrote (Bufo bufo L.). P hD Dissertation, University of Gottingen. Ewert, J.-P. (1967). Untersuchungen iiber die Anteile zentralnervoser Aktionen an der taxisspezifischen Ermiidung beim Beutefang der Erdkrote (Bufo bufo L). Zeitschrift fiir vergleichende Physiologie, 57, 263-2948. Ewert, J.-P. (1968). Der Einflub von Zwischenhimdefekten auf die Visuomotorik im Beute- und Fluchtverhalten der Erdkrote (Bufo bufo L). Zeitschrift fiir vergleichende Physiologie, 61, 41-70. Ewert, J.-P. (1970). Neural mechanisms of prey-catching and avoidance behavior in the toad (Bufo bufo L.). Brain, Behavior and Evolution, 3, 36-56. Ewert, J.-P. (1971). Single unit response of the toad (Bufo americanus) caudal thalamus to visual objects. Zeitschrift fiir vergleichende Physiologie, 74, 81-102. Ewert, J.-P. (1974). The neural basis of visually guided behavior. Scientific American, 230, 34-42. Ewert, J.-P. (1976). The visual system of the toad: behavioral and physiological studies on a pattern recognition system. In K. Fite (Ed.), The amphibian visual system: a multidisciplinary approach (pp. 141-202). New York: Academic Press. Ewert, J.-P. (1980). Neuroethology. An introduction to the neurophysiological fundamentals o f behavior. Berlin Heidelberg New York: Springer-Verlag. Ewert, J.-P. (1984). Tectal mechanisms that underlie prey-catching and avoidance behaviors in toads. In: H. Vanegas (ed.), Comparative neurology o f the optic tectum (pp. 246-416). New York: Plenum. Ewert, J.-P. (1987a). Neuroethology: toward a functional analysis of stimulus-response mediating and modulating neural circuitries. In: P. Ellen, & C. Thinus-Blonc (eds.), 224 Cognitive processes and spatial orientation in animal and man (Pt. 1, pp. 177-200). Dordrecht: Martinus Nijhoff. Ewert, J.-P. (1987b). Neuroethology of releasing mechanism: Prey-catching in toads. Behavioral and Brain Sciences, 10, 337-405. Ewert, J.-P. (1991). A prospectus for the fruitful interaction between neuroethology and neural engineering. In M.A. Arbib, & J.-P. Ewert (Eds.), Visual structures and integrated functions (in press). Research notes in neural computing. Berlin: Springer- Verlag. Ewert, J.-P., Burghagen, H., & Schiirg-Pfeiffer, E. (1983). Neuroethological analysis of the innate releasing mechanism for prey-catching behavior in toads. In: J.-P. Ewert, R.R. Capranica, & D. Ingle (Eds.), Advances in vertebrate neuroethology (pp. 413-475). New York: Plenum Press. Ewert, J.-P., & Ingle, D. (1971). Excitatory effects following habituation of prey- catching activity in frogs and toads. Journal o f Comparative and Physiological Psychology, 77, 369-374. Ewert, J.-P., & Hock, F.J. (1972). Movement sensitive neurons in the toad’s retina. Experimental Brain Research, 16, 41-59. Ewert, J.-P., & Kehl, W. (1978). Configurational prey-selection by individual experience in the toad Bufo bufo. Journal o f Comparative Physiology A, 126, 105- 114. Ewert, J.-P., & Rehn, B. (1969). Quantitative Analyse der Reiz-Reaktionsbeziehungen bei visuellem Auslosen des Fluchtverhaltens der Wechselkrote (Bufo viridis Laur.). Behaviour, 35, 212-234. Ewert, J.-P., & Seelen, W.V. (1974). Neurobiologie und System-Theorie eines visuellen Muster-erkennungsmechanismus bei Kroten. Kybernetik, 14, 167-183. Ewert, J.-P., & Traud, R. (1979): Releasing stimuli for antipredator behaviour in the common toad Bufo bufo (L.). Behaviour, 6 8 , 170-180. 225 Finkenstadt, T. (1989a). Stimulus-specific habituation in toads: 2DG studies and lesion experiments. In J.-P. Ewert, & M.A. Arbib (Eds.), Visuomotor coordination: amphibians, comparisons, models, and robots (pp. 767-797). New York: Plenum. Finkenstadt, T. (1989b). Visual associative learning: searching for behaviorally relevant brain structures in toads. In J.-P. Ewert, & M.A. Arbib (Eds.), V isuom otor coordination: amphibians, comparisons, models, and robots (pp. 799-832). New York: Plenum. Finkenstadt, T., Adler, N.T., Allen, T.O., Ebbesson, S.O.E., & Ewert, J.-P. (1985). Mapping of brain activity in mesencephalic and diencephalic structures of toads during presentation of visual key stimuli: A computer assisted analysis of ( 1 4 C)2DG autoradiographs. Journal o f Comparative Physiology A, 156, 433-445. Finkenstadt, T., Adler, N.T., Allen, T.O., & Ewert, J.-P. (1986). Regional distribution of glucose utilization in the telencephalon of toads in response to configurational visual stimuli: A 1 4 C-2DG study. Journal o f Comparative Physiology A, 158, 457- 467. Finkenstadt, T., & Ewert, J.-P. (1983). Visual pattern discrim ination through interactions of neural networks: A combined electrical brain stimulation, brain lesion, and extracellular recording study in Salamandra salamandra. Journal o f Comparative Physiology A, 153, 99-110. Finkenstadt, T., & Ewert, J.-P. (1988a). Stimulus-specific long-term habituation of visually guided orienting behavior toward prey in toads: a I 4 C-2DG study. Journal o f Comparative Physiology A, 163, 1-11. Finkenstadt, T., & Ewert, J.-P. (1988b). Effects of visual associative conditioning on behavior and cerebral metabolic activity in toads. Naturwissenschaften, 75, 85-97. Fukushima, K. (1988). Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Networks, 1, 119-130. 226 Freeman, W .J., Yao, Y., & Burke, B. (1988). Central pattern generating and recognizing in olfactory bulb: A correlation learning rule. Neural Networks, 1, 277- 288. Gluck, M.A., & Thompson, R.F. (1987). Modeling the neural substrates of associative learning and memory: A computational approach. Psychological Review, 94, 1-16. Gross, C.G., Desimone, R., Albright, T.D., Schwartz, E.L. (1985). Inferior temporal cortex and pattern recognition. In: C. Chagas, R. Gattass, & C. Gross (Eds.), Pattern recognition mechanisms (pp. 179-201). Berlin: Springer. Grossberg, S. (1969). Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, I. Jounal o f Mathematics and Mechanics, 19, 53-91. Grossberg, S. (1976). Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biological Cybernetics, 23, 121- 134. Grossberg, S., & Kuperstein, M. (1986). Neural dynamics o f adaptive sensory-motor control. Amsterdam: North-Holland. Grossberg, S., & Schmajuk, N.A. (1988). Neural dynamics of adaptive timing and temporal discrimination during associative learning. Neural Networks, 2, 79-102. Groves, P.M ., & Thompson, R.F. (1970). Habituation: a dual-process theory. Psychological Review, 77, 419-450. Griisser, O.J., & Griisser-Cornehls, U. (1970). Die Neurophysiologie visuell gesteuerter Verhalternsweisen bei Anuren. Verhandlungen der Deutschen Zoologischen Gesellschaft in Koln, 64, 201-218. Griisser, O.J., & Griisser-Cornehls, U. (1976). Neurophysiology of the anuran visual system. In: R. Llinas, & W. Precht (Eds), Frog neurobiology (pp. 297-385). Berlin Heidelberg New York: Springer. 227 Gutfreund, H., & Mezard, M. (1988). Processing of temporal sequences in neural networks. Physics Review Letters, 61, 235-238. Guyon, I., Personnaz, L., Nadal, J.P., & Dreyfus, G. (1988). Storage and retrieval of complex sequences in neural networks. Physics Review A, 38, 6365-6372. Harris, J.D. (1943). Hibituatory response decrem ent in the intact organism. Psychological Bulletin, 40, 385-422. Harvey, C.B., Moore, M., & Lindsay, L. (1981). Passive-avoidance learning in bullfrogs, grass frogs, and toads. Psychological Reports, 49, 1003-1006. Hebb, D.O. (1949). The Organization o f behavior. New York: Wiley & Sons. Herrick, C J. (1933): The amphibian forebrain. VIII. Cerebral hemispheres and pallial primordia. Jounal o f Comparative Neurology, 58, 737-759. Hoehler, F.K., & Thompson, R.F. (1980). Effect of the interstimulus (CS-UCS) interval on hippocampal unit activity during classical conditioning of the nictitating membrane response of the rabbit (Oryctolagus cuniculus). Journal o f Comparative and Physiological Psychology, 94, 201-215. Hoffman, H.H. (1963): The olfactory bulb, accessory olfactory bulb and hemisphere of some anurans. Journal o f Comparative Neurology, 120, 317-368. Hopfield, J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings o f the National Academy o f Sciences USA, 79, 2554-2558. Ingle, D. (1973). Disinhibition of tectal neurons by pretectal lesions in the frog. Science, 180, 422-424. Ingle, D. (1976). Spatial vision in anurans. In K.V. Fite (Ed.), The Amphibian visual system: A multidisciplinary approach (pp. 119-141). New York: Academic Press. Ingle, D. (1980). The frog's detection of stationary objects following lesions of the pretectum. Behavioral Brain Research, 1, 139-163. 228 Jennings, P.J., & Keele, S.W. (1990). A computational model of attentional requirements in sequence learning. In Proceedings o f the Twelfth Annual Conference o f the Cognitive Science Society (pp. 876-883). Hillsdale, NJ: Erlbaum. Jones, W.A., & Falkenberg, V.P. (1980). Test for effects of visual and position cues on T-maze learning in toads. Perceptual and Motor Skills, 50, 455-460. Jordan, M.I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings o f the Eighth Annual Conference o f the Cognitive Science Society (pp. 531-546). Hillsdale, NJ: Erlbaum. Jordan, M.I. (1990). Learning to articulate: Sequential networks and distal constraints. In M. Jeannerod (Ed.), Attention and Performance XIII. Hillsdale, NJ: Erlbaum. Karplus, I., Algom, D., Samuel, D. (1981). Acquisition and retention of dark avoidance by the toads, Xenopus laevis (Dardin). Animal Learning & Behavior, 9, 45-59. Kandel, E.R. (1976). Cellular basis o f behavior: An introduction to behavioral neurobiology. New York: Freeman. Kicliter, E. (1979). Some telencephalic connections in the frog Rana pipiens. Journal o f Comparative Neurology, 185, 75-86. Kicliter, E., & Ebbesson, S.O.E. (1976). Organization of the "nonolfactory" telencephalon. In R. Llinas, & W. Precht (Eds.), Frog Neurobiology (pp. 946-972). Berlin Heidelberg New York: Springer-Verlag. Kleinfeld, D. (1986). Sequential state generation by model neural networks. Proceedings o f National Academy o f Sciences USA, 83, 9469-9473. Koch, C., & Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4, 219-227. Kohonen, T. (1987). Dynamically expanding context, with application to the correction of symbol strings in the recognition of continuous speech. In Proceedings o f the International Conference on Neural Networks (Vol. 2, pp. 3-9). San Diego, CA. 229 Kohonen, T. (1989). A self-learning musical grammar, or "associative memory of the second kind”. In Proceedings o f the Internatinal Joint Conference on Neural Networks (Vol.l, pp. 1-5). Washington, DC. Kohonen, T. (1990). The self-organizing map. Proceedings o f IEEE, 78, 1464-1480. Kojima, S., & Goldman-Rakic, P.S. (1982). Delay-related activity of preffontal neurons in rhesus monkeys performing delayed response. Brain Research, 248,43-49. Kosko, B. (1988). Bidirectional associative memory. IEEE Transactions on System Man and Cybernetics, 18, 49-60. Kuffler, S.W. (1973). Discharge patterns and functional organization of mammalian retina. Journal o f Neurophysiology, 16, 37-68. Kuhn, R., van Hemmen, J.L., & Riedel, U. (1989). Complex temporal association in neural networks. Journal o f Physics A, 22, 3123-3135. Kupfermann, I. (1985). Learning. In E.R. Kandel, & J.H. Schwartz (Eds.), Principles o f neuroscience (2nd ed., pp. 805-815). New York: Elsevier. Kurogi, S. (1987). A model of neural network for spatiotemporal pattern recognition. Biological Cybernetics, 5 7 ,103-114. Lara, R. (1983). A model of the neural mechanisms responsible for stimulus specific habituation of the orienting reflex in vertebrates. Cognition and Brain Theory, 6 , 463- 482. Lara, R. (1989). Learning and memory in the toad's prey/predator recognition system: A neural model. In J.-P. Ewert, & M.A. Arbib (Eds.), Visuomotor coordination: amphibians, comparisons, models, and robots (pp. 833-855). New York: Plenum. Lara, R., & Arbib, M.A. (1985). A model of the neural mechanisms responsible for pattern recognition and stimulus specific habituation in toads. Biological Cybernetics, 51, 223-237. 230 Lara, R., Arbib, M.A., & Cromarty, A.S. (1982). The role of the tectal column in facilitation of amphibian prey-catching behavior: A neural model. Journal o f Neuroscience, 2, 521-530. Lashley, K.S. (1951). The problem of serial order in behavior. In L.A. Jeffress (Ed.), Cerebral mechanisms in behavior (pp. 112-146). New York: Wiley & Sons. Lazar, Gy., Toth, P., Csink, Gy., & Kicliter, E. (1983). Morphology and location of tectal projection neuron in frogs: A study with HRP and Cobalt-filling. Journal o f Comparative Neurology, 215, 108-120. Leonard, D.W., & Theios, J. (1967). Effect of CS-US interval shift on classical conditioning of the nictitating membrane in the rabbit. Journal o f Comparative and Physiological Psychology, 63, 355-358. Lettvin, J.Y., Maturana, H.R., Pitts, W.H., & McCulloch, W.S. (1961). Two remarks on the visual system of the frog. In W.A. Rosenblith (Ed.), Sensory communication (pp. 757-776). Cambridge, MA: MIT Press. Malsburg, C.v.d. (1973). Self-organization of orientation sensitive cells in the striate cortex. Kybernetik, 14, 85-100. Malsburg, C.v.d. (1981). The correlation theory of brain function. Internal Report 81-2, Max-Planck-Institute for Biophysical Chemistry, Gottingen. Malsburg, C.v.d. (1985). Nervous structures with dynamical links. Berichte der Bunsengesellschaft fiir Physikalische Chemie, 89, 703-710. M alsburg, C.v.d., & Schneider, W. (1986). A neural cocktail-party processor. Biological Cybernetics, 54, 29-40. Mazokhin-Porshnyakov, G.A. (1969). Die Fahigkeit der Bienen, visuelle Reize zu generalisieren. Zeitschrift fiir vergleichende Physiologie, 65,15-28. Merkel-Harff, C., & Ewert, J.-P. (1991): Learning-related modulation of toad's responses to prey by neural loops involving the forebrain. In M.A. Arbib, & J.-P. 231 Ewert (Eds.), Visual structures and integrated Junctions (in press). Research notes in neural computing. Berlin: Springer-Verlag. Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. Murdock, B.B., Jr. (1987). Serial-order effects in a distributed-memory model. In D.S. Gorfein, & R.R. Hoffman (Eds.), Memory and learning: The Ebbinghaus centennial conference (pp. 227-310). Hillsdale, NJ: Erlbaum. Neary, T.J., & Northcutt, R. (1983). Nuclear organization of the bullfrog diencephalon. Journal o f Comparative Neurology, 213, 262-278. Neary, T.J., & Wilczynski, W. (1977). Autoradiographic demonstration of hypothalamic efferents in the bullfrog, Rana catesbeiana. Anatomical Record, 187, 665. Nissen, M.J., & Bullemer, P. (1987). Attentional requirements of learning: Evidence from performance measures. Cognitive Psychology, 19, 1-32. Northcutt, R.G., & Kicliter, E. (1980): Organization of the amphibian telencephalon. In S.O.E. Ebbesson (Ed.), Comparative neurology o f the telencephalon (pp. 203-255). Plenum: New York. Peterson, L.R., & Peterson, J.P. (1959). Short-term retention of individual verbal items. Journal o f Experimental Psychology, 58, 193-198. Reed, S.K. (1982). Cognition: Theory and applications. Monterey, CA: Brooks/Cole. Rubin, E. (1915). Synsoplevedefigurer. Copenhagen: Glyden dalske. Rum elhart, D .E., Hinton, G.E. & W illiam s, R.J. (1986): Learning internal representations by error propagation. In D.E. Rumelhart, & J.L. McClelland (Eds.), Parallel distributed processing (Vol. 1, pp. 318-362). Cambridge, MA: MIT Press. Rumelhart, D.E., & Zipser, D. (1986). Feature discovery by competitive learning. In D.E. Rumelhart, & J.L. McClelland (Eds.), Parallel distributed processing (Vol. 1, pp. 115-193). Cambridge, MA: MIT Press. 232 Scalia, F., & Colman, D.R. (1975). Identification of telencephalic efferent thalamic nuclei associated with the visual system of the frog. Neuroscience Abstract, 1, 65. Scalia, F., & Gregory, K. (1970). Retinofugal projections in the frog: Location of the postsynaptic neurons. Brain, Behavior and Evolution, 3, 16-29. Schleidt, W. (1962). Die historische Entwichlung der Begriffe "angeborenes auslosendes Schema" und "angeborener Auslosemechanismus" in der Ethologie. Zeitschrift fiir Tierpsychologie, 19, 697-722. Schmajuk, N.A., & Segura, E.T. (1980). Behavioral changes along escape learning in toads. Acta Physiological Latino Americana, 30, 211-215. Schmajuk, N.A., Segura, E.T., & Reboreda, J.C. (1980). Appetitive conditioning and discriminatory learning in toads. Behavioral and Neural Biology, 28, 392-397 Schmidt, R.F. (Ed, 1985). Fundamentals o f neurophysiology (3rd ed.). New York: Springer. Schwartz, E.L. (Ed., 1989). Computational neuroscience. Cambridge, MA: MIT Press. Sejnowski, T.J., & Rosenberg, C.R. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145-168. Simon, H. A. (1974). How big is a chunk? Science, 183, 482-488. Smith, M.C. (1965). CS-US interval and US intensity in classical conditioning of the rabbit’s nictitating membrane response. Journal o f Comparative and Physiological Psychology, 65, 679-687. Sokolov, E.N. (1960) Neuronal models and the orienting reflex. In M.A.B. Brazier (Ed.), The central nervous system and behavior'. HI (pp. 187-276). New York: Macy Foundation. Sokolov, E.N. (1975). Neuronal mechanisms of the orienting reflex. In: E.N. Sokolov, & O. Vinogradova (Eds.), Neuronal mechanisms o f the orienting reflex (pp. 217- 235). New York: Erlbaum. 233 Sompolinsky, H., & Kanter, I. (1986). Temporal association in asymmetric neural networks. Physics Review Letters, 57, 2861-2864. Stanley, J.C., & Kilmer, W.L. (1975). A wave model of temporal sequence learning. International Journal o f Man-Machine Studies, 7, 397-412. Sternberg, S. (1966). High-speed scanning in human memory. Science, 153, 652-654. Sutherland, N.S. (1969) Shape discrimination in rat, octopus, and goldfish: a comparative study. Journal o f Comparative and Physiological Psychology, 67, 160- 176. Szekely, G., & Lazar, G. (1976). Cellular and synaptic architecture of the optic tectum. In R. Llinas, & W. Precht. (Eds.), Frog neurobiology (pp. 407-434). Berlin Heidelberg New York: Springer. Tank, D.W., & Hopfield, J.J. (1987). Neural computation by concentrating information in time. Proceedings o f the National Academy o f Sciences USA, 84, 1896-1900. Teeters, J.L. (1989). A simulation system for neural networks and model for the anuran retina. Technical Report 89-01, Center for Neural Engineering, University of Southern California. Teeters, J.L., & Arbib, M.A. (1990). A model of anuran retina relating intemeurons to ganglion cell responses. Biological Cybernetics, 6 4 ,197-207. Tsai, H.J., & Ewert, J.P. (1987). Edge preference of retinal and tectal neurons in common toads (Bufo bufo) in response to worm-like moving stripes: the question of behaviorally relevant "position indicators". Journal o f Comparative Physiology A, 161, 295-304. Thompson, P.A., & Boice, R. (1975). Attempts to train frogs: review and experiments. Journal o f Biological Psychology, 17, 3-13. Thompson, R.F. (1986). The neurobiology of learning and memory. Science, 233, 941- 947. 234 Thompson, R.F., & Spencer, W.A. (1966). Habituation: A model phenomenon for the study of neuronal substrates of behavior. Psychological Review, 73, 16-43. Thorpe, W.H. (1963). Learning and instinct in animals. Cambridge, MA: Harvard University Press. Tsai, H.J., & Ewert, J.-P. (1987). Edge preference of retinal and tectal neurons in common toads {Bufo bufo) in response to worm-like moving stripes: the question of behaviorally relevant "position indicators". Journal o f Comparative Physiology A, 161, 295-304. van Hateren, J.H., Srinivasan, M.V., & Wait, P.B. (1990). Pattern recognition in bees: orientation discrimination. Journal o f Comparative Physiology A, 167, 649-654. Verlaine, L. (1924). L'instinct et l'intelligence chez les Hymenopf eres.I. Le probleme du retour au nid et de la reconnaissance du nid. Memo ires, Academie Royale des Sciences, des Lettres et des Arts, Belgique. Classe des Sciences. Collection In- Octavo. 2 Ser, 8, 1-72. Wang, D.L. (1989). An extended model of the Neocognitron for pattern partitioning and pattern composition. In Proceedings o f the International Joint Conference on Neural Networks (Vol. 2, pp. 267-274). Washington, DC. Wang, D.L., & Arbib, M.A. (1990). Complex temporal sequence learning based on short-term memory. Proceedings of the IEEE , 7 8 ,1536-1543. Wang, D.L., & Arbib, M.A. (1991a). How does the toad's visual system discriminate different worm-like stimuli? Biological Cybernetics, 6 4 ,251-261. Wang, D.L., & Arbib M.A. (1991b). Hierarchical dishabituation of visual discrimination in toads. In: J.-A. Meyer, & S. Wilson (Eds.), Simulation o f adaptive behavior: From animals to animats (pp. 77-88). Cambridge, MA: MIT Press. Wang, D.L., & Arbib, M.A. (1991c). A neural model of temporal sequence generation with interval maintenance. In Proceedings o f the Thirteenth Annual Conference o f Cognitive Science Society (to appear). Chicago, IL. 235 Wang, D.L., & Arbib, M.A. (1991d). Timing and chunking in processing temporal order. Psychological Review, submitted. Wang, D.L., Buhmann, J., & Malsburg, C.v.d. (1990). Pattern segmentation in associative memory. Neural Computation, 2, 95-107. Wang, D.L., & Ewert, J.-P. (1991). Configurational pattern recognition by dishabituation in common toads Bufo bufo (L.): Behavioral tests of the predictions of a neural model. Journal o f Comparative Physiology A, to appear. Wang, D.L., & Hsu, C.C. (1988). A neuron model for computer simulation of neural networks. Acta Automatica Sinica, 14, 424-430. Wang, D.L., & Hsu, C.C. (1990). SLONN: A simulation language for modeling of neural networks. Simulation, 55, 69-83. Wang, D.L., & King, I. K. (1988). Three neural models which process temporal information. In Proceedings o f the First Annual Conference o f the International Neural Network Society (pp. 227). Boston, MA. Waugh, N.C., & Norman, D.A. (1965). Primary memory. Psychological Review, 72, 89-104. Wehner, R. (1981). Spatial vision in arthropods. In H. Autrum (Ed.), Vision in invertebrates (Handbook o f sensory physiology, Vol VII/6C, pp. 287-616). Berlin Heidelberg New York: Springer. Wells, M.J. (1978). Octopus: Physiology and behavior o f an advanced invertebrate. London: Chapman and Hall. Wingfield, A., & Byrnes, D.L. (1981). The psychology o f human memory. New York: Academic Press. Wilczynski, W., & Northcutt, R. (1977). Afferents to the optic tectum in the leopard frog: An HRP study. Journal o f Comparative Neurology, 173, 219-229. 236 Wilczynski, W., & Northcutt, R. (1983). Connections of the bullfrog striatum: afferent organization. Journal o f Comparative Neurology, 214, 321-332. Young, J.Z. (1964). A model o f the brain. London: Oxford University Press.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
PDF
00001.tif
Asset Metadata
Core Title
00001.tif
Tag
OAI-PMH Harvest
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11255749
Unique identifier
UC11255749
Legacy Identifier
DP22841