Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Learning invariant features in modulatory neural networks through conflict and ambiguity
(USC Thesis Other)
Learning invariant features in modulatory neural networks through conflict and ambiguity
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
LEARNING INVARIANT FEATURES IN MODULATORY NEURAL NETWORKS THROUGH CONFLICT AND AMBIGUITY w. shane grant A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY University of Southern California Department of Computer Science May, 2018 ABSTRACT This work developsa new unsupervised learning ruleand framework for learning rotation, scale, and translation invariant visual features. The new learning rule, called conflict learning, is designed around the complications of learning modulatory feedback and composed of three simple concepts grounded in physiologically plausible evidence. Using border ownership as a prototypical example, it is shown that a Hebbian learning rule, which has long been a key component in understanding neural plasticity, fails to properly learn modulatory connections, while the proposed rule correctly learns a stimulus-driven model. This is the first time a border ownership network has been learned. Additionally, the rule can be used as a drop-in replacement for a Hebbian learning rule to learn a biologically consistent model of orientation selectivity, a network which lacks any modulatory connections. Following the development of conflict learning, this thesis lays the foundation for a general framework of cortical learning based upon the idea of a ‘competitive column’. This column describes a prototypical organization for neurons that gives rise to an ability to learn scale, rotation, and translation invariant features. This is empowered by conflict learning and a novel notion of neural ambiguity, which states that ’too many decisions lead to indecision’. The competitive column architecture is used to create a large scale (54,000 neurons and 18 million synapses), invariant model of border ownership that is trained on simple shapes such as squares and rectangles yet generalizes to multiple scales and complexities of input, including a demonstration on contours of objects taken from natural images. Conflict learning and ambiguity lead to testable predictions related to the learning of modulatory connections in the brain and associated activation dynamics. These contributions of conflict learning, the competitive col- umn, and ambiguity give a better intuitive understanding of how feedback, modulation, and inhibition may interact in the brain to influence activation and learning. Additionally, they provide a promising avenue for advancing object recognition systems. ii PUBLICATIONS Some ideas and figures have appeared previously. The following is a complete list of publications to which I have contributed during the course of this degree: Grant, W Shane and Laurent Itti (2012). “Saliency mapping en- hanced by symmetry from local phase.” In: Image Processing (ICIP), 2012 19th IEEE International Conference on. IEEE, pp. 653–656. — (in submission, 2018). “Learning Invariant Features in Modula- tory Neural Networks.” In: Neural Computation. Grant, W Shane, James Tanner, and Laurent Itti (2017). “Biolog- ically plausible learning in neural networks with modulatory feedback.” In: Neural Networks 88, pp. 32–48. Grant, W Shane, Randolph C Voorhies, and Laurent Itti (2013). “Finding planes in LiDAR point clouds for real-time regis- tration.” In: Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on. IEEE, pp. 4347–4354. — (2018, in review). “Efficient velodyne SLAM with point and plane features.” In: Autonomous Robots. iii CONTENTS List of Figures vi List of Tables vii Acronyms vii 1 introduction 1 1.1 Modeling the Visual Cortex . . . . . . . . . . . . . 2 1.1.1 Hubel and Wiesel . . . . . . . . . . . . . . . 2 1.1.2 Adding Invariance . . . . . . . . . . . . . . . 3 1.1.3 Learning From a Teacher . . . . . . . . . . . 6 1.1.4 Moving Beyond Feedforward . . . . . . . . . 7 1.2 Learning Rules . . . . . . . . . . . . . . . . . . . . 7 1.3 Encapsulating the Problem . . . . . . . . . . . . . . 10 1.3.1 Orientation Selectivity . . . . . . . . . . . . 11 1.3.2 Border Ownership . . . . . . . . . . . . . . 13 2 conflict learning 23 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Modulatory Connections . . . . . . . . . . . . . . . 25 2.3.1 HebbianLearningandModulatoryConnections 26 2.4 Introducing Conflict Learning . . . . . . . . . . . . 29 2.4.1 Conflict Learning and Modulatory Connections 32 2.4.2 Stability Analysis of Conflict Learning . . . 34 2.4.3 Stability Analysis of the Generalized Heb- bian Algorithm . . . . . . . . . . . . . . . . 37 2.4.4 Stability Analysis of BCM . . . . . . . . . . 37 2.5 Network Modeling Results . . . . . . . . . . . . . . 40 2.5.1 Border Ownership . . . . . . . . . . . . . . 40 2.5.2 Orientation Selectivity . . . . . . . . . . . . 46 2.5.3 Experimental Methods . . . . . . . . . . . . 52 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . 57 2.6.1 Analyzing the Rule . . . . . . . . . . . . . . 59 2.6.2 Implications for Plasticity . . . . . . . . . . 62 2.6.3 Learning Border Ownership . . . . . . . . . 64 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . 64 3 ambiguity 66 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2 Introduction . . . . . . . . . . . . . . . . . . . . . . 66 3.3 The Competitive Column Model . . . . . . . . . . . 68 3.3.1 The Competitive Column . . . . . . . . . . 68 3.3.2 Ambiguity . . . . . . . . . . . . . . . . . . . 71 iv contents v 3.3.3 Neuron Activation . . . . . . . . . . . . . . 73 3.3.4 Thresholds . . . . . . . . . . . . . . . . . . . 74 3.4 Network Construction . . . . . . . . . . . . . . . . 76 3.4.1 Network Construction and Wiring . . . . . . 76 3.4.2 Training . . . . . . . . . . . . . . . . . . . . 79 3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . 81 3.5.1 Ambiguity . . . . . . . . . . . . . . . . . . . 81 3.5.2 Invariance . . . . . . . . . . . . . . . . . . . 83 3.5.3 Natural Images . . . . . . . . . . . . . . . . 92 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . 92 3.6.1 Towards Proto Objects . . . . . . . . . . . . 97 3.6.2 Generalizability . . . . . . . . . . . . . . . . 99 3.6.3 Biological Implications . . . . . . . . . . . . 100 3.6.4 Contemporary Approaches . . . . . . . . . . 101 3.6.5 Future Applications and Enhancements . . . 102 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . 102 4 conclusion 103 bibliography 104 LIST OF FIGURES Figure 1.1 Hubel and Wiesel Model of Simple and Complex Cells . . . . . . . . . . . . . . . . 3 Figure 1.2 Neocognitron Hierarchichal Model . . . . . 5 Figure 1.3 HMAX Model . . . . . . . . . . . . . . . . 6 Figure 1.4 Markov et al. Visual Hierarchy . . . . . . . 8 Figure 1.5 BCM Learning Function . . . . . . . . . . 10 Figure 1.6 STDP Learning Function . . . . . . . . . . 10 Figure 1.7 Leabra XCAL Learning Function . . . . . . 11 Figure 1.8 Map of Orientation Selectivity in Macaque 12 Figure 1.9 GCAL V1 Learning . . . . . . . . . . . . . 13 Figure 1.10 Rubins Vase Illusion . . . . . . . . . . . . . 14 Figure 1.11 Neural Evidence for Border Ownership . . 15 Figure 1.12 NeuralEvidenceforBorderOwnershipScale Invariance . . . . . . . . . . . . . . . . . . 16 Figure 1.13 Feedforward Model of Border Ownership . 17 Figure 1.14 Lateral Model of Border Ownership . . . . 18 Figure 1.15 Feedback Model of Border Ownership . . . 20 Figure 1.16 Border Ownership Grouping Cell . . . . . . 21 Figure 1.17 Teo et al. Border Ownership Network . . . 22 Figure 1.18 Teo et al. Border Ownership Assignments . 22 Figure 2.1 Simple Network With Modulatory Connec- tions . . . . . . . . . . . . . . . . . . . . . 28 Figure 2.2 States of the Simple Network . . . . . . . . 33 Figure 2.3 States of the Simple Network . . . . . . . . 38 Figure 2.4 Border Ownership Model Architecture . . . 41 Figure 2.5 Learned Feedback Receptive Fields for Bor- der Ownership . . . . . . . . . . . . . . . . 43 Figure 2.6 Learned Feedforward and Lateral Receptive Fields for Border Ownership . . . . . . . . 44 Figure 2.7 Border Ownership Polarity Assignments . . 45 Figure 2.8 Conflict Learning Components . . . . . . . 47 Figure 2.9 V1 Network Architecture . . . . . . . . . . 48 Figure 2.10 Orientation Selectivity Results . . . . . . . 49 Figure 2.11 Orientation Selectivity With Input Bias . . 51 Figure 3.1 The Competitive Column . . . . . . . . . . 69 Figure 3.2 C Shape . . . . . . . . . . . . . . . . . . . 71 Figure 3.3 Border Ownership Network . . . . . . . . . 77 vi Figure 3.4 Border Ownership Network Training Pro- cedure . . . . . . . . . . . . . . . . . . . . 80 Figure 3.5 C Shape With and Without Ambiguity . . 82 Figure 3.6 Border Ownership Results on Object Scales 86 Figure 3.7 Sample Shape Responses . . . . . . . . . . 87 Figure 3.8 Sample Shape Responses . . . . . . . . . . 88 Figure 3.9 Rotation Invariance . . . . . . . . . . . . . 89 Figure 3.10 Scale Invariance . . . . . . . . . . . . . . . 90 Figure 3.11 Translation Invariance . . . . . . . . . . . . 91 Figure 3.12 Airplane Natural Image . . . . . . . . . . . 93 Figure 3.13 Bear Natural Image . . . . . . . . . . . . . 94 Figure 3.14 Car Natural Image . . . . . . . . . . . . . . 95 Figure 3.15 Horses Natural Image . . . . . . . . . . . . 96 Figure 3.16 Example Columns . . . . . . . . . . . . . . 98 LIST OF TABLES Table 2.1 Conflict Learning Parameter Listing . . . . 56 ACRONYMS V1 Primary Visual Cortex BO Border Ownership vii 1 INTRODUCTION The visual system paradoxically provides a constant intuitive expe- rience of the world while simultaneously remaining a tremendously complex and difficult physical system to explain. It seems that the more that is discovered about how the brain works, the more complex the machinery that creates the intuitive illusion becomes. Since the studies of Hubel and Wiesel (1963) it has been known that the development of early visual processes is dependent on experience, but how the brain translates this visual experience into a consistent reality remains an intriguing question. Hubel and Wiesel showed that neurons selective to orientation in the primary visual cortex (V1) could be organized in a hierar- chichal fashion and further that those neurons were tuned to the statistics of the visual input. In the intervening decades, models of visual development have largely followed this design and have primarily focused on feedforward pathways, i.e., a one way flow of information from the retina through successive layers of corti- cal processing. These feedforward models, combined with recent advances in computational power, have been successful in driving a resurgence in neural networks (Schmidhuber, 2015). However, despite current successes, these modern feedforward models have had little reciprocity in furthering our understanding of how the brain learns (Wolchover, 2017). This thesis focuses on an underserved and less well understood mechanism that is prevalent in the cortex: modulatory connections and feedback. The cortex is permeated by predominantly feedback connections, i.e., those that originate from the brain itself and are passed down the feedforward hierarchy, of which the majority are modulatory connections (Markov et al., 2014). Unlike driving connections, which contribute directly to the firing of a neuron, modulatory connections are those that must coincide with a driving connection to have any effect on activation (Larkum, 2013). A primary contribution of this thesis is the development of a new learning rule, called conflict learning, that enables stable learning in the presence of modulatory connections. In developing the new learning rule, intuitive and testable predictions of how the brain operates are made. The new learning rule is applied to a model of an early but challenging visual process called border ownership 1 1.1 modeling the visual cortex 2 (a process which involves the assignment of object boundaries to objects). The developed model of border ownership is the first to be learned in an unsupervised fashion through experience. This learning rule and various modeling experiments performed with it are detailed in chapter 2. Following the development of conflict learning, this thesis lays the foundation for a general framework of cortical learning based upon the idea of a ‘competitive column’. This column describes a prototypical organization for neurons that gives rise to an ability to learn scale, rotation, and translation invariant features. A novel notion of ambiguity, which dampens the activation of neurons under conflicting sources of information, is key to the working of these columns. Modulatory connections allow the learned features to be conjunctions of several simple components. For example, in border ownership, a neuron may learn an oriented edge segment that also signals the presence of an owning object. This is presented in chapter 3, where the model of border ownership from the previous chapter is extended to be invariant and capable of handling locally ambiguous input. The remainder of this chapter provides broad background to put these contributions in context, with later chapters supplying more specific related work. 1.1 modeling the visual cortex In this section a few of the many models of the visual cortex are detailed. Core features of these models serve as a basis of inspiration and provide the foundation of the competitive column model introduced in chapter 2 and detailed in chapter 3. 1.1.1 Hubel and Wiesel The model put forth by Hubel and Wiesel (Hubel and Wiesel, 1962, 1965) established a framework that has formed the basis for hierarchichal models of the visual system. Combined with the earlier ideas put forth by Hebb on how neurons could be organized to produce complex behavior (Hebb, 1949), this spurred great interest into understanding visual perception and the mechanisms that underlay it (Wurtz, 2009). The Hubel and Wiesel model of orientation selectivity consists of layers of cortical processing in which information cascades from layer to layer in a feedforward, sequential fashion. Mimicking biology, model cells in higher stages are generally responsive to 1.1 modeling the visual cortex 3 Figure 1.1: A diagram showing the hierarchichal nature of both simple and complex cells, which are modeled on the experimental data Hubel and Wiesel recorded. Simple cells pool over center-surround cells from the lateral geniculate nucleus (LGN), while complex cells pool over simple cells of similar orientation selectivity to grant spatial invariance. Each successive layer in the hierarchy features cells with larger receptive field sizes that respond to more complex features. Reproduced from Hubel and Wiesel, 1962. more complex features, have larger receptive fields, and show greater invariance. Their hierarchy of ‘simple cells,’ which respond to specific oriented stimuli, and ‘complex cells,’ which show a spatial invariance over the same stimuli, is illustrated in figure 1.1. This concept of feature extraction followed by a spatial invariance step is found in almost every subsequent hierarchichal model. 1.1.2 Adding Invariance Fukushima developed (1975) and refined (1980) a hierarchichal model of object recognition that expanded upon the basic design of Hubel and Wiesel. This model, called the neocognitron, formalizes the function and structure of a cascade of simple and complex cells into a deep hierarchy of so called S-cells and C-cells that ends with unique C-cell activation corresponding to a single object identity. 1.1 modeling the visual cortex 4 An illustration and description of the model can be seen in figure 1.2. There are several features of the neocognitron that are especially relevant to the current work. The first is that the model S-cells are plastic and their weights are adjusted by a Hebbian learning rule through experience. Although the C-cells as well as some other lat- eral connections in the network are not learned, the demonstration that a deep hierarchy could be learned without a teacher through experience alone was a significant contribution. The model also organized S- and C-cells into ‘cell planes’ that receive the same distribution of inputs, shifted by their position. This design can be seen echoed in modern day convolutional net- works, where the kernels can be thought of as S-cells, and various pooling operations as C-cells. Additionally, the S-cells are organized such that cells with similar receptive fields are wired together into a ‘column’ structure. This organization of cells with similar receptive fields was inspired by the hypercolumns proposed by Hubel and Wiesel (1977), which itself was influenced by the work of Mountcastle, 1957. The key importance of this column organization is that, when combined with an associative learning rule, it provides a way for neurons to differentiate their learned features. Only the most active S-cell in a column is allowed to update its weights to the current stimulus in a winner-take-all learning scheme. Winner-take-all mechanisms provide a way for a network to make a coherent decision from multiple choices (Feldman and Ballard, 1982). A final noteworthy facet of the model is its use of inhibitory connections within each S- or C-cell layer to perform shunting-type inhibition, which provides a divisive normalization to activation. This inhibition, which is also learned for the S-cells, causes their activation to drop when a non-preferred stimulus is presented. In this fashion, the activation for S-cells also performs as a winner- take-all network, much like the learning does. In a variant with feedback (Fukushima and Miyake, 1978), the cognitron was made to act as an associative memory, much like a Hopfield network (Hopfield, 1982). The role of feedback in these types of networks is to restore degraded input - an initial weak decision made by the network will self-reinforce until it is confident in that decision. The feedback in these networks was essentially purely topographical, with no functional distinction from feedforward connections. This behavior is different from when feedback has a modulatory role, as will be investigated in the next chapter. 1.1 modeling the visual cortex 5 Figure 1.2: Adiagramoftheneocognitronheirarchy. Themodelconsists of alternating layers of simple (S-cells) and complex cells (C-cells), much like that of Hubel and Wiesel. S-cells are organized into local columns that, combined with a local learning rule and winner-take-all behavior, causes them to learn disparate features. C-cells pool over S-cells that can span many columns and fire if any of their S-cell inputs are active. In this way C-cells display spatial invariance. There are a small total number of S- and C-cells, which are organized into cell planes in a convolutional fashion. The model is constructed such that the final layer of C- cells correspond to object identities and are invariant to the spatial position of their preferred object. Reproduced from Fukushima, 1980. HMAX (Serre, Wolf, and Poggio, 2005) is another hierarchichal model that again operates in a fashion similar to the organization put forth by Hubel and Wiesel. It consists of alternating layers of simple and complex cells, where the simple cells compute some local feature and the complex cells perform a max (hence the MAX in HMAX) operation. This max operation greatly enhances the invariance of the model, though it comes with a loss of locality information as successive layers of max pooling occur (Sabour, Frosst, and Hinton, 2017). A full description of the model is provided in figure 1.3. A key difference, aside from the fact that HMAX performs no learning, is that the HMAX model is scale invariant. This is done utilizing an approach taken from image processing: a scale pyramid. In HMAX, a large number of simple cell responses are computed not just for differing orientations, but for differing spatial scales. In contrast to a hierarchichal approach 1.1 modeling the visual cortex 6 Figure 1.3: A system overview of the HMAX model. The S1 layer consists of multiple scales and orientations of Gabor filter responses. The C1 layer performs, for each orientation, a pooling over position and scale, though the pooling over scale is restricted to adjacent sized scale bands. The S2 layer operates much like the S1 layer, though it uses a bank of previously encoded features instead of Gabor filters. The final C2 layer pools each filter in the bank of S1 filters. The output of the C2 layer can be input to a classifier to perform recognition. Reproduced from Serre, Wolf, and Poggio, 2005. to scale invariance, the pyramid computes all of the scale responses in parallel and merges them through pooling operations. The MAX operation thus provides not only spatial invariance but scale invariance. 1.1.3 Learning From a Teacher LeCun et al., 1989 created a similar model to the neocognitron and applied it to the same task of digit recognition. Although both are hierarchichal and convolutional, one of the major differences in idealogy between the two is that the model of LeCun et al. is trained using supervised learning via the backpropagation of error algorithm (backprop). Backprop is a numerical technique that assigns errors in the predictions of a trained network proportional 1.2 learning rules 7 to neuron weights in a backward pass through the network as a form of gradient descent(Rumelhart, Hinton, and Williams, 1985). Backprop requires a reciprocal mapping to convey errors generated at the deepest levels of the hierarchy back to the earliest stages. This scheme is not believed to be biologically plausible and has been termed the ‘weight assignment problem’ (Grossberg, 1987). However, the important lesson to learn from backprop trained networks, which the majority of modern deep learing architectures utilize (LeCun, Bengio, and Hinton, 2015), is that there is consid- erable power in employing an error signal as a teacher for adjusting synaptic weights. Lillicrap et al., (2016) demonstrated that ran- dom weights can replace using the actual weights of downstream neurons when sending an error signal upstream. Although this work is very intriguing, it still leaves unanswered certain questions such as what the ultimate source of the error signal is and how feedback mechanics should interplay with feedforward circuitry. 1.1.4 Moving Beyond Feedforward The aforementioned models are dominated by feedforward pro- cessing and do not utilize feedback or modulatory connections. When looking at the connectivity of different regions of the visual system (figure 1.4) it is clear that the wiring in the brain is far more interconnected than these models would suggest. Explaining the interaction of feedforward and feedback processing and how they impact learning is a prime motivation of the thesis. 1.2 learning rules A core contribution of this thesis is the development of a new unsu- pervised learning rule that operates in regimes where a traditional associative Hebbian learning rule cannot. The next chapter will detail the conditions for this in detail. This section provides some general background on Hebbian learning and inspired learning rules. Hebbian learning is an associative learning rule famously summa- rized under the principle of “fire together, wire together”. It is an associative learning rule that increases the connection strength be- tween two neurons if their firing coincides. The influence of Hebb’s work can be seen in the development of nearly every learning rule devised for self organizing networks. All of the rules described here feature increasing synaptic strength as a function of correlated firing as a core component. Though these learning rules operate 1.2 learning rules 8 Figure 1.4: A diagram showing the hierarchy of brain regions involved in visual processing and their connectivity. This diagram can be considered a more detailed and up-to-date version of the connectivity diagram first published by Felleman and Van Essen, (1991), showing that the different brain regions involved in visual processing are extremely interconnected by feedforward and feedback, moreso than previously es- tablished. Green shaded regions correspond to the dorsal (‘where’) pathway, while blue shaded regions correspond to the ventral (‘what’) pathway. Reproduced from Markov et al., 2014. at a somewhat high level of abstraction, they play an important role in allowing us to develop better models of various faculties of the brain. The ideas present in these learning rules and associated models can help direct experimentalists or be adapted to more immediately practical applications in disciplines such as computer vision or machine learning. As mentioned in the previous section, Fukushima’s cognitron (1975) used a learning rule that strengthened connections if there was a correlation between pre- and post-synaptic firing, with the additional constraint that the post-synaptic neuron had to be a local winner amongst its neighbors. This concept would be 1.2 learning rules 9 extended by Fukushima in future development of the neocognitron (Fukushima, 1980). Other early approaches to learning Kohonen’s Self-Organizing Map (SOM) (1990), which was concerned with explaining the spatial organization of various brain functions, though later ap- plications learned V1 like orientation selectivity (Swindale and Bauer, 1998). The learning rule for an SOM is again Hebbian like in nature where a global winner is allowed to increase weight, along with spatially neighboring neurons in a so called learning eligibility region (LER). It however has issues with biological plau- sibility, especially in regards to its reliance on global processes and connectivity (Miikkulainen et al., 2006). The LISSOM model (Sirosh and Miikkulainen, 1994) eliminated global connectivity re- quirements by incorporating local Hebbian-driven learning, which allowed previously hard-coded lateral connections to be learned. Sirosh and Miikkulainen’s work was further extended in GCAL (Bednar, 2012), which introduced adaptive firing thresholds and gain control to robustly model the development of V1 (Stevens et al., 2013). Jain et al. (2015) showed that by modifying the updates applied to neighboring neurons in the LER to follow a hybrid spatial and activation based rule, an SOM could learn maps with a wide variety of topologies, including multimaps, which are composed of many independent interdigitated feature maps. The introduction of Hebbian learning to models of self-organization allowed more phenomena to be explained, though they have largely remained focused on feedforward networks. Other attempts to model neural learning have introduced ad- ditional mechanisms on top of Hebbian learning, often revolving around explicit synaptic weakening. The Bienenstock-Cooper- Munro (BCM) rule (Bienenstock, Cooper, and Munro, 1982) uses a floating threshold based on activation to modulate the magnitude and sign of a Hebbian update (figure 1.5). Spike-timing-dependent plasticity (STDP) (Song, Miller, and Abbott, 2000) uses the rel- ative timing of neuron spikes to similarly affect its Hebbian-like update (figure 1.6). The Leabra framework (O’Reilly et al., 2012), which combines Hebbian learning with error-driven learning, takes ideas from both of these concepts, using activation over multiple time scales to control a BCM-like threshold (figure 1.7). The learning rule developed in the next chapter, conflict learning, is also built on the foundation of Hebbian learning, and shares some concepts found in these rules. However, conflict learning was ex- plicitly designed around the challenges of modulatory feedback, making it capable of learning under configuration these rules were 1.3 encapsulating the problem 10 Figure 1.5: The BCM learning function is a modified form of Hebbian learning with a floating threshold θ M that models the long term average of a neuron’s activity and controls the point at which the rule switches from negative to positive learning. Reproduced from O’Reilly et al., 2012. Figure 1.6: The STDP learning function adjusts weights based on rela- tive spike timing; if a pre-synaptic neuron spikes before a post-synaptic one, the weight is increased. If it spikes after, the weight is decreased. Reproduced from Song, Miller, and Abbott, 2000. not designed to consider. The analysis and discussion presented in the next chapter will revisit some of these learning rules. 1.3 encapsulating the problem This work is primarily concerned with performing unsupervised learning in networks with feedback and modulatory connections. While these types of connections and flow of information are be- lieved to be prevalent throughout the visual system, it is useful to distill the problem to a prototypical example that can be supported by physiological data. Traditionally learning rules are demonstrated on one of the earliest visual processes the visual cortex performs - orientation selectivity. This is the same problem that Hubel and Wiesel focused on and is often seen as a sanity check for a learning rule and associated network. Even in networks that have the ultimate goal of object recognition, it is often the case that their early stages end up learning orientation selective features. 1.3 encapsulating the problem 11 Figure 1.7: The XCAL function used by Leabra is similar to the BCM function with an additional threshold θ d controlling when the function reverses direction, which is dependent on the floating thresholdθ p . Reproduced from O’Reilly et al., 2012. However, orientation selectivity can be solved to a large extent without the need to model feedback or modulatory connections. This prompts the need for a more complicated visual process that can still be simply described. Border ownership is ideally suited to be the litmus test for learning a network with feedback and modulation. Border ownership predominantly occurs in visual area V2, and though it is a less well studied phenomenon, there is an increasing body of literature and data which support it as a process that is reliant on feedback. This section reviews both of these processes and previous at- tempts to model and explain them. The learing rule developed in the next section will be demonstrated against both to show that it not only resolves the dilemma of modulatory feedback, but maintains expectations of feedforward learning when used in a context without feedback. 1.3.1 Orientation Selectivity With few exceptions, the traditional focus of many learning rules for self-organization has been on feedforward processing. In particular, a common test is their ability to learn orientation selective units from visual input, a key characteristic of neurons in the primary visual cortex. This is largely due in part to the fact that V1 is one of the most studied areas of the cortex, and much of its behavior can be captured without introducing feedback. In many mammals, orientation selective neurons form systematic topographic maps organized by preferred orientation and retinal location (Blasdel, 1992, see figure 1.8). These maps develop in a stable fashion from weakly to highly selective while maintaining spatial similarity in layout (Chapman, Stryker, and Bonhoeffer, 1996). These maps 1.3 encapsulating the problem 12 Figure 1.8: Neurons in the primary visual cortex of a macaque mon- key colorized by their preferred orientation. Nearby neu- rons tend to cluster to similar orientations, with points at which orientation preference changes continuously forming pinwheels. Neurons at pinwheel centers and other discon- tinuties tend to be less selective. Legend in top right maps colors to orientation. Reproduced from Blasdel, 1992. are further highly influenced by the statistics of natural input, with animals reared in abnormal visual conditions seeing their environment reflected in the tuning of their neurons (Wiesel, Hubel, et al., 1963). A common test for learning rules is thus their ability to learn orientation selective units from visual input in a stable fashion (Bednar and Miikkulainen, 2006; Masquelier, 2012; Stevens et al., 2013; Swindale and Bauer, 1998; Wenisch, Noll, and Hemmen, 2005). It is well-known that orientation maps develop to maturity based on visual experience in the primary visual cortex of many mammals (Ferster, 1987; Li, Fitzpatrick, and White, 2006; Petti- grew and Konishi, 1976) and thus it is reassuring if a learning rule replicates this behavior. See figure 1.9 for an example of results using the GCAL learning rule, which is used as a baseline for Hebbian learning in the next chapter. 1.3 encapsulating the problem 13 Figure 1.9: A comparison between orientation maps in the ferret (A) and those learned using the GCAL learning rule (B). Maps are colored in a similar fashion to that of figure 1.8. (C) Example receptive fields which become increasingly selective over time. (D, E) Stability (similarity to final map) and selectivity over time, showing that for both ferret as well as the model, stability smoothly increases with selectivity. Reproduced from Stevens et al., 2013. 1.3.2 Border Ownership Perhaps the easiest way to understand what border ownership is is by looking at a famous optical illusion: Rubin’s vase (Rubin, 1915, figure 1.10). In this image, it is possible to arrive at two different interpretations of the shapes in the scene - but it is not possible to do so simultaneously. The contour that divides the vase from the faces either belongs alternately to either the faces or the vase, depending on which is attended to. Border ownership is the determination of which side of a contour belongs to the figure, and which to the background. As this example demonstrates, this assignment can often be ambiguous, and one of the key challenges in solving border ownership is dealing with ambiguity. This work will directly address these challenges of ambiguity in chapter 3. Neural evidence for border ownership was first demonstrated by Zhou, Friedman, and Von Der Heydt, 2000, who found neurons in the macaque visual cortex selective for border ownership in areas V1, V2, and V4. In their experiments, they first mapped the receptive field size and orientation tuning of neurons before 1.3 encapsulating the problem 14 Figure 1.10: A classical optical illusion showcasing a bi-stable represen- tation of two faces, or alternately a vase. Which object is figure and which is background can be made to change by attending to either object, but it is not possible to perceive both simultaneously as figure. presentingimageswheretheboundaryofafigureoverlaidaneuron’s preferred orientation. In alternating which side of this orientation the figure lay, they were able to demonstrate that some neurons showed a preference for which side of their boundary the figure occurred (figure 1.11). Based on their experiments, they estimated that 18% of V1, 59% of V2, and 53% of V4 neurons selective to edges are sensitive to border ownership, suggesting that there is a hierarchichal component to its computation (Kogo and Ee, 2014). Border ownership responses are believed to be a critical compo- nent of figure-ground segmentation, a process that is itself likely critical for higher level tasks such as object recognition (Kogo and Ee, 2014). Border ownership takes queues from multiple sources of information, such as edges, color, texture, and stereoscopic depth (Nakayama, Shimojo, and Silverman, 1989), though edge infor- mation alone is usually, but not always, sufficient (Kogo and Ee, 2014). The majority of BO models therefore use edges to drive the computation, though some do integrate other features as well (e.g., depth, see Hsu and Parker, 2014). Though the existence of border ownership neurons is firmly established, the precise mechanism that allows them to perform this computation is still under some debate. Computing border ownership requires information from outside of a neuron’s classical receptive field, suggesting that it requires the cooperation of BO neurons spread across an entire figure to ensure a correct polarity assignment. Experiments by Sugihara, Qiu, and Heydt, 2011 found that the border ownership response is largely invariant to the scale 1.3 encapsulating the problem 15 Figure 1.11: Border ownership selective neurons in the macaque visual cortex. The images were presented such that the boundary between the surfaces matches the orientation and position of a previously mapped neuron’s classical receptive field. By alternating the contrast or position of the figure, the perceived ownership of the neuron is reversed when going from the top row of images (a1-6) to the bottom row (b1-6). The firing rates (c) corroborate the preference for a specific polarity of ownership. Reproduced from Zhou, Friedman, and Von Der Heydt, 2000. 1.3 encapsulating the problem 16 Figure 1.12: The time course for the BO response for two figures. The upper graph is for a smaller figure covering 3°of the visual field, while the bottom is 8°. Both figures show similar timing for a BO neuron responding to its preferred or non- preferred polarity. The dip in initial response occurs after competition for a winning BO response has occurred. The larger figure shows a slightly decreased response. Repro- duced from Sugihara, Qiu, and Heydt, 2011. of the figure and that the time course of the response is not delayed when the size of the figure increases (figure 1.12). In fact the latency of the BO response in general is extremely short, placing constraints on the mechanisms that could be supplying non-local information to BO neurons. 1.3.2.1 Models of Border Ownership Since its discovery, there have been multiple computational models developed to explain how the brain might compute border own- ership. These models can be categorized into three categories: feedforward (e.g., Supèr, Romeo, and Keil, 2010), lateral (e.g., Kogo et al., 2010; Sakai and Nishimura, 2006; Zhaoping, 2005), and feedback driven (e.g., Craft et al., 2007; Jehee, Lamme, and Roelfsema, 2007). It is likely that the computation of border owner- ship combines processes found in all of these types of models, with an initial guess driven by feedforward and lateral computation fol- lowed by a fast refinement through recurrent modulatory feedback. The models that do not focus on feedback do so not to disprove that feedback is part of the process, but rather to highlight the contributions of feedforward or lateral information. 1.3 encapsulating the problem 17 The goal of this thesis is not to develop the perfect model of border ownership, but rather to show how a model utilizing recur- rent modulatory connections, which is emerging as the most likely putative model, could actually be learned by the brain. Feedback modelsarealsopromisingastheycanbeeasilyadaptedtofitwithin more complex heirarchies (Craft et al., 2007). However, none of the previously developed works have demonstrated or attempted to explain how visual experience could drive the development of border ownership, and this work is the first to show how a model of border ownership could be learned. feedforward models Figure 1.13: A purely feedforward driven model of border ownership. (A, C) Input is split into two complementary channels representing on-off and off-on responses, preferring figure andbackground, respectively. (B)Thenetworkisorganized intothreeequalsizedlayers, withthethirdlayerperforming border ownership assignment. Second layer neurons receive direct excitation from their same retinotopic location and inhibition from all active figure neurons in the first layer. The third layer receives a specific pairing of excitation and inhibition to detect a directed boundary between figure and ground. Reproduced from Supèr, Romeo, and Keil, 2010. To study the role of feedforward in figure-ground segmentation, Supèr et al. (2010) developed a model driven purely by feedforward 1.3 encapsulating the problem 18 processing. Their model consists of three layers and splits the input into two channels, one representing foreground and another background (figure 1.13). The first layer transforms figure-ground texturetoaspikemap, whichissenttothesecondlayer. Thesecond and third layers consist of concentric center-surround mechanisms that first calculate figure-ground modulation, and then border ownership. Border ownership emerges in the third layer by virtue of specific excitatory and inhibitory arrangements of inputs from the second layer. If a layer three neuron receives excitation from a layer two figure neuron and inhibition from a layer two surround neuron, it activates. These connections and the arrangement of the excitation/inhibition pairs are fixed. Although this model does show that it is possible to compute border ownership in a purely feedforward fashion, its simple design makes it incapable of explaining the influence of input outside of the classical receptive field. lateral models Figure 1.14: An example of the excitatory and inhibitory lateral con- nections used in a lateral model of border ownership. Ex- citatory connections are created between BO neurons that support a convex figure. Inhibitory connections serve the opposite purpose, preventing the assignment of polarities which would disagree with a closed figure. Reproduced from Zhaoping, 2005. Zhaoping (2005) developed a model of border ownership that requires no recurrent feedback processing, instead relying on lateral connections between BO neurons to propogate polarity decisions. As mentioned earlier, border ownership is likely a combination of 1.3 encapsulating the problem 19 several types of computation, and the model by Zhaoping aims at explaining it inthe absence of feedback, arguing that feedback likely hasaroletoplaybutisnotessentialforthebasiccomputation. The model uses carefully constructed lateral connections between edge selective BO neurons to either excite or inhibit neighboring neurons depending on whether they agree (support convexity) or disagree (concavity)thatafigureispresent, asseeninfigure1.14. Themodel is able to reproduce many of the actual responses of BO neurons, even showing a degree of scale invariance in regards to the timing of the spread of the BO response for large figures. Although the model does not directly model end-stopping features such as T junctions, it indirectly captures them through the connectivity patterns of excitation and inhibition, which can be seen by observing the T junction in figure 1.14. In additional experiments, feedback simulating attention was applied to the model to demonstrate how the lateral connectivity could be useful in a more complicated model of border ownership taking a wider range of connection patterns into account. Despite the virtues of the model, it has been argued that physiological constraints caused by the topology of visual area V2 as well as the latency of lateral transmission make it unlikely that lateral connections are the dominant factor in the computation of border ownership (Craft et al., 2007). feedback based Feedback based models of border ownership rely on higher level areas with large receptive fields to modulate the activity of lower level areas. Craft et al. (2007) developed a model where pairs of BO neurons compete over a polarity assignment while receiving feedback from higher level grouping neurons, which pool over a wide range of BO responses (figure 1.15). The grouping neurons have annular receptive fields and are essentially tuned to fire when a mostly contiguous arrangement of BO neurons support the interior of the figure occuring at the retinotopic position of the grouping neuron (figure 1.16). Feedback based models better fit physiological requirements of integrating information over wide areas quickly, due to the fast conduction speeds of feedback connections compared to lateral ones (Girard, Hupe, and Bullier, 2001). Additionally, feedback models have been shown to integrate well with models of attention (Mihalas et al., 2011; Qiu, Sugihara, and Heydt, 2007). It is important to note the connectivity requirements of this model, which require very specific connection patterns dependent on not only learned feature selectivity (e.g., edge responses) but 1.3 encapsulating the problem 20 Figure 1.15: The grouping cell based feedback model of Craft et al. (A) Border ownership is computed through the interaction of two layers, one with competing pairs of BO neurons, and another with grouping neurons that have annular receptive fields pooling over BO responses. Each pair of BO neurons receives identical input, which come from edge selective features or end-stopping (e.g., L, T junctions) features. (B) The BO-grouping neuron circuit. A competing pair of BO neurons receive identical edge information as well as end-stopping information which can be used to directly inhibit one of the pair. BO neurons send excitatory feedfor- ward input to grouping cells consistent with their preferred polarity, and receive inhibitory feedback from the group- ing cell their competing BO neuron sends feedforward to. Reproduced from Craft et al., 2007. retinotopic and physical position of neurons as well. Individual pairsofBOneuronsmusthavepreciseinhibitoryconnectionsaswell as properly routed feedforward to appropriately located grouping cells such that a grouping cell receives input from an annulus of BO neurons all selective to pointing to a common interior. Additionally, those same grouping cells must send inhibitory feedback to the competing BO neurons in exactly the same pattern. The model also requires end-stopping information (i.e., the presence of L or T junctions of edges) to help disambiguate polarity assignments. computer vision approaches 1.3 encapsulating the problem 21 Figure 1.16: The ideal connectivity pattern between a grouping cell and a ring of border ownership neurons. The grouping neuron pools over BO neurons selective to polarities corresponding to the same interior direction. In the Craft et al. model, the grouping cell then sends inhibitory feedback to BO neurons with the opposite polarity of its preferred direction. Reproduced from Kogo and Ee, 2014. As it is likely a key part of figure-ground separation, there has been increasing interest in the computer vision community to model border ownership, with the hope that it will play an increasing role in segmentation and recognition. Though these models are not under constraints of being biologically plausible, they are still heavily influenced by existing computational models of border ownership. Ren et al. (2006) use features selective to specific conjunctions of edges (so called shapemes) to train a logistic classifier to locally predict figure vs ground before globally refining their estimates with a conditional random field. More recently, Teo et al. (2015) developed a system with a semi-global approach based upon training a classifier on a conjunction of features including histogram of gradient (HoG), gestalt inspired grouping patterns, and extremal edges (figure 1.17). Though their approach used several features, they all essentially reduce to relationships between edges, with little benefit being seen beyond using HoG alone. Despite the ability of these models to use classifiers to do the heavy lifting of integrating features and their ability to use global computations, there is still room for much progress in this area, with accuracies topping out at around 74% on currently available datasets (figure 1.18), suggesting that the problem is likely more challenging than a simple feedforward process can resolve. 1.3 encapsulating the problem 22 Figure 1.17: (A) The various features used in the computer vision ap- proach to border ownership assignment by Teo et al. The extracted features are used along with ground truth annota- tions to learn (B) a decision tree that encodes the polarity assignment of a local image patch. (C) The final decision for a given patch is made by averaging the decision of trees trained at several spatial scales. The tree enforces consis- tency amongst neighboring pixels. Reproduced from Teo, Fermuller, and Aloimonos, 2015. Figure 1.18: Example assignments and overall accuracies on two border ownership datasets. The Berkeley Segmentation Dataset and Benchmark (BSDS, above) consists of human anno- tated natural scenes. The NYU Depth V2 (below) dataset features images taken from indoor scenes using a structured light camera to capture depth ordering. Overall prediction accuracies are shown inlaid in the image, with a competing computer vision based approach (Ren, Fowlkes, and Ma- lik, 2006) achieving 68.9% accuracy on the BSDS dataset. Reproduced from Teo, Fermuller, and Aloimonos, 2015. 2 CONFLICT LEARNING 2.1 overview Although Hebbian learning has long been a key component in understanding neural plasticity, it has not yet been successful in modeling modulatory feedback connections, which make up a sig- nificant portion of connections in the brain. We develop a new learning rule designed around the complications of learning modu- latory feedback and composed of three simple concepts grounded in physiologically plausible evidence. Using border ownership as a prototypical example, we show that a Hebbian learning rule fails to properly learn modulatory connections, while our proposed rule correctly learns a stimulus-driven model. To the authors’ knowledge, this is the first time a border ownership network has been learned. Additionally, we show that the rule can be used as a drop-in replacement for a Hebbian learning rule to learn a biologically consistent model of orientation selectivity, a network which lacks any modulatory connections. Our results predict that the mechanisms we use are integral for learning modulatory con- nections in the brain and furthermore that modulatory connections have a strong dependence on inhibition. 2.2 introduction The brain has a remarkable ability to learn to process complicated input through self-organization, and since the studies of Hubel and Wiesel (1963) it has been known that the development of early visual processes is dependent on experience. In the decades since, models of visual development have focused on feedforward pathways, with little attention given to the learning of modulatory connections. Modulatory connections, which adjust existing neuron activations instead of directly driving them, dominate feedback pathways, which themselves constitute a majority of the connec- tions in the brain (Markov et al., 2014). Hebbian-based models have come a long way in explaining potential mechanisms of learn- ing (Clopath et al., 2010; Hebb, 1949; Widloski and Fiete, 2014), especially in feedforward models of V1 (Stevens et al., 2013), but an increasing amount of literature suggests that more comprehen- 23 2.2 introduction 24 sively explaining plasticity requires novel approaches (Lim et al., 2015; Zenke, Agnes, and Gerstner, 2015). We will argue that the principles of Hebbian learning, known colloquially as “fire together, wire together,” cannot be used alone to learn correctly or maintain stability in the context of modulatory connections. The primary contributions of this effort are twofold: the develop- ment of a new learning rule that handles modulatory connections, and showing that a stimulus driven feedback model of border own- ership can be learned in a biologically plausible way as a result of the new learning rule. The new learning rule, which we call conflict learning, is composed of three conceptually simple, physiologically plausible mechanisms: adjusting plasticity based on the activation of strongly learned connections, using inhibition as an error signal to explicitly unlearn connections, and exploiting several timescales. With border ownership as our prototypical example, we show that a Hebbian learning rule fails to properly learn modulatory con- nections, while the components of our proposed rule enable it to learn the required connections. Border ownership, which involves the assignment of edges to owning objects, is perhaps one of the earliest and simplest visual processes dependent upon modulatory feedback (Kogo and Ee, 2014), appearing in V1, V2, and V4 (Zhou, Friedman, and Von Der Heydt, 2000). Although many models of its function exist (e.g., lateral models: Sakai and Nishimura, 2006; Zhaoping, 2005, feedforward: Supèr, Romeo, and Keil, 2010, and feedback: Craft et al., 2007) those incorporating feedback are especially promising, integrating well with models of attention (Mi- halas et al., 2011; Qiu, Sugihara, and Heydt, 2007) and concepts of grouping (Martin and Heydt, 2015). However, until now, all of these models have used fixed, hand-crafted weights, with no demonstration of how the connection patterns for border ownership might be learned. With our new learning rule, we demonstrate that inhibitory mod- ulation of plasticity, in conjunction with competition, are likely crucial mechanisms for learning modulatory connections. Addition- ally, we show that the rule can be used as a drop-in replacement for a Hebbian learning rule even in networks lacking any modulatory connections, such as an orientation selective model of primary visual cortex. Conflict learning is compared against a recent Heb- bian learning based rule (GCAL; Stevens et al., 2013), which is a good baseline rule for comparison because its weight updates are governed purely by Hebbian logic and it operates at a level of abstraction that captures important physiological behaviors while still being usable in large scale neural network models (e.g., ori- 2.3 modulatory connections 25 entation selectivity) and being adaptable for use in new network architectures (e.g., border ownership). We demonstrate that con- flict learning, like a Hebbian rule such as GCAL, can be used to learn a biologically consistent model of orientation selectivity. Our results further suggest that networks learned with conflict learning have improved noise and stability responses. Conflict learning works in a fundamentally different way to previ- ous learning rules by leveraging inhibition as an error signal to dy- namically adjust plasticity. Though many existing techniques built upon Hebbian learning, such as those derived from STDP (spike timing-dependent plasticity, Song, Miller, and Abbott, 2000) or BCM learning (Bienenstock, Cooper, and Munro, 1982), have some method to explicitly control synaptic weakening (e.g., based on signal timing for STDP or comparisons to long term activation averages for BCM), inhibition only indirectly affects learning by lowering activation. Our successful application of the rule to learn- ing models of orientation selectivity as well as border ownership serves as a prediction that modulatory connections in the brain require inhibition and competition to play a bigger role in the dynamics of neural plasticity and activation. 2.3 modulatory connections Modulatory connections are the primary motivation for the devel- opment of conflict learning. They are found extensively in feedback projections related to visual processing, for example from visual cortex to the thalamus (Cudeiro and Sillito, 2006; Jones et al., 2012, 2015), from higher visual areas to primary visual cortex (Callaway, 2004; Hupe et al., 2001), as well as from posterior parietal cortex to V5/MT (Friston and Büchel, 2000). Top-down modulatory influences also play a role in phenomena such as attention (Baluch and Itti, 2011; Beuth and Hamker, 2015; Yantis, 2008), object segmentation (Roelfsema et al., 2002), and object recognition (Bar et al., 2006). Attention is a modulatory effect and has the greatest impact on already active representations (Buschman and Kastner, 2015). Modulatory feedback, used in much the same way as in our border ownership experiment, has been used to construct a model of attention that replicates numerous observed attentional effects on both firing rates and receptive field structure (Miconi and VanRullen, 2016). Modulatory connections can alter the existing activation of a neuron, but cannot cause activity in isolation; they must work in conjunction with driving inputs (Brosch and Neumann, 2014b). 2.3 modulatory connections 26 We can observe this distinction mathematically by first looking at the activation function for an artificial neuron, which is typically modeled by some function of its weighted inputs: x j =f( X i∈input x i w ij ) (2.1) where w ij is the weight between neurons i and j and x i is the activation of neuron i. However, as modulatory connections are defined as those that do not directly drive the activation of a neuron, their effect must be distinguished from driving connections, which, in similar fashion to Brosch and Neumann, 2014b, we formalize as: x j =f(D j +g(D j ,M j )) (2.2) where D j = P i∈driving x i w ij and M j = P i∈modulatory x i w ij . g is a monotonically increasing function with respect to D j and D j = 0 implies that g(D j ,M j ) = 0. Typically, g is a simple product between D j and M j (e.g., Bayerl and Neumann, 2004; Brosch and Neumann, 2014a; Roelfsema et al., 2002), hypothesized to be implemented biologically by backpropogation-activated coupling (Larkum, 2013). When feedforward inputs are taken to be driving and feedback to be modulatory, it can be said that feedback is gated by feedfor- ward, an effect noted by Larkum, (2013). Roelfsema et al., (2002) discuss the idea of gating in detail and use it to support a model of figure-ground segregation. This gating allows networks to integrate feedback without struggling to balance it against feedforward input or incurring spurious top-down-driven activation. The physiologi- cal mechanics of modulation have been best studied in relation to the thalamus, with a recent review by Varela, 2014 showing that modulatory input is extensive and heterogeneous in regards to ori- gin, neurotransmitter, and function. Brosch and Neumann, 2014b discuss the evidence for the potential physiological implementation of modulatory feedback while developing a network-level circuit model for feedforward and feedback interaction. 2.3.1 Hebbian Learning and Modulatory Connections Traditional Hebbian based learning rules adapt weights based on some function of the coincidental firing of pre and postsynaptic neurons: Dw ij =f(w ij ,x i ∗g(x j )) (2.3) 2.3 modulatory connections 27 Hebbian learning in its most basic formulation has no mechanism to bound weight growth, making it trivially unstable. For our purposes we use a formulation of Hebbian learning that includes a normalization component for stability, adapted from Stevens et al., 2013: Dw ij = w ij +ηx i x j P k (w kj +ηx k x j ) −w ij (2.4) where η is the learning rate. This weight update, and its nor- malization, are applied independently to driving and modulatory connections (i.e. all w ij are the same connection type). To better understand why such a Hebbian rule is not suitable for learning modulatory connections, let us look at the dynamics of a minimal network with two competitive neurons, illustrated in Fig. 2.1. In this context, competitive means that the neurons are connected such that more active neurons inhibit the activation of those less active through lateral connections. The desired state of this network is to have each competing neuron develop a strong connection to a unique source of modulatory input. It should be noted that this end state is considered desired due to its computa- tional usefulness as a source of top-down information rather than a direct extrapolation from biology. We can imagine this network as, for example, a simple attention network concerned with detecting apples or oranges in its input. Themodulatoryconnectionsactasattentionalbiasestowardseither apples (M 1 ) or oranges (M 2 ). Though one fruit may be desired over the other (e.g., searching for a specific fruit; M 1 active vs. M 2 ), the network has no control over what is present in its input. Features related more to apples (N 1 ) or to oranges (N 2 ) may be active regardless of the bias signal, even occurring simultaneously. This presents a problem to learning if a pure correlation based rule, like Hebbian learning, is to be used, as the top-down bias is equally correlated with each bottom-up driving input. Learning a unique source of modulatory input is desirable because it allows the attentional biases to affect only the features with which they are semantically associated. With this in mind, let us analyze how this Hebbian learning rule behaves in this network. The activity of a neuron in the network can be expressed using (2.2) with a product for g() along with adding divisive inhibition (Carandini and Heeger, 2012) for competition (following Brosch and Neumann, 2014b; Stevens et al., 2013) as well as a noise term: x j = D j +D j M j + 1+Inhib j (2.5) 2.3 modulatory connections 28 Figure 2.1: A simple network with modulatory connections. Neurons N 1 andN 2 receive identical driving input and compete over input from two modulatory neurons, M 1 and M 2 . The colored connections show the desired state of the network, where each competing neuron has learned a unique source of modulatory input. The dashed connection represents lateral inhibition. We are interested in the dynamics of the network once it has reached the desired state. Let us assume that the neurons have each already learned associations to a unique modulatory input, such that w M 1 N 1 =w M 2 N 2 =w max and w M 2 N 1 =w M 1 N 2 =w min . Because the weights are normalized (see (2.4)), this configuration implies that w min +w max = 1. Without loss of generality, assume thatM 1 is highly active while M 2 is inactive, resulting in M 1 sending strong feedback to N 1 . Because of that feedback, N 1 will become more active than N 2 regardless of whether or not it was more active prior. N 2 is then inhibited by N 1 , but because it receives the same driving input, it remains at a lower but non-zero activation. Formally: x M 1 > 0 and x N 2 > 0 substituting this into (2.4) gives: Dw M 1 N 2 = w min +ηx M 1 x N 2 w min +w max +ηx M 1 x N 2 +0 −w min = w min +ηx M 1 x N 2 1+ηx M 1 x N 2 −w min (2.6) 2.4 introducing conflict learning 29 letting α=ηx M 1 x N 2 , Dw M 1 N 2 = w min +α 1+α −w min = w min +α−w min (1+α) 1+α = (1−w min )α 1+α (2.7) Because α> 0 and 1>w min ≥ 0, Dw M 1 N 2 > 0. Thus w M 1 N 2 is increasing and the system is not in a steady-state. This implies that even if this Hebbian learning rule managed to reach the desired state, it would not be in equilibrium and would be disrupted by any input. Compared to this simple example, the modulatory inputs in more general networks will be populations of correlated neurons, and the competing neurons may not all receive identical driving input. Distinct populations are assumed to be weakly correlated with each other (otherwise they would be the same population). The core challenge of learning modulatory connections, however, can be captured by this example using two neurons driven by identical input competing over two independent modulatory inputs. Implementations of Hebbian learning that bound weight growth through means other than weight re-normalization, such as the Generalized Hebbian Algorithm (Sanger, 1989), which is closely related to Oja’s rule (Oja, 1982), or the BCM rule, which uses an adaptive threshold based on expected average activation to adjust the sign of the weight update, can also be shown to be either unstable or not guaranteed to reach the desired state of this network. We will revisit and analyze these two variations of Hebbian learning in 2.4.2 after introducing conflict learning in the next section. 2.4 introducing conflict learning Conflict learning was developed to address the demonstrated in- stability of Hebbian learning rules in the context of modulatory connections, and can be intuitively described as a rule that assigns a unique population of correlated modulatory inputs to each neuron competing over those inputs. It is a general learning rule composed of three conceptually simple, physiologically plausible mechanisms: adjusting plasticity based on the activation of strongly learned connections, using inhibition as an error signal to explicitly unlearn 2.4 introducing conflict learning 30 connections, and exploiting several timescales. These concepts are formalized by the following equations: 1. Spreading: Neuronsarerestrictedtoincreasingweightononly those connections that overlap with their existing preferred stimulus – thus causing a smooth spreading through feature space. This is accomplished using a coefficient applied to the weight update, equal to the maximum activation amongst a neuron’s strongly learned connections: κ i = max j|(w ij (t)> 1 2 max j w ij (t)) x j (2.8) where strongly learned connections are those whose weight exceed half the strength of the largest weight amongst that individual neuron’s connections. 2. Unlearning: Conflict learning treats inhibition as an error signal indicating that the inhibited neuron has mistakenly strengthened any currently active connections. A neuron competing with its neighbors via inhibition exerts pressure on those neurons to unlearn the connections driving its ac- tivation, while receiving reciprocal pressure to unlearn the connections that drive its neighbors. The amount of inhi- bition a neuron receives is used to interpolate between a positive and negative associative weight update: δ ij =(1−Inhib)ηx i x j κ i −Inhib∗βηx i x j (2.9) where β (set to 1 in all experiments) can be used to control the rate of learning versus unlearning. The interpolation between learning and unlearning is irrespective of activation strength and depends only upon the amount of inhibition received. 3. Short and Long-Term (SLT): Connection weights are ad- justed on a short-term and long-term timescale, striking a balance between initial exploratory learning and long-term exploitation of a learned pattern. The short-term weight w ij adjusts rapidly to the current stimulus, but decays towards and fluctuates around the more stable, slowly adapting long- term weight w ltm ij . The only visible weight for a neuron is its short-term weight; long-term weights are internal and only observed via their effect on short-term weights. The entire neuron weight update process has four steps: 2.4 introducing conflict learning 31 a) Compute short-term weight updates δ ij b) Move long-term weights towards short-term weights: w ltm ij (t+1)=(1−s ltm )(w ij (t)+δ ij )+s ltm w ltm ij (t) (2.10) c) Move short-term weights towards long-term weights: w ij (t+1)=(1−s stm )(w ij (t)+δ ij )+s stm w ltm ij (t+1) (2.11) d) Normalize short- and long-term weights independently where s ltm and s stm are smoothing factors, described below. An accumulator of lifetime short-term weight updates is used for computing the smoothing factor s ltm for the long-term weight update: acc ij (t+1)= acc ij (t)+δ ij (2.12) The smoothing factor for the long-term update,s ltm , is computed by comparing a neuron’s proportion of long-term weight against its proportion of lifetime accumulator value (normalized w ltm ij (t) vs acc ij (t+1)). When the w ltm ij (t) update would move the long- term weight proportion towards that of the accumulator, s ltm is decreased, proportional to the remaining distance between them. In cases where the w ltm ij update would move the proportion away from the accumulator, s ltm is increased. Thesmoothingfactorfortheshort-termupdate,s stm , isconstant, with smaller values preferring the short-term weight. Short-term and long-term weights are divisively normalized in- dependently. Weights initially start lower than their allowed totals and are not normalized until they have grown to exceed it. The full conflict learning rule is not used for learning inhibitory connections as they serve as control signals for the rule itself. Instead, these connections have a single weight based upon a normalized lifetime accumulation of weight updates: acc lat− ij (t+1)= acc lat− ij (t)+x i x j w lat− ij (1−Inhib) (2.13) w lat− ij (t+1)= acc lat− ij (t+1) P k acc lat− kj (t+1) (2.14) 2.4 introducing conflict learning 32 Conflict learning uses the same neuron activation principles as GCAL (Stevens et al., 2013), described in 2.5.3.1. It should be noted that the above equations, although conceptually grounded, are not directly fit to experimental data. The intent of this formu- lation is to demonstrate that these concepts, when used together, provide a stable and plausible way to learn in networks with mod- ulatory connections that could exist in some fashion in actual neurons. Although weight re-normalization (3d) is not strictly bio- logically plausible (see Turrigiano and Nelson, 2004 for more viable alternatives), it ensures that weights are bounded in a computa- tionally amenable fashion, and furthermore is used in the weight update equation for GCAL. An in depth discussion of each component of conflict learning is provided after the experiments, in Section 2.6, using the results to address the components’ contributions towards learning and their biological plausibility. 2.4.1 Conflict Learning and Modulatory Connections We can now revisit the simple network of Fig. 2.1 and see how conflict learning resolves the observed stability problems of the Hebbian learning rules. Recall the earlier argument (Section 2.3.1), which showed that the analyzed Hebbian learning rules are not stable in the desired state of the network. Specifically, we noted that w M 1 N 2 had a non-zero update. This is not the case when the conflict learning rule is used instead. Assuming the same weight configuration as for the Hebbian rule, if M 1 and N 1 are active, x N 1 > x N 2 , and thus Inhib N 1 = 0 and Inhib N 2 = 1. Additionally, because N 1 has an active strongly learned connection, κ N 1 = 1 while N 2 has no strongly learned active connections, so κ N 2 = 0. For simplicity we use 1 and 0 for the values ofκ and Inhib, though the sign of the update remains the same so long as (1−Inhib N 2 )κ N 2 > Inhib N 2 β holds. Substituting all of this into the short-term weight update (2.9) gives: δ M 1 N 2 =(1−(1))ηx M 1 x N 2 (0)−(1)∗βηx M 1 x N 2 =−βηx M 1 x N 2 < 0 (2.15) Since w M 1 N 2 already has a value of w min , the effective negative weight update applied will be 0, much like the effective positive weight update for w M 1 N 2 will be 0 because it is already at w max . Although N 2 is still partially active, it is being inhibited by N 1 , so it performs explicit unlearning towards M 1 instead of positive learning like in the Hebbian case. This same procedure can be 2.4 introducing conflict learning 33 Figure 2.2: State diagram for the simple network of Fig. 2.1. This dia- gram shows the progression of the network from an initial unlearned state (0SL) to the desired state of each competing neuron learning a unique modulatory input (2SL-Desired). Outgoing transition probabilities as well as the percentage of time spent in each state are shown for both (a) Heb- bian learning and (b) conflict learning, based on simulation. Conflict learning enters and remains in the 2SL-Desired state, having no outgoing transitions from 2SL-Desired. By contrast, Hebbian learning oscillates between 2SL-Desired, 1SL, and 2SL-Shared. The components of conflict learning essential for specific transitions are labeled. The spreading component prevents the network from transitioning from the 1SL to the 2SL-Split state. Although the simple network under conflict learning cannot make the transition from 1SL to the 2SL-Shared state (dashed arrow), this transition is possible in general, and made unstable by the unlearning and SLT components. applied to the other three feedback connections in this example, and in each case the weight update will be 0 or restricted to 0 by the weight value range. Since all of the connection weights maintain their values, the system is at equilibrium and can maintain this steady state. Knowing that conflict learning is stable in the desired state, we can consider its behavior in the other possible states of the network and how the system transitions from an initial unlearned state to the desired stable state. The network has five functionally distinct states of interest as seen in Fig. 2.2. 1) The initial state, where no connections have become strongly learned (0SL). 2) A strongly learned connection between one competitive neuron and one modulatory neuron (1SL). Two strongly learned connections, either 3) one competitive neuron with a strongly learned connection to both modulatory inputs (2SL- Split), 4) one modulatory neuron with strongly learned connections between both competitive neurons (2SL-Shared), or 5) unique strongly learned connections between modulatory and competitive neurons (2SL-Desired). 2.4 introducing conflict learning 34 We performed 30 repeated simulations of this simple network to illustrate the trajectory taken by both the considered Hebbian learning rule and conflict learning through the state space (see 2.5.3.3 for experimental procedures). Fig. 2.2 shows the outgoing transition probabilities as well as the percentage of time spent in each state for both learning rules. This demonstrates that the Hebbian learning rule, which cannot prevent both competitive neurons from performing learning, immediately transitions into the 2SL-Shared state before entering an oscillation between 2SL- Shared, 2SL-Desired, and 1SL. The Hebbian rule cannot enter the 2SL-Split state because this state requires one neuron to perform learning while the other does nothing. Conflict learning, as shown in(2.15), iscapableofperformingpositivelearningonacompetitive neuroninisolation, duetoitsspreadingandunlearningcomponents. The spreading component is chiefly responsible for preventing the system from entering the 2SL-Split state. The unlearning and SLT components are similarly responsible for transitioning the network out of the 2SL-Shared state, were it ever to be in that state. A case by case analysis of the transitions made or avoided by conflict learning can be found in 2.4.2. Using the nomenclature for the states introduced here, additional analysis for simulating two additional variations of Hebbian learning, the Generalized Hebbian and BCM learning rules, is provided in 2.4.3 and 2.4.4, respectively. 2.4.2 Stability Analysis of Conflict Learning Traditionally this type of stability analysis is performed by analyz- ing the properties of the Jacobian. The discontinuous nature of the spreading component (2.8) of conflict learning, which is caused by the categorization of neurons as strongly learned or not, precludes writing a single equation for the individual components of the Jacobian. Given the analyzed network, this would mean creating a distinct Jacobian for each state and categorization of neurons, which would only serve to complicate the presented analysis. We instead continue in the same fashion as in Sections 2.3.1 and 2.4.1. transitions out of 0sl When the network is in its initial unlearned state 0SL, there is no association between modulatory input and competitive neurons, so regardless of which neuron wins or which modulatory input is active, the update occurs in the same fashion. Without loss of 2.4 introducing conflict learning 35 generality, letN 1 andM 1 be the active neurons. The updates from M 1 are then: δ M 1 N 1 =ηx M 1 x N 1 > 0 (2.16) δ M 1 N 2 =−βηx M 1 x N 2 < 0 (2.17) which transitionsN 1 into being strongly learned towards M 1 . Con- nection weights from M 2 are unchanged because M 2 is inactive. transitions out of 1sl Once in a state where one of the competing neurons has a strongly learned connection, there are four possible scenarios of activation. We will again assume, without loss of generality, that N 1 has strongly learned connections from M 1 , and that N 2 has no strongly learned connections. • N 1 and M 1 active: δ M 1 N 1 is the only positive update, so the weight changes proceed as they did under the same conditions in the initial state, keeping the network in the 1SL state. • N 1 and M 2 active: Because N 1 is strongly learned towards M 1 , κ N 1 will be 0 as M 1 is inactive. Thus none of N 1 ’s weights can change and the network remains in the same state, the spreading component preventing the network from transitioning into the 2SL-Split state. N 2 receives inhibition fromN 1 , causing it to unlearn towards the active modulatory neuron M 2 , which results in no effective change as w M 2 N 2 is already w min . • N 2 and M 1 active: In this simple example, the existing feedback from the strongly learned connection between N 1 and M 1 overrides the driving input to N 2 , so N 1 becomes active and N 2 inactive, which we have already seen results in no change to the state. • N 2 and M 2 active: N 2 has no strongly learned connections, thusκ N 2 = 1 andN 2 can learn towards the active modulatory input M 2 . N 2 becomes strongly learned towards M 2 and the network enters the 2SL-Desired state. In more complex networks, it is possible to transition from the 1SL state to the 2SL-Shared state. In these networks, in place of a single neuron, modulatory input comes from correlated popula- tions of neurons. Depending on the activation of the population, a 2.4 introducing conflict learning 36 particular competitive neuron may only be able to learn a subset of connections to a population while one of its competitors learns a different subset. Alternatively, there may be overlap between pop- ulations of modulatory inputs, meaning that some of the neurons that are learned belong to both populations, resulting in a sharing of strongly learned connections. transitions out of 2sl-shared When a network is in this state, more than one competitive neuron has a strongly learned connection to the same modulatory population. In this case, the unlearning component in conjunction with the SLT component work to make this an unstable state and move the network back to 1SL: the less active competitive neuron will actively unlearn its connection to the active population, while the more active one strengthens its connection. Over time this will result in one of the neurons losing its strongly learned status to that population, allowing it to return to an initial unlearned state. The SLT component allows initial changes to happen quickly and creates momentum via long-term statistics once one neuron begins to consistently win versus the other. Consider the behavior of the simple network of Fig. 2.1 if placed into the 2SL-Shared state: because N 1 and N 2 both have strongly learned connections to M 1 , the following applies identically to either N 1 or N 2 : • If M 1 becomes active, either N 1 or N 2 will be more active, depending on noise. The winner will update its weights further towards M 1 while the loser will unlearn its weights towardsM 1 . IfN 1 were the winner, this differential in weight value will cause N 1 to win versus N 2 in future cases of M 1 being active, maintaining these weight updates until the system returns to the 1SL state: δ M 1 N 1 =(1−(0))ηx M 1 x N 1 (1)−(0)∗βηx M 1 x N 1 =ηx M 1 x N 1 > 0 (2.18) δ M 1 N 2 =(1−(1))ηx M 1 x N 2 (1)−(1)∗βηx M 1 x N 2 =−βηx M 1 x N 2 < 0 (2.19) • IfM 2 becomes active, neitherN 1 norN 2 will perform positive learning because they are strongly learned towards M 1 and κ N 1 =κ N 2 = 0. 2.4 introducing conflict learning 37 2.4.3 Stability Analysis of the Generalized Hebbian Algorithm The Generalized Hebbian Algorithm (GHA) can be shown to be unstable for the network of Fig. 2.1 using the same procedure as was used for the normalization based Hebbian learning rule (2.4). GHA adjusts weights as follows: Dw ij =η x j x i −x j j X k=1 w ik x k (2.20) Here we assume most of the same network assumptions as 2.3.1. Thismeansthenetworkisalreadyinthedesiredstateandw M 1 N 1 = w M 2 N 2 =w max andw M 2 N 1 =w M 1 N 2 =w min . However, weassume M 2 and N 2 will be highly active instead of M 1 and N 1 . Now, replacing (2.4) with (2.20) yields: Dw M 2 N 1 =η(x N 1 x M 2 −x N 1 (w min x N 1 )) (2.21) As we are only interested in the sign of Dw M 2 N 1 , and because η> 0 and x N 1 > 0, we have: sgn(Dw M 2 N 1 )=sgn(η(x N 1 x M 2 −x N 1 (w min x N 1 ))) =sgn(x M 2 −w min x N 1 ) (2.22) BecauseM 2 is highly active whileN 1 is being inhibited,x M 2 >x N 1 . Considering this along with the fact that 1 > w min ≥ 0, it must be true that Dw M 2 N 1 > 0, indicating that the system is not in a steady-state. ResultsforsimulatingthislearningruleforthenetworkofFig.2.1 can be seen in Fig. 2.3A. The simulation confirms that 2SL-Desired is not a stable state for this learning rule and shows that the network enters an oscillation between multiple states. 2.4.4 Stability Analysis of BCM We can also demonstrate that the BCM rule, another variant of Hebbian learning, is not guaranteed to converge to the desired state of the network of Fig. 2.1. BCM uses a Hebbian update modulated by a dynamic threshold to control explicit synaptic weakening: Dw ij =ηx i x j (x j −θ j ) (2.23) where θ j is the expected value (long-term average) of x 2 j . The value of θ j directly controls whether this rule is stable in the desired state of the simple network. Let us assume that the 2.4 introducing conflict learning 38 Figure 2.3: State diagram for the simple network of Fig. 2.1 for the Gen- eralized Hebbian Algorithm (Sanger’s Rule) and BCM. This diagram shows the progression of the network from an initial unlearned state (0SL) to the desired state of each competing neuron learning a unique modulatory input (2SL-Desired), much like Fig. 2.2 did for a normalized Hebbian learning rule and conflict learning. Outgoing transition probabili- ties as well as the percentage of time spent in each state are shown for both (a) the Generalized Hebbian Algorithm (GHA) and (b) BCM, based on simulation. States which were not reached by a learning rule have been omitted for clarity. The 3SL state corresponds to exactly three strongly learned connections between modulatory (M 1 and M 2 in Fig. 2.1) and competitive neurons (N 1 and N 2 ), regardless of which set of three connections is strongly learned. The 3SL state was not reachable by either rule shown in Fig. 2.2, either due to weight normalization or various components of conflict learning. Although GHA can spend time in the 2SL-Desired state, it is not stable in that configuration and oscillates between four different states. BCM is stable in the 2SL-Desired state, but is also stable in the 2SL-Split state. Using BCM, the transition out of 1SL is essentially random, and the system cannot reliably end up in the desired state. 2.4 introducing conflict learning 39 network is in the desired state 2SL-Desired, and M 1 is the current active modulatory neuron, implyingx N 1 =x active ,x N 2 =x inhibited , and x active > x inhibited . The sign of each weight update is then dependent solely the (x j −θ j ) term of (2.23). For the system to remain in the stable state, Dw M 1 N 1 ≥ 0 and Dw M 1 N 2 ≤ 0 must hold, as these updates maintain the same assignment of strongly learned connections. The system must simultaneously satisfy the case when M 2 is the active modulatory neuron, which sets up a similar set of requirements: Dw M 1 N 1 ≤ 0 and Dw M 1 N 2 ≥ 0. Arranging all requirements and substituting x active and x inhibited where appropriate, we get: x N 1 =x active ≥θ N 1 x N 2 =x inhibited ≤θ N 2 x N 1 =x inhibited ≤θ N 1 x N 2 =x active ≥θ N 2 (2.24) which is satisfied if and only if x active >θ>x inhibited . However, the BCM rule has another stable state which it can reach, 2SL-Split, which is the state where both modulatory neurons are associated with a single competitive neuron. Once in the 2SL- Split state, the competitive neuron with two strongly learned connections will always activate more strongly than and inhibit the other because it is receiving additional feedback input regardless of which modulatory neuron is active. Let us investigate the dynamics of the network in the 1SL state, before it reaches either 2SL-Split or 2SL-Desired. Without loss of generality, assume N 1 has strongly learned connections from M 1 , and that N 2 has no strongly learned connections. Consider what happens when the threshold, θ, falls within the required bounds for stability in 2SL-Desired, such that the winning neuron with activation x active will do positive learning, and the inhibited neuron with activation x inhibited will do negative learning. The interesting case is what happens whenM 2 is the active modulatory neuron, which has no existing strongly learned connections (i.e. w M 2 N 1 =w M 2 N 2 =w min ). N 1 and N 2 thus receive identical input, so the winner is decided by noise. Due to the value of the threshold, the winner will increase its weight towards M 2 , and the loser will decrease its weight towards M 2 . If the winner happens to be N 1 , the system will transition into 2SL-Split. If N 2 wins, the system will transition to 2SL-Desired. This result can be seen in the simulation results presented in Fig. 2.3B, where the network under the BCM rule has two terminal 2.5 network modeling results 40 states: 2SL-Desired and 2SL-Split. To achieve this, we specifically initialized the adaptive threshold to a value between the activation values of activation and inhibition for the network. 2.5 network modeling results In contrast to the simple network with two competitive neurons, we now focus on large scale (several thousand neurons) neural networks. We test conflict learning by learning a model of border ownership as well as a model of orientation selectivity. The bor- der ownership network relies on modulatory feedback for proper operation, whereas the orientation selective network demonstrates that conflict learning is a general learning rule also applicable in contexts lacking modulatory connections. Conflict learning is compared against an implementation of GCAL (Bednar, 2012; Stevens et al., 2013; threshold adjustment is implemented differently, see 2.5.3.2 for full implementation details), a learning rule that uses purely Hebbian logic to adjust its weights, increasing them when pre and postsynaptic neurons are simulta- neously active. Throughout the rest of this work, we will often refer to GCAL as the “Hebbian learning rule” to emphasize the associative nature of its weight update. GCAL is able to achieve biologically plausible results in applications such as learning V1-like orientation selective maps by way of adjusting neuron activation through contrast normalization and adaptive thresholds (Stevens et al., 2013). For all experiments, both rules use identical activa- tion functions, activation thresholds, and connection patterns, only differing in how their weights are adjusted. This section focuses on reporting the results of the experiments; full technical details on the experimental procedures is provided in 2.5.3. Intuition and further analysis of how each component of conflict learning gives rise to the results shown is provided after the results in Section 2.6. 2.5.1 Border Ownership The primary benefit of conflict learning is its ability to learn in networks with modulatory feedback, a feature that allows it to be used to learn a model of border ownership. As border ownership (BO) is a less familiar and more complicated process than orientation selectivity, it is worth briefly revisiting its putative architecture (illustrated in Fig. 2.4, also see the experimental methods in 2.5.3.4) to fully appreciate the results. The model 2.5 network modeling results 41 Figure 2.4: Border ownership model architecture. (a) Diagram of full architecture. A V1-like layer consisting of Gabor filters processes the input at four orientations (0, 45, 90, and 135°). Each orientation neuron provides input to two border own- ership cells, which are connected laterally to six others (for the three remaining orientations) at the same retinotopic lo- cation within a column in the Border Ownership layer. The grouping layer pools BO column activation, receiving input from all BO cells within all columns in a local receptive field. The grouping layer additionally sends feedback to those same cells. (b) Diagram of a single BO column. Columns contains eight competing neurons, two for each orientation, and internally have lateral inhibitory connections between each neuron. They also receive feedback from a local recep- tive field in the grouping layer. (c and d) The effects of an example stimulus (dotted square, actual experiment uses solid input) on BO columns (cylinders) and grouping cells (circles labeled G). (c) Feedforward connections from the perspective of a BO column. The column sends feedforward input to all grouping cells in its receptive field, but only the grouping cell receiving input from multiple columns is highly active (indicated by increased size). (d) Feedback connections from the perspective of a grouping cell. Feed- back is sent to all BO columns within its receptive field, but only those along the boundary of the object will be highly active. (e) Detailed relationship of competition between two BO neurons with the same orientation. Each BO neuron eventually learns to project to and receive feedback from a grouping cell on only one side of its orientation. 2.5 network modeling results 42 we develop is a derivative of the feedback model of Craft et al., 2007, which as mentioned in the introduction, is one of multiple models capturing the observed behavior of actual border ownership neurons. BOneuronsareidentifiednotjustbyanorientation, butalsobya polarity, which indicates to which side of their orientation the figure (or background) lies (Zhou, Friedman, and Von Der Heydt, 2000). The key challenge is to develop receptive fields such that each BO neuron responds to a single orientation with a single polarity, with full coverage over all orientations and polarities. In our model, this relies on learning feedforward and modulatory feedback connections between columns of BO neurons and a layer of so-called grouping neuronswhichpoolovermultipleBOcolumns, integratingnon-local information. Learning these connections is especially challenging because the multiple BO neurons that exist for each orientation, destined to develop a specific polarity, must learn consistent and opposite connection patterns. The network accomplishes this task purely through experience, with no a priori spatial information – not only are feedforward and feedback weights initially uniform, but BO neurons within a column must also learn to specialize their inhibitory lateral connections, a necessary requirement for competition. While many other models of border ownership require explicit features for junction (e.g., L, T) detection, our learned model requires only edge information. Note that not all components of this model have been directly observed in the brain. Although BO neurons and their responses to various stimuli have been recorded (Zhou, Friedman, and Von Der Heydt, 2000), grouping neurons have yet to be explicitly dis- covered (Craft et al., 2007). Grouping neurons can thus be seen as a computational generalization of a more complicated grouping process, for which there is mounting evidence (e.g., Martin and Heydt, 2015; Wagatsuma, Heydt, and Niebur, 2016). This model is nonetheless a good approximation of the current understanding of border ownership circuits. Additionally, the structure of the border ownership network fits within a standard model of computation in visual cortex: it consists of competition followed by grouping, with increasing receptive field size. This is reminiscent of alternating simple and complex cells Wiesel, Hubel, et al., 1963, which have formed the basis of many models of visual cortex (e.g., Fukushima, 1980; Serre, Wolf, and Poggio, 2005). The connection from edge re- sponsive neurons (input in the model) to border ownership neurons is a simplification for the model; we imagine a more realistic circuit 2.5 network modeling results 43 Figure 2.5: Learned Feedback Receptive Fields for BO Neurons. Re- ceptive fields are shown for all eight neurons of a single representative BO column for both Hebbian and conflict learning rules. Each row of the figure represents a different orientation. Each BO neuron is marked by a blue pixel, and green pixels show feedback connections from grouping neurons, with brightness corresponding to weight strength. Polarity represents the average degree to which every pair of neurons in a network learns feedback from grouping neurons on opposite sides, a necessary requirement for consistent border ownership assignment. The conflict learning network successfully learn pairs of competing polarity BO neurons without any a priori information regarding BO or grouping cells spatial position. would have edge or contour responsive neurons directly compete with each other over border ownership polarity. 2.5.1.1 Results The learned feedback receptive fields for a representative BO col- umn taken from fully trained networks are shown in Fig. 2.5, and the feedforward and lateral receptive fields are shown in Fig. 2.6 (the full details of training and other experimental procedures can be found in 2.5.3.4). Under conflict learning, each neuron within a BO column learns to associate with grouping feedback occurring on only one side of its orientation, with all orientations and polarities represented. Additionally, the two BO neurons associated with each orientation learn to become competitive with each other and 2.5 network modeling results 44 Figure 2.6: Learned feedforward and lateral receptive fields for BO and grouping neurons. (a) Feedforward receptive fields for a grouping neuron, shown for both learning rules. Successful learning entails a ring-like pattern of strong connectivity. (b) As in Fig. 5, the results are organized by the orientation of the BO neuron. For each orientation, the learned outgo- ing feedforward projections are displayed first followed by a radial graph of the corresponding learned lateral inhibition strength for the same neurons. Lateral connections project to other neurons within the BO column, colored by the preferred polarity of the inhibited neuron. For example, a red polarity corresponds to inhibition towards a horizontal selective BO neuron with a preference for objects in the lower half of its receptive field. Under conflict learning, BO neurons learn to primarily inhibit the other neuron shar- ing their orientation, as well as applying a small amount of inhibition to immediately adjacent orientations with overlap- ping polarities. This pattern of inhibition not only ensures the creation of competing pairs of BO neurons, but also a winner-take-all like behavior amongst all orientations in a column. learn opposite sides of feedback. This occurs because the oppo- site sides of grouping feedback come from distinct populations of grouping neurons, and conflict learning, as was shown in Section 2.4.1, strives to associate one competing neuron to each population of modulatory input. The Hebbian learning based rule, however, is unable to develop this partitioning of modulatory feedback amongst competing neurons. The two BO neurons for each orientation learn the same receptive fields as each other, causing them to be unable 2.5 network modeling results 45 Figure 2.7: Border Ownership Assignments by a Network Trained with Conflict Learning. Black lines represent the stimulus and colored arrows represent BO assignments at those locations. Each BO neuron is assigned a direction vector based on its learned polarity. Assignments are made by summing these direction vectors, weighted by activation. All results are taken from a fully learned network naive to these example inputs. The network has complete position and orientation invariance. (a) The progression of BO assignment over time. Feedback begins to arrive in iteration 3. (b-e) Settled (iteration 9) assignments for various stimuli. (c and d) These shapeshavelocallyambiguousborderownershipassignments that are resolved through modulatory feedback from the groupingneurons. (e)Thenetworkisnotfullyscaleinvariant because the BO to grouping neuron connections exist only at a single radius, resulting in the corners being weakly activated. to reliably associate with objects occurring on a particular side of their orientation. When a stimulus is presented to these neurons, the winner will be chosen randomly instead of being chosen based on any border ownership information. Along with the sampled receptive fields, the average polarity score for BO neurons of each orientation is shown. This score represents the degree to which a competing pair of BO neurons learn feedback on opposite sides (see 2.5.3.4). These averaged scores, computed from all pairs of BO neurons, demonstrate that the pictured examples are representative of the whole network. 2.5 network modeling results 46 Fig. 2.7 shows the results of running the trained conflict learning network on common stimuli from the border ownership literature. As the network was trained on single presentations of squares (see 2.5.3.4), every shape presented here is one to which the network has never been exposed. The network in its current implementation has limited scale invariance, demonstrated by the weak response at the vertices of the triangle input (Fig. 2.7E). The responses to the tiled squares (Fig. 2.7A), the C pattern (Fig. 2.7D), and the rounded squares (Fig. 2.7C) are especially interesting because local informationmayfavoragloballyincorrectpolarityassignment. The network, in all cases, is able to use feedback to correct ambiguous feedforward input in order to reach the correct assignment of border ownership. To our knowledge, this is the first time a border ownership network has been learned, enabled by the new conflict learning rule. Finally, we investigate the contribution of each component of conflict learning as it applies to learning the modulatory feedback connections in the border ownership network. Fig. 2.8 shows receptive fields taken from a vertically oriented BO pair for all variations of rules tested. The receptive fields were chosen to be exemplars of common failures (if they existed) for the various configurations. Histograms of polarity scores over all vertical BO neurons show typical network-level results. In Fig. 2.8I we compare the median score across all configurations, showing that conflict learning receives benefit from the amalgamation of all of its components. The results demonstrate that there is a non-linear relationship between the introduction of a rule component and its effect on the polarity score. However, we can still extract some general conclusions with respect to the polarity score: while unlearning on its own is very influential (C), the unlearning and spreading components complement each other and together (G) account for most of the improvement over Hebbian (A). The SLT component, by slowly transitioning the network to reflect long-term statistics, appears to have the effect of eliminating outliers and reducing the variance of the distributions (e.g., histograms B vs. F, C vs. E, and G vs. H). Additional discussion on the contribution of each component follows in the discussion (Section 2.6). 2.5.2 Orientation Selectivity We next apply conflict learning to a problem that can be seen as a baseline for self-organizing networks of the brain – orientation selectivity. The network, seen in Fig. 2.9, consists of an input 2.5 network modeling results 47 Figure 2.8: Contribution of rule components. (a-h) Representative re- ceptive fields from a vertical BO neuron pair taken from various configurations of conflict learning as well as Hebbian learning. Histograms depict the polarity scores of all verti- cal BO neurons for a given configuration, with the median denoted by a red line. Configurations are: (a) Hebbian, (b) spreading component only, (c) unlearning component only, (d) SLT component only, (e) conflict learning with- out spreading, (f) conflict learning without unlearning, (g) conflict learning without SLT, (h) full conflict learning. (i) Median scores for (a-h) with error bars indicating 95th per- centile cutoffs. Conflict learning (h) is significantly higher with respect to all other configurations (a-g). layer, a center-surround layer, and an output layer, like that used to demonstrate the properties of GCAL (Stevens et al., 2013). The connections between the input layer and the center-surround layer are fixed; all learning in this network takes place between the center-surround neurons and the output neurons. The network has no modulatory connections, such that the activation equation for neurons reduces to (2.1). The desired goal of learning in this 2.5 network modeling results 48 Figure 2.9: V1-like feedforward network. Center-surround layers per- form a difference-of-Gaussian like computation either pre- ferring the center (On-Off) or the surround (Off-On). The orientation selective layer receives input from both On-Off and Off-On neurons and form lateral connections within some radius. The connections between center-surround lay- ers and the orientation selective layer are learned. Figure depicts actual model responses from a learned network. network is to develop output neurons which are orientation selective over all possible input orientations. Detailed information on the network architecture, training, and experimental procedures are provided in 2.5.3.5. 2.5.2.1 Results The primary goal of this experiment is to demonstrate that the conflict learning rule, even when applied to networks lacking mod- ulatory feedback and compared against a learning rule tailored for such an environment, produces similar biologically consistent output. Fig. 2.10A shows the output neurons, for both learning rules, colorized by orientation selectivity after training on oriented bar input. The learned maps show an arrangement that mimics physi- ological maps of orientation selectivity in mammalian cortex (e.g., pinwheels, which are singularities where orientation preference increases clockwise or counterclockwise; see Chapman, Stryker, and Bonhoeffer, 1996). To quantify this subjective similarity, the pinwheel density metric of Stevens et al., 2013 is computed for the maps. A pinwheel density of π pinwheels per unit hypercolumn area (see 2.5.3.5) has been found to be consistent across a number of mammalian species (e.g., tree shrew, galago, cat, and ferret, see: Kaschube et al., 2010; Keil et al., 2012), and may be a fundamental 2.5 network modeling results 49 Figure 2.10: Orientation selectivity results. (a) Orientation maps for both both learning rules, colored according to the preferred orientation of each neuron. Pinwheel locations are deter- mined algorithmically and denoted by white circles. Both learning rules result in a biologically plausible pinwheel density within 3% of π. (b) Average selectivity for both learning rules while training with input data corrupted with increasing amounts of Gaussian noise. Selectivity is based on how well a neuron’s receptive field can be modeled by any Gabor function. (c) Stability for both learning rules as a function of learning iteration for a range of input noise values. Stability is based on the correlation between the current and final (iteration 20,000) maps. constant of map organization (Stevens et al., 2013). Both learning rules result in pinwheel densities within 3% of π. In testing conflict learning, we also observed noteworthy behavior when varying amounts of noise were injected into the input of the system. Fig.2.10alsoshowstheresultsofsimulatingtheorientation selective network for both learning rules under varying amounts of Gaussian noise applied to the input neurons (by adjusting their activation noise term ; see 2.5.3.5 for details). Fig. 2.10B shows an increased resistance to the effects of noise in the conflict learning results. Hebbian learning more quickly succumbs to a significant drop in the quality of learned receptive fields compared to conflict learning, which only begins to be affected by noise at very high standard deviations. The scoring metric for selectivity is based 2.5 network modeling results 50 on how well a receptive field can be represented by any possible Gabor function for all neurons in the network (Olshausen and Field, 1997). Real neurons are subject to many more sources of noise and variability than is present in our modeling, and handling that noise is a fundamental requirement for the nervous system (Faisal, Selen, and Wolpert, 2008). We discuss reasons why conflict learning is less affected by noise in the discussion section. Using the same stability metric as Stevens et al., 2013, we com- pare how similar learned receptive fields are at any given time to the final state of the network (Fig. 2.10C). Conflict learning reaches a higher plateau of stability at earlier iterations compared to Hebbian learning. As stability may be important for the devel- opment of downstream brain regions (Stevens et al., 2013), earlier stability could decrease the delay between a reliable orientation selective representation and further visual processing. Additional experiments comparing stability across a greater number of itera- tions did not show any appreciable difference in the time it took to reach stability or the final values. When looking at stability over increasing levels of noise, we again see a resistance to noise in conflict learning that only gives way at high standard deviations. 2.5.2.2 Additional Experiments Neural networks often have difficulty preserving features that are rarely seen or that have not been seen by the network for a long period of time. This phenomenon has been dubbed ‘catostrophic interference’ (McCloskey and Cohen, 1989). We performed an additionalexperimentusingtheorientationselectivenetworktotest the behavior of conflict learning as the input became increasingly biased. In the prior experiment, oriented edges are drawn from a uniform distribution. In this new experiment, the distribution was increasingly biased towards a single orientation. This bias both reduces the amount of training data for many orientations as well as making their appearance become increasingly infrequent. The dynamics of conflict learning tend to preserve rarely seen input, primarily through the interaction of the spreading and unlearning components of the rule. The spreading component makes it difficult for a neuron responding to a common stimulus to strengthen connections utilized by a neuron responding to a less common stimulus because it is unlikely that they will co-occur. Additionally, both neurons will actively force the unlearning of connections to their preferred stimulus type, making it unlikely that the other neuron will learn the same features. 2.5 network modeling results 51 Figure 2.11: Orientation selectivity results under bias. (a) Orientation and selectivity maps for both learning rules under differing amounts of input distribution bias. As bias increases, the input consists more and more of vertical lines. Conflict learning maintains a more even representation of orienta- tion and preserves less frequent orientations even as bias increases. (b) Effects of differing levels of bias on learned maps. The scoring metric is based on the distance between the distribution of learned orientation selectivities and a uniform one, with a lower score being closer to uniform. 2.5 network modeling results 52 2.5.3 Experimental Methods 2.5.3.1 Neuron Activation Details For all experiments, all model neurons use the same activation function regardless of learning rule. A neuron j has a continuous firing rate x based on integrating weighted inputs: x j =f( FF+Lat+(FB∗FF 2 )+ 1+Inhib ,θ j ) (2.25) where FF, Lat, and FB represent the sum of weighted inputs of all excitatory (weightw≥ 0) feedforward, lateral, and feedback inputs, respectively. Each sum is calculated as: P i∈type w ij x i , where w ij is the weight between neuronsi andj. Note that feedback is gated by feedforward input; it cannot activate a neuron in the absence of feedforward driving input. Inhib is calculated by taking the weighted sum of all inhibitory inputs from more strongly active neurons. isanoisetermsampledfromanormaldistribution:N(0,σ 2 noise ). f(x,θ) sets the output to zero if it is less than a threshold value. Thresholds are updated whenever a neuron is active and not inhibited: θ = s∗FF+((1−s)θ) FF≥θ ff and Inhib<θ Inhib 0 else (2.26) where s is a smoothing parameter, θ ff a threshold for considering a neuron active, and θ Inhib a threshold for considering a neuron inhibited. Thresholds are further bound between a minimum (θ min ) and maximum (θ max ) value. The minimum is set such that the noise term is unlikely to spuriously activate the neuron. 2.5.3.2 Learning In the experiments, each model neuron, under either learning rule, learns each type of connection (i.e. feedback, feedforward, and lateral) independently. Our experiments use a slightly modified version of GCAL (Bed- nar, 2012)wherethethresholdworksasdescribedin2.5.3.1, instead of a global target activation based threshold such as that described in Stevens et al., 2013. This change to the threshold resulted in better performance and easier system tuning for both of our experiments. The rule is otherwise the same, using purely Hebbian 2.5 network modeling results 53 logic, i.e. (2.4), to determine weight updates, and the activation function described above (2.5.3.1). The neurons using conflict learning operate as described in 2.4.1. 2.5.3.3 Simple Network Architecture The simple network of Fig. 2.1 is used to demonstrate the instability of variants of Hebbian learning when modulatory connections are present. The results for simulating the network for both weight re- normalization Hebbian learning and conflict learning are presented in Fig. 2.2. Additional simulations using the Generalized Hebbian Algorithm and the BCM rule are presented in Fig. 2.3. All connection weights are fixed to 1 with the exception of in- coming modulatory input to the competitive neurons, which adjust their weights using the learning rule being tested. Competition is implemented through lateral inhibition between neurons N 1 and N 2 . All tested learning rules use the same network parameters. The results are averaged over 30 simulations. Each simulation contains 100 presentations of input, which each consist of a uni- formly random modulatory input being active (activation set to 1) while the driving input to both competitive neurons is simultane- ously active (set to 1). The non-active modulatory input is set to 0. The network is exposed to each presentation for 100 iterations, followed by 10 iterations of all zero-valued input before the next presentation. State transitions are computed based on the state the network is in before and after the presentation of an input. A connection is considered strongly learned if it meets the con- ditions of the spreading component of conflict learning (2.8). 2.5.3.4 Border Ownership Network Architecture To analyze our learning rule in feedback contexts, we focus on a model of border ownership similar to that developed by Craft et al., 2007. The network is organized into four layers of cells arranged retinotopically: the input, orientation selective, BO, and grouping layer (Fig. 2.4). For the main BO experiments, we used a 40x50 grids of cells. As training time scales with network size, the smallest network that would still allow interesting stimuli to be presented was used. The network is given grayscale input. The orientation layer receives input from the input layer, and uses fixed log Gabor filters (Field, 1987), parameterized by θ gabor =π and f = √ π, to compute four orientation maps (0, 45, 90, and 135°) representing 2.5 network modeling results 54 a simplified V1-like layer. There are four orientation selective neurons per grid space, giving a total of 8,000 neurons. The output of each of the four angles in the orientation layer below provide input to two BO neurons at the same retinotopic location. These BO neurons are grouped into a column at each location with eight neurons, for a total of 16,000 BO neurons. The neurons within a column have inhibitory lateral connections, initially with equally distributed weight. BO neurons in a column have no concept of their physical position relative to any other neu- ron, nor their border ownership polarity (i.e. left/right, up/down, etc). Initially, without any learning for the lateral connections, neurons within a column are unaware of the neuron with which they will most directly compete to form a BO pairing. Each BO neuron provides feedforward input to all grouping cells within a radius r of its retinotopic position, and receives feedback from the same set of grouping cells. This radius r determines the scale of objects that can be handled by the network. Both the feedforward and feedback connections between these two layers are learned. The grouping layer is much more sparsely populated than both the input and border ownership layers, with roughly 1,000 neurons placed randomly using a Poisson-disc algorithm (Bridson, 2007). Finally, there are lateral connections between grouping neu- rons in a center-surround fashion, extending to 0.6r for excitation, 3r for inhibition. Training involves the repeated presentation of a moving square with length 10, chosen to be slightly smaller than the grouping neuron receptive field diameter 2r (see Figs. 2.4C and D) . Squares are given a random initial position, orientation, and scaled up or down by up to 10% in size. Once placed, squares move in a random linear path across the FOV until no longer visible. Each positioning of a square is presented for 10 time steps to allow the network to settle. The network is given a blank input for 10 time steps after the square is no longer visible. Training is terminated after 40,000 squares are presented, a sufficient amount to show a plateau in the polarity scoring metric, described next. For evaluation, we compute a polarity vector for each BO neuron, which represents the strength and preferred polarity direction of a neuron. The polarity vector is calculated as the sum of retinotopic vectors, each from the BO neuron to one of the grouping neurons providing it feedback (scaled by weight strength), multiplied by 1 or -1 depending on which side of the BO neuron’s orientation they fall. The median absolute difference between the magnitude of polarity vectors for BO neurons of opposite polarity is then 2.5 network modeling results 55 aggregated across all neurons of each orientation preference, to give the overall polarity score shown in Fig. 2.5. Significance is established with a Wilcoxon signed rank test. The polarity vectors are also used in Fig. 2.7, where the polarity vectors from neurons in the same column are weighted by activation and summed together to provide the resulting response vectors. Finally, to compare the Hebbian learning rule versus all possible variants of conflict learning, a smaller 30x30 network is trained for each configuration (Fig. 2.8), using the same methodology as the larger network. For each configuration, we use the vertical orientation as an exemplar, computing a histogram of polarity scores across all vertical BO neurons. The medians for each score are then compared and tested for significance with a Wilcoxon signed rank test. 2.5.3.5 Orientation Selective Network Architecture The orientation selective network (Fig. 2.9) has three layers: an in- put layer, a center-surround layer, and an orientation selective layer, like that used to demonstrate the properties of GCAL (Stevens et al., 2013). The center-surround layer consists of both on-off and off-on preferential cells. In order to avoid anti-aliasing issues and a bias towards diagonals caused by square pixels, the resolution of the input is scaled by some amount, s, for the center-surround convolution. On-off cells have a difference-of-Gaussians receptive field, with a sigma of 0.33s for the larger Gaussian and 0.4s for the smaller one. The receptive field for an off-on cell is the negation of an on-off cell. Orientation selective neurons receive feedforward input from a disc of center-surround neurons (on-off and off-on) within some radius r, initially with equally distributed weight. Learning these connections creates the orientation selective behavior of the neu- rons. Orientation neurons further have lateral connections in a center-surround fashion to promote grouping and competition. Ex- citation extends to 0.27r, inhibition to 0.73r. Center-surround and orientation selective neurons are placed randomly using the same Poisson-disc algorithm as used for grouping neurons in the border ownership experiment. The center-surround and orienta- tion selective layers have approximately 1,600 (800 + 800) and 3,200 neurons, respectively, depending on the randomness of the Poisson-disc algorithm. Training involves the repeated presentation of an oriented line segment spanning the width of the input layer. Lines are given 2.5 network modeling results 56 Parameter Value Description σ noise 0.01 Standard deviation of noise distribution. θ ff 4σ noise Threshold of driving input for neuron to be considered active. θ Inhib 0.2 Threshold of inhibition for neuron to be considered in- hibited. s 0.1 Smoothing factor for thresh- old update. θ min 4σ noise Minimum threshold value. θ max 0.5 or 0.7 Maximum threshold value. Larger value used for orien- tation selective network. η 0.01 or 0.001 Learning rate. Lower value used by Hebbian learning. β 1.0 Balances positive versus neg- ative learning for conflict learning. Table 2.1: Conflict Learning Parameter Listing a random initial position, orientation, and are translated across the field of view (FOV) in a random direction until no pixel of the line can be seen. Each position is held for 10 time steps, which is sufficient for the network to settle. The network is given a blank input for 10 time steps after the line is no longer visible in the FOV. Training is terminated after 20,000 lines are presented, a sufficient amount of time to maximize the selectivity score for a non-noisy network. Orientations are assigned to neurons by finding the best fitting Gabor function and taking its orientation and coefficient of de- termination (r 2 ) values, using the MATLAB library knkutils by Kendrick Kay. The orientation is used for the hue in generating the color maps, whereas the coefficient of determination is used for determining selectivity. Pinwheel density is computed on orienta- 2.6 discussion 57 tion maps using code adapted from Topographica (Bednar, 2015) using the methods described in Stevens et al., 2013. For the noise and stability measurements, noise is introduced by adjusting the standard deviation of the neuron activation noise term, , to σ noise in the input layer (see results Fig. 2.10 for noise values used). The noise score is the average of the r 2 coefficients across all orientation selective neurons. For stability, the scoring metric is identical to the metric used by Stevens et al., 2013. We perform a paired-sample t-test to test for significance. For the biased input experiment (Fig. 2.11), the rotation of the line stimuli is drawn from a normal distribution of mean π 2 and standard deviation σ bias . The network is evaluated by taking a histogram of learned orientation selectivities over 18 orientations, and comparing the bin sizes to the bin sizes of a flat histogram. The score is then computed as the sum of absolute differences between the ideal (flat) and actual (learned) histograms, with significance computed using a paired-sample t-test. 2.5.3.6 Parameter Listing and Source Code Key parameters for the learning and activation functions for both Hebbian learning and conflict learning are displayed in table 2.1. All parameters were tuned for each experiment to maximize per- formance with both rules in mind. The experiments were performed using a custom framework written in C++ explicitly for conflict learning, with some analysis ofresultsperformedusingMATLABorPythonscripts. Alllearning rules tested were implemented in this same framework. Source code is available on the website for conflict learning (Grant, Tanner, and Itti, 2016). 2.6 discussion Typically a learning rule is devised with a specific activation func- tion in mind, so it may not seem surprising that the Hebbian learning rule we compare against was unable to learn a model of border ownership dependent on modulatory connections. However, the orientation selective network, in which there is no source of modulatory input, served as a comparison of the two learning rules in a setting where the activation function was as expected by the Hebbian rule, yet was still compatible with conflict learning, which was designed around the presence of modulatory input. In Section 2.3.1 2.4.3, and 2.4.4, we demonstrated that unlike conflict learning, 2.6 discussion 58 for even a minimal network with modulatory connections, neither the normalized Hebbian rule, the Generalized Hebbian Algorithm, nor BCM are capable of stably learning modulatory weights. We suggest that this is because all of these variants of Hebbian learning are based on a core principle of associative learning, which alone seems incompatible with modulatory input. Our computa- tional experiments suggest that a synapse does not have enough information as to how a weight should be adjusted using only incoming activation compared with the output activation of the cell. Even learning rules like BCM, which control plasticity via an adaptive threshold based on expected activation, do not solve the problem, because they do not draw on any additional sources of information. We hypothesize that additional control signals are required to support modulatory connections, where the incoming activation may be coincident with the firing of the cell, but not relevant. Conflict learning uses two additional sources of informa- tion for these signals: the activation of strongly learned synapses within the cell, and inhibitory input driven by competing neurons. Strongly learned connections identify relevant firing, while inhibi- tion partitions firing by indicating that a neuron is losing a local competition amongst connected neurons. We demonstrated through computational models that using inhibition as a control signal results in a partitioning of correlated firing in modulatory input amongst competing neurons. Our results (e.g., Fig. 2.8) suggest that lowering activation through inhibition is insufficient to prevent unwanted learning from taking place - inhibition must actively drive the partitioning of modulatory input through unlearning. Additionally, we also demonstrated that restricting learning based on the activation of strongly learned connections results in a successful clustering of correlated firing amongst modulatory input to an individual neuron. This behavior is complementary to the partitioning performed by the inhibitory control signal, resulting in neurons which compete over correlated firing of incoming con- nections, regardless of whether they are sourced from driving or modulatory input. These two components of conflict learning, together with short- and long-term learning, will be discussed and related to experiment in detail in the following section of the discussion. Although it may be the case that a different learning rule could govern driving versus modulatory connections, we think there is some elegance in a single set of principles being compatible with both types of excitatory connections. Conflict learning does not 2.6 discussion 59 directlyaddresstheplasticityofinhibitoryconnections, whichlikely do operate with a different set of mechanisms. In fact, conflict learning cannot be used for learning inhibitory connections because of its reliance on inhibition as a control signal (see 2.4). 2.6.1 Analyzing the Rule The results demonstrate that for certain patterns of connections and firing, traditional Hebbian learning mechanisms are ill-suited for adapting synaptic weights. This was seen directly in learning a model of border ownership, where only conflict learning was able to properly learn the required modulatory feedback connections to perform the computation correctly. Additionally, conflict learning operates in a biologically consistent manner even in situations lacking these types of connections, with the pinwheel density of the learned orientation selective network matching biology as well as other learning rules. The orientation selective network experiments also show interesting properties with regards to increased stability and robustness to noise. All of these results are a product of the three complementary components that make up conflict learning, introduced in Section 2.4, which we now discuss in detail. 2.6.1.1 First Component - Spreading The first component of conflict learning states that neurons cannot strengthen their connection weights unless an already strongly learned connection is currently active. In the border ownership experiments, the spreading component helps prevent neurons of a border ownership pair from associating with grouping neurons on both sides of their oriented edge. While populations of grouping neurons on both sides are individually co-active with a BO neuron, there is little to no correlation between the firing of the distant populations themselves. A Hebbian neuron cannot detect this dis- tinction, whereas a conflict learning neuron can. This is illustrated most clearly in the learned receptive fields of the border ownership experiment, seen in Fig. 2.5, as well as by the simple network of Section 2.4.1. Spreading is similar to the concept of associative LTP (long- term potentiation), where the strong firing of a learned synapse supports the strengthening of a weaker one (Linden and Connor, 1995; Shouval, Samuel, and Wittenberg, 2010). The notion of strong synapses could potentially be encoded via synaptic tagging (Redondo and Morris, 2011). There has been discussion on the 2.6 discussion 60 spatial requirements (Engert and Bonhoeffer, 1997) as well as tem- poral constraints (Levy and Steward, 1983) of synapses involved in associative LTP, suggesting that it is both a spatially and a temporally local process. Since we do not model the physics of our synapses, we use only a temporal constraint. This means that once a neuron has begun to associate with certain connections, any further connections it strengthens must co-occur with the ex- isting ones, which forces connection weights to smoothly spread outward through feature space from an initially learned pattern. In situations where initial conditions allow competing neurons to learn the same set of connections (analogous to the 2SL-Shared state described in Section 2.4.1), the spreading component, if used without the unlearning component, would make it impossible for the neurons to disentangle their learned features. In Fig. 2.8B, the two BO neurons are correctly learning on only one side of the boundary, but have no mechanism to prevent them from learning and spreading to the same features. This effect is exacerbated when combined solely with the long-term statistics used by the SLT com- ponent (Fig. 2.8F), which compounds the mistaken initialization over time. Our method of labelling connections within a single neuron as strongly learned (2.8) is a simple abstraction intended to capture the behavior, but not the exact biological implementation, of such a mechanism. It has been demonstrated that the soma can back- propagate signals to its dendrites for the purpose of manipulating thresholds (e.g., Larkum, 2013) and that individual dendrites dis- play a wide array of active properties such that one synapse can affect the behavior of many others (Major, Larkum, and Schiller, 2013). Such mechanisms could also be responsible for the manipula- tion of a learning threshold affecting synaptic plasticity. Therefore the spreading component, in a real neuron, would likely be imple- mented through a variety of adaptable thresholds as opposed to the simple activation strength based product that we employ. In the context of modulatory connections, the spreading com- ponent is essential to enabling a neuron to identify a population hidden within the many correlated activations of its inputs. In networks without modulatory feedback, the spreading component gives increased resistance to the effects of noise (Fig. 2.10) by lessen- ing the impact of spurious activation as it is unlikely to consistently coincide with the strongly learned connections. 2.6 discussion 61 2.6.1.2 Second Component - Unlearning In conflict learning, inhibition, in addition to reducing the activa- tion of a neuron, causes the neuron to directly unlearn its active connections. This is in contrast to a typical Hebbian learning rule which still allows positive learning to take place, dependent on the activation. It is also distinct from examples of explicit synaptic weakening in BCM-like rules or STDP, which use activation or timing to control the unlearning. In conflict learning, a neuron can be strongly active but still undergo unlearning if its inhibitory input is high enough. In the border ownership experiments, inhibition primarily occurs between pairs of border ownership cells compet- ing over feedback from grouping cells on either side of their local oriented boundary. Unlearning helps correct mistaken assignments within a BO pair, ultimately resulting in a near even split along the polarity boundary (Fig. 2.5). Mistaken activation close to the boundary will be frequently contested and thus unlearned by both cells in the pair. There is significant evidence of complex interactions between inhibition and excitation in the brain. Wang and Maffei, 2014 found that inhibition controlled the sign of excitatory plasticity in rat visual cortex, which is remarkably similar to our unlearning component, via crosstalk between inhibitory and excitatory signal- ing. Fino et al., 2010 found that the presence or lack of inhibition could reverse the classic STDP window, causing either LTP or LTD (long-term depression) to occur. Additionally, in a recent review on inhibitory plasticity, Vogels et al., 2013 emphasize the increasing evidence that excitation and inhibition are deeply intertwined, with inhibition potentially providing a mechanism that allows selective learning to occur. Unlearning through inhibition allows one neuron to force another to unlearn common connections between the two, causing the inhib- ited neuron to return to an initial unlearned state, at which point it is possible to learn a different population of input. This was first seen in analysis of the simple network (2.4.2), where the unlearning component is the primary mechanism by which a 2SL-Shared state is made unstable. A consequence of this component is that when a neuron competes for features it actively weakens competitors, leading to a greater separation in feature space (weight values) be- tween the neurons (Fig. 2.8C). When combined with the spreading component in competitive groups of neurons (such as the mutu- ally inhibitory groups of neurons in a column), neurons learn in a smooth yet competitive fashion (Figs. 2.8G and 2.8H). The neurons 2.6 discussion 62 identify populations in the feature space and slowly expand their receptive fields until they have no more correlated connections to learn or they are faced with competition from another neuron. In the orientation selectivity experiment, unlearning enforces a greater difference in connection weight strength between the fea- tures learned by each neuron, meaning responses are more stable and higher levels of noise can be introduced without confusing the input pattern. 2.6.1.3 Third Component - Short and Long Term In conflict learning, all neurons have an externally visible short- term weight as well as an internal long-term weight. The two weights constantly pull on each other until they settle to the same value, with the rates at which they move towards each other con- trolling how quickly a neuron adapts its weights and how steadfast it becomes in its decisions. This short- and long-term learning, or SLT, allows neurons to quickly associate with populations in their input while remaining sensitive to long-term trends. In the border ownership network, this ability to be initially flexible but stable in the long run leads to more neurons learning significantly better separation along BO neuron boundaries (Fig. 2.8I). We found SLT to be especially beneficial for feedforward connections, where capturing long-term statistics is useful (e.g., BO feedforward connections). SLT, used alone, works essentially like Hebbian learn- ing (Fig. 2.8D), but when combined with the other two portions of the rule, leads to a significant improvement and consistency of learned receptive fields (Fig. 2.8H). This increased consistency can also be seen in Fig. 2.8E compared to Fig. 2.8C, which differ only by the inclusion of SLT. The physiological underpinnings of multi-timescale learning are notably discussed by Zucker and Regehr, 2002, who review the dynamics of short-term learning, Abbott and Nelson, 2000, who review synaptic redistribution and the interplay between short- and long-term potentiation, and Grossberg, 2013, throughout his extensive development of adaptive resonance theory. 2.6.2 Implications for Plasticity Our results, accompanied by physiological evidence for the mech- anisms we have described, suggest that similar mechanisms are likely used in the brain for the learning of modulatory connections. By acting as an error signal to instigate unlearning, inhibition can 2.6 discussion 63 dynamically alter plasticity and encourage diversification amongst competing neurons, and by requiring strongly learned connections to be active for learning, spreading allows for the detection of correlated clusters of activation within non-driving inputs. Our model of primary visual cortex shows that such mechanisms do not interfere with learning in more traditional contexts lacking modulation. We therefore predict that neurons likely have the key mechanisms of conflict learning: the ability to adjust their plasticity based on a concept of synaptic strength, and the usage of inhibition as a control signal for unlearning. These concepts could be tested in actual neurons with a series of simple experiments on single neurons. For all of these proposed experiments, we assume that a single neuron has learned a preferred stimulus such that it has increased synaptic strength towards the inputs associated with that stimulus. Inhibition is assumed to originate through interaction with other neurons (e.g., inhibitory interneurons Markram et al., 2004). Conflict learning predicts that inhibition has additional effects on plasticity if its presence lowers the activation without completely surpressing its firing. If inhibition serves as a signal that unlearning should occur, the strength of the synapses associated with the learned input should decrease when inhibition is applied simultaneously to the driving input. As noted in Section 2.6.1.2, there is existing evidence that this is indeed a potential role for inhibition. A classical Hebbian theory, such as any of the rules discussed in Section 2.3.1, or STDP, would predict no decrease in synaptic strength in such a situation. To establish the existence of a behavior similar to the spreading component, a new, independent source of input could be applied while artificially activating the neuron. Conflict learning predicts that the lack of activation of the already learned input will prevent or significantly impair the learning of the novel input. Existing Hebbian rules predict that synaptic strength towards the new input should increase unimpeded. Finally, the interaction of these two components could be tested in a combination of the two experiments. While driving the neuron via its preferred stimulus and supplying a sufficient source of inhibition, additionally apply a new, independent source of input. In this situation, conflict learning predicts that the neuron will not increase synaptic strength towards the novel input, even though it is presented simultaneously to the learned input. This prediction arises from the proposed role of inhibition, which in this situation would cause all active inputs to the neuron to have their synaptic 2.7 conclusion 64 strength decreased. A classical Hebbian rule here would predict that the synaptic strength to the novel input would increase. Furthermore, if inhibition is indeed a necessary component for learning modulatory connections, it follows that modulatory con- nections(andthusamajorityoffeedback)mustdeveloptomaturity alongside inhibition. The balance between excitation and inhibition is a drawn-out process controlled by experience (Froemke, 2015), and a potential additional reason for this delayed maturation could be explained by a dependence between inhibition and feedback. 2.6.3 Learning Border Ownership As mentioned in 2.5.1, the border ownership network architecture we present is not fully drawn from physiological observations. Our results do not definitively rule out that a Hebbian based rule, un- der some alternative network configuration, could reproduce the behavior of border ownership. However, given the prevailing theory that the computation of border ownership is dependent upon feed- back (Kogo and Ee, 2014), along with the argument presented in Section 2.3 demonstrating Hebbian learning’s incompatibility with modulatory connections, it seems unlikely that a Hebbian learn- ing rule could learn a feedback-based border ownership network. Additionally, through our experiences developing conflict learning, we believe that any network configuration compatible with purely Hebbian learning would be overwhelmingly complex and likely not support stimulus driven learning. As briefly discussed in 2.5.1, the network architecture used here, although only applied to border ownership, is not specifically tied to computingthisonefeature. Thenetworkhasnoaprioriinformation about borders or specific relationships between neurons. Border ownership is instead an emergent property of the network given competition over orientation responses coupled with higher level grouping. A deepened hierarchy composed of the same type of competition and grouping may potentially lead to the computation of higher level features more akin to proto-objects (for a discussion of proto-objects, see Heydt, 2015), and is a target for future work. 2.7 conclusion In developing conflict learning, we have shown how existing mech- anisms already found in the brain can interact together to provide substantial benefits in learning and allow the learning of modu- latory connections. We have demonstrated the effectiveness of 2.7 conclusion 65 conflict learning by showing, for the first time, how a model of bor- der ownership might be learned through experience. This new rule could additionally be beneficial for modeling many brain functions, including figure-ground segmentation, top-down attention, and ob- ject recognition, which may all benefit from top-down modulation. As we uncover more details of the mechanisms governing neural plasticity, models capable of incorporating this new information, such as conflict learning, become increasingly necessary. 3 AMBIGUITY 3.1 overview The previous chapter of the thesis developed conflict learning, a new unsupervised learning rule that enabled, for the first time, a feedback driven model of border ownership to be learned through experience. This chapter takes the fundamental ideas used in the construction of that model and formalizes them into a new neural network architecture called the competitive column. This column structure, when combined with conflict learning, enables the learning of features invariant to scale, rotation, and translation. The competitive column is used to create an invariant model of border ownership that is trained on simple shapes yet generalizes to multiplescalesandcomplexitiesofinput, includingademonstration on natural images. The trained network has on the order of 54,000 model neurons and 18 million synapses that participate in learning. A new notion of ambiguity is introduced that is core to the ability of the network to handle challenging input. 3.2 introduction Although some models for border ownership contain some degree of scale invariance, the model described and learned in the previous chapter had no built-in capability for explicit scale invariance. This can most easily be seen by looking at polarity assignments of the triangle input in figure 2.7E, where the corners of the triangle have dramatically lower responses. There has been very little effort to learn or explain scale invari- ance in neural models. Typically scale is addressed by using a feature pyramid, where the input itself is scaled and fed through the network at multiple scales before the resulting output is integrated into a single response (e.g., Dollár et al., 2014; Itti, Koch, and Niebur, 1998; Lowe, 1999). This technique is often useful but does not correlate to a plausible explanation for how the brain is scale invariant, and does not allow for much interaction between different scales. In object recognition systems, if scale is even considered, it is often handled by duplicating features at varying resolutions and then performing some kind of pooling or max operation to select a 66 3.2 introduction 67 single winner (e.g., Serre, Wolf, and Poggio, 2005), in what essen- tially becomes a scale pyramid over features. Another approach is to utilize a purely hierarchichal formulation, where larger objects are composed of pieces of smaller objects (e.g., Sabour, Frosst, and Hinton, 2017). The first contribution of this chapter is a solution for scale invariance that leverages both approaches and takes influence from the highly interconnected wiring of the visual system (Markov et al., 2014). The resultant model is a hierarchichal cascade with the addition of connections between every layer of the network, much like the concept of skip connections which has become popular in deep learning (He et al., 2016), an idea previously referred to as shortcut connections (Bishop, 1995). This is formulated into a model called the ‘competitive column‘. To address the lack of scale invariance in the border ownership model, it will be modified to support scale by adding new grouping cells with larger receptive fields, called proto object cells. Since BO neurons and grouping neurons have no apriori relation to each other, the receptive field size of BO neurons must also be increased to allow for reciprocal connections to develop to and from the proto object cells. The challenge is to ensure that neurons with larger receptive fields indeed learn scaled up features and do not use their increased pool of connections to learn features better suited for a neuron with a smaller receptive field size. An additional challenge is getting a useful interaction between the different scales. The second contribution of this chapter is the development of a new notion of ambiguity that dampens the activity of neurons that cannot reach a reliable concensus. In the figure-ground system of Layton, Mingolla, and Yazdanbakhsh, (2015), which is one of the few systems to use scale in a non-trivial fashion, larger scale responses inhibit smaller scale responses. The updated model here will utilize inhibitory feedback in the computation of ambiguity, which can ultimately inhibit the activation of a neuron in a similar fashion. These contributions will be demonstrated on an improved border ownership network which is highly invariant to scale, rotation, and translation. The network is trained on a small subset of shapes yet shows generalization to larger shapes with ambiguous local features. Additionally, the network is demonstrated on a small set of natural images, showing promising generalizability and applications for future work. 3.3 the competitive column model 68 3.3 the competitive column model This section details both the nextwork structure as well as the activation dynamics for neurons. The learning rule used by the neurons, conflict learning, is detailed in the previous chapter. The rule is unchanged other than some minor implementation details to allow it to work with inhibitory connections, which are represented by negative weights in the model. Supporting scale invariance requires two fundamental improve- ments over the model in the previous chapter: the first is the formalization of the competitive column and associated wiring, while the second is a new activation dynamic called ambiguity which is used to resolve locally ambiguous input. These changes are further supported by improvements to the neuron activation model. 3.3.1 The Competitive Column The organization of neurons into columns to support competition has long been a component of more biologically plausible models of the visual system (Fukushima, 1980). In the primary visual cortex neurons are arranged in perpendicular slabs such that neurons with with similar receptive fields but different orientation preferences are adjacent to each other (Blasdel and Salama, 1986; Hubel and Wiesel, 1974). In this sense the column is more of a conceptual grounding to describe organization. The column structure developed here, called the competitive column, is a way to organize neurons such that a diversity of features can be learned while maintaining invariance to rotation, translation, and scale. An illustration of the model is shown in figure 3.1. Much like the columns used by Fukushima, (1980), neurons here will be organized into a column if they are within some radius of each other. Though the mental model of a column often (and indeed here as well) has neurons stacked in a cylinder, the neurons need not actually be laid out like this. Competitive columns have winner-take-all dynamics such that only one neuron can be dominant in activation at a time. Neu- rons that lose out to more active neurons have their activation diminished but not extinguished. This property, which is seen in opposing polarities of actual border ownership responsive cells (Zhou, Friedman, and Von Der Heydt, 2000), is essential for the correct operation of conflict learning. Conflict learning utilizes the competition in the column to drive differentiation amongst 3.3 the competitive column model 69 L - 1 L - 2 Layer L L + 1 L + 2 Column Radius Lateral Radius Excitatory Feedback (Direct, Skip) Inhibitory Lateral Excitatory Feedforward (Direct, Skip) Inhibitory Feedback (Direct, Skip) Excitatory Lateral { N Neurons Winner-Take-All A B C Figure 3.1: The competitive column and associated network topology. A competitive column is an organization for neurons into local units of competition that receive driving feedforward, modu- latory lateral, and modulatory feedback input. A: Columns are created by wiring together neurons in a local neighbor- hood with mutual inhibition such that a winner-take-all like network is created. The winner-take-all dynamics are such that losing neurons are not fully deactivated as a re- sult of competition. Columns also have lateral connections to other neurons residing in other columns within a larger local neighborhood. B: A diagram showing the types of input a column receives. Feedforward and feedback input comes both from immediately adjacent layers in the hier- archy (direct) as well as distant layers (skip). Feedforward and feedback that trickles through layers in a cascade also ultimately affects the column by influencing the direct input it receives. C: A diagram showing how the column receives input from other layers in a hierarchy. Input is received from directly adjacent layers as well as from every distant layer. Input thus follows a pattern of coming from increasingly large receptive fields with increasingly weak weights. Input also arrives from within the same layer in the form of lateral connections. 3.3 the competitive column model 70 the learning of modulatory features. The model developed here bears some resemblence to the selective tuning attention model of Tsotsos et al., (1995) in that both utilize a hierarchical frame- work with winner-take-all dynamics. Thier model differs in that it uses a backward pass of these dynamics to select relevant or salient features to the uppermost winning feature, pruning other activations away, whereas the model developed here has constant winner-take-all dynamics in every column. Neurons withincolumns also have lateralconnections that extend outside the column to other nearby neurons, which are organized into other competitive columns. The weight distribution of these lateral connections determines the overall topology of a layer. Con- flict learning works by using inhibition as an error signal for a set of weights. The inhibition inside the column is used to drive differentiation over modulatory input. Inhibition received through lateral connections is used to drive differentiation over driving input, which is feedforward. The inhibition affects the neuron activation in the same fashion as that received from within the column; neurons can only be inhibited by those that are more active than them. In the previous chapter, an orientation selective model used center-surround lateral connectivity to cause neurons to develop a smooth pinwheel-like configuration analogous to that often seen in mamalian primary visual cortex. Jain, Millin, and Mel, 2015 demonstrated that the structure of lateral connectivity is key to differentiating between these V1 like maps and more dif- ferentiated ‘multi-maps’ where a multitude of different features can be learned. Thus the role of lateral connections in the competitive column model is to drive the overall configuration of the learned feedforward features of a layer. Columns receive feedforward and feedback input from every other layer in the network. As the distance increases, so does the receptive field size. This is accompanied by a similar reduction in weight, much like what is seen in real neurons (Markov et al., 2014). This is key to the learning of scale invariance. The network thus combines a typical hierarchichal cascade with the ideas of a scale pyramid. Neurons at deeper levels have large receptive fields over earlier levels, which lets them learn a mixture of ‘scaled up’ features along with more ‘parts like’ features composed of input from intermediary layers. Sincefeeedbackismodulatoryandweightsdecreasewithdistance, neuronscanexertagreatdealofcontrolovertheirreceivedfeedback by inducing neurons in adjacent layers to deactivate. A neuron that receives feedback from the next layer likely also supplies driving 3.3 the competitive column model 71 Figure 3.2: A classical locally ambiguous shape for the assignment of border ownership. The top and bottom corners of the c shape have identical local features to the concavity missing from the middle. Each region has three edges making a nearly closed convexity and has the same number of corner features supporting the interior of an object. Without ex- plicitly modeling the entire object, the assignment of edge polarities is one that requires a way of quantifying the local ambiguity of decisions and moving the network towards an unambiguous assignment. input to that layer, and this allows for interesting dynamics to occur between different scales in the network. This will be key to the operation of ambiguity, developed in section 3.3.2. 3.3.2 Ambiguity Border ownership is an example of a visual process that requires the resolution of locally ambiguous information to make a correct decision regarding edge polarities. Sometimes this ambiguity is due to a bi-stable representation of figure and ground, such as the famous Rubins vase (figure 1.10). More often, this ambiguity can arise as a consequence of local features of an object. Since the neurons that respond to border ownership have limited receptive field sizes yet must make a decision dependent on global context. However, the problem of resolving ambiguity is not one that can simply be pushed to a deeper level in a hierarchy or a larger scale; eventually there will be a decision between multiple choices where there appear to be several good decisions given the current state of the network. Thus a useful method of breaking ambiguity should operate at every level of computation, and indeed at every neuron. Further, thisissueisnotisolatedtotheproblemofborderownership but is rather a general problem faced when making decisions — 3.3 the competitive column model 72 there are often many competing choices that cannot be decided on without the influence of some additional factor (consider situations like deciding on a restaraunt to eat at or dealing with conflicting navigation suggestions while driving a car). A prototypical example of this problem for border ownership is the ‘c shape’, which has a concavity with the same local features as the top and bottom portions of the shape (illustrated in figure 3.2). Given the current model for border ownership, which uses grouping cells to collect local evidence of objectness, strong responses will be elicited both within the concavity as well as within the actual c shape. These will reinforce decisions made at the border ownership layer, and ultimately it will not be possible to make a reliable polarity choice along the concavity. As mentioned earlier, even deepening the hierarchy, as will be done here by the addition of proto-object neurons with larger receptive field sizes, will not break this ambiguity; at a slightly larger scale this object is still highly symmetric and ambiguous. The proposed way to solve this problem is to come up with a measure of how ambiguous the activation of a neuron is and then dampen its activity. In essence, the ambiguous neurons will cease to contribute to the state of the network, allowing unambiguous neurons to reach concensus. The border ownership neurons are ambiguous because the neurons for each polarity are receiving strong feedback from grouping cells on opposing sides of the object. However, this does not give an individual neuron the ability to decided it is ambiguous; consider that before these neurons even receive feedback, their common feedforward driving input will put them into a similar state of high activation which is resolved through competition and the contribution of noise to drive an initial winner. It is not until these neurons both become reinforced in their decisions through modulatory feedback that they can be said to be ambiguous. To give neurons a way to measure this individually, inhibitory feedback was introduced to the model. Neurons are expected to learn a preferred feedback stimulus (from a single polarity) and a non-preferred stimulus (from the opposite polarity). Although the model of Craft et al., 2007 already proposed using feedback inhibition, inhibitory feedback alone is both insufficient to resolve this problem and also unnecessary to compute border ownership in largely unambiguous inputs, as the previous chapter showed. It is the balance of excitatory and inhibitory modulatory feed- back that provides the best measure of whether a neuron is in an ambiguous state. A neuron receiving a high amount of both types 3.3 the competitive column model 73 of input is receiving information that neurons deeper in the hierar- chy think that it should both be highly active and highly inhibited. This is precisely what causes an ambiguous assignment of border ownership; high feedback from grouping neurons on competing polarities. Simply using the net contribution of the excitation and inhibition is insufficient to address the problem. Regardless of whether the inhibition contributes in a subtractive or divisive way to the acti- vation of the neuron, it still remains that both competing neurons within the column are receiving essentially the same excitation and inhibition. To resolve this, ambiguity is defined to be the minimum of the modulatory excitation and inhibition. This way, a neuron that receives high amounts of both is considered to be ambiguous, while a neuron that receives a high amount of either excitation or inhibition is unambiguous. Additionally, a neuron receiving no modulatory input is also unambiguous. Ambiguity can be summed up by: too many decisions leads to indecision. Ambiguity is detailed mathematically in section 3.3.3, and a demonstration of border ownership assignment on the c shape with and without a notion of ambiguity can be found in section 3.5.1. 3.3.3 Neuron Activation Amodelneuronj hasacontinuousfiringratexbasedonintegrating weighted inputs: x j =f(g( FF+(Lat∗FF 2 )+(FB∗FF 2 )+ 1+Inhib+Ambiguity ),θ fast j ) (3.1) where FF, Lat, and FB represent the sum of weighted inputs of all feedforward, lateral, and feedback inputs, respectively. Each sum is calculated as: P i∈type w ij x i , where w ij is the weight between neurons i and j. Note that feedback and lateral connections are gated by feedforward input; they cannot activate a neuron in the absence of feedforward driving input. isanoisetermsampledfromanormaldistribution:N(0,σ 2 noise ). g(x) is a gain control function that dampens any activation that exceeds 1.0 and also applies a gain normalization term γ: g(x)= min(x,1.0)+max(0.0,log 10 x) γ (3.2) 3.3 the competitive column model 74 The desired activation range for neurons is [0,1.0], though neu- rons are allowed to exceed this upper level of activation temporarily. The dampening function g(x) does nothing to activations within the nominal range but pushes over activation back towards 1.0. Over-active output is thus quickly quelled over several network iterations. The goal of the gain normalization term,γ, is to enable balanced winner-take-all dynamics to occur within a competitive column. It is essential that the activations of neurons within the column can be compared fairly, and this can only happen if the neurons are operating in the same range of activation. This is much the same effect as homeostatic synaptic scaling within real neurons (Turrigiano, 2008), and works similarily to the normalization model proposed by Heeger, (1992). To achieve this in the model, γ is set proportionately to the highest activation of any neuron in the column: γ = γ t−1 ∗max j x j ., j∈ column. Another effect of the gain normalization term is to effectively change the balance of driving versus modulatory weight in controlling the activation of a neuron. Since over activation through the modulatory input increases γ, after normalization occurs the modulatory input is potentially responsible for a larger proportion of the output. Note that gain control mechanisms like γ must be averaged over time to avoid rapid oscillations caused by changing activation values with each model iteration Heeger, 1992. A simple exponential average (see equation 3.3) suffices for this. f(x,θ) sets the output to zero if it is less than a threshold value. Thresholds are described in section 3.3.4. Thresholds are further bound between a minimum (θ min ) and maximum (θ max ) value. The minimum is set such that the noise term is unlikely to spuriously activate the neuron. 3.3.4 Thresholds Neurons have adaptive thresholds that control whether their input is high enough to trigger activation (Nicholls et al., 2001). Often a modelwill setneuron thresholdsto be arolling average ofactivation (e.g., Stevens et al., 2013), but there are disadvantages to such a mechanism. If a long period passes without a neuron receiving input, it can begin to lower its threshold and fire for non-preferred stimuli. This contributes to a network forgetting its learned weights when examples of certain features are sparse. Additionally, a typical threshold does not work well when the input to a neuron changes dynamically over a short period of 3.3 the competitive column model 75 time. As the model neurons in this work receive feedback and even feedforward inputs at different timesteps, it is essential that the threshold can capture this change in activation and still ensure the input is within some expected range. Thus a threshold needs to behave much like the weights in conflict learning — it needs a long-term stable value but also needs to be adaptable on a short timescale. To address both of these issues, a new threshold scheme was created that utilizes the short- and long-term weights in conflict learning. Neurons have three thresholds based on long-term ac- tivations x ltm j (i.e., activations calculated using the long-term as opposed to short-term weights) of their driving input: • θ max j : a rolling average of the maximum long-term activation • θ active j : a rolling average of turn-on long-term activation • θ decay j : a rolling average of sub-threshold long-term activation These three thresholds effectively classify the activation of the neuron into three states: an active regime, where the long-term activation exceeds θ active j , a sub-threshold regime, where the long- term activation is between θ active j and θ decay j , and a decay regime, where the long-term activation is less than θ decay j . These three thresholds, much like the long-term weights, are ‘hidden’ state, in that they do not directly control whether the neuron activates. A fourth, faster moving threshold, θ fast j , controls whether the neuron actually activates based on its short-term activation x j . Thresholds are updated using an exponential average function: avg(θ old ,θ new ,s)=(1−s)θ old +sθ new (3.3) which moves the threshold towards the new target using a smooth- ing factor s. The thresholds are updated as follows: When the neuron is in the active regime, θ max j is adjusted to avg(θ max j ,x ltm j ,s). θ fast j is adjusted to avg(θ fast j ,θ max j ,s). The smoothing factors are chosen such that these thresholds rise rapidly and fall slowly. For the sub-threshold regime, if x ltm j <θ max j , θ active j is adjusted to avg(θ active j ,x ltm j ,s). If x ltm j < θ active j , θ decay j is adjusted to avg(θ decay j ,θ active j ,s). The smoothing factors are chosen such that these adjustments occur slowly. 3.4 network construction 76 Finally,inthedecayregime,θ decay j isadjustedtoavg(θ decay j ,θ minimum ,s) and θ fast j is adjusted to avg(θ fast j ,θ active j ,s). The smoothing factors are chosen such that the passive decay rate is very slow, while the reset of θ fast j occurs quickly. The overall effect of these thresholds is a neuron that maintains a stable activation point through rapidly changing input. This is especially important given the short-term weights of conflict learning, which adapt as the neuron is firing. Due to the decay threshold, the neurons will not drop their thresholds quickly in the absence of input, and will only lower their threshold if exposed to long periods of sub-threshold activation. Since the thresholds are based on driving input only, neurons will not deactivate due to competition, which is critical for the proper function of conflict learning. This behavior of competing neurons receiving a non- preferred stimulus to still activate can be seen experimentally in border ownership neurons (Zhou, Friedman, and Von Der Heydt, 2000). This behavior allows two significant effects to take place with conflict learning. The first is that neurons within a column can learn the same driving input but differentiate on modulatory input. The second is that lateral interaction between columns can drive differentiation over entire columns, leading to different driving input preferences amongst different columns. 3.4 network construction This section details how the competitive column is applied to a model of border ownership. The construction and wiring of the network are first detailed followed by the training regimen that is used to learn features in an unsupervised fashion. 3.4.1 Network Construction and Wiring The network is constructed of up to 5 layers: an input layer, an edge response layer, a border ownership layer, a grouping layer, and a proto-object layer. The border ownership, grouping, and proto layers are parameterized by the number of neurons in their competitive columns. An overall schematic of the network can be seen in figure 3.3. Three topologies of networks are used in the experiments: 1G0P, which has one grouping neuron per column and has no proto-object layer, 1G1P, which has one grouping neuron per column and one proto-object neuron per column, and 4G1P, which has up to four 3.4 network construction 77 Input Edge Responses Border Ownership Grouping Proto Fixed Weights Learned Weights 8 1 4 or 0 1 or Figure 3.3: A schematic of the network configuration used for the exper- iments. Each layer of the network has the same dimensions, though the density and receptive field size tends to increase with layer depth. The input and edge response layers have fixed weights, as does the feedforward input to the bor- der ownership layer. The border ownership, grouping, and proto-object layers all perform learning. These layers are interconnected to each other through feedforward and feed- back projections. Each layer is parameterized by the number of neurons in its competitive columns, with different network topologies having different configurations. Layers show ac- tual responses for depicted input, with the polarity colored according to preferred orientation for the border ownership layer. grouping neurons per column and one proto-object neuron per column. The border ownership layer is always fixed to have exactly eight neurons per column. The input and edge response layers have fixed weights. Input is provided as a grayscale image that is held constant by the input layer until the next stimulus is presented. The edge responses are computed at four orientations (0, 45, 90, and 135°) by log Gabor filters (Field, 1987) parameterized by θ gabor = π and f = √ π. To reduce artifacts from edge filtering that can occur at small resolutions (e.g., shifts of edges), the network input is up-sampled by a factor of ten to be 800x800 pixels. The results of the filtering are then down-sampled back to the 80x80 network size. This gives a total of 25600 edge responsive neurons, with 4 for each location in the network. The border ownership layer consists of 80x80 columns, each containing 8 border ownership neurons, for a total of 51200 neu- rons. Each column receives driving feedforward input from every orientation at the same location in the edge response layer. The input from the 4 orientation selective neurons is duplicated within 3.4 network construction 78 the column such that there are two border ownership neurons for each edge response. The dimensions of the network are held constant across each layer, though beginning with the grouping layer, the density of neurons decreases. Additionally, beginning with the grouping layer, neurons are no longer arranged on a grid, but are instead placed randomly using a Poisson-disc algorithm (Bridson, 2007) which aims to maintain a minimum distance between any two neurons. This random placement means that the grouping and proto-object layers are populated by a target number of neurons, though the actual number may vary slightly. The border ownership layer is a slight exception to the others because its feedforward is fixed and its neurons are arranged on a grid, as opposed to random placement. This was done to ensure that the network could be trivially probed to determine polarity assignments; each border ownership column is known to have two neurons for each orientation, so comparing their activations and weight distributions directly provides the column’s polarity. For the 1G configurations, the grouping layer has a target of 25% of the density of the border ownership layer, with 1750 neurons. For the 4G configuration, four times as many neurons are generated as in the 1G configuration. For the 1P configuration, the proto layer is populated at a density target of 10% of the border ownership layer, with 750 neurons. The neurons in the border ownership, grouping, and proto layers are wired in a similar fashion: • Neurons receive feedback projections from every downstream layer. Feedback arrives from every neuron at the same retino- topic position within some radius in each layer. The radius used for adjacent layers, r s , determines the base preferred scale of the network. Every succesive layer doubles the size of the radius. • Neurons receive feedforward projections from every upstream layer in exactly the same fashion as feedback, with the radius doubling for successive layers. • Lateral projections arrive from every neuron not in the same competitive column on the same layer within the base scale radius. The incoming weights to each neuron are organized by their source (e.g., direct feedback, skip feedback, direct feedforward, etc.). Each group of weights is given some amount of uniform 3.4 network construction 79 initial weight as well as a maximum pool of learnable weight. The weight assigned to each group is set such that the further the source of the input, the less impact it has on the activation of the neuron. If the total weight within a group exceeds the maximum weight for that group, it is normalized back down to the maximum total weight. Columns are wired as follows: • Neurons are considered to be in the same column if they are within some radius of each other. This radius is set for each layer such that the number of neurons within a column hits a specific target. • Every neuron in a column receives an inhibitory projection from every other neuron within the column. The networks used in the experiments learn all feedback and column weights beginning with the border ownership layer. Feed- forward weights are learned starting with the grouping layer. For the 1G1P network, the above configuration leads to a network with around 54,000 neurons and 18 million synapses that participate in learning. 3.4.2 Training The network is trained by repeatedly exposing it to moving closed shapes. The scale of shapes shown to the network is described in terms of the preferred stimulus size of the grouping neurons; a shape with a scale of 1 is one that ideally matches a grouping neuron, a scale of 2 is twice its preferred size, etc. Shapes are always sized such that their height and width are whole multiples of the preferred scale. Shapes are drawn from a 2x2 shape generator which can create all possible shapes from 1x1 to 2x2 such that the shapes are a single connected component and contain no holes. For a 2x2 generator, this means that there are four total possible shapes: a 1x1 square, a 2x1 rectangle, a 2x2 corner, and a 2x2 square. After a shape has been randomly selected, it is given both a random orientation and random position. The position is chosen such that the centroid will be within the network input. A random direction is then picked, and the shape is repeatedly translated in that direction until no portion of it can be viewed by the network. Each position of the shape is presented for 13 iterations of the network, which provides ample time for the influences of feedback 3.4 network construction 80 2x2 Generator Valid Shapes Network Input Random Shape, Orientation & Position Random Translation Figure 3.4: A shematic of the training procedure for the border own- ership network. Shapes are generated from a 2x2 shape generator to produce all possible single component shapes with no holes. The generator yields shapes that are sized proportional the base preferred scale of the network. The Valid Shapes inset depicts all valid shapes that a 2x2 gener- ator can yield. These shapes are then repeatedly sampled from, given a random orientation, and placed randomly on the network. The shape is then translated in a random direction until it leaves the field of view. This process is repeated, with a blank stimulus in between selecting a new shape, until the network is converged. 3.5 experiments 81 to circulate through the network. A blank stimulus is applied for 13 iterations in between selecting a new random shape. All results presented in the next section are on networks for which 15000 random shapes were shown. An illustration of this training process is provided in figure 3.4. 3.5 experiments The competitive column and ambiguity are tested on various border ownership tasks using three network variants, with topologies as described in section 3.4.1. Networks were each trained as described section 3.4.2. For all experiments, the accuracy of a given border ownership assignment is computed by comparing a ground truth assignment against the polarity assigned by the network. If the polarity points to the same side (plus or minus 90°) of the ground truth, it is considered correct. 3.5.1 Ambiguity Figure 3.5 demonstrates the impact of ambiguity on deciding the border ownership of a 3x2 c shape. The network used for this example is the 1G1P network, which has both grouping and proto- object neurons. Ambguity was disabled for the ‘no ambiguity’ network by setting the ambiguity term in the activation equation 3.1 to 0. Although both networks initially have a large grouping response within the concavity of the c shape, the network with ambiguity dampens the activation of neurons that give rise to this activation. BO neurons along the concavity lower their activation, which in turn decreases the driving input to the grouping layer. This causes the grouping layer neurons that were receiving input from solely ambiguous BO neurons to fail to meet their thresholds and turn off. With these grouping neurons deactivated, competition within the previously ambiguous BO columns causes the polarity to shift to the now unambiguous interior of the shape, which has uncontested grouping activation. This process takes a few iterations to complete, but eventually an interior only activation of the c shape drives an unambiguous response from the BO neurons. The network without ambiguity has no way of resolving the polarity assignment of BO neurons along the concavity. The basic winner-take-all dynamics of the column will indeed be choosing winners each iteration, but without a way to prevent feedback from 3.5 experiments 82 90° 0° 180° 270° With Ambiguity Without Ambiguity t = 1 t = 2 t = 3 t = 4 t = 5 t = 6 Ground Truth BO Grouping BO Grouping Figure 3.5: The progression of border ownership assignment for a 3x2 c shape with and without ambiguity. The top of the figure shows the ground truth data, with edges colorized by the correct polarity assignment. The bottom portion of the image is split into two columns: with and without ambiguity. Each row shows the network state at a different timestep, beginning with the onset of the stimulus and ending with the first iteration of correct polarity assigment. At t = 2, the polarity assignment is driven by random noise and purely feedforward input. At t = 3, feedback begins to arrive from the grouping and proto-object layers. Note that the grouping neurons are initially responsive for the concavity of the c shape for both networks. The network with ambiguity shows dampening of ambiguous BO neurons starting at t = 3, when feedback input is received from both sides of each neuron. This dampened activity propogates to the grouping neurons at t = 4, which lowers the ambiguity of the BO neurons. At t = 6 the neurons are fully unambiguous and the correct assignment emerges. Without ambiguity, the assignment is both random and muted due to constant competition. 3.5 experiments 83 applying equal amounts of excitation and inhibition to competing polarities, no stable winner can emerge. Although not shown in the figure, proto-object neurons are also responsive to the input and supply feedback to both the BO neuronsaswellasthegroupinglayer. Sincethestrengthoffeedback decreases with distance, the majority of the feedback influence the proto-object neurons exert on the BO neurons is channeled through the grouping neurons. Since feedback is modulatory and the lowered activation of the BO neurons causes inactivation of some grouping neurons, any feedback the proto-object neurons applied to those deactivated grouping neurons no longer affects the previously ambiguous BO neurons. Thus not only does the grouping response converge with the unambiguous activation of BO neurons, but the proto-object response does as well. All layers of the network settle in an unambiguous state. There is some residual activation amongst the grouping neurons even after a non-ambiguous result is reached because neurons in a competitive column are not fully deactivated even when losing. This causes a small amount of input to reach the grouping layer from the incorrect polarity BO neurons around the concavity, which may be enough to turn on some grouping neurons. The long-term thresholds of the neurons need to be such that the neurons will turn on from the initial driving feedforward stage, in which it can be expected that half of the neurons are randomly incorrect. Thus the thresholds are often low enough to see some activation at this stage. This residual activation causes a slight oscillation in activation amongst the ambiguous neurons (due to increased ambiguity), which eventually is removed from the network by two factors: the dampening function (equation 3.2) and the fast moving component of the threshold (see section 3.3.4). 3.5.2 Invariance To study the scale invariance and ability to generalize to challenging shapes, the networks were tested on shapes sampled from a 4x4 shape generator. The trained networks were only ever exposed to shapes from a 2x2 generator during training, so an overwhelming majority of the testing input was novel to the network. Shapes generated from a 4x4 generator can be categorized by the strictest subset generator that could have created them. This is done by looking at the maximum of the shape’s width or height. For example, a 3x2 rectangle would be classified as scale 3, whereas a 2x2 corner would be scale 2. Using this scheme, scale 1 has 1 3.5 experiments 84 shape, scale 2 has 3 shapes, scale 3 has 40 shapes, and scale 4 has 1855 shapes. It should be noted that as the scale of the shape increases, so does the potential complexity of the shape. Scale 4 shapes can range from a simple 4x4 square to a snaking path with many locally ambiguous regions. For evaluating the performance of the networks on these shapes, the shape was presented in the center of the network with no rotation. The polarity assignments were recorded after 13 iterations of the network, which is the same number of iterations a shape is presented for at a particular location during training. Figure 3.6 shows the average polarity accuracies for each scale of shapes on all network configurations. All networks had essentially idealperformanceonobjectsfromthesamescaleatwhichtheywere trained. However, as the scale increased, performance decreased for the network that lacked the proto-object layer (1G0P). The 1G0P network degrades in performance because its grouping neurons, which have a preferred stimulus scale of 1, are unlikely to activate as objects grow beyond this size. From the perspective of a grouping neuron, the stimulus goes from a fully closed contour to some portion of the presented object. While, depending on the learned thresholds, a grouping neuron may still activate for a corner or three-sided end to a shape, as the scale increases it becomes increasingly likely that the grouping neuron is only exposed to a linear contour which will fail to excite it above its threshold. Since the network additionally lacks proto-object neurons which have larger receptive field sizes, it ultimately cannot generalize as well to larger input. In the networks that contain a proto-object layer, the proto- object neurons have preferred stimulus sizes that are at least of scale 2, which gives the network a greater deal of scale invariance. proto-object neurons receive feedforward projections from both the border ownership layer as well as the grouping layer, so their behavior is not simply to respond like a larger grouping neuron. The interaction of feedforward and feedback between the grouping and proto object layers also enhances scale invariance by allowing mutual reinforcement of activations. The 4G1P network differs from the 1G1P network only in that it has considerably more grouping neurons, which are arranged into competitive columns. This causes the grouping neurons to compete over modulatory feedback from the proto object layer much the same way border ownership neurons do from the grouping layer. The increase in grouping neuron density also causes some extra lateral competition which causes differentiation in learned 3.5 experiments 85 feedforward features - some grouping neurons will learn features more receptive to corners or other subsets of a larger object. Since the overall topology of the network is the same as in the 1G1P network, not much difference should be expected between the two networks, which is confirmed by the average polartiy accuracies. There is, however, a small bug significant increase in score for the largest scale, which is likely due to the ability of the 4G1P network to have its grouping neurons excited by smaller portions of objects. Figures 3.7 and 3.8 show polarity assignments taken from a variety of shapes using the 1G1P network. These figures show all of the shapes possible from a 2x2 generator, but only a small subset of scale 3 shapes and a fraction of scale 4 shapes. As seen in figure 3.6 performance degrades slightly as scale increases, which is largely because the increased scale makes it possible for the generator to create shapes that have multiple competing ambiguities. Since the network was never exposed to such examples during training, it is limited in its ability to resolve such ambiguities. However, in many cases it is still able to resolve a large number of difficult assignments. 3.5.2.1 Detailed Rotation, Scale, and Translation Invariance Although the scores in figure 3.6 capture the overall trend of the network to be invariant, it is useful to look at more detailed metrics of invariance. The network is evaluated on two example shapes of differing base scale and complexity to give insight into the network’s invariance to scale, translation, and rotation. All results in this section are from the 1G1P network that was trained on shapes from a 2x2 generator. A 1x1 square shape and a 3x2 C shape were presented to the 1G1Pnetworkwithvarioustrasformationsapplied. Totestrotation, the shape was placed in the center of the network and then rotated in continuous steps up to 360°. As with previous tests, results were taken after letting the network settle for 13 iterations. A blank stimulus was applied in between each rotation to prevent any memory of the previous rotation from influencing the result. The results can be seen in figure 3.9. Due to the way the network was trained with random rotations of generated shapes, the network is fully rotation invariant. This is largely due to the nature of grouping and proto layer neurons, which optimally respond to a closed contour in their receptive field. To test scale invariance, the shapes were presented to the center of the network and scaled down by a factor of 0.5 up to a factor 3.5 experiments 86 Object Scale Average Polarity Accuracy 1 2 3 4 0 0.2 0.4 0.6 0.8 1 1G0P 1G1P 4G1P ** ** ** ** * ** ** Figure 3.6: A histogram of network performance against different scales of generated shapes. All shapes from a 4x4 generator were testedandplacedintoscalesbythestrictestsubsetgenerator that could have created them. The networks with the proto- object layer (1G1P, 1G4P) show significant improvements in average polarity accuracy compared to the network without (1G0P) for all scales 2 and larger. Two asterisks indicates extremesignificance(pvalue<1e−5)whencomparedtothe 1G0P model. The 4G1P model is slightly but significantly (p value < .05) better than the 1G1P model on scale 4 objects. Tests detailed in text. 3.5 experiments 87 A 100% 100% 100% 100% 89.8% 94.0% 96.7% D E F G H I J 90° 0° 180° 270° 100% 100% 100% B C Figure 3.7: Sample shape responses from the 1G1P network on shapes up to scale 3. The input to the network is shown as a black outlined shape with arrows representing the network assigned polarity overlaid. Shapes A through D are the only shapes that the network was exposed to during training and are created by a 2x2 generator. All shapes starting at E are scale 3. Each shape is accompanied by the percent- age of border ownership neurons that had correct polarity assignments when the result was probed. 3.5 experiments 88 100% 100% 97.4% 90.0% 98.8% 81.7% 90.7% 85.6% A B C D E F G H 90° 0° 180° 270° Figure 3.8: Sample shape responses from the 1G1P network on shapes of scale 4. Scale 4 shapes are not only considerably larger than shapes the network was trained on, but can contain very complex and locally ambiguous arrangements of fea- tures. The ability of the network to resolve ambiguity is diminished as the scale increases or if multiple portions of the input are ambiguous. Each shape is accompanied by the percentage of border ownership neurons that had correct polarity assignments when the result was probed. 3.5 experiments 89 0 0.5 1 0 0.5 1 Average Polarity Accuracy Orientation in Degrees 1x1 Square Shape 3x2 C Shape 0 45 90 135 180 225 270 315 337.5 Figure 3.9: Orientation invariance for a 1x1 square shape and a 3x2 c shape on the 1G1P network. The average polarity accuracy of the network is plotted as a function of the rotation of the shape. of 2.0. Unlike the shapes yielded from the shape generators, these shapes were allowed to have fractional sizes. A blank stimulus was applied between each successive scale. The results are presented in figure 3.10. For simpler shapes, the network is scale invariant over a wide range of object scales. As the complexity of objects increases, such as with the C shape, explicit activation dynamics driven by ambiguity are required for an accurate polarity assignment. This is more sensitive to changes in scale since there is a limited range in which the grouping, proto, and border ownership neurons can interact with each other. However the network still displays an impressive band of high accuracy and does not catastrophically fall off. Finally, to test translation invariance, the shapes were positioned at121locationsandhadthepolarityrecorded. Shapeswereinitially presented with their centroids in the upper left corner of the network before systematically traversing the network horizontally and vertically for each sampled location, as seen in figure 3.11. The training regime translates shapes all over the network, so it is unsurprising that the final network shows translation invariance. So long as the shape can be fully displayed within the network, the resulting polarity assignment is accurate. There is some fall off at 3.5 experiments 90 Average Polarity Accuracy Scale Multiplier 0 0.5 1 0.5 0.75 1 1.25 1.5 1.75 2 0 0.5 1 1x1 Square Shape 3x2 C Shape Figure 3.10: Scale invariance for a 1x1 square shape and a 3x2 c shape on the 1G1P network. Presented shapes were scaled from half to twice their original size. Presented sizes thus in- cluded many fractional amounts of the preferred scale of the network, to which it was never exposed in training. The average polarity accuracy of the network is plotted as a function of the multiplier applied to the shape scale. A vertical gray line indicates the base scale for the shapes. 3.5 experiments 91 +50% 0 -50% 0 1 0.8 0.6 0.4 +50% 0 0 0.6 0.4 1 0.8 A B -50% +50% +50% Average Polarity Accuracy Average Polarity Accuracy X Offset Y Offset X Offset Y Offset 1x1 Square Shape 3x2 C Shape Figure 3.11: Translation invariance for a 1x1 square shape and a 3x2 c shape on the 1G1P network. Presented shapes were trans- lated such that their centroid was placed in 121 sampled locations spanning the entire network input. The average polarity accuracy is plotted as a function of the offset of the centroid in terms of network size. The upper graphic is for the 1x1 square shape and the lower graphic is for the 3x2 c shape. 3.6 discussion 92 the edges of the network that are due to specific implementation details. The trained networks used for all tests were sized such that no artifcat of this could affect the results. 3.5.3 Natural Images Natural images represent a substantial step up in complexity com- pared to the procedurally generated shapes used in the previous experiments. In this experiment, the network is given a sample of line drawings of natural images from the Berkeley Segmentation Data Set (BSDS500, Arbelaez et al., 2011). These line drawings were annotated by human subjects. The 1G1P network is used for all natural image examples, with a minor adjustment made to neuron thresholds. Since this network was trained on artificially generated shapes, its thresholds have adapted to the expected activation driven by the Gabor responses on these shapes. The natural images have entirely different statis- tics and edges that may not correspond to the thickness of the artificial shapes. To address this, the activation thresholds of all neurons are decreased towards their decay thresholds, a process which would naturally occur given enough exposure to new input (see section 3.3.4 for details on how the thresholds work). Results are shown in figures 3.12, 3.13, 3.14, and 3.15. These images showcase examples with many features the network has neverseeninitstraining: curves, scalesthatarenotwholemultiples of the preferred stimulus size, holes, and occlusions. As such the network does make some mistakes and is unable to give an unambiguous response for many of the contours, but the overall outlook of the responses is highly favorable. 3.6 discussion This model, combined with the previously developed conflict learn- ing rule, provides a glimpse into how feedback, modulation, and inhibition may shape learning. Although recent efforts have made some progress towards understanding some aspects of feedback processing (e.g., Lillicrap et al., 2016), a fundamental question remains of how feedback influences activation and learning. The competitive column and ambiguity as applied to a model of bor- der ownership give some intuition that may be useful in probing the understanding of the brain and in developing more powerful models of object recognition. Although it is unsupervised, conflict learning allows the model to have a built in teaching signal, which 3.6 discussion 93 90° 0° 180° 270° Figure 3.12: Border ownership assignment on an image of an airplane. Upper portion of figure shows the full color natural image while the bottom shows the ground truth line drawing overlaid with the border ownership assignment generated by the network. The network has never previously been exposed to curved contours or even shapes that did not match a whole multiple of its referred stimulus size. 3.6 discussion 94 90° 0° 180° 270° Figure 3.13: Border ownership assignment on an image of a bear. Upper portion of figure shows the full color natural image while the bottom shows the ground truth line drawing overlaid with the border ownership assignment generated by the network. 3.6 discussion 95 90° 0° 180° 270° Figure 3.14: Border ownership assignment on an image of a car. Upper portion of figure shows the full color natural image while the bottom shows the ground truth line drawing overlaid with the border ownership assignment generated by the network. Note that this object contains several ‘holes’, or enclosed regions within the overall sihouette, which are an entirely novel input. 3.6 discussion 96 90° 0° 180° 270° Figure 3.15: Border ownership assignment on an image of horses. Upper portion of figure shows the full color natural image while the bottom shows the ground truth line drawing overlaid with the border ownership assignment generated by the network. Note that this image contains two objects at vastly different scales, with the smaller horse occluding the larger one. The network has never been exposed to occlusion. 3.6 discussion 97 as backpropogation based approaches have shown, is an excellent way to assign blame and modulate learning. 3.6.1 Towards Proto Objects The competitive column architecture encourages competition and a partitioning of learned features at every level of the hierarchy. Figure 3.16A shows a preresentative set of feedback receptive fields for a single border ownership column. The learning rule, combined with the column architecture, causes neurons to learn conjunctions of high and low level features - in this case, orientation selectivity combined with polarity preference. While this effect is most noticeable at the border ownership layer, especially due to their structured feedforward input, it is also evident at deeper layers of the network. The new layer of neurons added to the network to support scale invariance is called the proto object layer. The goal of these neurons is to provide a grouping mechanism over a larger receptive field size. If these neurons only received input from the border ownership layer, they may have acted just like grouping neurons and learned large annular receptive fields. However, proto object neurons also receive direct input from the grouping neurons, which gives them interesting visual features, seen in figure 3.16B-E. While it is difficult to give a precise description of their preferred stimulus, proto object neurons learn conjunctions of mid level (grouping) and low level (border ownership) features. This can be seen especially in 3.16C-E, which appear to learn a general large scale surround (from direct border ownership input) with a preference for more localized input (from grouping input). The features learned here are dependent on the visual experience of each neuron as well as the competition it receives from between-column lateral connections. Some neurons, such as 3.16A, happen to learn their grouping input in alignment with their border ownership input, giving the perception of a large annular receptive field. This use of the term proto object is similar to that used by Van der Heydt and others, though it differs somewhat in that Van der Heydt treats the grouping neuron response as a proto-object (Heydt, 2015). Inthe1G1Pmodel, thegroupingneuronsrespondto annular, convex configurations of border ownership neurons, which are very general responses that have little unique relation with any particular object. The 4G1P model, however, learns grouping features that are more in line with parts or subcomponents of 3.6 discussion 98 A B C D 180° 270° 0° 90° E F G H I Figure 3.16: Example receptive fields of neurons in competitive columns of the 4G1P model. A: The learned feedback of a single border ownership column. Eight neurons are depicted, with two for each orientation. The neurons learn to compete over polarities for each orientation. Green indicates an excitatory connection and red an inhibitory connection. Feedback is learned from both the grouping and the proto object layer. B, C, D, E: The learned feedforward of four proto object neurons. The proto object layer has columns with single neurons. Feedforward input is colored based on the preferred polarity of the border ownership neuron that supplied the input. Input comes from both border own- ership and grouping neurons - any input from a grouping neuron is colored by tracing its feedforward inputs to the border ownership layer. F, G, H, I: Learned feedforward and feedback connections for four grouping neuron columns. Column size is dependent on the random distribution of neurons in the grouping layer, so some columns have fewer than four neurons. Top row in each inset depicts learned feedforward, colored as in the proto object columns. Bot- tom row depicts the learned feedback from the proto object layer, colored as in the border ownership column. Best viewed digitally, zoomed in. 3.6 discussion 99 objects, and are a better match to what the name ‘proto object‘ implies. This specialization into subcomponents can be seen in figure 3.16F-I, which depict grouping columns of varying sizes and the learned feedforward and feedback receptive fields. In cases where a column contains a single neuron, there is little competition over feedforwardandnoneoverfeedback, sotheneuronlearnsanannular receptive field and associated feedback. This is the most general grouping feature of a border ownership network and a prototypical grouping neuron response. In the cases where there are multiple neurons, competition drives a differentiation over both feedforward and feedback features. In figures 3.16G-I, the learned feedforward receptive fields initially learn very similar distributions before some of the neurons specialize. This specialization is driven by both a partitioning of feedback, much like in the border ownership neurons, as well as between-column lateral competition over driving feedforward input. Figures 3.16G and H illustrate columns where some neurons have begun to specialize both their feedforward and feedback receptive fields, while other neurons still retain features similar to a neuron without competiton, such as that seen in 3.16H. In figure 3.16I the neurons are beginning to show specialization over feedback while maintaining similar feedforward preferences, much like the way the border ownership neurons operate. Unlike the border ownership neurons, the feedforward here is learned. Thus the competitive column architecture show the ability to replicate the same behavior that was forced in the border ownership neurons by fixing the feedforward input; a replication of driving input with diversification amongst modulatory input. This increased differentiation of feedforward and feedback input occuring with increased competition is a hopeful sign for future expansion of the model. The experiments also demonstrated that there is likely a benefit to having increased representation within the network for generalization to larger scales and more complex features. 3.6.2 Generalizability Remarkably, the network shows a capability to generalize from the simple 2x2 shapes it was trained on to the complex contours of these natural images, as seen in section 3.5.3. In modern deep convolutional networks, the ability of a network to generalize to unseen classes of input is a fundamental problem, which requires retraining and risks ‘catostrophic forgetting’ (e.g., Kirkpatrick et 3.6 discussion 100 al., 2017). The network used here, with three active layers, is still quite shallow by deep learning standards, which can be hundreds of layers deep (He et al., 2016), or by biology, which suggests up to ten layers involved in the visual hierarchy (Felleman and Van Essen, 1991; Markov et al., 2014). Perhaps some ability of the network to generalize to larger shapes and images is due to this shallow nature and lack of features tied to overly specific input (as is blamed in some deep networks, see e.g., Long et al., 2015; Sun, Feng, and Saenko, 2016), but our experiments actually suggest that the invariance and generalizability of the network should increase as more layers and competition are introduced. It remains to be seen if the competitive column approach will generalize to fundamentally different tasks outside of what it has been demonstrated on, i.e., tasks other than orientation selectivity or border ownership. This type of adaptability is one of the great strengths of deep learning(Neyshabur et al., 2017). Though the competitive column model and conflict learning were primarily designed around solving orientation selectivty and border owner- ship, the methodology underlying them is general; it is purely the statistics of input that drive the learning and emergent behavior. 3.6.3 Biological Implications The proposed mechanism of ambiguity is entirely hypothetical. Yet this method is not outside the realm of what is plausible for a neuron to compute: it is based on local computation and operates in a similar fashion to divisive inhibition. Given that the general model fits in with current ideas of border ownership, it is reason- able to hypothesize that a similar mechanism to ambiguity exists in real neurons. The challenge of finding neurons that compete over similar driving features with different modulatory feedback influences is already solved by directly studying the behavior of border ownership selective neurons (Williford and Heydt, 2016). Looking at the responses of these neurons to ambiguous input such as the c shape could yield similar activation dynamics to that demonstrated in the model, though it may potentially require disabling feedback from areas that are too distant. As the hierarchy deepens, the likelihood of having a more complete representation that can shortcut the ambiguity increases. If such a mechanism exists, there may be a slight oscillation of activity between the border ownership neuron and the downstream source of feedback. In practice this would likely be a very difficult experiment to carry 3.6 discussion 101 out, and it may not be possible to isolate the source of modulation directly. 3.6.4 Contemporary Approaches The competitive column model presented here has many similarties with two upcoming models of cortical processing: the capsule model of Sabour, Frosst, and Hinton, (2017), and the column based model of Hawkins, Ahmad, and Cui, (2017). Common to all three models is a notion of creating complex responses to input that bind to multiple features. In this work, edge responsive features are bound to polarity information, in Sabour, Frosst, and Hinton, (2017), features associated with numeric characters are bound to location information, and in Hawkins, Ahmad, and Cui, (2017), sensory features are bound to allocentric location signals. The capsule model learns using supervised backpropogation of error, following a long established path of artificial neural networks. The capsule model, however, does not truly feature recurrent or modulatory processing, thoughitdoeshaveanotionofpredictionandalignment between successive layers of capsules. The competitive column model and the model of Hawkins, Ah- mad, and Cui, (2017) are considerably more similar. Both feature an architecture centered around the abstract notion of a cortical column, and both use modulatory input sourced from lateral and feedback connections. While the model here places an emphasis on the influence of modulatory feedback, Hawkins’ model is more focused on lateral connectivity. A primary difference, aside from the learning rules and activation dynamics, is that the competitive column model learns conjunctions of features on its own. Hawkins’ model requires that the location signal, which is ultimately bound to learned features, be presented along with sensory input. Though they present an argument for this being a fundamental computation of a column, it still remains as an apriori signal for the model as it currently is implemented. Additionally, their model does not explain how features within a column can come to have similar feedforward receptive fields while differentiating over modulatory input; the competitive column model is capable of doing this, as seen and discussed in section 3.6.1. Ultimately all three models are exploring a more fundamental role for column-like organization and contain dynamics that differ greatly from established feedforward artificial neural networks. Hopefully these dynamics, particularly those of modulation, will 3.7 conclusion 102 enablebettermodelsofcorticalprocessingandgivegreaterintuition to how the brain works. 3.6.5 Future Applications and Enhancements The current implementation was written in C++ for multi-core CPU simulation. To make the approach more amenable to real world problems, an obvious avenue of future effort is to formulate a convolutional approach supported by GPGPU processing. This presents challenges as it remains an open question whether the detailedrelationshipsencodedinthemodulatoryfeedbacktoborder ownership neurons can still be represented with a weight sharing approach. Another future effort would be to extend the hierarchy and work towards object recognition. The current system is promising for this line of research, already displaying features that are in line with the idea of proto objects and showing a high amount of generalizability. 3.7 conclusion This chapter took the foundations set in the development and testing of conflict learning and created a framework built around using the learning rule within the structure of competitive columns. Combined with a novel way of measuring the ambiguity of a neuron, this enabled the construction of a border ownership network that is highlyinvarianttoscale, rotation, andtranslation. Thecompetitive column, taken together with conflict learning, show promise in their ability to generalize beyond the initial input a network is exposed to. The results suggest that increased competition combined with an increase in hierarchy depth could lead to the learning of exciting features and applications beyond border ownership. 4 CONCLUSION This thesis was framed around an early but challenging compu- tation of the visual system: border ownership. Border ownership served as the perfect playground to investigate the effects of feed- back processing, modulation, and inhibition on learning. Much like orientation selectivity is for feedforward processing, border owner- ship could be the prototypical task to demonstrate the effectiveness of a model of modulatory processing. Solving the problem of how something like border ownership could be learned in an unsupervised fashion led to the development of conflict learning. Conflict learning demonstrated that purely associative Hebbian learning rules are unlikely to work during the activation dynamics seen in networks with modulatory feedback. The new learning rule enabled a model of border ownership to be learned for the first time, with minimal assumptions, through direct visual experience. Conflict learning served as the backbone for the development of the competitive column and ambiguity. The competitive column organization was used to develop an enhanced model of border ownership that showed a high degree of scale, rotation, and trans- lation invariance. Perhaps more importantly, the developed model showed a high amount of generalizability within the same task to novel shapes and even a sampling of contours from natural images. Its generalizability to different tasks other than border ownership and orientation selectivity remains to be seen, but the minimal assumptions underlying its design suggest that it may have wider applicability. Ambiguity, which can be encapsulated by the simple concept of ‘too many decisions leads to indecision’, provided a novel way for the network to detect locally ambiguous input and reach a concensus that was unambigous. These contributions of conflict learning, the competitive col- umn, and ambiguity give a better intuitive understanding of how feedback, modulation, and inhibition may interact in the brain to influence activation and learning. Additionally, they provide a promisingavenueoffutureresearchforadvancingobjectrecognition and other visual tasks. 103 BIBLIOGRAPHY Abbott, Larry F and Sacha B Nelson (2000). “Synaptic plasticity: taming the beast.” In: Nature Neuroscience 3, pp. 1178–1183. Arbelaez, Pablo, Michael Maire, Charless Fowlkes, and Jitendra Malik (2011). “Contour Detection and Hierarchical Image Seg- mentation.” In: IEEE Trans. Pattern Anal. Mach. Intell. 33.5, pp. 898–916. issn: 0162-8828. doi: 10.1109/TPAMI.2010.161. url: http://dx.doi.org/10.1109/TPAMI.2010.161. Baluch, Farhan and Laurent Itti (2011). “Mechanisms of top-down attention.” In: Trends in Neurosciences 34.4, pp. 210–224. Bar, Moshe, Karim S Kassam, Avniel Singh Ghuman, Jasmine Boshyan, Annette M Schmid, Anders M Dale, Matti S Hämäläi- nen, Ksenija Marinkovic, Daniel L Schacter, Bruce R Rosen, et al. (2006). “Top-down facilitation of visual recognition.” In: Proceedings of the National Academy of Sciences of the United States of America 103.2, pp. 449–454. Bayerl, Pierre and Heiko Neumann (2004). “Disambiguating visual motion through contextual feedback modulation.” In: Neural Computation 16.10, pp. 2041–2066. Bednar, James A (2012). “Building a mechanistic model of the development and function of the primary visual cortex.” In: Journal of Physiology-Paris 106.5, pp. 194–211. — (2015). “Topographica: building and analyzing map-level simu- lations from Python, C/C++, MATLAB, NEST, or NEURON components.” In: Python in Neuroscience, p. 104. Bednar, James A and Risto Miikkulainen (2006). “Joint maps for orientation, eye, and direction preference in a self-organizing model of V1.” In: Neurocomputing 69.10, pp. 1272–1276. Beuth, Frederik and Fred H Hamker (2015). “A mechanistic cortical microcircuit of attention for amplification, normalization and suppression.” In: Vision Research 116, pp. 241–257. Bienenstock, Elie L, Leon N Cooper, and Paul W Munro (1982). “Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex.” In: The Journal of Neuroscience 2.1, pp. 32–48. Bishop, Christopher M (1995). Neural networks for pattern recog- nition. Oxford university press. 104 Bibliography 105 Blasdel, Gary G (1992). “Orientation selectivity, preference, and continuity in monkey striate cortex.” In: The Journal of Neu- roscience 12.8, pp. 3139–3161. Blasdel, Gary G and Guy Salama (1986). “Voltage-sensitive dyes reveal a modular organization in monkey striate cortex.” In: Nature 321.6070, pp. 579–585. Bridson, Robert (2007). “Fast Poisson disk sampling in arbitrary dimensions.” In: SIGGRAPH Sketches, p. 22. Brosch, Tobias and Heiko Neumann (2014a). “Computing with a canonical neural circuits model with pool normalization and modulating feedback.” In: Neural Computation. — (2014b). “Interaction of feedforward and feedback streams in visual cortex in a firing-rate model of columnar computations.” In: Neural Networks 54, pp. 11–16. Buschman, Timothy J and Sabine Kastner (2015). “From Behavior to Neural Dynamics: An Integrated Theory of Attention.” In: Neuron 88.1, pp. 127–144. Callaway, Edward M (2004). “Feedforward, feedback and inhibitory connections in primate visual cortex.” In: Neural Networks 17.5, pp. 625–632. Carandini, Matteo and David J Heeger (2012). “Normalization as a canonical neural computation.” In: Nature Reviews Neuro- science 13.1, pp. 51–62. Chapman, Barbara, Michael P Stryker, and Tobias Bonhoeffer (1996). “Development of orientation preference maps in ferret primary visual cortex.” In: The Journal of Neuroscience 16.20, pp. 6443–6453. Clopath, Claudia, Lars Büsing, Eleni Vasilaki, and Wulfram Gerst- ner (2010). “Connectivity reflects coding: a model of voltage- based STDP with homeostasis.” In: Nature Neuroscience 13.3, pp. 344–352. Craft, Edward, Hartmut Schütze, Ernst Niebur, and Rüdiger Von Der Heydt (2007). “A neural model of figure–ground organiza- tion.” In: Journal of Neurophysiology 97.6, pp. 4310–4326. Cudeiro, Javier and Adam M Sillito (2006). “Looking back: corti- cothalamic feedback and early visual processing.” In: Trends in Neurosciences 29.6, pp. 298–306. Dollár, Piotr, Ron Appel, Serge Belongie, and Pietro Perona (2014). “Fast feature pyramids for object detection.” In: Pattern Anal- ysis and Machine Intelligence, IEEE Transactions on 36.8, pp. 1532–1545. Bibliography 106 Engert, Florian and Tobias Bonhoeffer (1997). “Synapse specificity of long-term potentiation breaks down at short distances.” In: Nature 388.6639, pp. 279–284. Faisal, A Aldo, Luc PJ Selen, and Daniel M Wolpert (2008). “Noise in the nervous system.” In: Nature Reviews Neuroscience 9.4, pp. 292–303. Feldman, Jerome A and Dana H Ballard (1982). “Connectionist models and their properties.” In: Cognitive science 6.3, pp. 205– 254. Felleman, Daniel J and David C Van Essen (1991). “Distributed hierarchical processing in the primate cerebral cortex.” In: Cere- bral Cortex 1.1, pp. 1–47. Ferster, D (1987). “Origin of orientation-selective EPSPs in simple cells of cat visual cortex.” In: The Journal of Neuroscience 7.6, pp. 1780–1791. Field, David J (1987). “Relations between the statistics of natural images and the response properties of cortical cells.” In: JOSA A 4.12, pp. 2379–2394. Fino, Elodie, Vincent Paille, Yihui Cui, Teresa Morera-Herreras, Jean-Michel Deniau, and Laurent Venance (2010). “Distinct coincidence detectors govern the corticostriatal spike timing- dependent plasticity.” In: The Journal of Physiology 588.16, pp. 3045–3062. Friston, KJ and C Büchel (2000). “Attentional modulation of effective connectivity from V2 to V5/MT in humans.” In: Pro- ceedings of the National Academy of Sciences 97.13, pp. 7591– 7596. Froemke,RobertC(2015).“Plasticityofcorticalexcitatory-inhibitory balance.” In: Annual Review of Neuroscience 38, p. 195. Fukushima, K and S Miyake (1978). “A self-organizing neural network with a function of associative memory: feedback-type cognitron.” In: Biological Cybernetics 28.4, pp. 201–208. Fukushima, Kunihiko (1975). “Cognitron: A self-organizing mul- tilayered neural network.” In: Biological Cybernetics 20.3-4, pp. 121–136. — (1980). “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position.” In: Biological Cybernetics 36.4, pp. 193–202. Girard, P, JM Hupe, and J Bullier (2001). “Feedforward and feed- back connections between areas V1 and V2 of the monkey have similar rapid conduction velocities.” In: Journal of Neurophysi- ology 85.3, pp. 1328–1331. Bibliography 107 Grant, W Shane, James Tanner, and Laurent Itti (2016). Conflict Learning Source Code. ilab.usc.edu/conflictlearning/. Accessed: 2016-07-03. Grossberg,Stephen(1987).“Competitivelearning:Frominteractive activation to adaptive resonance.” In: Cognitive science 11.1, pp. 23–63. — (2013). “Adaptive Resonance Theory: How a brain learns to consciously attend, learn, and recognize a changing world.” In: Neural Networks 37, pp. 1–47. Hawkins, Jeff, Subutai Ahmad, and Yuwei Cui (2017). “Why Does the Neocortex Have Columns, A theory of Learning the 3D Structure of the World.” In: He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun (2016). “Deep residual learning for image recognition.” In: Proceed- ings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Hebb, Donald Olding (1949). The organization of behavior: A neuropsychological approach. John Wiley & Sons. Heeger, David J (1992). “Normalization of cell responses in cat striate cortex.” In: Visual neuroscience 9.2, pp. 181–197. Heydt, Rüdiger von der (2015). “Figure–ground organization and the emergence of proto-objects in the visual cortex.” In: Fron- tiers in Psychology 6. Hopfield, John J (1982). “Neural networks and physical systems with emergent collective computational abilities.” In: Proceed- ings of the National Academy of Sciences 79.8, pp. 2554–2558. Hsu, Chih-Chieh and Alice C Parker (2014). “Border ownership in a nano-neuromorphic circuit using nonlinear dendritic com- putations.” In: Neural Networks (IJCNN), 2014 International Joint Conference on. IEEE, pp. 3442–3449. Hubel, David H and Torsten N Wiesel (1962). “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex.” In: The Journal of physiology 160.1, pp. 106–154. — (1965). “Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat.” In: Journal of neurophysiology 28.2, pp. 229–289. — (1974).“Sequenceregularityandgeometryoforientationcolumns in the monkey striate cortex.” In: Journal of Comparative Neu- rology 158.3, pp. 267–293. — (1977). “Functional architecture of macaque monkey visual cortex.” In: Proc. R. Soc. Lond. B. Vol. 198, pp. 1–59. Hupe, Jean-Michel, Andrew C James, Pascal Girard, Stephen G Lomber, Bertram R Payne, and Jean Bullier (2001). “Feedback Bibliography 108 connections act on the early part of the responses in monkey visual cortex.” In: Journal of Neurophysiology 85.1, pp. 134– 145. Itti, Laurent, Christof Koch, and Ernst Niebur (1998). “A model of saliency-based visual attention for rapid scene analysis.” In: IEEE Transactions on Pattern Analysis & Machine Intelligence 11, pp. 1254–1259. Jain, Rishabh, Rachel Millin, and Bartlett W Mel (2015). “Mul- timap formation in visual cortex.” In: Journal of Vision 15.16, pp. 3–3. Jehee, Janneke FM, Victor AF Lamme, and Pieter R Roelfsema (2007). “Boundary assignment in a recurrent network architec- ture.” In: Vision Research 47.9, pp. 1153–1165. Jones, Helen E, Ian M Andolina, Bashir Ahmed, Stewart D Shipp, Jake TC Clements, Kenneth L Grieve, Javier Cudeiro, Thomas E Salt, and Adam M Sillito (2012). “Differential feedback modulation of center and surround mechanisms in parvocellular cells in the visual thalamus.” In: The Journal of Neuroscience 32.45, pp. 15946–15951. Jones, Helen E, Ian M Andolina, Stewart D Shipp, Daniel L Adams, Javier Cudeiro, Thomas E Salt, and Adam M Sillito (2015). “Figure-ground modulation in awake primate thala- mus.” In: Proceedings of the National Academy of Sciences 112.22, pp. 7085–7090. Kaschube, Matthias, Michael Schnabel, Siegrid Löwel, David M Coppola,LeonardEWhite,andFredWolf(2010).“Universality in the evolution of orientation columns in the visual cortex.” In: Science 330.6007, pp. 1113–1116. Keil, Wolfgang, Matthias Kaschube, Michael Schnabel, Zoltan F Kisvarday, Siegrid Löwel, David M Coppola, Leonard E White, and Fred Wolf (2012). “Response to Comment on “Universality in the Evolution of Orientation Columns in the Visual Cortex “.” In: Science 336.6080, pp. 413–413. Kirkpatrick, James, Razvan Pascanu, Neil Rabinowitz, Joel Ve- ness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. (2017). “Overcoming catastrophic forgetting in neural networks.” In: Proceedings of the National Academy of Sciences, p. 201611835. Kogo, Naoki and Raymond van Ee (2014). “Neural mechanisms of figure-ground organization: Border-ownership, competition and perceptual switching.” In: Handbook of Perceptual Organization. Bibliography 109 Kogo, Naoki, Christoph Strecha, Luc Van Gool, and Johan Wage- mans (2010). “Surface construction by a 2-D differentiation– integration process: A neurocomputational model for perceived border ownership, depth, and lightness in Kanizsa figures.” In: Psychological Review 117.2, p. 406. Kohonen, Teuvo (1990). “The self-organizing map.” In: Proceedings of the IEEE 78.9, pp. 1464–1480. Larkum, Matthew (2013). “A cellular mechanism for cortical as- sociations: an organizing principle for the cerebral cortex.” In: Trends in Neurosciences 36.3, pp. 141–151. Layton,OliverW,EnnioMingolla,andArashYazdanbakhsh(2015). “Neural dynamics of feedforward and feedback processing in figure-ground segregation.” In: Feedforward and Feedback Pro- cesses in Vision, p. 39. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton (2015). “Deep learning.” In: Nature 521.7553, pp. 436–444. LeCun, Yann, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel (1989). “Backpropagation applied to handwritten zip code recognition.” In: Neural computation 1.4, pp. 541–551. Levy, WB and O Steward (1983). “Temporal contiguity require- ments for long-term associative potentiation/depression in the hippocampus.” In: Neuroscience 8.4, pp. 791–797. Li, Ye, David Fitzpatrick, and Leonard E White (2006). “The development of direction selectivity in ferret visual cortex re- quires early visual experience.” In: Nature Neuroscience 9.5, pp. 676–681. Lillicrap, Timothy P, Daniel Cownden, Douglas B Tweed, and Colin J Akerman (2016). “Random synaptic feedback weights support error backpropagation for deep learning.” In: Nature communications 7. Lim, Sukbin, Jillian L McKee, Luke Woloszyn, Yali Amit, David J Freedman, David L Sheinberg, and Nicolas Brunel (2015). “Inferring learning rules from distributions of firing rates in cortical neurons.” In: Nature Neuroscience. Linden, David J and John A Connor (1995). “Long-term synaptic depression.” In: Annual Review of Neuroscience 18.1, pp. 319– 357. Long, Mingsheng, Yue Cao, Jianmin Wang, and Michael Jordan (2015). “Learning transferable features with deep adaptation networks.” In: International Conference on Machine Learning, pp. 97–105. Bibliography 110 Lowe,DavidG(1999).“Objectrecognitionfromlocalscale-invariant features.” In: Computer vision, 1999. The proceedings of the sev- enth IEEE international conference on. Vol. 2. Ieee, pp. 1150– 1157. Major, Guy, Matthew E Larkum, and Jackie Schiller (2013). “Ac- tive properties of neocortical pyramidal neuron dendrites.” In: Annual Review of Neuroscience 36, pp. 1–24. Markov, Nikola T, Julien Vezoli, Pascal Chameau, Arnaud Falchier, René Quilodran, Cyril Huissoud, Camille Lamy, Pierre Misery, Pascale Giroud, Shimon Ullman, et al. (2014). “Anatomy of hierarchy: feedforward and feedback pathways in macaque visual cortex.” In: Journal of Comparative Neurology 522.1, pp. 225–259. Markram, Henry, Maria Toledo-Rodriguez, Yun Wang, Anirudh Gupta, Gilad Silberberg, and Caizhi Wu (2004). “Interneu- rons of the neocortical inhibitory system.” In: Nature Reviews Neuroscience 5.10, pp. 793–807. Martin, Anne B and Rüdiger von der Heydt (2015). “Spike syn- chrony reveals emergence of proto-objects in visual cortex.” In: The Journal of Neuroscience 35.17, pp. 6860–6870. Masquelier, Timothée (2012). “Relative spike time coding and STDP-based orientation selectivity in the early visual system in naturalcontinuousandsaccadicvision:acomputationalmodel.” In: Journal of Computational Neuroscience 32.3, pp. 425–441. McCloskey, Michael and Neal J Cohen (1989). “Catastrophic in- terference in connectionist networks: The sequential learning problem.”In:Psychology of learning and motivation 24,pp.109– 165. Miconi, Thomas and Rufin VanRullen (2016). “A Feedback Model of Attention Explains the Diverse Effects of Attention on Neu- ral Firing Rates and Receptive Field Structure.” In: PLoS Computational Biology 12.2, e1004770. Mihalas, Stefan, Yi Dong, Rüdiger von der Heydt, and Ernst Niebur (2011). “Mechanisms of perceptual organization pro- vide auto-zoom and auto-localization for attention to objects.” In: Proceedings of the National Academy of Sciences 108.18, pp. 7583–7588. Miikkulainen, Risto, James A Bednar, Yoonsuck Choe, and Joseph Sirosh(2006).Computational maps in the visual cortex.Springer Science & Business Media. Mountcastle, Vernon B (1957). “Modality and topographic prop- erties of single neurons of cat9s somatic sensory cortex.” In: Journal of neurophysiology 20.4, pp. 408–434. Bibliography 111 Nakayama, Ken, Shinsuke Shimojo, and Gerald H Silverman (1989). “Stereoscopic depth: its relation to image segmentation, group- ing, and the recognition of occluded objects.” In: Perception 18.1, pp. 55–68. Neyshabur, Behnam, Srinadh Bhojanapalli, David McAllester, and Nati Srebro (2017). “Exploring generalization in deep learn- ing.” In: Advances in Neural Information Processing Systems, pp. 5949–5958. Nicholls, John G, A Robert Martin, Bruce G Wallace, and Paul A Fuchs (2001). From neuron to brain. Vol. 271. Sinauer Asso- ciates Sunderland, MA. O’Reilly, Randall C, Y Munakata, MJ Frank, TE Hazy, et al. (2012). Computational cognitive neuroscience. PediaPress. Oja, Erkki (1982). “Simplified neuron model as a principal com- ponent analyzer.” In: Journal of Mathematical Biology 15.3, pp. 267–273. Olshausen, Bruno A and David J Field (1997). “Sparse coding with an overcomplete basis set: A strategy employed by V1?” In: Vision Research 37.23, pp. 3311–3325. Pettigrew, John D and Masakazu Konishi (1976). “Neurons selec- tive for orientation and binocular disparity in the visual Wulst of the barn owl (Tyto alba).” In: Science 193.4254, pp. 675–678. Qiu, Fangtu T, Tadashi Sugihara, and Rüdiger von der Heydt (2007). “Figure-ground mechanisms provide structure for se- lective attention.” In: Nature Neuroscience 10.11, pp. 1492– 1499. Redondo, Roger L and Richard GM Morris (2011). “Making mem- ories last: the synaptic tagging and capture hypothesis.” In: Nature Reviews Neuroscience 12.1, pp. 17–30. Ren, Xiaofeng, Charless C Fowlkes, and Jitendra Malik (2006). “Figure/ground assignment in natural images.” In: Computer Vision–ECCV 2006. Springer, pp. 614–627. Roelfsema, Pieter R, Victor AF Lamme, Henk Spekreijse, and Hol- ger Bosch (2002). “Figure—ground segregation in a recurrent network architecture.” In: Journal of Cognitive Neuroscience 14.4, pp. 525–537. Rubin, Edgar (1915). “Synsoplevede figurer.” In: Rumelhart, David E, Geoffrey E Hinton, and Ronald J Williams (1985). Learning internal representations by error propagation. Tech. rep. California Univ San Diego La Jolla Inst for Cognitive Science. Bibliography 112 Sabour, Sara, Nicholas Frosst, and Geoffrey E Hinton (2017). “Dy- namic routing between capsules.” In: Advances in Neural In- formation Processing Systems, pp. 3859–3869. Sakai, Ko and Haruka Nishimura (2006). “Surrounding suppression and facilitation in the determination of border ownership.” In: Journal of Cognitive Neuroscience 18.4, pp. 562–579. Sanger, Terence D (1989). “Optimal unsupervised learning in a single-layer linear feedforward neural network.” In: Neural Net- works 2.6, pp. 459–473. Schmidhuber, Jürgen (2015). “Deep learning in neural networks: An overview.” In: Neural Networks 61, pp. 85–117. Serre, Thomas, Lior Wolf, and Tomaso Poggio (2005). “Object recognition with features inspired by visual cortex.” In: Com- puter Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 2. IEEE, pp. 994–1000. Shouval, Harel Z, SH Samuel, and Gayle M Wittenberg (2010). “Spike timing dependent plasticity: a consequence of more fun- damental learning rules.” In: Spike-timing Dependent Plasticity, p. 60. Sirosh, Joseph and Risto Miikkulainen (1994). “Cooperative self- organization of afferent and lateral connections in cortical maps.” In: Biological Cybernetics 71.1, pp. 65–78. Song,Sen,KennethDMiller,andLarryFAbbott(2000).“Competi- tive Hebbian learning through spike-timing-dependent synaptic plasticity.” In: Nature Neuroscience 3.9, pp. 919–926. Stevens, Jean-Luc R, Judith S Law, Ján Antolík, and James A Bednar (2013). “Mechanisms for stable, robust, and adaptive development of orientation maps in the primary visual cortex.” In: The Journal of Neuroscience 33.40, pp. 15747–15766. Sugihara, Tadashi, Fangtu T Qiu, and Rüdiger von der Heydt (2011). “The speed of context integration in the visual cortex.” In: Journal of Neurophysiology 106.1, pp. 374–385. Sun, Baochen, Jiashi Feng, and Kate Saenko (2016). “Return of Frustratingly Easy Domain Adaptation.” In: AAAI. Vol. 6. 7, p. 8. Supèr, Hans, August Romeo, and Matthias Keil (2010). “Feed- forwardsegmentationoffigure-groundandassignmentofborder- ownership.” In: PLoS One 5.5, e10705. Swindale, Nicholas V and H Bauer (1998). “Application of Koho- nen’s self–organizing feature map algorithm to cortical maps of orientation and direction preference.” In: Proceedings of the Royal Society of London B: Biological Sciences 265.1398, pp. 827–838. Bibliography 113 Teo, Ching, Cornelia Fermuller, and Yiannis Aloimonos (2015). “Fast 2D border ownership assignment.” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5117–5125. Tsotsos, John K, Scan M Culhane, Winky Yan Kei Wai, Yuzhong Lai, Neal Davis, and Fernando Nuflo (1995). “Modeling visual attention via selective tuning.” In: Artificial intelligence 78.1-2, pp. 507–545. Turrigiano, Gina G (2008). “The self-tuning neuron: synaptic scal- ing of excitatory synapses.” In: Cell 135.3, pp. 422–435. Turrigiano, Gina G and Sacha B Nelson (2004). “Homeostatic plasticity in the developing nervous system.” In: Nature Reviews Neuroscience 5.2, pp. 97–107. Varela, Carmen (2014). “Thalamic neuromodulation and its impli- cations for executive networks.” In: Frontiers in Neural Circuits 8, p. 69. Vogels, Tim P, Robert C Froemke, Nicolas Doyon, Matthieu Gilson, Julie S Haas, Robert Liu, Arianna Maffei, Paul Miller, CJ Wierenga, Melanie A Woodin, et al. (2013). “Inhibitory synap- tic plasticity: spike timing-dependence and putative network function.” In: Frontiers in Neural Circuits 7.EPFL-REVIEW- 189448. Wagatsuma, Nobuhiko, Rüdiger von der Heydt, and Ernst Niebur (2016). “Spike Synchrony Generated by Modulatory Common Input through NMDA-type Synapses.” In: Journal of Neuro- physiology, jn–01142. Wang, Lang and Arianna Maffei (2014). “Inhibitory plasticity dictates the sign of plasticity at excitatory synapses.” In: The Journal of Neuroscience 34.4, pp. 1083–1093. Wenisch, Oliver G, Joachim Noll, and J Leo van Hemmen (2005). “Spontaneously emerging direction selectivity maps in visual cortex through STDP.” In: Biological Cybernetics 93.4, pp. 239– 247. Widloski, John and Ila R Fiete (2014). “A model of grid cell devel- opment through spatial exploration and spike time-dependent plasticity.” In: Neuron 83.2, pp. 481–495. Wiesel, Torsten N, David H Hubel, et al. (1963). “Single-cell re- sponses in striate cortex of kittens deprived of vision in one eye.” In: Journal of Neurophysiology 26.6, pp. 1003–1017. Williford, Jonathan R and Rüdiger von der Heydt (2016). “Figure- ground organization in visual cortex for natural scenes.” In: eNeuro 3.6, ENEURO–0127. Bibliography 114 Wolchover, Natalie (2017). “New Theory Cracks Open the Black Box of Deep Learning.” In: Quanta Magazine. url: https: //www.quantamagazine.org/new-theory-cracks-open- the-black-box-of-deep-learning-20170921/. Wurtz, Robert H (2009). “Recounting the impact of Hubel and Wiesel.” In: The Journal of Physiology 587.12, pp. 2817–2823. Yantis, Steven (2008). “The neural basis of selective attention cortical sources and targets of attentional modulation.” In: Current Directions in Psychological Science 17.2, pp. 86–90. Zenke,Friedemann,EvertonJAgnes,andWulframGerstner(2015). “Diverse synaptic plasticity mechanisms orchestrated to form and retrieve memories in spiking neural networks.” In: Nature Communications 6. Zhaoping, Li (2005). “Border ownership from intracortical interac- tions in visual area V2.” In: Neuron 47.1, pp. 143–153. Zhou, Hong, Howard S Friedman, and Rüdiger Von Der Heydt (2000). “Coding of border ownership in monkey visual cortex.” In: The Journal of Neuroscience 20.17, pp. 6594–6611. Zucker, Robert S and Wade G Regehr (2002). “Short-term synaptic plasticity.” In: Annual Review of Physiology 64.1, pp. 355–405.
Abstract (if available)
Abstract
This work develops a new unsupervised learning rule and framework for learning rotation, scale, and translation invariant visual features. The new learning rule, called conflict learning, is designed around the complications of learning modulatory feedback and composed of three simple concepts grounded in physiologically plausible evidence. Using border ownership as a prototypical example, it is shown that a Hebbian learning rule, which has long been a key component in understanding neural plasticity, fails to properly learn modulatory connections, while the proposed rule correctly learns a stimulus-driven model. This is the first time a border ownership network has been learned. Additionally, the rule can be used as a drop-in replacement for a Hebbian learning rule to learn a biologically consistent model of orientation selectivity, a network which lacks any modulatory connections. ❧ Following the development of conflict learning, this thesis lays the foundation for a general framework of cortical learning based upon the idea of a ‘competitive column’. This column describes a prototypical organization for neurons that gives rise to an ability to learn scale, rotation, and translation invariant features. This is empowered by conflict learning and a novel notion of neural ambiguity, which states that ’too many decisions lead to indecision’. The competitive column architecture is used to create a large scale (54,000 neurons and 18 million synapses), invariant model of border ownership that is trained on simple shapes such as squares and rectangles yet generalizes to multiple scales and complexities of input, including a demonstration on contours of objects taken from natural images. ❧ Conflict learning and ambiguity lead to testable predictions related to the learning of modulatory connections in the brain and associated activation dynamics. ❧ These contributions of conflict learning, the competitive column, and ambiguity give a better intuitive understanding of how feedback, modulation, and inhibition may interact in the brain to influence activation and learning. Additionally, they provide a promising avenue for advancing object recognition systems.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Invariant representation learning for robust and fair predictions
PDF
Dendritic computation and plasticity in neuromorphic circuits
Asset Metadata
Creator
Grant, William Shane
(author)
Core Title
Learning invariant features in modulatory neural networks through conflict and ambiguity
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
03/02/2018
Defense Date
01/19/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
ambiguity,border ownership,cortical column,feedback,Learning and Instruction,modulation,neural network,OAI-PMH Harvest,plasticity
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Itti, Laurent (
committee chair
), Mel, Bartlett (
committee member
), Nevatia, Ram (
committee member
)
Creator Email
w.shane.grant@gmail.com,wgrant@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-480290
Unique identifier
UC11266762
Identifier
etd-GrantWilli-6070.pdf (filename),usctheses-c40-480290 (legacy record id)
Legacy Identifier
etd-GrantWilli-6070.pdf
Dmrecord
480290
Document Type
Dissertation
Rights
Grant, William Shane
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
ambiguity
border ownership
cortical column
feedback
modulation
neural network
plasticity