Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A dynamic method to reduce the search space for visual correspondence problems
(USC Thesis Other)
A dynamic method to reduce the search space for visual correspondence problems
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A DYNAMIC METHOD TO REDUCE THE SEARCH SPACE FOR VISUAL
CORRESPONDENCE PROBLEMS
by
Junmei Zhu
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2004
Copyright 2004 Junmei Zhu
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 3140584
Copyright 2004 by
Zhu, Junmei
All rights reserved.
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations and
photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
UMI
UMI Microform 3140584
Copyright 2004 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Dedication
To my parents.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgements
This dissertation is a reflection of ideas emerged from many invaluable discussions with my
advisor Christoph von der Malsburg. His early work on the Dynamic Link Architecture forms
the backbone of this thesis; his insight into the organization of the brain is a constant source
of inspiration as my work progresses; his encouragement and guidance are the driving force of
finishing this manuscript; and he proofread every draft. This work was supported by his various
grants. I feel privileged to receive the tremendous influence of his brilliant mind, and am grateful
to his understanding, tolerance and compassion that were best revealed at the most difficult time.
I also have the privilege to learn visual psychophysics from Irving Biederman, who kindly
adopted me, generously offered his insightful comments, and always made available his broad
knowledge, time and resources. I benefited from every activity with his laboratory including the
Helmholtz club meeting. I thank Michael Arbib for his comments and suggestions to improve this
work. His knowledge on every aspect of brain theory has been what I can always resort to. I thank
Norberto Grzywacz for his suggestions to my thesis proposal, invaluable from both biological and
theoretical aspects, and for his continued interests in finding biological substrates of my system.
I thank Gerard Medioni for being in my qualifying examination committee.
Most experiments in this work are simulated in FLAVOR, a software environment developed
at the Institut fiir Neuroinformatik, Ruhr-Universitat Bochum Germany, and our Laboratory
for Computational and Biological Vision at USC. I would like to thank all developers. Sincere
thanks also go to my colleagues; Kazunori Okada for always having solution to my questions and
providing me technical details and input images, Shuang Wu for offering lots of help including
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the submission of this dissertation, Rolf W iirtz and Jan Wieghardt for stimulating discussions
during my short visit to Bochum, Laurenz Wiskott, Larry Kite, Doug Garrett, Xiangyu Tang,
Ladan Shams, Michael Potzsch, Thomas Maurer, Ingo Wundrich, H artm ut Loos, Michael Neef,
Mike Mangini, Ed Vessel.
Faculty and staff in Computer Science and the Neuroscience program at USC provide a friendly
interdisciplinary environment from which I have benefited greatly. I would like to thank, among
many others, George Bekey, Len Adleman, Ari Requicha, Nina Bradley, Judith Hirsch, Bosco
Tjan, Laurent Itti, Bill Wagner, Armand Tanguay, Lou Byerly, Sarah Bottjer for generosity in
their advices.
I thank my friends for their tremendous help in making my Ph.D. study at USC more enjoyable:
Xiaoying Lu for her help when I needed most, Erhan Oztop for showing a deeper meaning of
a friend, Tao Zhao for sharing similar experience, Walid Soussou, Michael Crammer and Kim
Christian for being there listening and reinforcing my confidence, Yunsong Huang, Beth Fisher,
Fang Fang, Qi Cheng, and many others.
Certain event happened during my Ph.D. study expanded my interests to other areas in life
sciences. Many people have helped me in this process, and I thank everyone of them. Particularly,
I own all my basic knowledge to Lori Walker, who, with her associates, patiently compiled my
three-volume study notes. I am grateful to Henry Han for also being a friend, David Calverley
for his compassion, Clenn Ehresmann and Akmal for their expertise.
Finally, I am especially indebted to my parents for their sacrifices, love and support, without
which it would be impossible for me to concentrate on my research and study far away from
home. I am indebted to my brothers Junming for taking my family responsibilities, and Junhong
for steering my career path and helping on various topics including the philosophy of science.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Contents
D edication ii
A cknow ledgem ents iii
List o f Tables viii
List o f Figures ix
A bstract x
1 Introduction 1
1.1 The b r a i n .................................................................................................................................. 1
1.2 The p r o b le m ........................................................................................................................... 3
1.3 O bjectives.................................................................................................................................. 7
1.4 The ap p ro a c h ........................................................................................................................... 7
1.5 Organization of this dissertation ........................................................................................ 8
2 Background: representation and organization 9
2.1 Correspondence in invariant object recognition and b e y o n d ........................................ 9
2.2 The correspondence problem .............................................................................................. 12
2.3 Representation and implementation of m a p p in g s ........................................................... 13
2.3.1 Representation of m appings.................................................................................... 13
2.3.2 Brain realization of m appings................................................................................. 14
2.3.3 Search space .............................................................................................................. 17
2.4 Map organization; top-down constrained o p tim iz a tio n ................................................... 18
2.4.1 Methods o v e rv ie w ..................................................................................................... 18
2.4.1.1 Energy f u n c tio n ........................................................................................ 18
2.4.1.2 D ynam ics..................................................................................................... 19
2.4.2 Reduction of the search sp a c e ................................................................................. 19
2.4.3 L earnability................................................................................................................. 20
2.5 Map organization: bottom-up network self-organization.............................................. 21
2.5.1 Self-organization in physics and b io lo g y ............................................................... 21
2.5.2 Methods o v e rv ie w ..................................................................................................... 21
2.5.2.1 C on strain ts.................................................................................................. 21
2.5.2.2 Stable s ta te .................................................................................................. 22
2.5.3 Reduction of the search sp a c e ................................................................................. 23
2 .5 .4 L e a r n a b i l it y ......................................................................................................................................... 24
2.6 Dynamic Link Matching (D LM )........................................................................................... 24
2.6.1 Labeled g ra p h s ........................................................................................................... 24
2.6.2 Graph m a tc h in g ....................................................................................................... 26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.6.3 Classical neural implementation of D L M ................................................................... 26
F ast D y n am ic L ink M atch in g by c o o p e ra tin g m a p le ts 30
3.1 Link dynamics ........................................................................................................................ 30
3.2 Linear a n a ly s is ........................................................................................................................ 31
3.3 M a p le ts ..................................................................................................................................... 36
3.3.1 Formulation of maplet fu n c tio n ............................................................................... 36
3.3.2 Interactions between m a p le ts .................................................................................. 39
3.3.3 Link growth via m a p le ts........................................................................................... 39
3.3.4 Maplets can be learned ........................................................................................... 41
3.4 How maplets speed up convergence..................................................................................... 41
T h e m o d el 43
4.1 The s y s te m .............................................................................................................................. 43
4.1.1 Input and O u t p u t ..................................................................................................... 43
4.1.2 Feature sim ilarity........................................................................................................ 43
4.1.3 Initial link w e ig h ts..................................................................................................... 45
4.1.4 D y n am ics..................................................................................................................... 45
4.1.5 Low-dimensional maps from high-dimensional o n e s ........................................... 46
4.1.6 Image similarity from a m a p p in g ........................................................................... 46
4.1.7 R ecognition.................................................................................................................. 46
4.2 E xperim ents.............................................................................................................................. 47
4.2.1 Map formation: one-dimensional p a tte r n s ........................................................... 47
4.2.2 Map formation: two-dimensional p a tte r n s ........................................................... 48
4.2.3 2D face recognition..................................................................................................... 49
4.2.3.1 G a lle rie s ..................................................................................................... 49
4.2.3.2 R e s u lts ........................................................................................................ 50
L earn in g m a p le t co n n ectio n s 53
5.1 The learning s y s te m .............................................................................................................. 53
5.1.1 Learning ex a m p le s..................................................................................................... 53
5.1.2 System s tru c tu re ........................................................................................................ 54
5.1.3 Learning rules ........................................................................................................... 55
5.1.3.1 Maplet functions........................................................................................ 55
5.1.3.2 Connection between m a p le ts .................................................................. 56
5.1.3.3 The sy ste m .................................................................................................. 56
5.2 E xperim ents.............................................................................................................................. 57
5.2.1 Learning from synthesized m ap p in g s..................................................................... 57
5.2.2 Learning from classical DLM e q u ilib riu m ........................................................... 58
D iscussion 64
6.1 The speed of o rg an izatio n ..................................................................................................... 64
6.2 Neurobiological substrates of m aplets................................................................................... 65
6.3 Shifter c i r c u i t s ........................................................................................................................ 66
6.4 Self-organizing system s........................................................................................................... 67
6.5 Algorithms and c o m p le x ity .................................................................................................. 68
6.6 Learning m a p le ts..................................................................................................................... 69
6.7 D isclaim ers............................................................................................................................... 71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7 C onclusion 73
7.1 S u m m a ry ................................................................................................................................ 73
7.2 C o ntributions......................................................................................................................... 74
7.3 Future w o rk ............................................................................................................................. 74
R eference List 75
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Tables
4.1 Face recognition r a t e s .......................................................................................................... 51
4.2 Parameters in map formation and face re c o g n itio n ....................................................... 52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Figures
1.1 Schematic illustration of elastic graph m atching.............................................................. 5
2.1 Representation of m ap p in g s.................................................................................................. 15
2.2 Initial and final connectivity for D L M .............................................................................. 27
3.1 Isotropic C function and eigenvalues ........................................................................... 34
3.2 Non-isotropic C function and eigenvalues X^’’ ’ ................................................................. 35
3.3 Examples of maplet for ID p a t t e r n s ................................................................................. 37
3.4 Examples of maplet for 2D p a t t e r n s .................................................................................. 38
3.5 Maplet interactions.................................................................................................................. 40
4.1 ID Input patterns: image and m o d e l.................................................................................. 48
4.2 Map formation for ID p a tte r n s ........................................................................................... 48
4.3 Map formation for 2D p a tte r n s ........................................................................................... 49
4.4 Images for face recognition .................................................................................................. 50
5.1 Structure of the learning system ........................................................................................ 55
5.2 Learning maplets for translation invariance: synthesized map scale 1 .......................... 59
5.3 Learning maplets for translation invariance: synthesized map scale 2 .......................... 60
5.4 Learning examples from classical D L M .............................................................................. 62
5.5 Learning maplets for translation invariance: map from classical D L M .......................... 63
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract
This dissertation presents a fast self-organizing system for solving the visual correspondence
problem, the creation of a mapping between corresponding points in two images with variations
in position, scale, orientation, and deformation. The starting point for our system is Dynamic
Link Matching (DLM). Dynamic links are implemented as synapses with rapid plasticity that can
switch under the control of signal correlations on a time scale of 100 milliseconds. The links serve
to connect corresponding points in the two images, and they self-organize to form continuous
one-to-one mappings. DLM can deal with relative shift and is robust against deformation. It
presupposes no learning or specific circuit structures, letting it appear as a plausible basis for
early post-natal vision. But DLM requires thousands of iterations so it is not a viable model for
fast adult object recognition. Moreover, DLM seems not to be able to deal with transformations
too big to be considered as a mere deformation.
Our new implementation requires direct interactions between links. These interactions are
modeled with the help of maplets, which stand for local groups of links that are consistent with
each other in terms of transformation parameters (position, orientation and scale). Links and
maplets form a cooperative dynamic system that has the ability to converge very quickly on
globally consistent mappings. In this process, feature similarities between the two images favor
the correct correspondences. Experiments show that a mapping can be formed in just a few
iterations, irrespective of differences in scale and in-plane rotation. This speed is due to the long
range and specificity of link-to-link connections.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
We also show that maplets and their connections can be learned by generalized Hebbian plas
ticity from consistent mappings, presumably formed by classical DLM. In simple pilot experiments
the maplets formed by learning closely resemble those previously designed manually. Associative
learning of link-to-link connections promises to extend the current system to more general varia
tions (involving, e.g., rotation in depth).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 1
Introduction
1.1 The brain
The brain remains one of the greatest mysteries. Vision, for example, enables us to perceive the
external world and to control object-orientated actions effortlessly, while computer vision research
of over four decades has shown that building similar visual functions on machines is a difficult
task. On the one hand, computer vision has achieved tremendous success in the direction of image
processing and pattern recognition. For example, barcode scanners are used in every supermar
ket, and address recognition in postoffices. Our vision has been greatly expanded by computer
techniques, with the most amazing examples being found in the medical image processing domain
where imaging techniques such as MRI are made possible. On the other hand, in general visual
tasks involving, for example, object recognition and scene interpretation, only limited progress
has been achieved. State-of-the-art computer vision systems, freely using the rapidly increasing
computational power of computers, work well only in very restricted conditions, in contrast to the
robustness and adaptability that characterize biological visual systems. Behind each camera in
the supermarket self-checkout line there still has to be a human to recognize the objects that do
not bear a barcode. It is therefore natural to study how biological visual systems work, both for
understanding the brain and for applying the same design principles to computer vision systems.
1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Biological visual systems, or the brain in general, have been studied using different approaches
such as anatomy, electrophysiology, psychophysics, and psychology. These approaches, being re
ductionist in nature, have been enjoying great success in neurobiology. These studies produced
almost all the concrete knowledge of the visual system that we know today, such as the various
vision-related cortical areas, their connections and pathways, and the receptive field properties
of their neurons. However, this knowledge does not suffice to build artificial systems, and even
experimental neuroscientists start to realize the limitations of reductionistic approaches, as is ap
parent in the special issue of Nature Neuroscience on computational approaches to brain function.
For example, Gilles Laurent writes that although he could get good intuitive knowledge of the
behavior of a neural system by experimenting on it under varied regimes, he could not convey his
understanding of the system to someone else except by conveying all his past experience [38]. It
is precisely in this sense that the knowledge he obtained by experimenting is raw material but not
yet science. Charles Stevens also calls for theories which identify general organizing principles,
in distinction to models that only describe a particular process [61]. The brain is a complex
system whose collective behavior is difficult to derive from the knowledge of its components. It is
im portant to understand its principles of organization, as has been pointed out by theorists for a
long time (e.g. von der Malsburg [69]).
The data structure and dynamic laws that govern the organization of the brain are collec
tively called a neural architecture, which provides answers to the following four fundamental
questions [70]:
• W hat is the data structure of brain states?
• How are brain states organized?
• W hat is the data structure of memory?
• W hat is the mechanism of learning?
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The classical way to model the data structure of brain states is based on a vector of neural
activity values It has been argued previously that this data structure is deficient
in leaving open the binding problem [39, 68, 30, 2, 70], In a seminal paper [68], von der Malsburg
proposed that a better interpretation of dynamic brain states must contain the equivalent of
dynamic links between neural units. These could be binary, involving pairs of elements,
i , j = or links could be of higher order, e.g., involving triplets i , j , k =
and would have weights changing on the psychological time scale of 100 msec or less. This
neural architecture with dynamic links is called the Dynamic Link Architecture (DLA). In this
dissertation, we present a concrete application of the general ideas of DLA, using the visual
correspondence problem as example.
1.2 The problem
The visual correspondence problem concerns the creation of a mapping between two patterns, such
th at features, or nodes, projected from the same scene point are connected. “Correspondence” and
“mapping” are used interchangeably in this dissertation. Correspondence underlies various visual
tasks. In stereo vision, correspondence is between left and right images for depth information [33,
5, 42]. In motion estimation, we need correspondence between consecutive images in time [29, 65].
In object recognition, the two patterns are an input image and a stored model. The correspondence
problem is an important and intensively studied issue in itself. We here view it as an important
paradigm to study the mechanisms by which the link variables are organized on the psychological
time scale, because by force correspondence has to talk about the organization of links.
In the context of invariant object recognition, links are to represent neighborhood relationships
within the image domain and within the model domain, as well as correspondence relationship
between the image and a model. Correspondence serves to compensate for the variations between
the image and the model, so that they can be directly compared. This abstract description has
3
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
been implemented as a concrete algorithm in the elastic graph matching (EGM) system [37],
which is a correspondence-based object recognition system shown schematically in figure 1.1. In
the system, models and images are represented by graphs, whose vertices are labeled by features
defined as a column (vector) of multiscale and multiorientation Gabor filter responses (called a
Gabor jet), and whose edges are labeled by the geometrical distances. The elastic graph matching
algorithm builds a correspondence between image and model, by providing a systematic way of
deforming the image graph to optimize the matching of the vertex labels and edge labels with
the model graph. Once a correspondence is established, a similarity score can be obtained by
summing over jet similarity between corresponding jets, which is the normalized inner product of
the two vectors.
The matching algorithm was very efficiently implemented in the computer using pointers,
and this system as well as its extension have achieved great success as a technology for face
recognition. In the world-wide FERET test by the US Army Research Laboratory, it outperformed
all other techniques such as eigenface and classification neural nets [53]. More recently, in the
Face Recognition Vendor Test (FRVT), a large-scale evaluation of face recognition technologies
by the National Institute of Standards, two of the three best systems (Eyematic and Cognitec)
are based on this correspondence of Gabor jet graphs (www.frvt.org).
The most interesting aspect of the EGM system, however, is its potential of being a theory
of biological perceptual recognition. It contains no free parameters needed to be adjusted for
different applications. The representation of graph nodes by Gabor jets is motivated by the early
spatial representation of biological vision, namely the hypercolumns in the primary visual cortex
V I. Most importantly, psychophysical experiments [17, 34, 7] in Biederman’s laboratory showed
that the similarity value generated by the EGM model provides an excellent measure of human
performance on discrimination task between objects that require quantitative distinctions (such
as faces or Shepard blobs, case 3 subordinate object recognition according to Biederman [8]). This
4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Stored object representation
Object (memory) layer
Matching algorithm
Multidimensional
feature detectcr
Inptit (feature) laj®r
The direction of diffusion
Figure 1.1: Schematic representation of the two-layer spatial filter elastic graph matching model
[37]. The input (image) and object (model) are represented as regular lattices of Gabor jets. The
matching algorithm establishes a correspondence between the image and the model, (from [17]
with permission from authors.)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
exciting result leads to the urgent question as to how the matching algorithm can be implemented
in a brain-like structure.
We consider brain-like structure as a self-organizing system with neurons or nemal-like ele
ments. Here we refer to the dynamic link architecture. The matching process for invariant object
recognition based on dynamic links is called Dynamic Link Matching (DLM). The EGM system
is an algorithmic counterpart of DLM, its data structure and operations being adapted to cur
rent computer architecture. Wiskott and von der Malsburg subsequently proposed a fully neural
realization of DLM [80], which we here call the classical neural implementation. In this implemen
tation, link dynamics are achieved by temporal correlation: a positive feedback loop between a
link weight and the synchronized activity of the neurons at its two ends. For temporal correlation
to work, a sequence of activity patterns in the image and the model are required. If we attribute
a few milliseconds to establish a single pattern, and if as many as several hundreds of patterns
are needed, the recognition time will be too long to account for the a tenth of a second recog
nition time in adults [62, 63]. Therefore, although using only biologically plausible organization
rules, and being a system that can work even in a very simple initial system state, the classical
implementation is limited as a biological model due to its relative slowness. In addition, it can
deal with only small transformations in rotation and scale, by treating them as deformations.
Direct extension to deal with large scale and orientation differences is possible in principle, but
the increased search space and level of ambiguity would exacerbate the time problem further if
not create convergence problems. To answer the question how the matching algorithm can be
implemented in the dynamic link architecture, these problems of the classical neural DLM have
to be addressed.
6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1.3 O bjectives
In this dissertation, we explore the organization of links in the visual correspondence problem for
invariant object recognition on an unstructured model database in the dynamic link architecture.
To solve the problems of the classical neural DLM, we place particular emphasis on the efficiency
of the matching process, efficiency measured in terms of the number of iterations necessary in a
fully parallel implementation, and enlargement of the useful search space, permitting invariance
to scale and orientation in addition to translation. Specifically, we aim to provide a self-organizing
system to rapidly create a mapping between two patterns, with variations in translation, scale,
in-plane rotation, and deformation.
Although we pay attention to neural constraints (especially such as locality of information), we
stop short of formulating a neural implementation in any detail. We see our in silico experiment
as an efficient approach to test ideas that may work, and especially to eliminate ideas that do not
work. Because if an idea does not work with stronger computational power, no way can it work
in more restricted operations.
1.4 The approach
Intuition comes from the extraordinary ability of the brain to learn and adapt. The organization
of a complex system depends on the interactions between its constituent elements. The brain has
the inherent tendency to establish organized structural states by mechanisms of self-organization,
and it must be part of its function to stabilize such arduously found states by modifying the
underlying system of interactions, i.e., learning, so that these states can be recovered efficiently
and reliably in the future. We therefore assume that mappings formed in early life leave memory
traces in the form of altered link interaction patterns, on the basis of which the process can be
accelerated.
7
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
As in the classical neural DLM [80], we formulate our system as a set of coupled differential
equations. However, as system variables we restrict ourselves to links, leaving out the neural
activities of the classical system. To compensate, direct synapto-synaptic interaction are posited.
This has two decisive advantages. The inherently time-consuming requirement to compute tem
poral correlations is obviated, and interactions between links can now be sculpted more freely
for the benefit of system efficiency. Through appropriate link interactions, the system can give
selective advantage to mapping patterns similar to those formed previously.
We model the link interaction patterns as maplets and their interactions. A maplet is a local
group of links consistent with each other in terms of transformation parameters such as relative
position, scale, and orientation of image and model. Different maplets are connected according to
their degree of consistency in terms of these parameters. Maplets previously were called control
units [82, 83]. They correspond closely to the control neurons of shifter circuits [50], but differ in
essential ways. In this dissertation, we will show in silico experiments that correspondence map
formation can be made very efficient on the basis of interacting maplets, and that the formulation
of maplets by learning is possible by generalized Hebbian plasticity.
1.5 Organization of this dissertation
This dissertation is organized as follows. Chapter 2 is background on the representation and
organization of correspondence, as well as a short review of DLM. We introduce the concept of
maplet and describe how it is used and how it can speed up DLM in Chapter 3. Chapter 4 gives a
complete description of our current system, and some experimental results. Chapter 5 addresses
the problem of learning maplets, with preliminary results on learning shift invariant. Finally,
Chapter 6 is discussion and Chapter 7 conclusion and future directions.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 2
Background: representation and organization
In this chapter, we provide background on correspondence and invariant object recognition. We
will define the search space of the visual correspondence problems, and review methods to explore
the search space. Dynamic Link Matching (DLM), a matching method for object recognition
based on dynamic links, is reviewed.
2.1 Correspondence in invariant object recognition and beyond
The visual perception of objects, even the mere recognition of objects, is made difficult by the
tremendous variance in their retinal images, wiriance in terms of image transformation (transla
tion, scale and orientation), depth-rotation, deformation, occlusion, change in illumination and
noise. Many mechanisms have been proposed to explain how biological systems solve this in
variant object recognition problem. Direct comparison of the image with stored models, like in
the associative memory, generalizes only over Hamming distance (pixel-by-pixel difference), and
therefore can only account for variance in noise, or other variations that are small enough to be
considered as noise. Other types of variance, especially transformations, can be accounted for by
the establishment of an organized point-to-point correspondence between the image and a model,
which, in theory, links elements mapped from the same scene points. Correspondence serves to
undo the variances. This is the correspondence-based method for object recognition. Recognition
9
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is based on finding mappings, or sets of correspondences, between part of the image domain and
part of the model domain, maps being represented by sets of links that run between units with
similar features and that are consistent with each other in terms of relative position, scale and
orientation.
Evidence that correspondence might be the mechanism that humans use in the type of sub
ordinate object recognition that require quantitative distinctions comes from psychophysical ex
periments by Biederman’s laboratory [17, 34, 7]. The results show that, as pointed out in the
introduction, human performance is highly correlated with a correspondence-based recognition
model, EGM [37]. In one experiment [34], subjects judge whether a pair of brief (100 msec)
sequential presentations of face images with an intervening mask are of the same or different
individuals. For the effect of rotation in depth, the reaction time and error rate are strongly and
linearly correlated with the similarity score calculated by the EGM system. The correlation is 0.9
for reaction time and 0.82 for error rates. In another experiment [7], subjects perform physical
same-different judgments on pairs of sequentially presented free-form, symmetrical, blobby novel
shapes called Shepard blobs. The correlation between the EGM similarity value and reaction
times and error rates is 0.95 and 0.96, respectively.
Correspondence may have an indirect, but more fundamental role in general basic-level in
variant object recognition. Of all computational models for invariant recognition, apart from a
few correspondence-based systems [50, 80, 77, 30, 66], the majority are feature-based, examples
of which include the Neocognitron [20], VisNet [14] and SEEMORE [45]. Feature based systems
work by extracting a sufficiently rich variety of features (with the help of units with appropriately
structured receptive fields), achieve feature-wise invariance with the help of fan-in connections
from parameter-specific feature detector units to parameter-independent master units for feature
types, and base the recognition on the set of master units activated by the image of an object.
The feature hierarchy implicit in these models mimics well the organization of the ventral path
way in the visual system. Although in principle feature-based systems leave the door open for
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the confusion of objects that agree summarily in feature types but differ in the features’ relative
position, scale or orientation, proponents of feature-based methods argue that this loophole can
be stopped with the help of combination-coding units of modest complexity (see, for example,
Rolls and colleagues [14], and Mel [45]).
We proceed on the assumption that although feature-based object recognition may play an
im portant role in our visual system, the ability to establish precise correspondences is nevertheless
a necessary ingredient for visual perception. An unstructured list of features is not an appropriate
representation for object perception beyond mere recognition, failing to support im portant oper
ations such as purposeful scanning of the image. A further problem for the feature-based method,
which requires appropriate sets of feature types tuned to the particular properties of objects to
be distinguished, is the unsolved learning problem. Correspondence-based object recognition, on
the other hand, can proceed immediately on the basis of elementary features such as Gabor-based
wavelets, which can be developed by plausible mechanisms [51, 6]. Correspondence-based object
recognition opens the avenue to one-shot learning of new objects [40], and once an object type
can be recognized, it is possible to select complex feature recognizers as are required for the
feature-based method [74]. We therefore believe that the correspondence-based method is more
fundamental.
Beyond object recognition, the correspondence problem must be solved explicitly in stereo
fusion [33, 13, 42] and motion estimation [29, 65]. Those applications are less general in scope,
though, as the patterns to be matched have small parameter differences and are rather similar to
each other. In distinction, the search space for object recognition is large as im portant differences
in scale, orientation and position are involved, as the correct model must be selected from a possi
bly very large number of competitors, and as images of the same object may differ in illumination,
pose and deformation.
In general, vision is a basis not only for perception, but also for action [23]. In primates, two
separate, but interactive, visual systems have evolved for these two functions: the ventral pathway
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
for the perception of objects, and the dorsal pathway for the control of action on the objects.
Correspondence may be more important for the control of object-oriented action in the dorsal
pathway than pure perception. When planning an action based upon the recognition of an object,
for example, such information as position, pose, or the composition of the object is important.
This information is lost in the pooling operation of feature-based object recognition methods. The
dorsal pathway may also have evolved earlier than the ventral pathway. As pointed out by Goodale
et al., “the origins of vision may be related more to its contribution to the control of action than
to its role in conscious perception, a function which appears to be a relative newcomer on the
evolutionary scene.” [23]. This again indicates that correspondence is a fundamental operation in
2.2 The correspondence problem
If images and models are structured as two-dimensional arrays of feature-specific units (or, in the
case of the model domain, sets of such arrays, one for each object), a convenient starting point
to formulate the correspondence problem is subgraph isomorphism. Two graphs G = {V, E ) and
G' = (V ',E '), where V is a set of vertices and E = {{u,v)\u,v € I^} is a set of edges, or links,
are isomorphic if there exists a one-to-one mapping f : V ^ V , called an isomorphic mapping,
such that {u,v) e i? iff (/(w ),/(u )) 6 E '. A graph Gs — (Vs, Eg) is a subgraph oi G liVg
and Eg C E. The general problem of subgraph isomorphism is to decide if there is a subgraph
of one graph that is isomorphic to (a subgraph of) another. Subgraph isomorphism in its general
basic form is NP-complete [21].
The visual correspondence problem is related to subgraph isomorphism. In case of noise and
deformations, the existence of an isomorphism can not be guaranteed, in which case a relaxed (ap
proximate) definition of isomorphism should be applied. Moreover, the graphs as representation
of images are labelled graphs in that labels (or features) are attached to nodes and edges in G and
12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
G', and a similarity S : {u,u') — »R is defined. The problem becomes labeled graph matching (or
called attributed graph matching). Not only do we need to map the graph structure, we also have
to map the labels. In practical cases, similarity relationships are ambiguous and subject to noise
and loss of information (an element in one structure potentially having many strong similarities
to elements of the other, the correct one in the above sense not necessarily the strongest), so that
on the basis of between-structure similarities alone a unique mapping cannot be defined. Both
aspects, no solution and no unique solution, are typical for ill-posed problems [54]. A unique
mapping is possible by regularization techniques, such as utilizing the continuity constraint to
reinforce linked elements connecting to linked elements. A practical formulation of the corre
spondence problem could be to find a function / that is as close to an isomorphic mapping as
possible, and to optimize the compound similarity by / . Elastic graph matching as we introduced
in chapter 1 is not exactly a subgraph isomorphism, because the graph in the input is allowed to
deform (hence the name “elastic”).
2.3 R epresentation and im plem entation of mappings
2.3.1 Representation of mappings
The representation of a mapping between two graphs can be high-dimensional or low-dirnensional,
defined as follows. Let t and r be node position (indices) of the two patterns, respectively. A
low-dimensional mapping is represented by a function / from nodes in one graph (t) to nodes in
the other (r): r = f{t). The EGM system in figure 1.1 is an example of this form, as each node
in the object layer maps to only one position in the input layer. This representation admits only
mappings that are one-to-one, or many-to-one, implementing the uniqueness constraint (only one
connection per element) explicitly. Other examples of this representation include optical flow [29],
diffeomorphism groups for deformable templates [46], and local linear mappings [4].
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A high-dimensional representation is a connection matrix W = (wtr), specifying the effective
link strength wtr between all {t, r) pairs. The mapping is all-to-all in that it assumes a physical
connection between any r and any t, which is enabled or disabled by positive or zero values of
Wtr, respectively. High-dimensional maps form a superset of low-dimensional maps. Although
in theory a model node does not map to more than one position in the input image, because
of the uniqueness constraint, the ability to represent this possibility gives us a way to handle
uncertainty in the mapping. A high-dimensional map can be readily reduced to a low-dimensional
one by imposing the uniqueness constraint. For example, for each t, set f{t) to be the r that
is most strongly connected to t (f{t) = a.rgma.x^,{wtr'))- f(t) can also be set as some other
function of the distribution Wtr' , like the mean, in which case t might not be mapped to exact
r node positions. Models using this high-dimensional representation include shifter circuits [50],
cooperative algorithms for stereo vision [33, 5, 42], Dynamic Link Matching [80], motion minimal
mapping theory [65], and tensor voting for motion estimation [48].
An example of the two types of mapping representations is shown in figure 2.1 for ID image
and model. Both patterns are of length 128 pixels. We place one node at each pixel, making
r,t = 1, ■ • ■ , 128. (a) is a high-dimensional representation with W a 128 x 128 matrix. In W,
white indicates high weight values, black small values. The values as shown in the figure arc the
feature similarities alone. Ambiguities can be observed as the many high values within a row or a
column, (b) is a low-dimensional representation with f(t) a one-dimensional function. The ideal
map for identical image and model is W = I, where I is the identity matrix, for high-dimensional
map, and f(t) = t (shown in Fig. 2.1(b)) for low-dimensional map.
2.3.2 Brain realization of mappings
The realization of the connections f{t) in low-dimensional mappings requires a data structure
equivalent to pointers able to change their target freely. They are more frequently used in technical
applications and are not supported by known structures in the brain. In contrast, the links of
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
image
IIIHIHil
w=w(t,r)
(a) High dimensional map
Image
11. IIM
r = m
(b) Low dimensional map
Figure 2.1: Representations of a mapping between a ID image and a ID model, (a) high dimen
sional Wtri 2D function; (b) low dimensional /(t), ID function. For identical image and model,
W = I, fit) = t.
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
high-dimensional mappings can be realized in the brain more naturally, and several possibilities
have been proposed.
A conventional way to implement links is by way of neurons th at are specialized to carry
the intended synaptic connections, as proposed in stereo correspondence by Dev [13], M arr and
Poggio [42], and the skeleton filters of Sejnowski [59]. As pointed out by Sejnowski [59], link-
representing neurons would transm it fluctuating signals only if not in high or low saturation,
thus being able to represent dynamic links if appropriately controlled. Seeming difficulties with
neurons-as-links lie with economy and anatomy. It just seems very wasteful to employ whole
neurons as links (which must be much more numerous than neural units), and the anatomy of
cortical neurons, carrying many thousand synaptic connections on dendrites and axon, doesn’t
look as if made just for connecting a few other units.
Another possibility is the gate control of synaptic switching, by way of synaptic terminals
th at are closely apposed to the controlled synapses, either pre-synaptically or post-synaptically,
as in Olshausen et al. control neurons [50], and Hinton’s mapping units [28]. A difficulty with
the control neuron implementation is the open question how information on feature similarities or
on momentary synaptic strength could be transported to the control neurons, unless the connec
tions of control neurons supported bi-directional signal transport. Also, it is difficult to imagine
ontogenetic pathways to the generation of such control neurons.
A Third possibility is the dynamic links, synapses with rapid plasticity th at can switch at
the fast time scale of the “psychological moment” , about 100 milliseconds, as proposed by von
der Malsburg [70], initially as a theoretical solution to the binding problem [68]. The change of
dynamic link weight is called synaptic modulation. Currently this proposal still has little experi
mental support. The shrinking receptive fields modulated by attention (Moran and Desimore [47])
may indicate the mechanism by fast switching synapses. It is difficult to measure synaptic efficacy
change in the time scale of lOOms experimentally, although no contradiction to the dynamic link
has been found.
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.3.3 Search space
In the correspondence problem, all possible mappings form a state space. The task is to find the
best mapping rapidly, where “best” remains to be defined. This is a difficult problem because the
search space can be very large. A good search procedure is a smart way to go through the search
space and find the solution. In a sense, intelligence is search.
Constraints can shape the search space and the dynamics of the search. Constraints can make
the actual search space a low-dimensional submanifold of the biggest mapping space, although
this does not necessarily lead to a better search because the submanifold can be very nonlinear
and hard to handle. For instance, the state space of low-dimensional mappings is a submanifold
of that of the high-dimensional mappings, and the search might be more easily trapped in local
minima in low-dimensional map space.
A search is a trajectory in the state space, defined by the dynamics of the state variables,
which, in the case of the correspondence problem, are links composing a mapping. Link dynamics
can be obtained by top-down or bottom-up methods. Top-down methods derive link dynamics
from a global energy (“objective”) function, which is a good tool to analyze system behavior.
Bottom-up methods formulate link dynamics directly, which is a natural way to represent self
organizing systems. Wiskott et al. showed that, in the context of neural map formation, top-
down and bottom-up methods are equivalent [78]. Both methods depend on the definition and
implementation of constraints, and they work at different levels. We will devote the next two
sections to these methods, with special attention on how readily and automatically constraints
can be introduced, to shape and explore the search space efficiently.
17
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.4 Map organization: top-down constrained optim ization
2.4.1 M ethods overview
In top-down constrained optimization methods, an energy function is defined as a function of
link weight, and link dynamics can be obtained by standard optimization methods [9, 4, 65,
29, 35]. Sometimes the energy function and dynamics are defined over some control variables,
such as in shifter circuits [50], deformable templates [81], or the correlation methods in image
registration [11]. The real correspondence can be calculated directly from these control variables.
2.4.1.1 E nergy function
An energy function is composed of two parts: a measure for mapping quality and some constraints.
Mapping quality measures how close a mapping fits the measured data. It is usually an integration
of the similarity of corresponding features, whose definition depends on the feature type. When
using points as features, the most common mapping quality (or “cost” or “energy”) function is
of quadratic form, measuring the Euclidean distance between features in one image and their
respective corresponding features in the other image. This puts the correspondence problem into
the large class of “ill-posed problems” in early vision where regularization techniques can be
applied [54]. When there are more than one model to be compared simultaneously, an associative
memory-like objective function is usually used, like with shifter circuits [50].
The constraints define the search space. We will see in the next section their role in reducing
the search space. The task of optimization methods is to find a mapping within the constraints
that maximizes the quality measure.
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.4.1.2 D ynam ics
Dynamics are obtained by optimization methods. Some optimization methods guarantee a global
optimum, as the result of some kind of exhaustive search. Examples include dynamic program
ming [49, 22], subgraph isomorphism [64], or an explicit strategy to search through the whole
space like in elastic graph matching [37, 77]. In general, speedup of the search is obtained by re
ducing the number of variables and restricting the domain of these variables, or to make variables
independent, so that optimization amounts to searching each variable individually.
More commonly used optimization methods formulate link dynamics as gradient descent, such
as in Kass, W itkin and Terzopoulos [35] or Bienenstock and von der Malsburg [9], or implicit in the
control variables of the deformable templates of Yullie et al.[81] or shifter circuits of Olshausen
et al.[50]. Euler-Lagrange variation has also been used, e.g., the local linear map of DLM of
Anoishi and K urata [4]). A global optimum is not guaranteed, and optimization techniques such
as simulated annealing can be used to reduce the risk of being trapped to a local minimum.
2.4.2 Reduction of the search space
In constrained optimization methods, reducing the search space means building a better energy
function, such th at the global minimum corresponds to the desired solution, and there are no local
minima. The steeper the landscape, the faster the system converges. Constraints are to bias the
energy of certain mappings, in the following ways;
U niqueness Sometimes the mapping quality function does not have a unique optimum. This is
best demonstrated in the example of optical flow computation (Horn and Schunck [29]). Because
of the aperture problem, only the component of velocity in the gradient direction (normal to
level intensity contour) can be measured. Therefore, any flow that has the correct component on
that direction fits the observed data perfectly well. A smoothness constraint is used to break the
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
symmetry between these infinitely many equally good data fitting fiows, so that a unique solution
can be determined.
C onstraints on high-dim ensional m appings Especially in high-dimensional mappings, de
pending on the definition of the quality function, a solution realizing the optimum might be trivial
and meaningless. For example, if quality is defined as the sum of the similarities of corresponding
features, the mapping with the best score will always be the one that maps every feature in one
image to every feature in the other {wtr = max, for all t,r). The minimal mapping theory of
motion correspondence (Oilman [65]) requires the minimization of the distance between corre
sponding features, in which case the solution is the one with no feature matched {wtr = 0, for all
t,r), reaching the absolute minimum of the cost function. To solve the problem. Oilman intro
duced the cover principle, which is a constraint to minimize the total number of matches while
ensuring that each feature is matched.
R estriction to perm issible transform ations Sometimes the correspondence is known to be
of a specific type, such as being a rigid transformation. The corresponding constraints can be
introduced to reduce the search space to be within that transformation. An example of restricting
a mapping locally to a rotation is given in the local linear mapping version of DLM [4].
2.4.3 Learnability
The difficulty in defining energy function and especially the constraints part can be appreciated
by a simple look at the equations obtained in the referenced examples, such as the deformable
template [81], and the local linear map of DLM [4]. The final energy function is usually very
bulky, and the dynamics of the links requires very complicated computations. Every equation has
to be derived manually. Learning the energy function or the constraints is out of the question.
20
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.5 M ap organization: bottom -up network self-organization
2.5.1 Self-organization in physics and biology
Self-organization, the process by which local interactions between large numbers of elements lead
to complex structures of global order, is ubiquitous in our world, both physical and biological.
Classical examples include crystallization, the Benard liquid, and biological pattern formation (for
a short review, see von der Malsburg [72]).
In the process of self-organization, the system makes a transition from a simple, unstructured
initial state to a final state that has a global order. It starts with random fluctuations, which
are small deviations from the initial homogeneous state, and is governed by laws of three types:
self-amplification of fluctuations, cooperation and competition.
A fundamental difference between organization in biology and physics is that in biology the
underlying dynamics can change with experience. This makes it possible to explain the brain by
applying only basic principles of organization on different time scales, each setting the boundary
condition to the next level.
2.5.2 M ethods overview
2.5.2.1 C onstraints
In bottom-up methods, the usual steps are: study the problem, find the physical constraints,
and encode them into cooperation and competition terms. Such methods almost always use a
high-dimensional representation. Each link is a state variable, and is governed by the dynamics
defined by cooperation and competition from other variables. These methods can be collectively
called cooperative algorithms, or relaxation. If in relaxation one admits symbols that may not
have an order, relaxation is extended to relaxation labeling [31].
Bottom-up models include a large collection of stereo matching algorithms [33, 5, 42, 55, 60],
retinotopy formation [26], DLM [80], and tensor voting [48]. Models differ in their state variables
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
and their way of defining cooperation and competition. For example, in the Dev model of depth
from stereo [5], state variables are the neurons’ firing rates, representing the confidence level that a
point has a given position in 3D space. Neurons coding for neighboring position and similar depth
excite each other, and those for neighboring position but different depth inhibit each other. Arbib
et al. showed that this system does indeed segment the visual input into regions with different
depths. In the theoretical formulation of retinotopy by Haussler and von der Malsburg [26],
state variables are synapses. Cooperation for a synapse comes from its neighboring synapses,
neighboring in terms of positions on retina and tectum. Competition is convergent and divergent
to control the synapses from growing indefinitely and to enforce decision. Tensor voting is a
method for perceptual grouping in computer vision for the inference of structure from sparse,
noisy data in 2D and 3D [43]. When applied to motion correspondence, it uses the smoothness
constraint [48]. It differs from other cooperative methods in that the tensor voting cooperation is
based on a tensor instead of a scalar as in other methods. This lets the voting field contain more
information, and thus no iteration is needed, as was claimed by Medioni et al. [48].
2.5.2.2 Stable state
A very im portant point in self-organization is to show that the attractor of the dynamical system
is the desired solution. In rare cases, this can be done by analytical methods, as with one
dimensional retinotopic projections [26]. But in most cases, we need to do computer simulations
to show that the stable state is indeed what we want. In general, one studies the nature of the
problem, comes up with constraints, and then hopes to approach a solution. A justification of
this process is done in the theory of relaxation labeling processes [31]. By defining consistency of
a labeling, Hummel and Zucker showed that the dynamical system does converge to a consistent
labeling.
It is interesting to note that, although the final state is what counts, people tend to use the
formulation of cooperation and competition constraints, and not the state, to compare different
22
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
models. A notable example is the debate on different stereo algorithms [41, 19, 5], which is about
different forms of the cooperation and competition constraints. On the technological side, the
debate is not on the consequences of these form differences. It is possible that the final states
from different constraints might not be different, or the difference might not affect the constructed
depth. On the biological side, no effort was made to test what form is actually expressed in
biological visual systems.
2.5.3 Reduction of the search space
Self-organization is inherently slow if the interconnections between elements are of only short
range. Intuitively, to form a coherent pattern, two extremities of the pattern have to communicate
to reach a consensus. W ith short range communication only, it takes a long time for the two ends
to even have a chance to talk to each other. Therefore one way to improve the speed is to introduce
longer range interconnections to make distant elements communicate directly.
The speed of convergence has not been given much analytical attention. The general practice
is to show simulation results after a certain number of iterations. Tensor voting claims to be non
iterative, but it actually stops after the first iteration, because the voting field usually is strong
enough for getting the final result [43]. The analjdical tools to study the dynamical behavior of
dynamical systems have to some extent been developed in the context of synergetics [25], pattern
formation [10], and retinotopy [26]. Using the language and methodology of stability analysis (see,
e.g., Haken [25]), a mapping can be seen as a superposition of “modes” , connectivity patterns
that are solutions to linearized dynamic equations and th at one by one preserve their form while
growing or decaying exponentially as dictated by a positive or negative growth coefficient (modes
and growth coefficients being obtained as eigenpatterns and eigenvalues of the linearized dynamic
system). The speed of convergence to the desired organized state is controlled mainly by two
factors: 1. the number of modes with positive eigenvalues, and 2. the real part of the positive
eigenvalues. We will discuss this further in next chapter.
23
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.5.4 Learnability
There has been some work on learning constraints from examples, but only in very preliminary
form. O’Toole provides an associative learning from mappings between disparity data to surface
depth data [52]. Although a good start, this work suffers the monolithicity problem of associative
memory. In random dot stereo correspondence, Qian and Sejnowski constructed network [56]
th at learned the uniqueness and continuity constraints proposed by Marr and Poggio [42], using
recurrent backpropagation learning. The network was constructed so that each unit is connected
only to those units that are in excitatory and inhibitory regions of Marr and Poggio, and learned
that the connections between units in the excitatory region are indeed positive, and those in the
inhibitory region are indeed negative. But the initial network structure make the learning easy.
A more important question is how the system know of the two regions to begin with, and Qian
and Sejnowski consider this as unlikely to be learned.
2.6 Dynam ic Link M atching (DLM)
As a concrete example of the concepts reviewed in this chapter so far, we in this section review
Dynamic Link Matching (DLM), which is an elementary process to match and compare labeled
graphs in the dynamic link architecture. This will also serve as a basis of the extension of DLM
in the rest of this dissertation.
2.6.1 Labeled graphs
In DLM, both image and model are represented by labeled graphs. Node positions are sampled
over the image or model, where the sample array could be as simple as a regular lattice. Each
graph node is labeled with a Gabor jet as feature, and graph edges represent spatial neighborhood
relationship [37, 77].
24
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A Gabor jet is a vector of Gabor wavelet responses. A Gabor function is a sinusoidal plane wave
modulated by a Gaussian envelope. In addition to their convenient mathematical properties [12],
Gabor functions are very good approximations to simple cell receptive fields in the primary visual
cortex [32].
The family of Gabor kernels in the spatial domain are:
= ^ e x p ( - ^ ^ ) ( e x p ( z f c - .f ) - e x p ( - y ) ) (2.1)
where cr = 27t, and = n • n is the squared norm for any vector v. The parameter k determines
the wavelength and orientation of kernel If we use 5 scales, indexed by n G {0, • ■ •, 4} and 8
orientations, indexed by // e {0, • ■ •, 7}, then
where and
To get the feature vectors for a pattern (image or model), we convolve the pattern with the set
of Gabor kernels, and compute the magnitude of each complex Gabor response. The real vector
at each point, of dimension 40 if we have 5 scales and 8 orientations, forms the feature vector at
th at point, the jet.
The feature similarity between two jets J — (oi, ■ • •, a„) and J' = {a[, ■ ■ ■ , a'^) is the normalized
inner product of the two vectors:
5 ( J ,J ') = - ^ ^ ^ (2.3)
The graph edges represent spatial neighborhood within the pattern. Nodes that are spatial
neighbors are connected, and there is no edge between distant nodes.
25
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.6.2 Graph matching
The task of DLM is to establish a connectivity between the two graphs that connect only cor
responding nodes that represent the same scene point. However, this definition is circular, since
finding the corresponding nodes is exactly what DLM is for. The loop can be broken if we define
two nodes to correspond to each other if they have similar features, and if they have common
neighbors. Common neighbor means neighboring nodes are mapped to neighboring nodes, which
imposes the neighborhood preservation constraint. This is a definition of graph similarity. In the
language of self-organizing systems, neighboring links cooperate.
In order to deal with all possible variations of the image, each node in the model has to be
potentially connected to each node in the image. Because preferably nodes with similar fea
tures should be connected, the synaptic weight of a link is initialized with the similarity between
the features. As there might be many accidental links from locally similar image and model
patches, DLM has to rule out most of them to get an approximate one-to-one mapping by utiliz
ing neighborhood information. An example of representation and organization of DLM is shown
in figure 2.2.
2.6.3 Classical neural im plementation of DLM
As introduced in chapter 1, known neural implementations of DLM are based on neural activities,
in which synaptic modulation is controlled through the positive feedback loop between a link’s
weight and the synchronized activity of the neurons at its two ends [80, 36]. This can bo considered
as a Hebbian plasticity at a fast time scale, which puts the system in complete analogy to activity-
dependent retinotopic map formation. There are four basic principles of neural-activity-based
DLM: 1. correlation encodes neighborhood; 2. the two layers are synchronized dynamically; 3.
synchrony Is robust against noise; 4. synchrony structures connectivity.
26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
model Wvet
unage uytf
Figure 2.2: Initial and final connectivity for DLM. Image and model are represented by layers
of 16X17 and 10X10 nodes respectively. Each node is labelled with a local feature indicated by
small texture patterns. Initially, the image layer and the model layer are connected all-to-all
with synaptic weights depending on the feature similarities of the connected nodes, indicated
schematically for one model node by arrows of different line widths. The task of DLM is to select
the correct links and establish a regular one-to-one mapping. We see here the initial connectivity
at t= 0 and the final one at t=10000. Since the connectivity between a model and the image is a
four-dimensional matrix, it is difficult to visualize it in an intuitive way. If the rows of each layer
are concatenated to a vector, top row first, the connectivity matrix becomes two-dimensional.
The image index increases from left to right, the model index from top to bottom. High similarity
values are indicated by black squares. A second way to illustrate the connectivity is the net
display shown at the right. The image layer serves as a canvas on which the model layer is drawn
as a net. Each node corresponds to a model neuron, neighboring neurons are connected by an
edge. The location of the nodes indicates the center of gravity of the projective fields of the model
neurons, considering synaptic weights as physical mass. In order to favor strong links, the masses
are taken to the power of three. (Prom Wiskott and von der Malsburg [80].)
27
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The most im portant ingredient of DLM (or self-organizing systems in general) is the cooper
ation between neighboring elements. The most generic neighboring links are those that connect
neighboring nodes in the image with neighboring nodes in the model (neighborhood being defined
as Euclidean distance). In neural activity-based DLM, the interaction of links is mediated by
the neurons. A way to encode neighborhood in neurons is to let neighboring neurons fire in a
correlated way, while non-neighboring neurons are not correlated. For example, in one extreme
form, activity would take the form of a sequence of blob-like activity patterns. In W iskott and
von der Malsburg [80], running blobs are used, formulated as an autonomous dynamical system.
Intuitively, this is like having a spotlight in the connection matrix within which all links, being
neighbors, are strengthened. The blobs in Konen et al. [36] are static. At each iteration, a blob
is formed in one pattern, and the corresponding blob in the other pattern is calculated in an
algorithmic way. This is claimed to be fast because no dynamic process of blob formation is
needed.
The neural implementation of DLM has many desirable properties. It is intrinsically invariant
to translation and is robust against many other variations, such as deformation, as in all DLM
systems. Face recognition over a gallery of more than 100 faces showed good performance [80].
The recognition rate for 111 faces depth-rotated by 15® is 91.9%. More importantly, the system
requires very little genetic or learned structure, relying essentially on the rules of rapid synaptic
plasticity and the preservation of topography. This makes it a perfect model for infants in that it
can work from day one.
However, although based on biologically plausible operations, two limitations prevent the clas
sical neural implementation of DLM from being a model of object recognition. First, considering
the number of necessary iterations, it is too slow in comparison to the split-second recognition
times in adults. In the simulation of the running blob system [80], each iteration step is one
update of the dynamics of blob activities. Each of these iterations, corresponding to an exchange
of action potentials between neurons, cannot take place in less that a few msec. Link dynamics
28
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is simulated every 200 iteration steps, during which a running blob moves about once over all its
layer, for a model grid of size 10 x 10 and image grid of 16 x 17. The final result can be obtained
after about 2000 units of simulation time. In figure 2.2, the final connection was obtained at
t=10000. In contrast, human object recognition is very fast. Event-related potentials (ERPs) by
Thorpe et al. show that complex natural images can be categorized in under 150ms [63].
Second, although classical DLM treats small transformations as deformations, it cannot deal
with large transformations, e.g., in-plane rotation or changes in scale. This is because jet similar
ities, which serve as initial link weights, are not rotation or scale invariant. In addition, neural
activity can encode only generic neighborhood of a link, which is not specific to large changes in
scale, and has a bias in favor of a mapping with no scale change.
29
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 3
Fast D ynam ic Link M atching by cooperating m aplets
This chapter describes the most important elements to speed up the convergence of Dynamic Link
Matching.
3.1 Link dynam ics
The starting point of our dynamic formulation is a set of coupled differential equations developed
in Haussler and von der Malsburg [26] to describe the ontogenetic development of a retinotopic
map between a one-dimensional model retina of N elements and a similar model tectum, connected
by an X m atrix W = {wtr), each non-negative real number Wtr being the connection weight
between position r in the retina and position t in the tectum. We adapt the model by
• identifying retina with image and tectum with model
• making the transition to two-dimensional domains,
• introducing higher-order links and their interactions, and
• introducing feature similarities.
30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The dynamics of the connections are described by the set of N x N differential equations
Wtr = ftriW) - ^ W tr C £ ft'r { W ) + J 2 ftr '{ W ) ) . (3.1)
£ ' r'
The growth term ftr(W ) of link wtr contains the cooperation from all its neighbors, with positive
rate (3, plus a non-negative synaptic formation rate a;
ftr{W) = a + Pwtr c{t, t', r, r')wt’r', (3.2)
r' yt'
where c{t,t',r,r') is the cooperation coefficient, referred to here as C function. In the original
model, signal correlations caused by local connections within domains and the connections Wtr
between domains control Hebbian synaptic plasticity and thus induce the cooperative interactions
between links. Correspondingly, the C function has the properties of being separable, symmetric,
and shift invariant [26].
The negative term in equation (3.1) expresses competition, between links that diverge from
one point in the source domain (retina or image) and between links th at converge to one point in
the target domain (tectum or model) .
3.2 Linear analysis
The speed of pattern formation can be investigated by linear analysis around the initial state.
W hat counts is the number of modes with positive eigenvalues, and size relation of the real part
of the positive eigenvalues.
In Haussler and von der Malsburg [26] it was assumed that the two one-dimensional domains
had circular boundary conditions, forming rings of units, and a stability analysis was performed.
31
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The system (3.1) has the stationary homogeneous solution W — 1 (wtr—1 for \ft,r ). Linear
expansion around this point gives a system whose eigenvectors are the complex exponentials:
etr = exp{i‘ ^ { k t + Ir)) (3.3)
for fc, / € Zjv- These eigenvectors are products of harmonic functions in t and in r whereas k and
I are frequencies in the two domains, respectively. The eigenvalues of the linearized version of
system (3.1) are:
- a - 1 , k= l=0
—a + (7 ^’* — l )/2 , k = 0,1 ^ 0, or k ^ 0,1 ~ 0 (3-^)
—a + 7 ^’^ , otherwise,
where 7 ^’^ are the eigenvalues of the C function on the same set of eigenfunctions. They are
obtained as Fourier transform of the C function and are real numbers. The eigenfunctions (3.3),
also called linear modes of the linearized version of (3.1), have amplitudes that grow or decay
exponentially in time, with the eigenvalues as growth coefficients.
The decisive quantities determining the speed of growth and convergence of the system are
the margins by which the eigenvalues of the desirable modes outdistance those of the unwanted
modes. Quantity a is a control parameter, which can be chosen so that the desired modes
have positive eigenvalues, leading to growth, and the unwanted modes have negative eigenvalues,
leading to decay. The bigger the margin, the larger the possible growth or decay rate, and the
faster convergence.
W ith natural link interactions induced by correlation-controlled synaptic plasticity there is
only one choice in shaping the C function, controlling its width (and correspondingly that of the
distribution of eigenvalues, the width of which is related inversely). Due to the symmetry of the
system, there are four modes that have the same maximal eigenvalue. They correspond to the
32
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
lowest positive frequency, k, I — ±1, the matrix Wtr taking the form of broad diagonals of either
orientation, each with one of two positions (phases cos or sin). In order to develop a mapping,
the system must break the symmetry between those modes, which, as shown in [26], happens on
the basis of the non-linearities inherent in (3.1), but is a slow process.
In figure 3.1 we plot an isotropic C function, as given by natural link interactions, and the
corresponding eigenvalues for a = 0, sorted in descending order. The C function c{t, t \ r, r') =
c{t — — r') is a 2D Gaussian with standard deviation in the two axes a\ = G 2 = 3, for the
ID t and r. The eigenvalues are sorted in descending order in a one-dimensional index (only the
first 40 largest ones are shown). The eigenvalues for the maximal phase pairs of either diagonal
orientation are equal (the difference — diff, or margin — is 0).
The symmetry between the two map orientations cannot be broken by an isotropic C function.
This problem will get much worse in the realistic case of two-dimensional domains with open
boundary conditions, where many mappings of a continuous variety of relative orientations, scales
and positions are competing with each other. It would therefore be of great advantage if the
link interactions were selectively restricted to such sets of links that are consistent with each
other in terms of map parameters. To show the effect we plot in figure 3.2 a C function that by
its shape already favors a diagonal of one of the two orientations. It is an elongated Gaussian
with (T i = 9, (T 2 = 1- Its long axis has slope 1, meaning it is not separable. The figure also
shows the largest eigenvalues. This time the difference between the largest phase pair, belonging
to the favored diagonal orientation, and the second-largest pair of eigenvalues, belonging to the
dis-favored orientation, leads to immediate strong map development.
The advantage of a specific C function on the speed of convergence is consistent with the
intuition from bottom-up network self-organization in section 2.5, saying that the system converges
faster if there are longer range interactions between links. We will see in the next sections how to
achieve specific C functions.
33
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
-»r-r'
---------------------- L -------------------
C as function of A t and A r of interacting links
diff=0.000000
(0 0.6
( 0
> 0.5
® 0.4
ind ex
Eigenvalues
Figure 3.1: Isotropic C function and eigenvalues (a) C function c{t — t',r — r') is a Gaussian
with — (T 2 = 3. The total size of C is 32 x 32. (b) Eigenvalues A * ^ ’ ^ for a = 0, sorted in
descending order in a ID index. Only the first 40 largest ones are shown. The eigenvalues for the
maximal phase pairs of either diagonal orientation are equal (diff=0).
34
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
- ^ r - r
c (t-t',r-r')
4 .
t - t '
C as function of A t and A r of interacting links
diff=0.104730
> 0.5
® 0.4
index
Eigenvalues
Figure 3.2: Non-isotropic C function and eigenvalues Labels are the same as in fig. 3.1,
except th at C function c{t — t',r — r') is a Gaussian with = 9, and ct2 — 1- Note that in
distinction to the isotropic C of fig. 3.1 there now is a positive diff.
35
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.3 M aplets
Guided by the heuristics of the last section we now introduce formalism to describe the interaction
of groups of links that are spatial neighbors and are consistent with each other in terms of map
parameters. We call those groups maplets (previously [82, 83] we called them control units but
find the term maplet more suggestive). A maplet corresponds to a mapping between a small disc
in the image domain and a small disk in the model domain, see Fig. 3.4. It controls the growth
of a number of individual links and in turn is driven by the links in that control region, like a
template. From a self-organizing system point of view, the maplets are specifying the interaction
between links and determine system dynamics and the global patterns that can emerge.
3.3.1 Formulation of maplet function
A maplet is represented by a function where p is a scale and orientation index and T
and R refer to the positions of the centers of the two disks mapped to each other. The domain
of r) defines a pool of individual links (that is, (t, r) pairs) that the maplet controls. This
domain is compact and its size controls the amount of map deformation that will be tolerated.
r) is formulated explicitly for one-dimensional image and model as follows. For ID t
and r, the disks become line segments, and, if the compactness is imposed by letting the function
decrease from its center (T, R), the overall shape of a maplet is a 2D Gaussian with mean Hc =
(T, R y , and covariance matrix S
K^R{t,r) =Af{fj.c,'^= - pr)} (3.5)
Here, Z is a normalization factor such that Ylt ~ matrix Ehas the two
eigenvalues (T l > c T c. Its eigenvector corresponding to al has the direction defined by the scale
parameter p (which for one-dimensional domains is the only mapping parameter besides the
relative translation, which is set by (T,R)). If p corresponds to a model-image scale of 1, the
36
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
major axis has slope 1; for a model-image scale ratio of 0.5 octave, the slope is arctan (;^ ).
Examples of maplets with these two different p values are shown in figure 3.3. They all have
(T c = 1, and < T £ , = 3, the slope being the only difference. (7^ is a parameter to control the
specificity of a maplet to its parameters. The function is broadly tuned if C T c is large.
r -> r
p= same scale
t
R
p= scale diff. 0.5 octave
T
t
R
(a) (b)
Figure 3.3: Maplets for ID patterns, with different scale parameter p. az, = 3, < 7c = 1. (a) model
and image are of the same scale, (b) image and model have a scale difference of 0.5 octave.
The generalization of maplet formulation to two dimensions is straightforward. The rule is, for
param etersp, T and R, to find the to which a given t should map, and punish the deviation
of r from the correct target point, by a Gaussian with standard deviation ac- Four examples of
2D maplets are shown in figure 3.4, corresponding to different parameters: (a) identical image
and model; (b) different shift; (c) different rotation; (d) different size. In each case, the two planes
correspond to model and image. The tiny circles indicate node positions, and large circles the
disks connected by the maplets.
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(a)
(b)
(c) (d)
Figure 3.4; Example of maplets between two-dimensional domains. Each small figure shows a
maplet, with different shift, scale, and orientation parameters, (a) identical image and model;
(b) different shift; (c) different rotation; (d) different size. In each small figure, the upper layer
is the model domain, and the lower layer is the image domain. Each tiny circle indicates a node
position, and large circles are the disks connected by the maplets. For simplicity, in (a)-(c), the
maplets have U c = 0, namely only and not its Gaussian neighbors, is linked to t. In
(d), however, we show the mapping of the node in the middle to several neighboring nodes in the
image as it cannot map to exact node positions because of the size change.
38
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.3.2 Interactions between maplets
Maplets cooperate if they are neighbors in terms of spatial location and transformation parameters.
The cooperation coefficient s{p,p',T,T', R, R') between maplets and falls off with
decreasing likelihood of their being compatible in terms of the mapping. We have modelled it as
a Gaussian falling off with variance as a function of the distance between p and p' and with
variance as function of the Euclidean distance between (T, i?) and {T',Rf):
= cx p ( - % ? 2 -
where d{p,p') is the distance between the two transformation parameters p and p'. d{p,p') = 0 iff
p = p'.
Figure 3.5 is a schematic illustration of the interactions between maplets. Each rnaplet is
represented by an ellipse (one contour in the plot of fig. 3.3), and the figure shows the cooperation
strength to the filled maplet. The solid line ellipses have the same p parameter as the filled maplet,
and the dashed line ellipses are of different p. The number of + signs is an indicator of interaction
strength. This interaction pattern looks very much like the control neuron interactions in shifter
circuits of Olshausen et al.[50] and, if we ignore the detailed interpretation, the association field
in contour integration (Field et al.[16]). A difference is that we have only cooperation and no
competition. (Leaving out maplet competition simplifies formulation and learning. Competition
is taken care of as part ot the link dynamics.)
3.3.3 Link growth via maplets
Maplets provide a specific way of calculating the growth term ftr{W) for links. The system does
the following three steps to compute an ftr(W) replacing the original form of equation (3.2):
39
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
\
Figure 3.5: Maplet interactions, schematic for the one-dimensional case. Each ellipse represents
a maplet, and the number of + signs indicates its strength of connections to the filled unit.
1. C om pute m aplet input from momentcu-y link strengths. The direct excitation
the maplet with parameters p, T, R receives from elementary links Wtr is:
^TR
(3.7)
2. G ather cooperation betw een m aplets. Analogously to equation (3.2), the effective
strength of maplet (p, T, R) is influenced cooperatively by other maplets and itself
according to
“ ' (3.8) = a + Y , s{p,p',T,T',R,R')oi
p ',R ',T '
T 'R '
where /? is a positive rate and the non-negative term a gives maplets a chance to be active
even without any link support. The cooperation coefficient s{p,p',T,T', R, R') reflects the
mutual consistency between maplets {p,T,R) and {p',T',R'), as defined in equation (3.6).
40
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3. Feedback from m aplets to links. The growth coefficient for a link, /tr(kF), is influenced
by the maplet with maximal effective strength from among those whose domains cover the
link:
ftriW) = k f r l R M l R . it, r)S f:, (3.9)
where fc is a coefficient, — ^^p,T,R{ftr\ domain of includes (t,r)} with
{p*, R*, T*} being the parameter set for which the maximum is attained and Sf* a feature
similarity of jets at position t and r with respect to transformation p* as defined in the next
section. (In a fully dynamic formulation the maximum function would have to be replaced
by some winner-take-all mechanism.)
3.3.4 M aplets can be learned
Maplets represent local synapto-synaptic connectivity patterns between synapses. For some pat
tern variations, such as in-plane rotation and size, there is a closed form function and we can
formulate r) analytically. But for other variations, like rotation in depth (for which map
ping deformations depend on object structure), there can be no closed formulation. Therefore,
in order to have an extendable system that can deal with more general variations it is important
that local connection patterns can be learned from examples.
3.4 How m aplets speed up convergence
Maplets can recognize local significant connectivity patterns conforming to a specific transforma
tion, and spread out this local order quickly like a brushfire, so the global order can be set up
fast.
Compared with the classical DLM implementation, maplets extend the neighborhood concept
to the transformation space, making specific long-range synapto-synaptic interactions possible.
Single synapses are ambiguous as to transformation parameters, therefore they can only have
41
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
unspecific cooperative interactions. But groups of links, integrated with each other through
maplets, can avoid this ambiguity, and therefore can have fairly specific long-range cooperative
interactions.
A stability analysis is difficult to perform because of the nonlinearity especially in the max
operator in equation (3.9). We can nonetheless do an approximation. Assuming undisturbed
feature similarities between model and image, the max operator will pick up the correct maplets,
leading to the equivalent of a specific C function. Assuming the specific C is an elongated and
oriented Gaussian, as can be seen in figure 3.2 with comparison of figure 3.1, the system can evolve
much faster. Around the final state, the eigenvalues drop off more steeply from the maximum, as
the result of the specificity of the C function, see figure 3.2.
42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 4
The m odel
4.1 The system
As our system is focused on map formation, its most im portant elements are the initial value and
the dynamics. Recognition is a simple step once a map has been formed to all models.
4.1.1 Input and Output
For map formation, the inputs are two grey-level patterns, in ID or 2D. The goal is to create a
mapping by which the correspondence points in these two patterns are connected. The output
is the mapping, represented as a connection matrix, which is 4D for 2D patterns, and 2D for ID
patterns. High m atrix elements indicate the presence of a connection and 0 means there is no
connection.
For object recognition, the inputs are a model gallery and a probe images taken from a probe
gallery. The output is the recognized model for each probe image.
4.1.2 Feature similarity
Features are Gabor jets as in all previous DLM systems [37, 80], described in section 2.6.1. As
is standard, 5 scales and 8 orientations are used for 2D patterns, and 5 scales are used for ID
43
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
patterns. Jets are obtained from the node positions of regular sample grids on the image and
model. In both cases, only the magnitudes (and not the phases) of Gabor are used.
The jet similarity measure used in previous systems [37, 80] will run into difEculties when we
deal with large variations in scale and orientation because the similarity depends on the relative
scale and orientation between the two jets. This problem is solved by expanding the similarity
definition. Let fp{i) be the function that maps corresponding elements in two jets with scale
and orientation difference p. The feature similarity J, J ') between jet J = (ui, a2, • • ■ , «n) at
position t in the model domain and J' — {a[, a'2 , ■ ■ ■ , a'^) at position r in the image domain, with
respect to scale and orientation difference p, is defined as
= = (4.1)
In the case of scale difference, summations run only over those elements of the jets that do find a
corresponding element in the other.
The single number similarity Str between these two jets is obtained by taking the maximum
over a range of interest in scale and orientation:
Str = max S ^ . (4.2)
p
The similarity matrices computed for one-dimensional patterns are very ambiguous, making it
difficult or impossible to extract correct correspondences. We reduced that problem by using three
gray-level patterns instead of one for both image and model, computing similarities for feature
vectors of length 3 x 5 = 15.
44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1.3 Initial link weights
The initial link weights are set equal to the feature similarities of the pairs of corresponding jets,
since without further knowledge nodes with similar features should be connected:
W t r ( O ) = Str ( 4 . 3 )
where Str is the similarity defined in equation ( 4 . 2 ) .
4.1.4 Dynam ics
The continuous dynamics is simulated by the Euler method, which is a method to solve differ
ential equations numerically by discretizing time, and iteratively solving the obtained difference
equations. At each iteration, two steps are performed:
G row th term ftr(W): Compute the growth term for each link as mediated by maplet activity,
using the three steps given in Section 3 . 3 . 3 . On the first iteration to compute the activity of
maplets in equation ( 3 . 7 ) , we use, however, the parameter-dependent similarity (Sfr) instead
of WtriO)-
D ynam ics: Update link weights according to:
Wtr = ftriW) - j^WtriY, fM W ))- ( 4 . 4 )
t '
In deviation from equation ( 3 . 1 ) we here use only a divergent competition term, involving
the average of the growth terms of the links that diverge from one image unit. Leaving out
convergent competition on model units allows them to link with multiple image units, which
is necessary because the number of units in the image domain is usually larger than that in
the model domain.
Each iteration step results in an updated mapping.
4 5
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1.5 Low-dimensional maps from high-dimensional ones
After a given number of iteration steps, the mapping Wtr may still have a number of links going
into each of the image units. For purposes of map display in two dimensions as in fig. 4.3 and
calculation of a model-to-image similarity it is desirable to determine a low-dimensional map,
giving a unique r{t) for a given t. We compute it as a weighted average of the projected position:
whore a is a positive number. For a = oo, r{t) = argmax,,/(wtr')•
4.1.6 Image similarity from a mapping
The similarity between an image and a model as mapped by ^Vtr can be computed as follows.
First, a one-to-one mapping is computed by using equation (4.5), so that for each t we have an
r{t). Then the optimal orientation and scale p* at each link is estimated from the maplets, as the
average of orientation and scale of all maplets containing this link in their domain, weighted by the
maplets’ activities. The similarity value is then the sum of the jet similarities of all corresponding
point pairs:
, = y s f . . = y ^46)
where is the jet similarity between jets J{r(t)) = (ai(r(t)), • ■ • ,a„(r(t))), at position r{t)
in the image, and J'{t) = position t in the model, with relative scale and
orientation p*.
4.1.7 Recognition
For object recognition, the following steps are performed for each image in the probe gallery:
46
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
M ap creation For the image and each model in the gallery individually the following steps are
performed:
• Initialize a mapping by equation (4.3).
• Update the mapping by equation (4.4), for a predefined number of iterations.
• Compute the image similarity under the mapping using equation (4.6).
R ecognition The recognized model is picked as the one with the maximum image similarity.
4.2 Experim ents
Our simulations of the system show that correct continuous mappings are formed quickly, indepen
dently (within bounds) of variation in relative scale, orientation and position. Some experimental
results are shown in this section.
4.2.1 Map formation: one-dimensional patterns
Figure 4.1 shows an example of one-dimensional input patterns. For convenience of viewing the
patterns are extended in the vertical direction. Their length is 128 ID pixels. Relative local scale
is non-uniform, representing deformation. Features are computed as ID Gabor wavelet responses
of three independent patterns of equal properties at a time, as explained in section 4.1.2. There
are n = 5 different scales, differing by 0.5 octaves from each other.
Figure 4.2 shows the evolution of the system with input patterns as in figure 4.1. The first row
shows synaptic weights, the second maplet activities. The image coordinate runs horizontally and
that of the model vertically. There are three groups of maplets corresponding to three relative
scales. The columns indicate different iteration steps (initial, showing the feature similarities,
after 10 iterations and after 20 iterations). The transformation parameters of the map can be
47
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
easily recognized by visual inspection from the activity of the maplets, by just paying attention
to coherent chains of them.
image
model
Figure 4.1: ID Input patterns: image and model. For convenience of viewing the patterns are
extended in the vertical direction.
initial iteration=10 iteration=20
synapse
maplets ' 1
Figure 4.2: Map formation for ID patterns in fig. 4.1: the evolution of synaptic weights and
maplet activities. Top row, synaptic weights Wtr- Bottom row, maplet activity o. High values are
shown in white, low values in black.
For the maplet function, in equation 3.5, the Gaussians have variance = 9 and U c = 1.
Their domain size in the t dimension is taken to be 13, and in the r dimension it varies depending
on the scale.
The performance is not sensitive to the exact values of parameters in the system dynamics.
In the experiments, both with one and two dimensions, we just set them to some reasonable
value, and the system worked fine without any need to tune them. The values used are shown in
table 4.2.
4.2.2 Map formation: two-dimensional patterns
The creation of a mapping between an image and a rotated copy is shown in figure 4.3. The first
row shows model and image. The second row shows a regular grid on the model and its mapping
to the image domain at different time steps. The graph shows where the model grid is mapped to
in the image, and is calculated by first computing the corresponding point for each node in the
48
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
grid, as in equation 4.5, then connecting points whose corresponding grid nodes are connected. It
can be seen that the grid is rotated along with the image, as it should. The sampling of maplets
in the transformation domain is 1 sample in scale (same size) and 3 samples in orientation (0,
7 t/ 6, and t t /S).
model
model grid
image
initial iteration=10 iteration=20
Figure 4.3: An example of map formation for 2D patterns. The first row shows model and image.
The second row shows a regular grid on the model, and its mapping to the image domain at
different iteration steps.
4.2.3 2D face recognition
4.2.3.1 G alleries
Our experiments are based on the FERET database [53, 77]. There are two groups of images for
the same set of persons: frontal view (fa), and frontal view with different facial expression (fb).
The image size is 128 x 128 pixels. To test the robustness of our system to deformations, we also
tested the Bochum database [80], where the two groups of faces are frontal (faO) and rotated in
depth by 30° (hr4).
In order to test large variance in scale and rotation, we transform fb images into “Tfb” images.
The transformation is a change in scale and an in-plane rotation around the center of the image.
49
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The transformed images are then reformatted as 128 x 128 images, cropping the portion outside
of this square region, and filling in empty margins by extending grey levels from the boundaries
of the transformed image outward. We get two sets of Tfb images with different transformation
range. In Tfb-large, the new size is in the size range [77,128] pixels on a side and rotation angle in
the range [— 30°, 30°]. In Tfb-small, the new size range is [110,128] pixels and the rotation angle
range [— 9°, 9°]. Both size and rotation are chosen randomly and uniformly from their respective
range. Several images from the fa and Tfb-large galleries are shown in figure 4.4. The first row is
from the fa gallery, and the second row is from the Tfb-large gallery.
A t J ^
Figure 4.4: Images for face recognition. First row: fa gallery; Second row: Tfb-large gallery.
In all experiments, the recognition is between a model gallery and a probe gallery. The model
gallery is composed of the original fa or faO images. There are 124 images in the galleries fa and
Tfb, 110 in galleries faO and hr4.
4.2.3.2 R e su lts
All results were obtained after three iterations of map formation. A summary of the recognition
rates is shown in table 4.1, and the parameters used in table 4.2.
T est 1 This is a basic test of the system. The model gallery is fa and the probe gallery Tfb-large.
Maplets sample the transformation domain at relative orientation values (0, O.It t , — O.It t ) and at
relative scale (image-to-model) values (v ^, 1, l/\/2 ). The recognition rate is 85% (=106/124).
50
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
No trend was found in the recognition errors, but even in recognition failures, the estimated
transformation parameters are usually correct.
T est 2 W ith test 2 we compare our system’s performance with that of other face recognition
systems. These systems didn’t explicitly permit changes in scale and orientation. We accordingly
switched off our search in scale and orientation by using only maplets with relative orientation
0 and scale 1. As model gallery we still used fa, and as probe gallery Tfb-small. We obtained a
recognition rate of 96.8% (=120/124). This is higher than in Test 1, as is to be expected given
the much smaller perm itted range of variation. The result is not significantly different from the
recognition rate with elastic bunch graph matching [77], where in one experiment with 250 fa
against 250 fb images (also from the FERET database) the recognition rate was
T est 3 To explore the robustness of our system against deformations we tested it with faces
rotated in depth. The model gallery this time is faO with probe gallery hr4. We used only one
set of parameters for the maplets as in Test 2. The recognition rate is 93.6% (=103/110). This
compares very favorably with the DLM system [80], where a recognition rate of 66.4% (=73/110)
was obtained with the same galleries.
test model probe size recognition rate other systems
1 fa Tfb-large 124 85%
2 fa Tfb-small 124 96.8% 98% ([77])
3 faO hr4 110 93.6% 66.4% ([80]
Table 4.1; Face recognition rates
51
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
parameter value notes
System Dynamics a 0.01 Eq.(3.8)
P
1 Eq.(3.8)
k 50 Eq.(3.9)
time step 0.5 Euler method
o-K
1 Eq.(3.6)
C T w
10 Eq.(3.6)
maplets
O c
1 Eq.(3.5)
ctl size t dim. 5 x 5
ctl size r dim. varies
a 32 Eq.(4.5)
Feature sample
in space
model 8 x 8 regular grid center 108 x 108
image 14 X 14 regular grid whole image
Table 4.2; Parameters in map formation and face recognition
52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5
Learning m aplet connections
The ability to learn maplets is very important for coping with certain variations such as rotation in
depth, and also as ontogenetic basis for maplet formation. In this chapter, we formulate learning
rules and present simple pilot experiments on learning ID shift invariance as a first demonstration
of the learning principles.
5.1 The learning system
5.1.1 Learning examples
The idea of this learning system is as follows. A baby can afford to stare at an object for several
seconds before recognizing it. But once a connectivity pattern is formed, it should leave memory
trace in the form of altered link interaction patterns, like in maplets, so that the convergence of
future map formation is accelerated. The examples by which this learning process is guided are
these connectivity patterns, each of which is represented by a synaptic matrix w {t,r).
Ideally, the examples for learning shift invariance, for instance, are mappings formed between
image-model pairs with variable relative shift, while other transformation parameters such as scale
are constant. In theory, these w{t, r)a are to be obtained by slow classical DLM. However, we also
know that classical DLM can not deal with large transformations. A solution to this apparent
5 3
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
logic flaw is a mechanism to compose smaller transformations, which classical DLM is capable to
handle, into big transformations. This can be a continuous mapping creation between the model
and a sequence of continuously moving images. The mappings for two consecutive images differ
only slightly. After one mapping is formed, the next one can be obtained by a small modification.
After a sequence, a mapping with large parameter change can be formed.
5.1.2 System structure
Although it would only be realistic to assume that the distribution of the sizes or the center
locations of developing maplets will turn out to be random as a result of competitive learning, we
here consider a simplified and regular structure.
The domain of mappings is divided into small regions called control regions, indexed by s,
with membership function A s{t,r) (A s(t,r)= l inside region s, =0 outside). Different regions can
overlap with one another. The membership function is to make sure that each maplet corresponds
to only a local group of links. For each s there is a set of n maplets, with control functions
Kgiit, r), f = 1, ■ • •, n. We need multiple maplets over a control region for different transformation
parameters. Comparing notations in chapter 3, we can see that Ksi{t, r) corresponds to r),
and Sc{s,i,s',i') corresponds to s (T ,T ', R, R ',p ,p '), where s is the index of control center TR ,
and i corresponds to the index p.
The response psi of unit K si(t,r) to input w {t,r) is:
where is the connection strength between maplets K^i and Ks'i'-
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
sn
Ks’n
w (t,r)
Figure 5.1: Structure of the learning system. Two control regions, s and s', are shown on top of
the weight matrix w{t, r) for ID patterns, For each control region, there are n maplets to learn
different control parameters.
5.1.3 Learning rules
5.1.3.1 M aplet functions
The initial values of the control functions K s i { t , r ) are random numbers within their control
regions, 0 otherwise.
For each input mapping w {t,r), control functions are updated according to:
A K si{t,r) = acA s{t,r)iv{t,r)yli (5.2)
where is the learning rate, and
Vsi =
ysi if argm axj(ysj) = z
0 otherwise
is winner-take-all among maplets sharing the same control region.
In order to avoid uncontrolled growth, this update is followed by a normalization:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
’ y^^K si(t,r) = const. (5.3)
t,r
The learning rule (5.2) for control function is like Hebbian learning of receptive fields of
orientation sensitive cells [67]. Control function K si{t,r) can be viewed as the receptive field of
maplet si. w {t,r) is the input, and y*^ is its output. The factor A s{t,r) serves just to keep the
control function within its control region.
5.1.3.2 C onnection betw een m aplets
The initial value of the connection Sc{s, i, s', i') between maplets Kgi and Ks'i' is set to 0. It is
also learned by a Hebbian rule:
Asc(s, i, s', i') = PcVliVl'i' > (5-4)
where (3 c is the learning rate. This is also followed by normalization:
Sc(s, i, s', i') + ^ Sc{s, i, s', i') = const. (5.5)
5.1.3.3 T he system
In summary, learning takes place in the following steps:
• Initialization
• For each training example w{t, r)
— compute maplet responses t/sj (equation (5.1))
— u p d a te c o n tro l fu n c tio n s K s i ( t , r ) (e q u a tio n s (5.2) a n d (5.3) )
— update connections Sc{s,i,s',i') (equations (5.4) and (5.5))
56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.2 Experim ents
The development of maplets and their connections is simulated in an example in which the goal
is to learn translation invariance for ID patterns. In the simulation, the weight m atrix domain is
of size 20 X 20. It is divided into 4 x 4 (s = 1, • ■ ■ , 16) small regions, each of which has size 5x5.
In each small region there are 9 maplets (n = 9), enough to cover all possible shifts. Different
scales are learned separately, meaning all examples have the same and fixed scale in each run.
Learning examples are presented sequentially, and the system learns according to the procedure
in section 5.1.3.3. We use the normalization constant 1 for control functions, and 0.2 for maplet
connections, ck c = 0.1, (3 c — 0.05.
The learning examples are artificially generated mappings, as well as mappings formed in the
classical DLM system.
5.2.1 Learning from synthesized mappings
In the first set of experiments, each learning example is a synthesized synaptic m atrix w {t,r)
corresponding to a mapping between an image and a model with variable relative shift and a
constant relative scale. Matrix elements have value 1 along a straight line, whose slope depends on
the scale difference, and 0 elsewhere. Some examples are shown in figure 5.2(a) for identical image-
model size and in figure 5.3(a) for a size difference of 0.5 octave. Variable r runs horizontally and
t vertically. The position of the line corresponds to the translation parameter, which is generated
randomly and uniformly over all possible translations. Noise in the mapping is simulated by
randomly setting elements next to the ideal line to value 1, with a probability of 10% in these
simulations.
The learned maplets are shown in figures 5.2 and 5.3. In both figures, (b) shows the initial
random maplet functions, identical in these two figures, (c) the learned maplet functions after
1000 learning examples as in their respective panel (a): 5.2(c) involves image and model of the
57
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
same scale, and 5.3(c) corresponds to a scale difference of 0.5 octave. W hat is plotted are only the
maplet functions within their respective control regions. Each small block represents one maplet.
In the horizontal direction runs s, the index of the control region in the synaptic matrix, and in
the vertical direction runs i, the index of maplets sharing the same control region. We can see
that the learned functions have reasonable shapes, resembling those designed by hand previously
(figure 3.3, with larger and smaller cT c). In addition, it seems th at maplets which share the
same control center learned different translation parameters, which often span the whole range of
possible translations. But it can also be noticed that even after 1000 training examples it is still
possible to have a few uncommitted slots, i.e., maplets with unstructured functions. This reflects
the well-known ’’dead unit” problem in competitive learning [27].
The learned interconnections between maplets are evaluated in the following way. For any
maplet si, compute the summation of all maplets, including itself, weighted by their connection
strength to unit si: S c{s,i, s',i')K sii'{t,r). This can be viewed as compound receptive field
of maplets. By visual inspection, in most instances, the reconstructed result resembles one of the
training examples, i.e., a straight line with appropriate slope, meaning that all maplets with the
same translation are connected, while those with different translation are not.
5.2.2 Learning from classical DLM equilibrium
Eventually the examples by which this learning process is guided should be the established map
pings formed in systems with naturally given simple interactions, like the classical DLM. Specifi
cally, the input is two patterns, an image and a model, with the image a transformed (translation,
scale, rotation) version of the model. Through dynamics (3.1) and (3.2), with isotropic C function,
a mapping between them is established and is taken as input to the learning system.
In the simulation, models and images are ID patterns of length 20. They are random patterns
with m = 10 feature types with labels 1 to m. Images are shifted versions of their corresponding
models. We use wrap around shift to avoid the complications of boundary effects. A certain
58
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
m ^ m m m m m w rn ^ M m ^ m m m
^m^mmM ^kmmmmKm
m m w . m m m m m m m m \ ^ m m
m^mmmnmmmmmmm
HKSigiennieaGSiiasaie
nB^iSKBBBieKSiaiiBien
K nvaeiasiiisnB B H iK iis
m m m m m ^ ^ m m m m m m M m ^
0 iia !ia ie a a K S » a K !i 0 s
(c)
Figure 5.2: Learning maplets for translation invariance: identical image and model, (a) Examples
of synthesized mappings {w{t,r)) as learning input, with shift parameters 0, -10, -6, and 5, from
left to right, (b) random initial values of maplet functions, (c) learned maplet functions after
1000 iterations. Each small block represents one maplet. All maplets in the same column share
the same control region (same s, different i). Columns are of different s.
59
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
m ^ m m m m m ^ M s m ^ m m m ^
m m ^ ^ m m m m s m m m m m m m
m ^ m ^ m m m m m K m E ^ M M m
^ m m M M m m m n m m m m m m
m m ^ ^ ^ m m T m m m i ^ m m m m m
m wm m m m ^Am nm is^m m m
vifiiASsaiissiiieiHiiEa
&^nmmmmm^mmMmm
(c)
Figure 5.3: Learning maplets for translation invariance: image and model have a scale difference
of 0.5 octave. Legend is the same as in Fig.5.2. (a) Samples of synthesized mappings {w{t,r))
as learning input, with shift parameters -8, 0, 5, and 9. (b) initial values of control functions,
identical to those in Fig.5.2(b). (c) learned maplet functions after 1000 iterations.
60
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
amount of noise is also added to the feature values in the image. The noise is 10%, meaning it is
randomly chosen in [— 0.1m, 0.1 m].
The similarity between features i and j defined as:
(5-6)
The feature similarity of a model-image pair is used as the initial value of the weight dynamics.
As in section 4.1.2, an overlay of three patterns is used at a time for image and model. In the
dynamics (3.1) and (3.2) the C function is a 2D Gaussian c{t — — r') with size 7x7, isotropic
with (T i = ct2 = 1. These equations are simulated by the Euler method, and the time step is set
to 0.25.
Some input examples are shown in figure 5.4. Image and model are of the same scale. Different
shifts are shown in rows. Column (a) shows model and image pairs, (b) is their feature similarity,
serving as the initial value of the weight dynamics. The rightmost column, (c), is the equilibrium
of the dynamical system, which we simply set to be the weight matrix after 100 iterations. These
final weight matrices are used as inputs to learn maplets and their connections.
Learning is the same procedure as in the synthetic simulation. The learned results are shown
in figure 5.5. 5.5(a) is the initial random maplet functions, 5.5(b) is the learned functions after
100 learning examples. This time the learned maplets are much cleaner, presumably because the
cleaner weight matrices as input.
61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
M f
I 1 1
Figure 5.4: Learning examples. Columns (a): Input model and image pairs; (b): Feature similarity
as the initial value of the weight matrix; (c). The equilibrium of the weight dynamics (after 100
iterations). Image and model are of the same scale, with different shifts. Rows 1: shift=0; 2:
shift=5; 3: shift=-3; 4: shift=8.
62
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 2 .4 1 : '■^c- r i ' . i i ’j* ''
■ i v y . . . ^ ' , . v « ; ? * , . i ^ k ; y ^ 2 * A . ' ; t
; i = - / f i . - ^ r . y : i 5 k ^ s : 3 1 i ? ^ . t S ^ . . . i r ,
' ' V'c,
* V ^ 'jf , 1 - i i - 5.1»A ¥ r .
(a)
■31012 i2aie!iB ainaB B 0a
HBBBBBBKI2B3IBBKB0
aBBaBBB0B3IK0BI2BB
■ K B 00B 0B ill2B B B aB il
■aifiBKIfiBB3IBBB03ll2l2
KBBBBHKKSBBKSISKB
3IH3IB3l0B3inBBKKBa3l
0 i2 a K B B a a B 0 0 a a B B B
B0B3iaa3IBaKKBBB3IB
(h)
Figure 5.5: Learning maplets for translation invariance, using learning examples from classical
DLM equilibrium (Fig.5.4) (a) random initial values of maplet functions, (b) learned maplet
functions after 100 iterations. Legend is the same as in Fig.5.2.
63
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 6
Discussion
If, as claimed by von der Malsburg [68], the data format of the brain is a dynamic graph containing
link variables in addition to node variables, it is an important scientific task to work out the
mechanisms by which these additional variables are organized on the psychological time scale of
fractions of a second. This dissertation aspires to contribute to that task. We take as paradigm the
correspondence problem in its application to object recognition. The general ability to establish
correspondence between structures may well be central to the function of the brain, a conclusion
well supported by the important role search procedures play in the field of artificial intelligence.
In this chapter, we will discuss several issues regarding our approach and related work.
6.1 The speed of organization
A detailed neural model for dynamic link matching on the basis of binary links is described in
W iskott and von der Malsburg [80]: links are identified with individual synaptic connections, which
are subject to rapid reversible synaptic plasticity under the control of temporal correlations [68].
It may be surmised that this is the original neural implementation in the infant brain. T hat
implementation, however, is not acceptable as a model for object recognition in the adult, being
too slow by one or two orders of magnitude. The reason for this slowness is twofold — the temporal
correlations for synaptic control inherently need time to be expressed, and the synapto-synaptic
64
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
interactions (corresponding to the C function of (3.2) in section 3.3) as induced by temporal
correlations are too unspecific in terms of map parameters. W ith unspecific interactions, the
globally correct mapping pattern has to compete with a very large number of nearly-optimal
connectivity patterns.
In this dissertation, we assume that organized mappings formed in early life leave memory
traces in the form of altered link interaction patterns, modelled here as maplets and their inter
actions, on the basis of which the process is accelerated by one or two orders of magnitude. If
each iteration in our dynamic formulation is estimated to take a few milliseconds, then the ten
or twenty iterations to form a fairly precise correspondence mapping could be accomplished well
within a tenth of a second. Moreover, as shown in Wiskott et al. [75], object recognition can, at
least in simple cases, be reliably performed long before the correspondence map has reached full
precision, as illustrated also in our face recognition tests, in which a decision was forced already
after three iterations of map development. Reaction-time arguments can therefore no longer be
raised against correspondence-based object recognition.
6.2 Neurobiological substrates of m aplets
The basic issue of how dynamic links are implemented in the brain has been discussed extensively
elsewhere by von der Malsburg [70, 71], and we will here concentrate mainly on higher-order links
and their interactions, as postulated in this dissertation in the form of interacting maplets. The
issue is complex, and much detailed work will be needed before a satisfactory solution is found.
There is the logical possibility of direct associative interaction routes within sets of synapses,
and candidate pathways for such interactions could be afforded by astrocytes, as proposed by
Antanitus [3]. Astrocytes belong to a subtype of glial cells, and are very numerous relative
to neurons. They have many processes with an obvious affinity to synapses and especially to
synaptic clefts, and carry ionic signals that could possibly switch synapses on and off in pools.
65
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
If maplets are realized in the brain by astrocytes, the domain of As {t, r) would be all synapses a
whole astrocyte or one of its dendrites envelops. The efEcacy of astrocytes controlling individual
synapses could change as a result of experience. So far, however, all ionic signals observed in
astrocytes are much too slow (several hundred msec) to serve the intended purpose (see Rose et
al.[58] concerning Ca++ signals).
An alternative possibility arises by realizing that only large numbers of activated synapses can
lire a postsynaptic neuron (Abeles [1]). Assume that the units that are to be linked are composed
of many neurons each, forming multi-cellular units (MCUs). Then a link between MCUs A and
B can be established by activating subsets a and b of A and B, respectively, if there are many
synaptic connections between those subsets, positively interfering such as to effectively link a
and b. Each neuron in, e.g., a has many more connections in addition to those to b, creating an
ineffectual spray of single signals over many other MCUs, connections that become functional only
when interfering positively in the context of other subsets a' of A. Thus, each neuron in an MCU
would be part of a combinatorial code contributing to a large number of links to other MCUs.
Implementation of links would thus stay within the realm of conventional neural networks, but
would require rather specific and massive connectivity patterns together with mechanisms for their
development and learning, of which the details will have to be worked out. On the basis of this
implementation, direct link-to-link interactions are simply realized by further neural connections,
which could be learned by Hebbian plasticity.
6.3 Shifter circuits
Our general approach is similar to the shifter circuits of Olshausen et al. [50]. Both approaches
share the philosophy to handle mappings directly, aiming to provide a good routing model that
preserves spatial relationships. Their model relies on a set of control neurons which are very
much like our maplets. There are fundamental differences, however. Their dynamic variables
66
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
are the coefficients (activities) of the control neurons, and are subject to gradient descent of an
energy function. In contrast, our dynamic variables are the link themselves, giving different roles
to our maplets and their control neurons. The goal of maplets is only to guide the dynamics of
links, in contrast to the all-or-none gating behavior of control neurons. Whereas our system is
self-organized, shifter circuits require a third party, the pulvinar, to provide the control signals,
and a difficulty with this mode of implementation is the open question how information on feature
similarities or on momentary synaptic strength could be transported to the control neurons (see
eq.(3.7)), unless the connections of control neurons supported bi-directional signal transport. The
architecture is also different in these two systems. Our system has only two domains (image and
model), while their model has several intermediate levels between the two domains. Multiple
levels have the advantage of needing fewer connections, so that each level has smaller fan-in and
simpler control. They are also biologically more realistic. But when using patterns stored in the
associative memory to guide the control variables, i.e., top-down control, there is the problem of
how to back-propagate this information down to the maplets at lower levels.
6.4 Self-organizing system s
In terms of self-organization, our system is very much like other self-organizing systems, such as the
cooperative algorithms in stereo vision (Julesz [33], Dev [13], Marr and Poggio [42], Sperling [60]),
and the development of retinotopic maps (von der Malsburg and Willshaw [73], Haussler and von
der Malsburg [26]). Our maplets and their connections act very much like the association field
for contour integration by Field et al. [16], and the extension field in edge detection by Medioni
et al. [43].
67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6.5 Algorithm s and com plexity
The speed, or complexity, of our system is measured in the unit of number of iterations. This
measure is appropriate for the parallel distributed processing in neural architectures, as it does
not consider the complexity of each iteration, as required in sequential computers. In contrast,
the complexity analysis in computer science is to count the number of basic operations in a von
Neumann computer architecture. The two measures emphasize different aspects of architecture,
although at the moment our experiments have to be implemented in sequential computers. Wc
consider sequential computing cost per iteration as irrelevant for the brain, and our aim is not to
improve the technology by reducing the number of sequential computer operations. In fact there
have been lots of algorithms in technology that can perform the correspondence task efficiently,
such as the elastic graph matching system [37]. Our question is not how many floating point
operations we need, but how many operations can be done at a time non-sequentially. (Single
neurons actually have tremendous computational power due to distributed mechanisms in their
dendritic trees, see, for instance, Mel [44]).
Subgraph isom orphism It is tempting to use subgraph isomorphism algorithms developed in
graph theory such as Ullman [64] for the visual correspondence problem. However, on the
one hand, the correspondence problem as we formulated is not exact subgraph isomorphism
problem. On the other hand, subgraph isomorphism is NP-complete [21], and approximation
algorithms have to be used for large input size. As a word of caution, the time complexity
of subgraph isomorphism problem depends on whether the pattern graph is viewed as fixed.
For example, Eppstein proposed a solution to the subgraph isomorphism problem for planar
graphs in linear time, and still acknowledged that “planar subgraph isomorphism is NP-
complete” [15].
D ynam ic program m ing Dynamic programming has also been used in stereo correspondence,
for example Ohta and Kanade [49] and Geiger et al. [22]. In general, dynamic programming is
68
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
an algorithmic design technique for the type of optimization problems that can be formulated
as a sequence of choices so that the optimal solution can be obtained by optimal solutions
to subproblems. By storing solutions to subproblems instead of re-calculating them each
time they are needed, the algorithm can reduce the time complexity from exponential to
polynomial, provided the subproblems overlap. Its complexity for matching ID patterns
with length n is O (nlogn).
However, dynamic programming is essentially sequential. All subproblems have to be solved
in their topological order, so that the result is always there whenever needed. In stereo
correspondence, the order could be from left to right in each epipolar line [49]. In order to
process a feature, all features left to it must have been already processed. This obviously is
not an optimal strategy in parallel processing architectures.
Functional approxim ation If, as in our current system, the variations are only translation,
scale and in-plane rotation, functional approximation could be very efficient, as used in
image registration with these restricted variations [11]. It is easy to write down an energy
function and the dynamics on these four parameters. But, as is true for all top-down
constrained optimization methods, in general this method is difficult to extend to other
variations, and learning the energy function from examples has never been attem pted.
6.6 Learning m aplets
We proposed generalized Hebbian plasticity for learning maplets and their connections, and
showed some pilot experiments on learning shift invariance for ID patterns. Our learning system
is only very primitive, but it shows some features that distinguish it from other learning systems.
Our learning system learns from established mappings, and this makes the system analogous
to learning patterns, where regular pattern learning techniques can be applied. In our system,
we use a simple Hebbian learning rule. An advantage of learning mappings, instead of learning
69
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
patterns as in most learning systems, however, is that the weight of a connection in a mapping
has a physical meaning — the existence or absence of a link — while the intensity level of a pixel
in a pattern does not indicate the existence of a feature.
If transformation invariance is learned from transforming images directly, something special is
needed to separate patterns from mappings. Learning control neurons in the shifter circuits would
be the closest to our learning setting. It was proposed that this could be done by performing fast
feature analysis for mappings in addition to slow feature analysis for object features (Wiskott
and Sejnowski [79]), and learning a sparse coding of both the feature and the mapping (Bruno
Olshausen, personal communication). The result has yet to be seen. Similar idea appears in
Grimes and Rao [24], where they use a bilinear model to factor an image into object features
and their transformations, and learn the basis through sparse coding on the respective coefficients
of the two factors. Their learned basis, however, contains every feature combined with every
transform, i.e., these two factors are not separated, which makes it difficult to generalize learned
transformations to new features.
Some may feel that our learning system is similar to Fdldiak’s system of learning shift invari
ance for complex cells [18]. There a modified Hebbian rule, the trace rule, is used to learn the
connections between complex and simple cells, from lines sweeping across a model retina. The
learned complex cells can respond to a line anywhere on the retina. Our system learns totally
different concept. If we compare our links to their pixels, and maplets to neurons, the maplets
we learned are “simple cells” , i.e., they are sensitive to the position of their inputs. Therefore our
learning is a much simpler task.
Different representations of mapping require different learning rules. Our maplets have trans
formation parameters that are discrete samples of the transformation space. The number of
maplets increases exponentially with the dimensionality of variance, and they all have to be
learned. A more efficient learning strategy is to learn the generators of the transformations,
which form a continuous group. Rao and Ruderman showed that an unsupervised network can
70
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
learn translation and rotation by learning generators for the corresponding Lie groups [57]. T his
is a direction worth trying, but when it comes to generators of multiple transformations, formal
derivation becomes difficult. Tony Bell tried to do it using tools from geometric algebra, but
reported difficulties with the approach (Tony Bell, personal communication).
6.7 Disclaim ers
In this dissertation, we concentrated on the organization of links in the visual correspondence
problem, especially for object recognition. We necessarily had to ignore some issues, which are
im portant in understanding visual function such as object recognition.
M ultiple view s for 3D objects This dissertation merely addresses comparing individual views,
with each object stored as a model graph from a single view. Although depth-rotation can
be compensated through the correspondence, we need multiple views to deal with gross
rotation in depth.
O rganization and indexing of th e m odel database As far as object recognition (and not
correspondence) are concerned, our formulation is still unrealistic in treating object models
as completely disjunct and letting each of them develop its own correspondence. A more
mature formulation will have to take into account the structural overlap between different
object models (as is exploited in the bunch graph method [76]) and will have to let them
collaborate in the establishment of a single mapping, as in the shifter circuits [50]. Partial
structural overlap between structures in the brain can be represented either by common
subsets of neurons or with the help of high-order links analogous to maplets.
This problem of organizing the model database will require much more future work. The
database also has to be indexed by simple features, and hierarchical indexing is needed if
the database becomes large, say beyond 100 objects. Lacking the organization and indexing,
71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
our system does not perform categorization tasks, such as those in RSVP experiments [62]
to identify pictures with, say, animals in them.
72
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 7
Conclusion
7.1 Summary
In this dissertation, we proposed a new implementation of DLM to speed up the convergence of its
classical neural implementation, and to deal with large variations in scale and in-plane rotation.
To this end we introduced synapto-synaptic interactions between links, as an alternative to link
dynamics through temporal signal correlation. To represent recurring local connectivity patterns,
we introduced maplets for groups of links that are consistent with one another in terms of trans
formation parameters. Maplets provide a basis for very specific and longer range interconnections
between links, leading to a dynamical system with better convergence properties.
In realistic face recognition tasks, the system showed significantly faster convergence than the
classical DLM, requiring just a few or even only one iteration. The recognition rate is also better
in some of the tests. The performance is also good with large variations in scale and in-plane
rotation.
We also showed that maplets and their connections can be learned by a generalized type of
Hebbian plasticity from consistent mappings. In simple pilot experiments the maplets formed by
learning closely resemble those previously designed manually.
73
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7.2 Contributions
As we have discussed, the aim of this dissertation is to take steps towards explaining how the visual
correspondence problem can be solved autonomously and efficiently in brain-like architecture.
Its contributions are therefore best stimmarized in the framework of the cognitive architecture
issues, which were first listed in the introduction of this dissertation. The contribution of DLA
in this framework is the introduction of dynamic links as a more flexible data structure of brain
states. This dissertation extends the DLA architecture further along the three other dimensions
mentioned in the introduction:
O rganization o f brain states: The main contribution of this work is a new mechanism by
which djmamic links are organized on a fast time scale. Our system requires direct synapto-
synaptic connections. The convergence of brain state organization is greatly speeded up by
altered link interaction patterns, modelled as maplets and their interactions.
D ata structure o f m em ory: We introduced maplets as a form of long term memory of transfor
mations. They are meant to be a simplified description of direct synapto-synaptic connection
patterns, the neural implementation remaining an open issue.
M echanism o f learning: We proposed a generalized type of Hebbian plasticity to learn maplets
and their connections, and showed some simple experiments to demonstrate the learning
principle.
7.3 Future work
Learning m ultiple variations The learning system in this dissertation is still rather primitive.
Only results from ID shift invariant were shown. It is very im portant for the system to learn
other variations, especially rotation in depth. Although in theory the same learning rules
apply to any variations, there are difficult issues with increased complexity in learning. One
74
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
problem, as pointed out earlier, is the difficulty of creating sample mappings for large image
transformations. Another challenge is to learn multiple variations simultaneously. This will
have to deal with all the problems machine learning is also facing, for instance the curse of
dimensionality. The goal is to have a system that can learn from few, unsorted examples.
M u ltip le m odels As pointed out in the last chapter, realistic object recognition has to include
the organization of the model database. From our system, a first step in this direction would
be to store the models in associative-memory-like fashion, and analyze the dynamic behavior
of the combined correspondence and memory systems, especially how the equilibrium of the
combined system affects that of the correspondence subsystem.
M oving m appings We need a data structure to support continuously moving mappings, which
is necessary for tracking and temporal prediction. This may be related to the generators of
transformation groups.
B iological com putation o f initial sim ilarity In this dissertation, as well as in other DLM
systems, features are Gabor jets, and the similarity between two jets is their normalized
inner product. Exactly how this is computed in the brain is not clear, but it is definitely
a process involving binding between individual Gabor filter responses in the two jets. This
problem is worsened in our system with large variations because different transformations
give rise to different binding. We need to find out how to compute this in a biologically
plausible way.
75
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reference List
[1] M. Abeles. Studies of Brain Function. Vol. 6: Local Cortical Circuits: An Electrophysiological
Study. Springer-Verlag, Berlin, 1982.
[2] V. Ajjanagadde and L. Shastri. Rules and variables in neural nets. Neural Computation,
3:121-134, 1991.
[3] D.S. Antanitus. A theory of cortical neuron-astrocyte interaction. Neuroscientist, 4(3):154-
159, 1998.
[4] T. Aonishi and K. K urata. Extension of dynamic link matching by introducing local linear
maps. IEEE Transactions on Neural Networks, ll(3):817-822, 2000.
[5] M.A. Arbib, C.C. Boylls, and R Dev. Neural models of spatial perception and the control of
movement. In W.D. Keidel, W. Handler, and M. Spreng, editors. Cybernetics and bionics,
pages 216-231. Oldenbourg, 1974.
[6] A.J. Bell and T.J. Sejnowski. The independent components of natural scenes are edge filters.
Vision Research, 37(23):3327-3338, 1997.
[7] I. Biederman and S. Subramaniam. Predicting the shape similarity of objects without disin-
guishing viewpoint invariant properties (vips) or parts. In Investigative Ophthalmology and
Visual Science, volume 38, page 998, 1997.
[8] I. Biederman, S. Subramaniam, M. Bar, P. Kalocsai, and J. Fiser. Subordinate-level object
classification reexamined. Psychological Research, 62:131-153, 1999.
[9] E. Bienenstock and C. von der Malsburg. A neural network for invariant pattern recognition.
Europhysics Letters, 4:121-126, 1987.
[10] P. C. Bressloff and J. D. Cowan. Spontaneous pattern formation in primary visual cortex.
In A. Champneys S. J. Hogan and B. Krauskopf, editors, Nonlinear dynamics: where do we
go from here?, chapter 1, pages 1-53. Institute of Physics: Bristol, 2002.
[11] L.G. Brown. A survey of image registration techniques. Surveys, 24(4):325-376, December
1992.
[12] J.G. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation
optimized by two-dimensional visual cortical filters. JOSA, 2:1160-1169, 1985.
[13] P. Dev. Perception of depth surfaces in random dot stereograms: A neural model. Int. J.
Man-Machine Studies, 7:511-528, 1975.
[14] M.C.M. Elliffe, E.T. Rolls, and S.M. Stringer. Invariant recognition of feature combinations
in the visual system. Biological Cybernetics, 86:59-71, 2002.
76
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[15] D. Eppstein. Subgraph isomorphism in planar graphs and related problems. Journal of
Graph Algorithms and Applications, 3:1-27, 1999.
[16] DJ Field, A Hayes, and RF Hess. Contour integration by the human visual system; evidence
for a local ’ ’association field” . Vision Research, 33(2):173-93, Jan 1993.
[17] J. Fiser, I. Biederman, and EE Cooper. To what extent can matching algorithms based on
direct outputs of spatial filters account for human object recognition? Spat Vis, 10(3):237-
271, 1997.
[18] P. Foldiak. Learning invariance from transformation sequences. Neural Computation,
3(2):194~200, 1991.
[19] John Prisby. Stereo correspondence. In M. Arbib, editor, The Handbook of Brain Theory
and Neural Networks, pages 1104-1108. MIT Press, 2nd edition, 2002.
[20] K. Fukushima, S. Miyake, and T. Ito. Neocognitron: A neural network model for a mech
anism of visual pattern recognition. IEEE Transactions on Systems, Man and Cybernetics,
13(5):826-834, 1983.
[21] M.R. Garey and D.S. Johnson. Computers and Intractability. W.H. Freeman and Co., New
York, 1979.
[22] D. Geiger, A. Gupta, L.A. Costa, and J. Vlontzos. Dynamic programming for detecting,
tracking and matching elastic contours. PAMI, 17(3):294-302, 1995.
[23] M.A. Goodale and G.K. Humphrey. The objects of action and perception. Cognition, 67:181-
207, 1998.
[24] David Grimes and Rajesh P. N. Rao. A bilinear model for sparse coding. In Advances in
Neural Information Processing Systems, volume 15. Cambridge, MA: MIT Press, 2003.
[25] Hermann Haken. Synergetics: An Introduction: Nonequilibrium Phase Transitions and Self-
Organization in Physics, Chemistry, and Biology. Berlin; New York: Springer-Verlag, 1977.
[26] A. F. Haussler and C. von der Malsburg. Development of retinotopic projections - an ana
lytical treatm ent. Journal of Theoretical Neurobiology, 2:47-73, 1983.
[27] John Hertz, Anders Krogh, and Richard G. Palmer. Introduction to the Theory of Neural
Computation. Addison-Wesley, Redwood City CA, 1991.
[28] G.E.A. Hinton. A parallel computation that assigns canonical object-based frames of refer
ence. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence,
volume 2, pages 683-685, Vancouver BC, Canada, 1981.
[29] B.K.P. Horn and B.C. Schunck. Determining optical flow. Artificial Intelligence, 17:185-203,
1981.
[30] J.E. Hummel and I. Biederman. Dynamic binding in a neural network for shape recognition.
Psychological Review, 99(3):480-517, 1992.
[31] R.A. Hummel and S.W. Zucker. On the foundations of relaxation labeling processes. PAMI,
5(3):267-287, 1983.
[32] J.P. Jones and L.A. Palmer. An evaluation of the two-dimensional gabor filter model of
simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6):1233-1258,
1987.
77
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[33] B. Julesz. Foundations of Cyclopean Perception. Chicago: Chicago Univ. Press, 1971.
[34] P. Kaiocsai, I. Biederman, and Cooper E.E. To what extent can the recognition of unfamiliar
faces be accounted for by a representation of the direct oiitput of simple cells. In Investigative
Ophthalmology and Visual Science, volume 35, page 1626, 1994.
[35] M. Kass, A.P. W itkin, and D. Terzopoulos. Snakes: Active contour models. IJCV, 1(4):321-
331, January 1988.
[36] W. Konen, T. Maurer, and C. von der Malsburg. A fast dynamic link matching algorithm
for invariant pattern recognition. Neural Networks, 7(6/7):1019-1030, 1994.
[37] M. Lades, J.C. Vorbriiggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Wiirtz, and
W. Konen. Distortion invariant object recognition in the dynamic link architecture. IEEE
Transactions on Computers, 42(3):300-311, 1993.
[38] Gilles Laurent. W hat does ’understanding’ mean? Nature Neuroscience, 3:1211, 2000.
[39] C. Legendy. The brain and its information trapping device. In Progress in Cybernetics,
volume 1, pages 309-338. Gordon and Breach, New York, 1970.
[40] H.S. Loos and C. von der Malsburg. 1-Click Learning of Object Models for Recognition.
In H.H. Biilthoff, S.-W. Lee, T.A. Poggio, and C. Wallraven, editors, Biologically Motivated
Computer Vision 2002 (BM CV 2002), volume 2525 of Lecture Notes in Computer Science,
pages 377-386, Tubingen, Germany, November 22-24 2002. Springer Verlag.
[41] D. Marr. Vision. W.H. Freeman and Company, 1982.
[42] D. Marr and T. Poggio. Cooperative computation of stereo disparity. Science, 194:283-287,
1976.
[43] Grard Medioni, Mi-Suen Lee, and Chi-Keung Tang. A Computational Framework for Seg
mentation and Grouping. Elsevier, 2000.
[44] B. Mel. Information processing in dendritic trees. Neural Computation, 6:1031-1085, 1994.
[45] B. Mel. SEEMORE: Combining color, shape, and texture histogramming in a neurally-
inspired approach to visual object recognition. Neural Computation, 9:777-804, 1997.
[46] M.I. Miller and L. Younes. Group actions, homeomorphisms, and matching: A general
framework. International Journal of Computer Vision, 41(l/2):61-84, 2001.
[47] J. Moran and R. Desimore. Selective attention gates visual processing in the extrastriate
cortex. Science, 229(4715):782-784, 1985.
[48] M. Nicolescu and G. Medioni. 4-d voting for matching, densification and segmentation into
motion layers. In International Conference on Pattern Recognition, Quebec City, Canada,
August 2002.
[49] Y. Ohta and T. Kanade. Stereo by intra- and inter-scanline search using dynamic program
ming. PAMI, 7(2):139-154, 1985.
[50] B.A. Olshausen, C.H. Anderson, and B.C. Van Essen. A neurobiological model of visual
attention and invariant pattern recognition based on dynamic routing of information. The
Journal of Neuroscience, 13(11):4700-4719, 1993.
[51] B.A. Olshausen and D.J. Field. Emergence of simple-cell receptive fields properties by learn
ing a sparse code for natural images. Nature, 381:607-609, 1996.
78
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[52] A.J. O’Toole. Structure from stereo by associative learning of the constraints. Perception,
18:767-782, 1989.
[53] P.J. Phillips, H. Moon, S.A. Rizvi, and P.J. Rauss. The feret evaluation methodology for
face-recognition algorithms. IEEE Trans. PAMI, 22(10):1090 -1104, 2000.
[54] T. Poggio and C. Koch. Ill-posed problems in early vision: Prom computational theory to
analog networks. Proceedings of the Royal Society London B, 226:303-323, 1985.
[55] S.B. Pollard, J.E.W . Mayhew, and J.P. Prisby. Pmf: A stereo correspondence algorithm
using a disparity gradient limit. Perception, 14:449-470, 1985.
[56] Ning Qian and Terrence J. Sejnowski. Learning to solve random-dot stereograms of dense
and transparent surfaces with recurrent backpropagation. In Proceedings of the 1988 Con-
nectionist Models Summer School, Morgan Kaufmann, pages 435-443, 1989.
[57] Rajesh P. N. Rao and Daniel L. Ruderman. Learning lie groups for invariant visual percep
tion. In M. S. Kearns, S. A. Solla, and D. Cohn, editors. Advances in Neural Information
Processing Systems, volume 11, pages 810-816. Cambridge, MA: MIT Press, 1999.
[58] C.R. Rose, R. Blum, B. Pichler, A. Lepier, K.W. Kafitz, and A. Konnerth. Truncated TrkB-
T1 mediates aneurotrophin-evoked calcium signalling in glia cells. Nature, 426:74-78, 2003.
[59] T J Sejnowski. Skeleton filters in the brain. In Parallel Models of Associative Memory, pages
189-212. Hillsdale, N.J.: Lawrence Erlbaum, 1981.
[60] G. Sperling. Binocular vision: a physical and a neural theory. Amer. J. of Psychology,
83:461-534, 1970.
[61] Charles P. Stevens. Models are common; good theories are scarce. Nature Neuroscience,
3:1177, 2000.
[62] S. Subramaniam, I. Biederman, and S. A. Madigan. Accurate identification but no priming
and chance recognition memory for pictures in rsvp sequences. Visual Cognition, 7:511-535,
2000 .
[63] S. Thorpe, D. Fize, and C. Marlot. Speed of processing in the human visual system. Nature,
381:520-522, June 1996.
[64] J.R. Ullman. An algorithm for subgraph isomorphism. Journal of the ACM, 23(l):31-42,
1976.
[65] S. Ullman. The Interpretation of Visual Motion. MIT Press, Cambridge MA, 1979.
[66] S. Ullman. Aligning pictorial descriptions: an approach to object recognition. Cognition,
32:193-254, 1989.
[67] C. von der Malsburg. Self - organization of orientation sensitive cells in the striate cortex.
Kyhernetik, 14:85-100, 1973.
[68] C. von der Malsburg. The correlation theory of brain function. Internal Report 81-2, MPI
Biophysical Chemistry, 1981. Reprinted in E. Domany, J.L. van Hemmen, and K.Schulten,
e d ito rs, Models of Neural Networks II, c h a p te r 2, p a g e s 9 5 -1 1 9 . S p rin g e r, B e rlin , 1994.
[69] C. von der Malsburg. A phase transition for biology! Memorandum, Contribution to the
Festschrift for the 20th Anniversery of Manfred Eigens Winterseminar, 1985.
79
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[70] C. von der Malsburg. The what and why of binding: The modelers perspective. Neuron,
24(1):95-104, 1999.
[71] C. von der Malsburg. Dynamic link architecture. In M. Arbib, editor, The Handbook of
Brain Theory and Neural Networks, pages 365-368. MIT Press, 2nd edition, 2002.
[72] C. von der Malsburg. Self-organization in the brain. In M. Arbib, editor. The Handbook of
Brain Theory and Neural Networks, pages I002-I005. MIT Press, 2nd edition, 2002.
[73] C. von der Malsburg and D. J. Willshaw. How to label nerve cells so that they can in
terconnect in an ordered fashion. Proceedings of the National Academy of Sciences (USA),
74:5176-5178, 1977.
[74] H. Wersing and E. Korner. Learning optimized features for hierarchical models of invariant
object recognition. Neural Computation, I5(7):I559-I588, 2003.
[75] L. Wiskott. The role of topographical constraints in face recognition. Pattern, Recognition
Letters, 20(I):89-96, 1999.
[76] L. Wiskott, J.-M. Fellous, N. Kruger, and C. von der Malsburg. Face recognition and gender
determination. In International Workshop on Automatic Face- and Gesture-Recognition,
Zurich, June 26-28, 1995, pages 92-97, 1995.
[77] L. Wiskott, J.-M. Fellous, N. Kruger, and C. von der Malsburg. Face recognition by elas
tic bunch graph matching. IEEE Trans, on Pattern Analysis and Machine Intelligence,
I9(7):775-779, 1997.
[78] L. W iskott and T. Sejnowski. Constrained optimization for neural map formation: A unifying
framework for weight growth and normalization. Neural Computation, I0(3):67I-7I6, 1998.
[79] L. W iskott and T. Sejnowski. Slow feature analysis: Unsupervised learning of invariances.
Neural Computation, I4(4):7I5-770, 2002.
[80] L. W iskott and C. von der Malsburg. Face recognition by dynamic link matching. In
J. Sirosh, R. Miikkulainen, and Y. Choe, editors. Lateral Interactions in the Cortex: Struc
ture and Function, chapter II. The UTCS Neural Networks Research Group, Austin, TX,
http://w w w .cs.utexas.edu/users/nn/w eb-pubs/htm lbook96/, 1996. Electronic book, ISBN
0-9647060-0-8.
[81] A. Yuille, D. Cohen, and P. Hallinan. Feature extraction from faces using deformable tem
plates. In Proceedings of Computer Vision and Pattern Recognition, pages 104-109, San
Diego, 1989. IEEE Computer Society Press.
[82] J. Zhu and C. von der Malsburg. Synapto-synaptic interactions speed up dynamic link
matching. NeuroComputing, 44:721-728, 2002.
[83] J. Zhu and C. von der Malsburg. Learning control units for invariant recognition. Neuro-
Computing, 52:447-453, 2003.
80
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Analysis, synthesis and recognition of human faces with pose variations
PDF
Analysis, recognition and synthesis of facial gestures
PDF
A model for figure -ground segmentation by self -organized cue integration
PDF
First steps towards extracting object models from natural scenes
PDF
Face classification
PDF
An integrated environment for modeling, experimental databases and data mining in neuroscience
PDF
Automatically and accurately conflating road vector data, street maps and orthoimagery
PDF
An adaptive soft classification model: Content-based similarity queries and beyond
PDF
Combinatorial approaches to signal finding and gene finding in DNA sequences
PDF
Diploid genome reconstruction from shotgun sequencing
PDF
Extendible tracking: Dynamic tracking range extension in vision-based augmented reality tracking systems
PDF
A modular approach to hardware -accelerated deformable modeling and animation
PDF
A syntax-based statistical translation model
PDF
Human -like movement in virtual environments
PDF
Annotation databases for distributed documents
PDF
Experimental evaluation of a distributed control system for chain-type self -reconfigurable robots
PDF
An efficient design space exploration for balance between computation and memory
PDF
Evaluating the dynamics of agent -environment interaction
PDF
Extending the COCOMO II software cost model to estimate effort and schedule for software systems using commercial -off -the -shelf (COTS) software components: The COCOTS model
PDF
Application-specific external memory interfacing for FPGA-based reconfigurable architecture
Asset Metadata
Creator
Zhu, Junmei (author)
Core Title
A dynamic method to reduce the search space for visual correspondence problems
Contributor
Digitized by ProQuest
(provenance)
School
Graduate School
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biology, neuroscience,Computer Science,OAI-PMH Harvest
Language
English
Advisor
Malsburg, Christoph von der (
committee chair
), Arbib, Michael A. (
committee member
), Biederman, Irving (
committee member
)
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c16-534009
Unique identifier
UC11340288
Identifier
3140584.pdf (filename),usctheses-c16-534009 (legacy record id)
Legacy Identifier
3140584.pdf
Dmrecord
534009
Document Type
Dissertation
Rights
Zhu, Junmei
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the au...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus, Los Angeles, California 90089, USA
Tags
biology, neuroscience