Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Biologically inspired approaches to computer vision
(USC Thesis Other)
Biologically inspired approaches to computer vision
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Biologically Inspired Approaches to Computer Vision
by
Rorry Brenner
Submitted to the faculty of the USC Graduate School
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Neuroscience
at the
UNIVERSITY OF SOUTHERN CALIFORNIA
May 2019
c ○ University of Southern California 2019. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
faculty of the USC Graduate School
November 8, 2018
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Laurent Itti
Professor
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bartlett Mel
Chairman
2
Biologically Inspired Approaches to Computer Vision
by
Rorry Brenner
Submitted to the facult y of the USC Graduate Sc ho ol
on No v em b er 8, 2018, in partial fulfillmen t of the
requiremen ts for the degree of
Do ctor of Philosoph y in Neuroscience
Abstract
In this thesis t w o biologically inspired pro jects are designed and implemen ted in the
field of computer vision. The first relies on h uman biology’s sup erior decision mak-
ing and mobilit y to w ork in a h uman-in-the-lo op system. This system w as able to
ac hiev e p erfect accuracy on a task where the only external p erception w as with a
digital camera. The second pro ject is a mo dification to mo dern artificial neural net-
w orks. It is founded on the idea that biological dendrites p erform complex nonlinear
computations of presynaptic input prior to that input reac hing the cell b o dy . The
bac kpropagation of traditional neural net w ork learning is mo dified in a paradigm w e
call “P erforated Bac kpropagation” to only flo w through a subset of no des represen ting
neurons, while skipping no des represen ting dendrite branc hes whic h can learn through
a differen t mec hanism.
Thesis Sup ervisor: Lauren t Itti
Title: Professor
3
4
Acknowledgments
First and foremost I w ould lik e to thank m y advisor Dr. Lauren t Itti for making
this PhD exp erience exactly what I w as hoping for b y running a lab with the p erfect
ratio of guided supp ort and freedom of researc h paths. I w ould lik e to thank b oth m y
committee mem b ers as w ell. Dr. Irving Biederman for ev erything I learned rotating
in lab, advice and supp ort when originally deciding on going to graduate sc ho ol and
applying, and for doing the researc h that originally inspired me to do researc h of m y
o wn. I w ould lik e to thank Dr. Bartlett Mel for teac hing the section in NGP’s core
class m y first y ear, and doing the researc h, that motiv ated m y final thesis topic.
I w ould lik e to thank all of m y labmates in iLab for alw a ys b eing a v ailable to answ er
sp ecific questions or just discuss cutting edge researc h in our field, m y pro jects w ould
not ha v e b een the same without all of y our input. I w ould also lik e to thank all of
the mem b ers of NGP for making ev erything related to the PhD and USC outside of
researc h a great exp erience.
I w ould lik e to thank all of m y teac hers throughout m y life who instilled a lo v e of
learning in me. Esp ecially those in undergrad who taugh t me the computer science I
needed to p erform the researc h required of this PhD.
Lastly I w ould lik e to thank m y brothers and m y paren ts for their con tin ued lo v e
and supp ort throughout m y life. As w ell as all of m y friends and other family mem b ers
who ha v e b een there for me through all the ups and do wns as I w en t through this
pro cess.
5
6
Contents
1 Introduction 23
1.1 The P erceptron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.2 Other Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3 P erceptrons in Computer Vision . . . . . . . . . . . . . . . . . . . . . 25
1.4 Computer Vision Challenges . . . . . . . . . . . . . . . . . . . . . . . 26
1.5 Assistiv e Vision T ec hnologies . . . . . . . . . . . . . . . . . . . . . . . 28
1.6 This Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Perfect Accuracy With Human-on-the-Lo op Computer Vision 31
2.1 In tro duction and Bac kground . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Challenges of Pic king a Decision Threshold . . . . . . . . . . . . . . . 32
2.3 Prop osed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.1 Human-in-the-Lo op . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3.2 Setting Thresholds and the Homograph y Matrix . . . . . . . . 36
2.3.3 Instructions to the User . . . . . . . . . . . . . . . . . . . . . 38
2.3.4 The Comp onen ts . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Exp erimen t Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.1 En vironmen t and Instructions . . . . . . . . . . . . . . . . . . 41
2.4.2 T raining the P articipan ts . . . . . . . . . . . . . . . . . . . . . 43
2.4.3 Con trol Exp erimen t . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5 Exp erimen tal Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.7 A c kno wledgmen t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7
3 Inspiration for Contin ued W ork 51
3.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Inspiration from Cognitiv e Psyc hology: Ob ject Represen tation in Hu-
mans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 Ov erview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Analysis of Biological Relev ance in Mo dern Neural Net w orks . 54
3.3 Inspiration from Neurobiology: Nonlinear Dendritic Pro cessing . . . . 63
3.3.1 Dendrites in CNNs . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3.2 Summary of Dendrite F unction . . . . . . . . . . . . . . . . . 63
3.3.3 Nonlinear Dendritic Spik es . . . . . . . . . . . . . . . . . . . . 64
3.3.4 Computational Mo dels of Dendrites . . . . . . . . . . . . . . . 67
3.3.5 Dendrite parallels in Mac hine Learning . . . . . . . . . . . . . 70
3.4 Mo dern Neural Net w ork Researc h . . . . . . . . . . . . . . . . . . . . 74
3.4.1 Ov erview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4.2 MNIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4.3 ImageNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4 Perforated Backpropagation: A Biologically Inspired Modification
to Deep Neural Netw orks 85
4.1 In tro duction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2 Bac kground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2.1 Biological Bac kground - A ctiv e Dendrites . . . . . . . . . . . . 86
4.3 Computational Bac kground - Cascade Correlation . . . . . . . . . . . 87
4.4 Explanation of System . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4.1 Pro of of Concept T est . . . . . . . . . . . . . . . . . . . . . . 92
4.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.4.3 Increase of P arameters . . . . . . . . . . . . . . . . . . . . . . 94
4.4.4 P ossible Biological Implications . . . . . . . . . . . . . . . . . 95
4.4.5 Neurogenesis and Artificial In telligence . . . . . . . . . . . . . 98
4.4.6 F uture w ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8
5 Summary of Contributions and References 119
Bibliograph y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9
10
List of Figures
1-1 Structure of a basic p erceptron. F rom [152]. . . . . . . . . . . . . . . 24
1-2 A con v olutional neural net w ork. The larger squares represen t the input
image at the b eginning and then the plane of activ ation v alues repre-
sen ted as feature maps. The smaller squares b eing pro jected forw ard
represen ts the k ernel, a single v alue on a p ostsynaptic la y er is calcu-
lated b y the w eigh ts of its k ernel applied a corresp onding lo cation in
the input map. F rom [3]. . . . . . . . . . . . . . . . . . . . . . . . . . 26
1-3 Figure sho wing Deep Belief Net w ork and Deep Boltzmann Mac hine.
F rom [188]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2-1 T emplate images for ob jects in database b y ro w, left to righ t. 1, Cereal:
PEB, CP , HNC, LUC, MGC. 2, Snac ks: SR, HBN, OCP , PS, NB. 3,
P asta: HH, KRA, PR, MA C, VEL. 4, T ea: SM, LIP , FTS, FR, ST A.
5, Candy: NRD, HT, GNP , MD, JM. . . . . . . . . . . . . . . . . . . 33
2-2 (A) R OC curv es of confidence v alues o v er all images and all ob jects
collected. PEB and HNC are the only ones where all confidences for
images of other ob jects are lo w er than all confidences for images of
themselv es. (B) R OC curv e for correctness with a single fixed threshold
o v er all ob jects b eing tested on. . . . . . . . . . . . . . . . . . . . . . 35
2-3 Flo w diagram of c hoices algorithm mak es. . . . . . . . . . . . . . . . 36
11
2-4 Visual represen tation of Homograph y calculation [162]. In this case
the p oin ts will b e prop ortionally closer in the y-axis, while in the the
x-axis they will closer at the top and further apart at the b ottom. The
calculated homograph y w ould describ e a p osition where the camera is
lo oking up at the ob ject from b elo w. . . . . . . . . . . . . . . . . . . 37
2-5 Confidence threshold displa y for three items. Bars represen t confidence
v alues o v er last 20 frames. No units are sho wn b ecause displa y con-
fidences for eac h are relativ e p ercen tage b et w een max and min ev er
seen b y the system when searc hing for eac h item. Middle threshold is
the highest false p ositiv e. Bottom threshold is the lo w est true p ositiv e.
T op threshold is the extra buffer, 15% ab o v e of the range ab o v e the
highest false p ositiv e. Left confidence is in the range of uncertain t y , if
this w as the item b eing searc hed for directions w ould b e giv en. Middle
confidence is ab o v e the max false p ositiv e v alue meaning this item is
actually in the camera frame. Righ t confidence is b elo w the lo w est true
p ositiv e so this item is certain to not ha v e enough k eyp oin t matc hes in
the image to reco v er a homograph y . . . . . . . . . . . . . . . . . . . . 39
2-6 Exp erimen t setup. Sho wn is a user confirming a selected item in the
sim ulated gro cery store. . . . . . . . . . . . . . . . . . . . . . . . . . 41
2-7 Instructions corresp ond to camera’s p osition based on homograph y cal-
culation. User is guided to mak e camera p oin t directly at cen ter of
ob ject. Cen ter image sho ws a strafe command where user w ould b e
instructed to rotate in addition to mo v e. . . . . . . . . . . . . . . . . 42
12
2-8 Bo xplots for total time tak en for runs of the system and the barco de
scanner, “time to complete” times for the system, and first “Reac h Out”
times for the system. Wilco xon Rank Sum T ests w ere run on eac h pair
to test if they could ha v e come from con tin uous distributions with equal
medians. All but Barco de Time vs First Reac h Out Time had signif-
ican t p-v alues: System Time vs Barco de Time: 1.5e-24, System Time
vs Time to Complete: 6.2e-118, Barco de Time vs Time to Complete:
2.4e-46, System Time vs First Reac h Out Time: 1.2e-34, First Reac h
Out Time vs Time to Complete: 7.4e-46, and Barco de Time vs First
Reac h Out Time: 0.18. . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3-1 A set of examples of non-acciden tal prop erties. F rom [19]. . . . . . . 53
3-2 T w o examples sho wing ho w edges and non-acciden tal prop erties b e-
t w een them form the building blo c ks of geons. F rom [19]. . . . . . . . 54
3-3 V arious MNIST images whic h maximally activ ate units in a trained
net w ork from [178] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3-4 V arious MNIST images whic h maximally activ ate units in a randomly
initialized net w ork from [178] . . . . . . . . . . . . . . . . . . . . . . 59
3-5 V arious ImageNet images whic h maximally activ ate units in a trained
net w ork from [178] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3-6 V arious ImageNet images whic h maximally activ ate units in a ran-
domly initialized net w ork from [178] . . . . . . . . . . . . . . . . . . . 60
3-7 6 Examples of adv ersarial images. Images in the left columns are the
original images whic h the net w ork correctly iden tifies. Middle columns
sho w the noise injected in to the system at 10x true in tensit y to create
adv ersarial images in the righ t columns. Righ t column images are all
classified to b e an ostric h. F rom [178]. . . . . . . . . . . . . . . . . . 61
13
3-8 (a) and (b) sho w generated adv ersarial examples for t w o net w orks
trained on MNIST. Odd ro ws are original images whic h net w ork cor-
rectly classifies while ev en ro ws are adv ersarial examples whic h net w ork
do es not correctly classify . (c) sho ws randomly distorted images, whic h
to the h uman ey e are m uc h more significan tly c hanged than the adv er-
sarial examples, but whic h the net w ork can still correctly classify half
the time. F rom [178]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3-9 N-shap ed Curv e of NMD AR conductance. The green curv e sho ws
the unstable p oin t where a spik e is p ossible when mem brane p oten-
tial mo v es across it. Figure from [124] . . . . . . . . . . . . . . . . . . 66
3-10 Pro ximal vs distal driv er synapse I-O curv es. Data gathered from real
neurons NEUR ON mo dels and the simple mo del. Figure from [86] . . 69
3-11 A represen tation of Upstart’s splitting of the parit y problem. Sho wn
are the original p erceptron classification line as w ell as the lines from
the false p ositiv e and false negativ e no des. Figure from [54] . . . . . . 71
3-12 Num b er of inputs net w orks with a particular n um b er of no des can
correctly categorize on the parit y problem. While more adding more
neurons increases capabilit y across all net w orks, deep er net w orks can
outp erform shallo w er net w orks ev en with few er neurons. F rom [82]. . 78
3-13 T otal accuracy o v er o v er man y runs comparing a single arc hitecture in
eac h graph compared to total n um b er of neurons in arc hitecture. F rom
[82]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3-14 Left net w ork sho ws the fully connected net w ork. Righ t net w ork sho ws
a temp orary net w ork during a single pass of drop out. Random no des
whose outputs are ignored are represen ted b y b eing crossed out and
remo v ed from the net w ork. F rom [172]. . . . . . . . . . . . . . . . . . 81
14
3-15 Represen tation of goal surface mapp ed b y 2, 4, and 5 neurons. With 2
Neurons the top-righ t mo deled surface can b e seen to underfit its rep-
resen tation of the goal surface. Ho w ev er, with 5 neurons the b ottom-
righ t surface o v erfits, with distortions that are not presen t in the goal
surface. In the b ottom-left the surface represen ted b y 4 neurons is
visually the most similar to the goal. F rom [82]. . . . . . . . . . . . . 82
4-1 Stages of Cascade Correlation learning are sho wn. In [a] a p erceptron
is sho wn with 2 inputs and 1 output. In the first phase this p ercep-
tron is trained to maximize classification of the input data. Next in
[b] the old w eigh ts in the net w ork are fixed and the new no de learns
w eigh ts whic h maximize its correlation with the p erceptron’s error. [c]
sho ws the net w ork in the next phase with the new no de added after
ha ving its w eigh ts fixed and the p erceptron learning to mo dify its in-
put w eigh ts as normal. In [d] an additional no de has b een added with
input connections to the input and the other created no de. . . . . . . 89
4-2 These graph sho ws the corresp onding capabilities of a CC net w ork. In
[a] w e only ha v e a single p erceptron so only a single classification line
can b e dra wn. F or simplicit y this line has b een dra wn to p erfectly
classify the blue data on the left and all of the red data, while getting
its classification wrong on all of the blue data on the righ t. [b] sho ws
ho w the CC no de learns. Rather than classifying the data, it learns to
classify where the p erceptron is wrong. If there w ere more than one
output no de it w ould learn to classify where all of the outputs mak e
the most mistak es. In [c] the red line represen ts the classification line
of the new no de. The output p erceptron no w has not only the x and y
co ordinates as input but also an input represen ting if those p oin ts are
ab o v e or b elo w the red line. It can no w mo dify its w eigh ts to correctly
classify all of the data. . . . . . . . . . . . . . . . . . . . . . . . . . . 90
15
4-3 T op three images are to giv e the reader a sense of the color sc heme.
Blac k lines sho w connections b et w een presynaptic Neurons and final
p ostsynaptic Neurons. Y ello w lines sho w connections b et w een presy-
naptic Neurons and hidden la y er 2 Neurons. Red lines sho w connec-
tions b et w een presynaptic Neurons and hidden la y er 1 Neurons. Blue
lines sho w connections b et w een hidden la y er 1 Neurons and hidden
la y er 2 Neurons. Finally Green lines sho w connections b et w een hidden
la y er 2 Neurons and final p ostsynaptic Neurons. T o b e clear the blac k
lines simply represen t a fully connected presynaptic and p ostsynaptic
la y er, the curv es are to allo w more clear visualization in the b ottom
images. The top three images simply represen t traditional m ulti-la y er
p erceptron net w orks with v arious n um b ers of fully connected la y ers.
(d) is a traditional Resnet blo c k. It lo oks iden tical to the three la y er
net w ork blo c k in terms of fully connectedness, with the addition of
skip connections b et w een the first presynaptic la y er and the final out-
put la y er. (e) sho ws a PBDNN blo c k where the p ostsynaptic Neurons
ha v e created t w o PBNo des eac h rather than ha ving t w o hidden Neuron
la y ers. Eac h PBNo de is receiving input from the Presynaptic Neurons
sho wing full connectedness of red and y ello w lines. Ho w ev er, blue lines
no w only are connected to a single PBNo de in the follo wing la y er, and
green lines are connected only from the PBNo des to the to the single
final No de itself. In the F orw ard propagation step, activ ation passes
through all colored connections. In the bac kw ard propagation step, er-
ror is eliminated rather than going through y ello w and red connections. 106
16
4-4 The top image sho ws a traditional con v olutional neural net w ork setup.
A presynaptic la y er forms connections to all Neurons in the p ostsy-
naptic la y er. These input connections are represen ted b y w eigh t v alues
of k ernels in the p ostsynaptic la y er whic h con v olv e on the presynap-
tic la y er to form set of planes of activ ation v alues. These planes all
form connections to all of the Neurons in the next la y er in the same
manner. The b ottom image sho ws the planes of a PBDNN. PBNo de
k ernels receiv e input and calculate activ ation planes in the same w a y ,
with eac h PBNo de forming a plane of activ ation v alues based on their
con v olution around the input planes. Ho w ev er, the planes they form do
not form connections to ev ery Neuron k ernel in the p ostsynaptic la y er.
The single activ ation v alue at eac h lo cation in a PBNo de plane only is
passed as a single additional input to the corresp onding Neuron in the
same lo cation in the p ostsynaptically . This means ev ery Neuron k ernel
only adds one single input connection p er PBNo de k ernel it adds. I.e.
with an NxN k ernel the p ostsynaptic Neuron will ha v e N*N+1 input
connections with a single PBNo de and N*N+2 with t w o. . . . . . . . 107
4-5 Graph of training sum squared error on t w o la y er xor test. The first
three blue bars corresp ond to b eginnings of ep o c hs whic h did not add
PBNo des, green bars corresp ond to the b eginnings of ep o c hs where
PBNo des w ere added. X-axis is batc hes run. Sum squared error c hance
probabilit y is 0.125 (where the flat p ortion rests). . . . . . . . . . . . 108
4-6 Graph of training error on MNIST dataset with neurogenesis steps at
90, 197, 262, 359, 411, and 448. These ep o c hs are mark ed with v ertical
green lines. Before eac h time p oin t error can b e seen to flatline, b efore
b eing reduced once again after passing the neurogenesis ep o c hs. T otal
error for the 29x29-40-10 net w ork thresholds at 5.3 p ercen t without
an y PBNo des and con tin ues b eing reduced to 3.9 p ercen t after 500
ep o c hs and the addition of six PBNo des . . . . . . . . . . . . . . . . 109
17
4-7 This test w as conducted also using the metho d of early stopping on the
EMNIST Merged Dataset. After 200 ep o c hs the net w ork with the b est
accuracy is loaded b efore adding additional PBNo des. PBNo des are
added after Ep o c hs 184, 202, 205, 251, 251, 260, 266, and 266, with du-
plicates meaning that PBNo de addition did not impro v e accuracy . The
thic k blue line indicates training error for the final 8 PBNo de net w ork
while the thic k red line indicates test error. All other dotted lines sho w
ho w the accuracies progressed b efore loading the early stopp ed time
p oin t of maxim um v alidation accuracy , meaning this is ho w training
w ould progress if more PBNo des w ere not added. As can b e seen eac h
time a new PBNo de is added accuracy con tin ues to impro v e, while ac-
curacies stagnate or b ecome w orse without the addition. V ertical blac k
lines sho w timep oin ts where PBNo des w ere added, with double lines at
251, and 266 signifying early stopping determined the first no de added
at that time step did not impro v e accuracy . . . . . . . . . . . . . . . 110
4-8 In this test only a single single set of PBNo des is added to a net w ork
running on the EMNIST Merged Dataset. T raining w as p erformed
on a traditional net w ork with no PBNo des through 285 ep o c hs un til
accuracy no longer impro v ed. A t this time a set of PBNo des w as
created for all Neurons in the net w ork. T ests w ere then con tin ued b y
adding the PBNo des to all Neurons, to only the Neurons in the top
la y er, or only excluding neurons in the top la y er. Results sho w that
adding only PBNo des to the top do es not actually impro v e the net w ork,
while adding to just the b ottom do es impro v e results, but adding to
the full net w ork impro v es results the most significan tly . V ertical green
line sho ws timep oin t where PBNo des w ere added. . . . . . . . . . . . 111
18
4-9 This test is conducted on a residual neural net w ork on EMNIST Merge
and Balance datasets. Once again early stopping w as conducted with
the b est net w ork after 200 ep o c hs b eing loaded b efore PBNo des are
added. F or the balance dataset this w as at ep o c h 118 while the merge
dataset added PBNo des at 128, sho wn b y the v ertical blue and red
lines. Dotted lines sho w ho w learning prograssed without the addition
of the PBNo des b efore loading the time p oin t of maxim um accuracy ..
These results sho w our PBDNN system is compatible with mo dern
resnets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4-10 Bo x plots comparing training and test error for eigh t tests of normal (no
PBNo des), con tin uous training, maxim um accuracy reac hed b y adding
b et w een three and eigh t PBNo des, and maxim um accuracy reac hed
after only one PBNo de. Exact PBNo des created for the maxim um
condition are as follo ws: 2 No des: 2 T ests, 3 No des: 1 T est, 5 No des:
3 T ests, 7 No des: 1 T est, 8 No des: 1 T est. As can b e seen Con tin uous
training outp erforms all others on the training data. Ho w ev er, on T est
data it is comparable to the accuracies reac hed with a single PBNo de
and w orse than adding m ultiple PBNo des. More testing is still required
on if con tin uous training is p ossible with m ultiple PBNo des as w ell. . 113
4-11 Bo x plots comparing training error of eigh t tests b et w een a net w ork
with no no des and the same net w ork after adding b et w een one and
eigh t no des. Exact PBNo des created for the are as follo ws: 0 No des: 8
T ests, 1 No de: 8 T ests, 2 No des: 8 T ests, 3 No des: 6 T ests, 4 No des: 5
T ests, 5 No des: 5 T ests, 6 No des: 2 T ests, 7 No des: 2 T ests, 8 No des:
1 T ests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
19
4-12 Bo x plots comparing training error of eigh t tests b et w een a net w ork
with no no des and the same net w ork after adding b et w een one and
eigh t no des. Exact PBNo des created for the are as follo ws: 0 No des: 8
T ests, 1 No de: 8 T ests, 2 No des: 8 T ests, 3 No des: 6 T ests, 4 No des: 5
T ests, 5 No des: 5 T ests, 6 No des: 2 T ests, 7 No des: 2 T ests, 8 No des:
1 T ests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4-13 This graph sho ws training accuracy on the EMNIST Balance dataset
with con v olutional net w orks of depth three. Y axis is accuracy , while
X axis sho ws the total n um b er of learnable parameters in the net-
w ork. The three categories represen t the width of the net w ork. Width
terminology is tak en from [200]. It simply means a width of one is
the starting arc hitecture for a deep con v olutional net w ork. A width
of three has three times as man y k ernels in eac h la y er, while a width
of t w elv e has t w elv e times as man y k ernels. Squares p oin ts represen t
the n um b er of parameters in the original con v olutional net w ork and
eac h circular p oin ts represen t net w orks after an additional set of PBN-
o des has b een added. Eac h accuracy sho wn is the maxim um accuracy
ac hiev ed b y the net w ork at whic h p oin t more ep o c hs do not con tin ue
impro v emen t. V ertical green lines are sho wn to giv e a clear visual of
the n um b er of parameters in the baseline net w orks of eac h size. With
these v ertical lines it can b e seen that after adding four PBNo des a
net w ork with initial width one has more parameters than the initial
width of a net w ork of width three. A net w ork of width three m ust add
fiv e PBNo des b efore it con tains more parameters than the initial size
of a net w ork of width t w elv e. . . . . . . . . . . . . . . . . . . . . . . 116
4-14 T raining sum squared error rate while learning a v ertices dataset. Blue
bars represen t the ends of Ep o c hs without adding PBNo des, Green
bars represen t the end of Ep o c hs where PBNo des w ere added. X-axis
is batc hes run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
20
4-15 T emplates of no des created b y AR T on alphan umeric c haracters. Fig-
ure from [30] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
21
22
Chapter 1
Introduction
1.1 The Perceptron
Artificial Neural Net w orks b egan with the in v en tion of the p erceptron in 1957 [151].
The p erceptron is a t yp e of linear classifier. This means it is able to linearly sepa-
rate data p oin ts through lines, planes, or h yp erplanes in to t w o classes. As a mac hine
learning algorithm a p erceptron is able to learn the parameters to its linear clas-
sification dynamically as it trains on input data. P erceptrons can b e trained with
sup ervised, unsup ervised, or reinforcemen t learning paradigms [123, 176, 159, 161].
Since their first instan tiation p erceptrons ha v e b een conjoined together in to net w orks
comprised of man y neurons, man y la y ers, and ev en more connections. They ha v e
b een the baseline for significan t researc h topics con tin uing to da y .
The concept leading to the p erceptron w as the biological neuron. A biological neu-
ron receiv es input from presynaptic no des while the p erceptron receiv es input from
data. Biological neurons ha v e a non-linear spiking mec hanism based on presynap-
tic activ ation, and the p erceptron computes an output v alue based on a non-linear
com bination of its presynaptic inputs. Both ha v e connection strengths asso ciated
with inputs for ho w influen tial eac h input will b e based on eac h input’s actual v alue.
Both are capable of sending output v alues elsewhere p ostsynaptically . After this list,
ho w ev er, the parallels b et w een an artificial neuron and a biological neuron abruptly
end [93, 151]. A P erceptron can b e seen in Fig 1-1.
23
Figure 1-1: Structure of a basic p erceptron. F rom [152].
1.2 Other Classifiers
A few other classifiers used regularly are Naiv e Ba y es [149], Supp ort V ector Mac hines
[70], and Random F orests [27]. Naiv e Ba y es is a probabilistic classifier based on Ba y es
Theorem that assumes indep endence of input v ariables. This classifier is can pro vide
strong accuracy with less data, but also if significan t data is a v ailable it will not
impro v e as m uc h as others. It is also more lik ely to underfit data than to o v erfit. A
supp ort v ector mac hine w orks v ery similarly to a p erceptron. The biggest difference
is that rather than just calculating error as if it’s decision class is correct or incorrect
a supp ort v ector mac hine also tries to maximize its linear classification line to split
the data with the largest margin p ossible. Supp ort v ector mac hines are also less
lik ely to o v erfit and pro vide faster training than p erceptrons. Random forests are an
ensem ble metho d [45] for classification. They pro cess input data b y creating a series
of decision trees and then at test time output the class that is the mean or mo de of
all of the created trees. They often are v ery simple to train and generate relativ ely
robust decisions.
24
1.3 Perceptrons in Computer Vision
Computer vision p oses extremely c hallenging tasks ranging from ob ject classification
and detection, to mo dern 3D p oin t cloud pro cessing and abstract scene understanding.
Since its inception in the 1960’s computer vision has tak en man y turns. Algorithms
exist are based on man y paradigms suc h as template matc hing [12, 153, 95, 35],
principal comp onen t analysis [95, 43, 197, 183], and feature selection [55, 28, 92]. But
recen tly one place p erceptrons, and p erceptron based algorithms, ha v e b een used for
the most c hallenging problems is computer vision. After y ears of not b eing on the
forefron t of computer vision artificial in telligence has circled bac k to artificial neural
net w orks as mo dern hardw are has allo w ed for larger net w orks.
Since the groundbreaking researc h of Alex Krizhevsky in 2012 [101] Con v olutional
neural net w orks are in wide use for computer vision tasks. Mo dern neural net w orks
are man y la y ers deep utilizing the bac kpropagation algorithm [71] to train p ercep-
trons in the middle la y ers of the net w ork. A con v olutional neural net w orks utilizes
represen tativ e no des rather than no des fully connected to presynaptic la y ers. These
no des, often in the form of input k ernels, are con v olv ed around presynaptic la y ers b y
applying the k ernel w eigh ts to eac h lo cation and output a plane of activ ation v alues
rather than a single v alue [116, 115]. Other la y ers are p o oling la y ers whic h reduce
the dimensions of the output planes b y metho ds suc h as a v eraging across lo cations or
taking the max v alue [68, 194, 160, 24]. Finally the la y ers closest to the output t yp-
ically are comprised of fully connected la y ers lik e in a simple m ultila y er p erceptron.
This can b e seen in 1-2
Mo ving only sligh tly a w a y from p erceptrons Deep Belief Net w orks [75, 73, 119] and
Deep Boltzmann Mac hines [199, 173] also are used in computer vision problems, often
to pretrain deep con v olutional net w orks in an unsup ervised manor [117, 75]. Both
are also comp osed of no des and connections, ho w ev er Deep Boltzmann Mac hines are
fully undirected and Deep Belief Net w orks are undirected and the top la y ers. The
undirected la y ers of b oth are comp osed of Restricted Boltzmann Mac hines [74, 168,
76]. Deep Belief Net w orks are trained in a stac king manner b y training eac h pair
25
Figure 1-2: A con v olutional neural net w ork. The larger squares represen t the input
image at the b eginning and then the plane of activ ation v alues represen ted as feature
maps. The smaller squares b eing pro jected forw ard represen ts the k ernel, a single
v alue on a p ostsynaptic la y er is calculated b y the w eigh ts of its k ernel applied a
corresp onding lo cation in the input map. F rom [3].
of la y ers one at a time in the normal manner of a Restricted Boltzmann Mac hine,
first training b et w een input and the first la y er and then b et w een pairs of deep la y ers.
Deep Belief Net w orks also can b e used in a con v olutional manner [117, 118]. Deep
Boltzmann Mac hines learn in a similar manner, ho w ev er a Deep Boltzmann Mac hine
trains all la y ers at once using a sto c hastic maxim um lik eliho o d. A visual of a Deep
Belief Net w ork and a Deep Boltzmann Mac hine can b e seen in Fig 1-3. Another
metho d can also b e used where the net w ork is first built as a Deep Belief Net w ork
and then con v erted to a Deep Boltzmann Mac hine for fine tuning [158]. Similarly ,
auto enco ders [15] can b e stac k ed to learn in a similar w a y . Auto enco ders differ in
that they directly learn a represen tation rather than a statistical distribution and
learn deterministically rather than probabilistically [187, 16, 111].
1.4 Computer Vision Challenges
Computer vision is a broad field ev en when only considering still images, one subset of
the w orld of computer vision. Classification could b e considered one of the simplest
problems. This in v olv es only iden tifying what a picture is of, suc h as in datasets
lik e the digits of MNIST [116], the c haracters of EMNIST [65, 39], or the pictures
26
Figure 1-3: Figure sho wing Deep Belief Net w ork and Deep Boltzmann Mac hine. F rom
[188].
of CIF AR10 and CIF AR100 [100]. A more c hallenging problem is detection, where
not only m ust the ob ject b e iden tified, but it m ust also b e lo cated with a b ounding
b o x where m ultiple ob jects migh t o ccur in the same image. A hea vily comp etitiv e
dataset is that of ImageNet [44] . Con tin uing with still images facial recognition
is a highly researc hed topic. F acial recognition p oses an extra difficult y b ecause
while faces can b e iden tified b y higher lev el features, determining who a sp ecific face
b elongs to requires lo w lev el differen tiation b et w een the subtleties of the activ ated
features. Man y tec hniques ha v e b een prop osed some of whic h use CNNs [195, 140,
113], while others use handcrafted features [33, 29, 17, 32]. In all of these areas
artificial in telligence has come a long w a y . But in the most c hallenging problems suc h
as ImageNet Detection Challenge the b est systems in the w orld still can not comp ete
with the h uman brain [44].
27
1.5 Assistive Vision T echnologies
Another topic of in terest where computer vision can pro vide a solution is in assistiv e
tec hnologies for the blind. Muc h progress has b een ac hiev ed in dev eloping electronic
aids to assist visually impaired p eople as tec hnology has adv anced. One metho d is to
con v ert images to soundscap es whic h some sub jects can learn to in terpret w ell enough
to differen tiate places, and to iden tify and lo cate some ob jects [174]. Others include
lo calization in an en vironmen t using stereo cameras, accelerometers, and ev en wifi
access p oin ts [80, 25]. A dv ances ha v e also b een made to traditional aids suc h as canes,
b y dev eloping electronic replacemen ts using, e.g., sonar to increase their w arning range
or gran t the same feedbac k but without a ph ysical cane [185, 134], and replacing guide
dogs with rob ots [105]. Among these devices man y utilize computer vision to help
with na vigation, text reading, and ob ject recognition. [134, 127, 179, 132, 2].
1.6 This Dissertation
This w ork aims to approac h computer vision and neural net w orks from a differen t
direction of most researc hers. Rather than use traditional computer science and
statistics, the approac hes tak en here rely on biology . The first pro ject relies on it
directly , and the second as inspiration.
The first pro ject is a system of h uman-in-the-lo op computer vision. This pro ject
accepts that h umans are simply b etter at man y tasks than the b est computer systems
in the w orld. Rather than try to solv e these tasks w e get around the c hallenge b y
putting a h uman in our decision making lo op. W e devise a system that only has a
digital camera as sensory input. After the camera frame is pro cessed the decision of
what to do next is left up to the h uman who is able to use his o wn critical thinking and
mobilit y to con tin ue through the exp erimen t. The participan ts use audio instructions
giv en b y our system to na vigate and find items in a sim ulated gro cery store setting.
Using this paradigm w e are able to ac hiev e 100% accuracy on this real w orld task,
something stand-alone computer vision systems w ould ha v e significan t trouble with.
28
The second pro ject is a no v el mo dification to mo dern state of the art neural net-
w orks. When the p erceptron w as in v en ted in 1957 our understanding of neuroscience
w as significan tly less than it is to da y . The pro ject is based on researc h in neurobiology
that not only do es the cell b o dy of a neuron spik e as a decision making comp onen t of
a neuron, but also the branc hing dendritic trees of all neurons in the brain p erform
additional complex non-linear pro cessing as presynaptic input tra v els to the cell b o dy .
W e mo dify the core p erceptron building blo c k to also ha v e additional nonlinear no des,
inspired b y dendrites, to allo w eac h building blo c k p erceptron to do additional data
pro cessing and co de for more complex features b efore computing its final activ ation
v alue. W e split up the building blo c ks b y using our mo dification to the bac kpropa-
gation algorithm w e call “P erforated Bac kpropagation.” Building up on the researc h
of b oth neurobiologists and computer scien tists our algorithm can b e added to an y
system of artificial neural net w orks and in man y cases impro v e the final accuracy .
29
30
Chapter 2
Perfect Accuracy With
Human-on-the-Loop Computer Vision
2.1 Introduction and Background
P eople who are blind ha v e more difficult y na vigating the w orld than those with sigh t,
ev en in places they ha v e b een b efore [141, 48]. This is a condition that affects 39
million p eople w orldwide [190]. Man y adv ances ha v e b een made in computer vision,
y et, ev en state of the art algorithms ha v e not y et b een able to ac hiev e p erfect accuracy
on standard datasets [36, 177, 69]. Our algorithm’s success is founded in the areas of
dynamic thresholding and activ e vision [4]. A ctiv e vision is the pro cess of c hanging
views to b etter iden tify what is b eing lo ok ed at. This can b e through c hanging the
p ose of the camera or c ho osing a region of in terest with a larger field of view and then
attempting iden tification within that region using a zo omed-in image [10, 21, 184, 63].
Dynamic thresholding is an y recognition system whic h has a decision threshold more
complicated than a single n um b er. F or example, some metho ds include differen t
thresholds for parts vs. whole ob ject detection [52], adaptiv e lo cal thresholding
[203, 90], and connectivit y based thresholding [138].
Despite the adv ances discussed in the in tro duction in assistance for the blind,
shopping can still b e a nearly imp ossible task. Man y b o xed and canned items ha v e
iden tical shap es, whic h means without one of these aids, or normal vision, help from
31
a p erson with vision is required for selecting the correct item [106]. Ev en successful
devices suc h as OrCam [132] require the user to p oin t at the desired ob ject to b e
iden tified. This is great for p eople with p o or vision, but not helpful for the fully
blind. T o address this, systems ha v e b een prop osed that read barco des [136] or
iden tify items on the shelv es using computer vision algorithms [127, 179]. On the
one hand, barco de scanners nev er mak e mistak es, although they can b e tedious to
use when lo oking for a sp ecific item in a large gro cery store (as sho wn in our o wn
results, see b elo w). On the other hand, a serious problem with using a computer
vision system for this application is that if they mak e to o man y mistak es, users will
lik ely stop using them [53, 142]. An acceptable system cannot ev er tell the user to
select an item they do not w an t.
2.2 Challenges of Picking a Decision Threshold
In a t ypical ob ject detection computer vision system eac h input image requires the
system to determine a confidence for ho w lik ely it is that an y items trained for are
curren tly in that image. If the confidence is high enough it will tell the user it has
found the item. Ho w ev er, no matter where the confidence threshold is set, for most
ob jects and algorithms there will b e some range of v alues where the system will mak e
mistak es [127], either false alarms or misses. If the threshold is set to o high the
system can decide it has not found the item when it w as presen t (miss), and if the
threshold is set to o lo w and the system can decide it has found the item when the
item w as not presen t (false alarm). This problem happ ens with almost ev ery system
with a confidence threshold for detection b ecause there often are some images without
a particular item where the confidence ma y b e higher than for some images with the
item.
T o sho w this p oin t using the set of 25 ob jects used in our exp erimen ts ab o v e
(Fig. 2-1), a dataset of pictures w as collected in our sim ulated gro cery store setting.
A camera w as placed in a fixed p osition and ob jects w ere arranged in fron t of it with
their cen ters t w o feet a w a y from the camera. Ob jects w ere then rotated v ertically
32
Figure 2-1: T emplate images for ob jects in database b y ro w, left to righ t. 1, Cereal:
PEB, CP , HNC, LUC, MGC. 2, Snac ks: SR, HBN, OCP , PS, NB. 3, P asta: HH,
KRA, PR, MA C, VEL. 4, T ea: SM, LIP , FTS, FR, ST A. 5, Candy: NRD, HT, GNP ,
MD, JM.
33
and horizon tally at 15 degree in terv als for eac h picture from negativ e 45 degrees to
p ositiv e 45 degrees offset giving a total of 1225 images. Images in whic h the ob ject
p ose (homograph y , discussed in the follo wing section) could not b e reco v ered b y our
algorithm w ere not included b ecause the system w ould not b e able to guide the user
from those images. Remo ving these images left a total of 1112 usable images or
44.5 images on a v erage p er item. Fig. 2-2-A sho ws receiv er op erating c haracteristic
(R OC) curv es for recognizing eac h of the ob jects in the dataset individually and
Fig. 2-2-B sho ws one R OC curv e o v er all ob jects. Confidences w ere calculated using
the SURF [12] algorithm. Some ob jects had less of a problem than others, with a
smaller p ortion of o v erlap b et w een the highest confidence without the ob ject and the
lo w est confidence with the ob ject. Only 2 ob jects had no o v erlap at all. This means
just these 2 ob jects of 25, with the images collected, w ould yield no mistak es with a
fixed threshold. The R OC curv es for some of the other items are quite go o d as w ell;
ho w ev er, ev en an error rate of only 1%, migh t cause an error ev ery 25 seconds in our
system that runs in real time at appro ximately 4 frames p er second. In the discussion
section w e will detail wh y ev ery mistak e is a large issue for the user. The only w a y to
a v oid this problem is to not ha v e a y es/no threshold, and instead allo w the system to
output that it is unsure within this range of v alues where there will b e uncertain t y .
2.3 Proposed System
2.3.1 Human-in-the-Loop
The prop osed system consists of a camera moun ted on a pair of glasses, whic h cap-
tures images in real time. Users can pro vide instructions as to whic h ob ject they
w an t to reac h for next (in exp erimen ts, that w as con trolled b y the exp erimen ter).
Camera images are then analyzed as the user mo v es through the en vironmen t un til
at least some w eak evidence for the presence of the ob ject is determined b y the vision
algorithm. If there is evidence that the ob ject ma y b e presen t, but the system is
uncertain (as further detailed b elo w), the user is not y et told that the ob ject has
34
Figure 2-2: (A) R OC curv es of confidence v alues o v er all images and all ob jects
collected. PEB and HNC are the only ones where all confidences for images of other
ob jects are lo w er than all confidences for images of themselv es. (B) R OC curv e for
correctness with a single fixed threshold o v er all ob jects b eing tested on.
b een found. Instead the user is instructed to turn, mo v e, strafe, or crouc h in a w a y
that will decrease the difference in ob ject p ose b et w een the curren t camera view and
the system’s template image for that ob ject. T emplate images for the ob jects are
fron t and cen tered. As the viewp oin t c hanges and pro vides increasingly more fron t
and cen tered views of the ob ject, confidence of the vision algorithm is exp ected to
increase. When confidence exceeds the threshold necessary to ensure the decision will
not b e a false p ositiv e, the ob ject is declared found. The user ma y still b e further
guided so that the ob ject b ecomes cen tered in the camera’s field of view. A t this
p oin t the user is informed that that the ob ject is straigh t in fron t of their face and
that they should reac h out to grasp it. A simple flo w diagram of this pro cess is sho wn
in Fig 2-3.
As describ ed in the in tro duction, this allo ws our system to fulfil our goal. The
only external sensory input to the system of h uman and camera is the visual stim uli
detected b y the camera. Ho w ev er, the computer vision algorithm relies on the h uman.
W e do not ha v e mac hine learning applied for decision making, planning, or mo v emen t
of the system. Our approac h uses the cognitiv e abilities of the users (to understand
instructions) and their mobilit y (to execute the suggested mo v es) to impro v e the
35
Figure 2-3: Flo w diagram of c hoices algorithm mak es.
qualit y of the view of an ob ject as captured b y the head moun ted camera. Because
our main application is for blind users, it will still b e able to solv e a problem in the
real w orld where h uman visual abilit y can not b e relied on.
2.3.2 Setting Thresholds and the Homography Matrix
T raining is p erformed to find confidence thresholds for the top and b ottom of the
uncertain t y range for eac h item. This range is defined as the v alues b et w een the
lo w est true p ositiv e threshold, and the highest false p ositiv e threshold. The lo w est
true p ositiv e is the smallest confidence score ev er giv en to an image that an item is
presen t where the item b eing trained for w as actually in the image. The highest false
p ositiv e is the largest confidence ev er giv en that an item w as in a training image when
it w as not presen t. T raining and testing incorp orate the use of homograph y matrices.
A homograph y matrix is a represen tation of where the camera is relativ e to a set of
p oin ts in space that all lie on the same plane. Homographies are calculated based on
the relativ e p ositions of a set of p oin ts in relation to eac h other in a template image
compared to their relativ e p ositions in a camera image. F or example, if the p oin ts
are all prop ortionally closer, the homograph y w ould sho w the camera is further a w a y
from the ob ject than where it w as when the template image w as tak en. Another
example can b e seen in Fig 2-4. In our case, the template p oin ts are from the goal
ob ject for whic h the system is curren tly training. Because the p oin ts m ust b e on the
same plane, in the curren t instan tiation of the system ob jects b eing found m ust b e in
36
Figure 2-4: Visual represen tation of Homograph y calculation [162]. In this case the
p oin ts will b e prop ortionally closer in the y-axis, while in the the x-axis they will
closer at the top and further apart at the b ottom. The calculated homograph y w ould
describ e a p osition where the camera is lo oking up at the ob ject from b elo w.
37
b o xes, as opp osed to cans or other ob jects without a flat fron t surface. T o calculate a
homograph y a k eyp oin t matc hing algorithm is required. These algorithms calculate
feature descriptors in images and finds matc hes b et w een similar descriptors in other
images. Matc hes will include a matc h confidence as w ell as the pixel p ositions in b oth
images, as needed for the homograph y calculation. W e c hose SURF [12], as opp osed
to others suc h as SFIT [95], b ecause of its sp eed.
In the end system, homograph y matrices are the metho d used to giv e instructions.
T o train the lo w est true p ositiv e threshold, ob jects are displa y ed to the camera and
rotated in all directions and mo v ed closer and farther a w a y . The lo w est true p ositiv e
v alue is determined to b e the lo w est confidence v alue seen where a homograph y is
still able to b e calculated. If a homograph y cannot b e calculated these confidences
are not used b ecause w e w ould not b e able to direct the user from those images. The
highest false p ositiv e v alue is trained at the same time. While training the lo w est
true p ositiv e for one ob ject, confidences are recorded for ev ery other ob ject in the
database. The strongest confidence ev er seen for eac h ob ject, while actually lo oking
at other ob jects, sets the thresholds for the highest false p ositiv es. T o b e safe, w e
additionally add 15% of the range b et w een thresholds to this v alue as a buffer. An
example of scores relativ e to these thresholds is sho wn in Fig 2-5.
2.3.3 Instructions to the User
During a run, if the confidence is within the uncertain t y range the system outputs
that it do esn’t kno w the answ er. Ho w ev er, it uses the information it has to arriv e at a
b etter decision later. If the confidence is b et w een these thresholds, and a homograph y
can b e calculated, the system will kno w where the camera is relativ e to the p oin ts on
the ob ject used in the homograph y calculation. It can then pass on this information
to the comp onen t that mo v es the camera. In our application that comp onen t is
the h uman user. Using audio feedbac k our system tells the user ho w to mo v e in
order to guide the camera to a b etter viewing angle. If homographies con tin ue to
b e calculated, ev en tually an ideal, fron t and cen tered, viewp oin t can b e ac hiev ed.
Images from this camera angle generate the most similar k eyp oin ts to the template’s
38
Figure 2-5: Confidence threshold displa y for three items. Bars represen t confidence
v alues o v er last 20 frames. No units are sho wn b ecause displa y confidences for eac h
are relativ e p ercen tage b et w een max and min ev er seen b y the system when searc hing
for eac h item. Middle threshold is the highest false p ositiv e. Bottom threshold is the
lo w est true p ositiv e. T op threshold is the extra buffer, 15% ab o v e of the range ab o v e
the highest false p ositiv e. Left confidence is in the range of uncertain t y , if this w as
the item b eing searc hed for directions w ould b e giv en. Middle confidence is ab o v e
the max false p ositiv e v alue meaning this item is actually in the camera frame. Righ t
confidence is b elo w the lo w est true p ositiv e so this item is certain to not ha v e enough
k eyp oin t matc hes in the image to reco v er a homograph y .
39
k eyp oin ts, giving the highest confidences. If the confidence of an image surpasses
the highest false p ositiv e v alue for the goal ob ject correctness is certain. If, with an
ideal viewp oin t, the item is still not ab o v e this threshold the user kno ws to mo v e on.
This will happ en when items ha v e enough k eyp oin ts in common that a homograph y
for the goal item is still able to b e calculated from k eyp oin ts found on the alternate
item. Most frequen tly this is seen b et w een ob jects whic h share brand logos or other
p ortions of similar visuals.
2.3.4 The Components
The ph ysical system consists of three comp onen ts. The first is the headset, created
b y attac hing a w eb cam to a pair of glasses. The camera is attac hed directly in the
middle to b est capture images replicating where a p erson w ould b e lo oking. The next
is a pair of headphones to allo w the user to hear the audio feedbac k. The last is the
computer whic h p erforms SURF template matc hing, c hec ks confidences, and giv es
instructions. W e ha v e used a GigaByte Brix whic h is able to b e placed in a bac kpac k
and p o w ered with a battery while the user is p erforming the task. These comp onen ts
are all con trolled via SSH b y a Samsung tablet.
2.4 Experiment Setup
F or real w orld applications a computer vision system m ust b e fla wless, or close to
fla wless in iden tifying what it is b eing used for. A system sa ying it has found what it
is lo oking for when it has not could range from catastrophic to just incon v enien t, but
in an y case it w ould not b e widely used with more than a tin y allo w ance for mistak es.
The goal of our system is to sho w that this metho d of thresholding and relying on
sup erior h uman decision making can ac hiev e p erfect accuracy . Blind gro cery shopping
tests this goal. The visually impaired user is able to mak e their o wn c hoices and utilize
their o wn mobilit y for all parts of the task other than the actual vision. Gro cery
shopping is also a task where a system telling the user to purc hase the wrong item
ev en once w ould b e considered a serious mistak e.
40
Figure 2-6: Exp erimen t setup. Sho wn is a user confirming a selected item in the
sim ulated gro cery store.
2.4.1 Environment and Instructions
Our exp erimen t to ok place in a sim ulated gro cery store aisle using blindfolded partic-
ipan ts as sho wn in Fig 2-6. Sub jects w ere 42 studen ts. W e arranged t w o b o okshelv es
next to eac h other where our gro cery store items could b e placed. Eac h b o okshelf has
three shelv es and w e allo w ed items to b e placed in three lo cations p er shelf making
18 total lo cations items could b e placed. During an y giv en run fiv e items w ould b e
out on the shelv es at a time. These items came from one of fiv e categories; cereal,
snac ks, pasta, tea, and candy . The fiv e items arranged together during eac h trial
w ould b e from the same category . Users all p erformed 1 trial from eac h category with
the same lo cations, and 2 trials from eac h category with randomized p ositions whic h
w ere unique, for a total of 15 trials p er sub ject. T o b egin eac h trial, the user w ould
stand against the bac k w all of the ro om facing the items. A t this p oin t the system
w ould b e turned on with the goal item selected. The user w as instructed to mo v e
slo wly around the ro om, while facing the shelv es, un til an initial mo v emen t command
41
Figure 2-7: Instructions corresp ond to camera’s p osition based on homograph y cal-
culation. User is guided to mak e camera p oin t directly at cen ter of ob ject. Cen ter
image sho ws a strafe command where user w ould b e instructed to rotate in addition
to mo v e.
w as giv en b y the system. This w ould o ccur when SURF matc hes w ere made in ar-
rangemen ts where homographies could b e calculated and the confidence w as ab o v e
the lo w er w orst true p ositiv e threshold. Once an instruction w as receiv ed the user
w as to follo w the instructions whic h w ould guide them to b e cen tered in fron t of the
item. Instructions included “Left,” “Righ t,” “Up,” “Do wn,” “Left Up,” “Left Do wn,”
“Righ t Up,” “Righ t Do wn,” “Strafe Left,” “Strafe Righ t,” “Strafe Up,” “Strafe Do wn,”
“Step F orw ard,” “Step Bac k,” and “Reac h Out.” Examples of images whic h w ould
elicit direction commands can b e seen in Fig 2-7. Direction commands w ere to mo v e
in those directions, strafe commands w ere to mo v e in that direction but rotate the
opp osite direction, and “Reac h Out” w as the command whic h w as only giv en when
the ob ject w as directly cen tered and the confidence w as ab o v e the w orst false p ositiv e
confidence plus buffer threshold.
When the “Reac h Out” command w as giv en users w ere to reac h out, from the
camera, and pic k up the item in fron t of them. Once this item w as grasp ed they
turned 90 degrees, to b e sure no other items from the shelv es w ere in the bac kground,
42
and confirm the item b y receiving a second “Reac h Out” command. This w ould
b e done b y holding the item up to the camera and mo ving the item based on the
audio feedbac k, rather than mo ving themselv es as w as done with the item on the
shelf. Sometimes users w ould b e guided to w ards incorrect items when t w o items had
similar enough features that confidence w ould b e high when lo oking at the wrong
item and p oin ts matc hed in suc h a w a y that homographies could still b e calculated.
Ho w ev er, when cen tered on the incorrect item the w orst false p ositiv e threshold w ould
not b e surpassed, and hence no “Reac h Out” command w ould b e issued. It w ould b e
up to the participan t to decide to mo v e on to other lo cations in the o ccasions where
they had cen tered on to an ob ject but w ere not b eing instructed to reac h out.
2.4.2 T raining the Participants
Eac h participan t w as first briefly trained on ho w to use the system. T raining started
with all 25 items out on the shelv es. This w ould b e more difficult than during non-
training trials, where the items w ould b e less cro wded. P articipan ts ran the exp er-
imen t three times without a blindfold, and then three times with a blindfold to get
a feel for the system. A t that time the participan t con tin ued training un til they
successfully p erformed three trials in a ro w without making a mistak e. Mistak es are
defined in t w o w a ys. One w as if they pic k ed up the wrong item. Users knew not to
pic k an ything up un til they receiv ed the “Reac h Out” command, but actually reac hing
out to w ards the lo cation directly in fron t of the camera’s cen ter of field pro v ed to not
b e an inheren tly easy function to p erform. Some users reac hed sligh tly to the left or
righ t, or ev en to o high or to o lo w to a differen t shelf. The second predetermined mis-
tak e to a v oid w as “losing” the item once trac king had b egun. When the system w as
initially turned on, instructions t ypically w ere not receiv ed as the item to b e searc hed
for w as either not in the camera frame, or far a w a y and therefore to o small in the
image to get enough k eyp oin t matc hes to calculate a homograph y . This w as the “no
instruction” condition. In this case the user w as to scan the shelv es without instruc-
tions un til a first instruction w as giv en. A t this p oin t the user follo w ed instructions
whic h w ould guide the ob ject to the cen ter of the camera frame. If the user mo v ed
43
in suc h a w a y that the ob ject w as lost from the camera frame and they w ere relapsed
to the “no instruction” condition that w ould b e considered a failure during training.
During the actual exp erimen t, trials w ould not b e ab orted whenev er the sub ject “lost”
an item, and users had to reco v er from it on their o wn. Lik ewise, if the user pic k ed
up an item but could not confirm it and decided they had reac hed incorrectly , the
item w ould b e returned to the shelf and the trial w ould con tin ue. F ailures during the
exp erimen t could hence o ccur if users b oth pic k ed up and confirmed the wrong item,
or if they ga v e up on a giv en trial (whic h nev er happ ened).
2.4.3 Control Experiment
Our con trol exp erimen t w as p erformed in a similar manner using a barco de scanner.
This w as c hosen, rather than another computer vision system, b ecause w e w ere con-
fiden t w e could ac hiev e p erfect accuracy and w an ted to test against a second option
whic h w ould ha v e p erfect accuracy [136]. The barco de scanner used w as an Amazon
Dash. Exp erimen tal setup for these trials w as k ept as parallel as p ossible to a gro cery
store setup. The same t w o b o okshelv es w ere used, again with 18 p ossible lo cations.
As in mo dern gro cery stores the barco des w ere placed directly on the fron t face of the
shelf. In these trials one barco de w as selected at random as the goal. Users scanned
ev ery barco de in an y order they c hose un til the correct one w as scanned. No mistak e
conditions w ere defined for these trials. T raining simply consisted of giving the user
the scanner and a blindfold and they w ere allo w ed to practice indefinitely un til they
felt confiden t.
2.5 Experimental Results
Exp erimen ts w ere run as describ ed on 42 participan ts. F or all trials with all partic-
ipan ts the correct item w as alw a ys correctly obtained b y the participan t. Barco de
scanning trials also w ere alw a ys successful. In eac h trial three time p oin ts w ere col-
lected. The first w as the time at whic h the first instruction w as receiv ed. This cutoff
w as included b ecause in some trials participan ts w ould tak e the ma jorit y of their to-
44
tal time mo ving blindly b efore receiving an y instructions. With the barco de scanner,
sub jects w ould start trials with the scanner already held to the first barco de. The
first scan w ould regularly tak e less than a second, so w e w an ted to ha v e a cutoff for
the first piece of feedbac k for our system. The second time recorded w as the time of
the first “Reac h Out” command. A t this p oin t the system w as 100% sure the user has
found the item they are lo oking for and it w as directly in fron t of them. The final
time recorded w as the additional time needed for the user to actually pic k up and
confirm the item, a final step not tak en during the barco de scanning trials.
Mean total time for our system w as 73.1 seconds p er trial. Mean Barco de scanner
time w as 49.4 seconds. Using total time, this w ould mean the barco de scanner w as
distinctly faster. Ho w ev er, mean for first instruction time with our system w as 23.7
seconds and for first “Reac h Out” command w as 46.5 seconds. This giv es a mean
“time to complete” with our system, time b et w een first instruction and first “Reac h
Out” command, a mean time of only 18.1 seconds. W e b eliev e this is the time that
should b e compared, as further discussed in the next section. These results are sho wn
in Fig 2-8.
Surv ey ed participan ts w ere ask ed to assign a v alue 1-10 to their preference of
systems with 10 b eing completely preferred our system, and 1 b eing completely pre-
ferred the barco de scanner. Mean score w as 7.8 with only 2 participan ts rep orting
that they preferred the barco de scanner. Man y rep orted their preference came from
our system b eing able to pro vide more con tin uous feedbac k than a barco de scanner
as guidance to the goal ob ject. Of course, there could b e some resp onse bias of the
sub jects w an ting to b e “friendly” participan ts.
2.6 Discussion
The strongest algorithm from the most recen t ImageNet Challenge [154] w as dev el-
op ed b y MSRA [69]. They ac hiev ed an accuracy rate of 62.07% (as rep orted b y
[154]) o v er all ob ject categories in the dataset, with a range of 95.93% for the most
accurate category and only 19.41% for the w eak est. This is still an outstanding result
45
Figure 2-8: Bo xplots for total time tak en for runs of the system and the barco de
scanner, “time to complete” times for the system, and first “Reac h Out” times for
the system. Wilco xon Rank Sum T ests w ere run on eac h pair to test if they could
ha v e come from con tin uous distributions with equal medians. All but Barco de Time
vs First Reac h Out Time had significan t p-v alues: System Time vs Barco de Time:
1.5e-24, System Time vs Time to Complete: 6.2e-118, Barco de Time vs Time to
Complete: 2.4e-46, System Time vs First Reac h Out Time: 1.2e-34, First Reac h Out
Time vs Time to Complete: 7.4e-46, and Barco de Time vs First Reac h Out Time:
0.18.
46
with the complexit y of the ImageNet dataset, and impressiv e w ork with deep resid-
ual neural net w orks to ac hiev e it. Ho w ev er, this rate of accuracy w ould b e far to o
lo w for an y real w orld applications where mistak es are costly . In situations suc h as
assistance to blind gro cery shopp ers it is essen tial to not mak e mistak es. In earlier
instan tiations of our algorithm “W rong Item” w as also an instruction. It w as giv en
when an ob ject w as cen tered but the w orst false p ositiv e threshold w as not surpassed.
The in ten tion w as to inform users they had cen tered on an item with similar enough
k eyp oin ts to calculate homographies for the goal item, ev en though it w as not the
goal item itself. Ho w ev er, in the cases when this happ ened when they w ere actually
lo oking at the goal item, only b ecause one frame didn’t calculate go o d k eyp oin ts,
users w ould t ypically mo v e a w a y immediately . This c hoice sometimes added min utes
more to their time b efore coming bac k to the correct item. This is wh y w e decided
to instead giv e no instruction when the item w as cen tered but the threshold w as not
surpassed, and rely on the participan t to decide on their o wn when they had cen tered
on an incorrect item. As seen in the R OC curv es earlier, with a fixed threshold a
SURF based algorithm could p erform with reasonable error rates on all of the items
in our dataset. Ho w ev er, when ev en a single bad instruction from a single frame can
increase y our time significan tly , and the algorithm is running at man y frames p er
second, p erfect accuracy is necessary for an algorithm to b e optimal. Our exp erimen t
has sho wn that using this h uman-in-the-lo op system 100% accuracy is, in fact, p os-
sible with a computer vision based system in a real life application. Using a h uman’s
mobilit y and decision making allo ws the algorithm to not ha v e a fixed threshold and
instead p ostp one decisions when uncertain. Without forcing answ ers from uncertain
conditions the algorithm is able to nev er mak e mistak es.
Time for the barco de scanner w as stopp ed when the user scanned the correct
barco de. These trials did not require the user to pic k up an actual item or confirm
it. Remo ving the time to pic k up and confirm with our system mak es the t w o more
equiv alen t. The time b efore first command is also not parallel for the barco de scanner.
In barco de scanning trials the sub jects w ere allo w ed to start with the scanner already
held up to the first barco de of their c hoice. This often mean t the first piece of feedbac k
47
w ould b e immediate. In trials for our system the time tak en b efore the first command
w as receiv ed w as regularly a large ma jorit y of the total time tak en. A ma jor cause of
this w as the c hoice of w eb cam for our original system. With a lo w definition w eb cam,
the smaller items w ould sometimes require users to ha v e to get within a couple feet
from the item b efore they to ok up a large enough p ortion of the image to detect an y
k eyp oin ts. This mean t the sub ject migh t ha v e to blindly scan all 18 p ositions b efore
getting an y feedbac k whatso ev er. Sometimes they w ould ev en ha v e to do this more
than once if they did not scan correctly the first time. With an HD w eb cam the user
should b e able to scan all 18 p ositions on b oth b o okshelv es at once from the starting
p osition at the bac k w all. This w ould eliminate all time tak en b efore first command.
As evidence for this, for the larger items in the cereal category this w as already
the case. With suc h large items initial instructions w ere often heard immediately .
Considering only this category , mean total time w as 57.1 seconds. Ho w ev er, for
cereals first instruction time had a mean of 9.1 seconds compared to 27.5 for the
other categories. With a mean time of 34.6 seconds to pic k up and confirm an item
after receiving the first “Reac h Out” command this ga v e cereals a mean “time to
complete” time of only 13.5 seconds and a mean time from start to the first “Reac h
Out” command of 22.5 seconds. Either of these times are more comparable to the
barco de scanner times, since barco de trials did not require confirmation and started
feedbac k immediately , and b oth are faster.
Compared to a barco de scanner the total times for our system w ere slo w er. Ho w-
ev er, when only considering “time to complete,” the time needed for the sub ject to
cen ter the correct item in the camera frame after receiving their first instruction, our
system w as faster. Also considering only time to first “Reac h Out” command, ignor-
ing time tak en to grasp and confirm the item not necessary in barco de scanner trials,
times did not sho w significan t difference. Imp ortan tly , surv ey ed participan ts rep orted
they preferred the constan t guided feedbac k of our system against the y es/no feedbac k
the barco de scanner could pro vide, ev en in our reduced “store” with only t w o shelv es.
W e hence conclude that this study has successfully demonstrated a user-in-the-lo op
mac hine vision algorithm that made no mistak es and could b e an in teresting basis
48
for a new generation of visual aids.
2.7 Acknowledgment
This w ork w as supp orted b y the National Science F oundation (gran t n um b ers CCF-
1317433 and CNS-1545089), and the Office of Na v al Researc h (N00014-13-1-0563).
The authors affirm that the views expressed herein are solely their o wn, and do not
represen t the views of the United States go v ernmen t or an y agency thereof.
49
50
Chapter 3
Inspiration for Continued W ork
3.1 Introduction
The next pro ject c hanges direction. The Human-in-the-Lo op system accepted that
the h uman brain is b etter at man y mac hine learning and decision making tasks and
th us relied on it for those asp ects of the algorithm. The next pro ject instead tries to
bring asp ects of the h uman brain in to a mac hine learning algorithm, artificial neural
net w orks, whic h are already lo osely based on the brain. The h uman brain is an
incredibly complex mac hine with significan t c hoices when deciding what complexities
to add. Before getting in to the pro ject I will discuss t w o significan t comp onen ts
presen t in the brain that are not represen ted effectiv ely in neural net w orks. These
t w o comp onen ts are the ones that inspired m y pro ject most significan tly . One is from
Cognitiv e Psyc hology , and the other is from Neurobiology . This section will then
close with a discussion of other researc h in the field, that do es not rely on biological
inspiration.
51
3.2 Inspiration from Cognitive Psychology: Ob ject
Representation in Humans
3.2.1 Overview
Ob ject represen tation in the brain is a topic that is b ecoming more and more under-
sto o d with mo dern neuro cognitiv e and fMRI exp erimen ts. This includes recognition
of individual ob jects and also relations of the descriptiv e qualities of m ultiple ob jects
[103, 67, 19, 20, 148, 94]. The researc h that originally inspired me to mo v e from an
undergrad in computer science to getting a PhD in neuroscience w as that of Dr. Irv-
ing Biederman at USC [19]. The Recognition-b y-Comp onen ts theory of h uman visual
pro cessing is based on the idea that the preferred metho d of ob ject detection in the
h uman visual stream is to build ob jects up from their comp onen ts. After edge de-
tection edges are first pro cessed b y their non-acciden tal prop erties. A non-acciden tal
prop ert y (NAP) is a visual feature of an ob ject that is in v arian t to most c hanges in
viewp oin t. F or example a curv ed edge will b e curv ed in ev ery viewp oin t except for
when the curv e is view ed head on when it will app ear straigh t. Another example
requiring m ultiple edges is that in the cen ter corner of a cub e the three edges form
a “Y” v ertex, and it will lo ok lik e a “Y” un til rotated so significan tly that the cen-
ter v ertex b ecomes one of the outside v ertexes. These non-acciden tal prop erties are
then com bined upstream to form geons, the basic building blo c ks of ob ject detection.
Geons are simple geometric primitiv es suc h as cub es and cones. These primitiv es are
then com bined in the next la y er in to the ob ject that the brain can then recognize.
Examples of non-acciden tal prop erties and ho w they are com bined in to geons are
sho wn in Fig 3-1 and 3-2 from [19].
Scien tists ha v e b een trying to in v en t computational mo dels whic h function the
w a y the brain w orks for decades [47, 139, 46, 81]. Ho w ev er, man y ob ject recognition
mo dels whic h ac hiev e strong results do not do so with the same metho ds as the brain.
This section will discuss recen t con v olutional neural net w orks (CNNs) and ob ject
iden tification. I will discuss a pap er that do es ac hiev e biologically relev an t results
52
Figure 3-1: A set of examples of non-acciden tal prop erties. F rom [19].
53
Figure 3-2: T w o examples sho wing ho w edges and non-acciden tal prop erties b et w een
them form the building blo c ks of geons. F rom [19].
when sp ecifically trained to [139]. Ho w ev er in most CNNs, esp ecially the CNNs that
the computer science comm unit y w ould describ e as state of the art, neurons do not
enco de these essen tial non-acciden tal prop erties [101]. I will discuss a pap er arguing
the opp osite and the fla ws in its reasoning [104], and another pap er supp orting m y
view [178]. I will conclude b y detailing ho w CNNs rely on an o v erabundance of
neurons, la y ers, data, and pro cessing sp eed in order to mak e up for their deficits in
learning algorithm compared to the h uman brain. As a result, all fail substan tially
compared to the visual abilities of a h uman.
3.2.2 Analysis of Biological Relev ance in Modern Neural Net-
works
There is disagreemen t in the field ab out if neural net w orks do in fact represen t ob-
jects in a neurologically relev an t w a y . This section b egins b y discussing supp orting
evidence of a net w ork whic h accomplishes the goal en tirely , but trains the net w ork
54
in a un usual w a y . it ev aluates exp erimen ts using t ypical training with familiar net-
w orks. It concludes with con tradictory researc h, that deep con v olutional nets do not
adequately ac hiev e neurologically relev an t ob ject represen tation.
Netw ork Learning Biologically Relev ant F eatures
HMAX [147] is one of the first con v olutional neural net w orks designed to try to
replicate the actual visual stream in the h uman brain. T raditional HMAX mo dels,
and the simplified HMAX tested b y Serre and P ark er [139], are trained using natural
images of the t yp e to b e tested on [163]. Previous exp erimen ts ha v e sho wn follo wing
this training HMAX systems sho w similar differences in activ ation when c hanging
metric prop erties rather than c hanging non-acciden tal prop erties, ev en when trained
with geon images whic h should accen tuate this imp ortance [5]. Serre and P ark er [139]
mo dify this traditional still image stim uli b y replacing them with video sequences with
simple ob jects slo wly rotating. Units whic h w ould normally ha v e learned a feature
in their receptiv e field presen t in one frame are no w forced to con tin ue learning all
features in their receptiv e field during a 300 ms windo w. This t yp e of temp oral
learning will force the system to learn a t yp e of ob ject constancy for a rotating
ob ject. This training pro cedure w as successful in encouraging the mo del to b ecome
less sensitiv e to the metric c hanges whic h will b e seen during rotation, and more
sensitiv e to non-acciden tal c hanges whic h will rarely b e seen within a single training
stim uli. T esting on the original HMAX mo del sho ws 20% NAP mo dulation, 22% MP
mo dulation, and only 49% of units ha ving more NAP mo dulation than MP . With
the mo dified training the system has 35% NAP mo dulation, 24% MP mo dulation,
and a whopping 71% of the units mo dulated more strongly b y NAP c hanges than
MP c hanges. This pap er sho ws evidence that con v olutional neural nets are capable
of learning the neurologically relev an t imp ortance of non-acciden tal prop erties. The
mo dified HMAX mo del sho ws results similar to data from IT, a brain area sensitiv e
to shap e cue [94]. Serre and P ark er [139] pro vide the suggestion that for a net w ork
to learn the imp ortan t sensitivit y h umans ha v e to nonacciden tal prop erties, learning
from video sequences ma y b e necessary . This t yp e of training could teac h the net w ork
55
the t yp es of metric c hanges to b e in v arian t to.
Arguments for CNNs being Biologically Relev ant
There is evidence that traditionally trained deep nets can accoun t for neural record-
ing and fMRI data in h umans and primates [96]. Ho w ev er, correlated neural activit y
do es not necessarily mean CNNs use similar represen tations. This researc h compared
represen tational patterns within categories and b et w een categories and sho w ed clus-
tering in sup ervised deep nets w as similar to that in IT. This could just mean that
other, not neurologically relev an t, represen tational features of ob jects are also simi-
lar and dissimilar for these categories. F urther researc h w ould b e needed to pro vide
evidence that deep nets abilit y to accoun t for neurological data is actually b y using
neurologically relev an t represen tations. A series of exp erimen ts suggests this, but
some of the metho ds used are questionable [104].
One exp erimen t is recognition from shap e. A stim ulus set of line dra wings is
used with three conditions: full color, gra yscale, and silhouette [169]. Three CNNs,
trained on natural images, w ere tested for p erformance. The results sho w top-5 ac-
curacy starts at 60-70% with color images, with greyscaling dropping this b y 10%
and silhouettes an additional reduction of 15-20%. Humans see a similar drop in
p erformance b et w een color and silhouette condition [189]. They do not pro vide in-
formation on if h uman p erformance loses accuracy at all when switc hing to gra yscale
[104]. The issue I ha v e with this exp erimen t is their claim forming silhouettes is a w a y
to remo v e non-shap e cue. T o iden tify ph ysical 3d shap e of an ob ject the in ternal lines
are imp ortan t shap e cue. Without these it w ould b e imp ossible to differen tiate b e-
t w een shap es as dissimilar as a cub e and a flat hexagon or somewhat similar shap es
lik e cylinder and a pill. Silhouetting w ould remo v e all “fork” v ertices and con v ert
all “arro w” v ertices in to “L” v ertices. A dditionally , this remo v es lines p ertaining to
orien tation and depth discon tin uities whic h help define a shap e [19].
Three additional exp erimen ts [104] w ere fairly similar so I will talk ab out them
together. The first w as to test ph ysical vs p erceiv ed shap e. In this exp erimen t
h uman sub jects and mo dels w ere presen ted with 9 ob jects with mixed qualities of
56
b eing v ertical, square, or horizon tal, and ha ving spiky , “smo othie”, or “cubie” features
[42]. The results sho w ed that h umans and deep nets closer categorized ob jects based
on their features and shallo w mo dels compared the shap e en v elop es to b e more similar
than an y individual features. The next exp erimen t w as the same except testing for
non-acciden tal prop erties using geons [19]. A stim ulus set with v arious geons w ere
tested comparing them to other geons whic h had metric differences or c hanges in
the non-acciden tal prop erties. Changes w ere made to ensure metric differences w ere
as distinct or more distinct from the original geon [94]. As b efore, non-acciden tal
prop erties cause more dissimilarit y in deep mo dels and h umans while shallo w er mo dels
w eigh the metric differences as more imp ortan t. The final exp erimen t w as category
“disso ciated from shap e”. This exp erimen t w as p erformed with a stim ulus set Kubilius
et al. designed whic h they claim separated stim uli in to differen t categories “in a w a y
suc h that an y one exemplar from a particular category w ould ha v e a v ery similar
shap e to an exemplar from another category .” They sho w examples of the silhouettes
of these stim uli to sho w they ha v e the same shap e, but do not include an y ra w stim uli
whatso ev er, so I can not commen t on the v alidit y of this claim. Their results sho w
that h umans and deep nets, esp ecially the v ery deep nets, still retain some correlation
b et w een categories ev en with similar shap e. Though this data sho ws deep nets are
more neurologically relev an t than shallo w mo dels, there are t w o complain ts I ha v e
ab out these exp erimen ts. The first is simply that they do not giv e exact neuron
activ ation n um b ers to sho w the scale of the differences seen. The second is more
significan t.
The mo dels they used for comparison fall in to t w o categories, state of the art deep
con v olutional neural net w orks [177, 89, 196, 166] and shallo w mo dels [102, 108, 41, 23,
114]. My griev ance is that Kubilius et al.[104] do not talk ab out an y other higher order
mo dels, or really an ything with in termediate complexit y . An imp ortan t cause for the
findings is that these shallo w mo dels do not ha v e an y depth to allo w for generalization
and in v ariance [61, 82, 192, 182, 202]. These metho ds w ould also sho w significan t
sensitivit y to mo ving the stim uli some n um b er of pixels off cen ter. Con v olutional
neural net w orks p erform nonlinear in tegration, com bining features from lo w er lev els
57
in imp ortan t w a ys. One of the effects of this com bination is that features from all
lo cations in the image are considered as factors in the final decision on the output
la y er. I w ould lik e to see a test using another shallo w, one la y er, mo del, suc h as SURF
[12], in the same exp erimen ts. Lik e the other shallo w mo dels, SURF calculates lo w
lev el features and compares them to other lo w lev el features on an y second image. The
difference is that SURF matc hes the strongest k eyp oin ts in v arian t to scale, rotation,
and lo cation. I h yp othesize SURF w ould p erform more similarly to the deep nets
than to the shallo w mo dels. SURF utilizes a bag-of-w ords feature matc hing system
whic h do es not in tegrate features in to an o v erall ob ject decision in an y capacit y . A
mo del whic h do es not merge lo w-lev el features can not truly capture o v erall shap e or
the imp ortance of com binations of non-acciden tal prop erties. If a mo del suc h as this
w ould also pass their tests for neurological relev ance, the tests themselv es are fla w ed.
Arguments against CNNs b eing Biologically Relev ant
I will no w discuss evidence against neurologically relev an t represen tations in CNNs.
This evidence exists b oth on the lev el of individual neurons and net w orks as a whole.
Researc h has sho wn that in biology individual neurons are sensitiv e to NAP c hanges
and therefore are co ding for some shap e asp ect whic h is b e more in v arian t to metric
c hanges than to NAP c hanges [94]. In artificial CNNs “neurons” instead learn “unin-
terpretable solutions” and the individual top la y er units do not co de for an ything more
sp ecific random linear com binations of top la y er units [61, 202]. Other neural net w ork
pap ers often sho w v arious input images or pixel clusters that most strongly activ ate
particular neurons to sho w that neuron is sensitiv e to a sp ecific qualit y and claim it
learned something imp ortan t [101]. One group tested neurons at equal depths, from
net w orks whic h had b een trained and net w orks whic h instead had instead b een giv en
random w eigh ts. T w o tests are p erformed, one on MNIST handwritten digits [116]
and a second on a deep er net w ork against ImageNet images [44]. The results sho w
that no des with random w eigh ts sho w the same t yp e of results that a researc hers ha v e
claimed pro v es the neurons learned a useful seman tic prop ert y con tained in the train-
ing images. Figs 3-3, 3-4, 3-5, and 3-6 pro vides examples. This exp erimen t suggests
58
Figure 3-3: V arious MNIST images whic h maximally activ ate units in a trained net-
w ork from [178]
Figure 3-4: V arious MNIST images whic h maximally activ ate units in a randomly
initialized net w ork from [178]
when lo oking at the calculation of a single individual neuron in a net w ork, a random
set of w eigh ts could b e in terpreted to b e just as useful as a trained no de [178].
More evidence that CNNs are not utilizing a neurologically relev an t represen tation
comes from viewing net w orks as a whole, testing what kind of test images can trip up
a fully trained deep CNN. A net w ork that has b een trained on millions of images, and
p erforms w ell on test images w ould b e exp ected to b e fairly robust to a small c hanges
Figure 3-5: V arious ImageNet images whic h maximally activ ate units in a trained
net w ork from [178]
59
Figure 3-6: V arious ImageNet images whic h maximally activ ate units in a randomly
initialized net w ork from [178]
made to images giv en as a test. Instead, researc h has found that making sp ecific
c hanges to correctly classified test images can force the net w ork to mak e mistak es.
The mo dified images required to do so are individually calculated adv ersarial images,
but the c hanges are miniscule. This can b e seen in Fig 3-7 and 3-8. If neurons co ded
for 3-dimensional structure through a hierarc h y of NAPs to geons to ob jects there
w ould b e little difference in the top la y er activ ations from these minor c hange to the
input. Results also sho w ed across net w orks and training sets the same adv ersarial
examples w ould cause similar mistak es. This suggests net w orks do learn a consisten t
represen tation of the input space [178]. Other researc h has sho wn CNNs ev en learn
certain in v ariances to c hanges in input [202]. Ho w ev er, failing on these t yp es of
c hanges pro vides evidence that the represen tation and in v ariances learned are not b e
the same as in the brain. F or man y of these examples a h uman view er w ould find the
mo dified images literally indestiguisable from their originals. One cannot argue that
these deep net w orks are not doing impressiv e calculations, but this is strong evidence
that they are not doing them in neurologically relev an t w a ys.
Serre and P ark er sho w ed a conclusiv e example of a con v olutional neural net w ork
learning neurologically relev an t shap e represen tations [139]. This is evidence neural
net w orks are capable of learning these qualities. Ho w ev er, with mo dern approac hes
there is significan t evidence they do not. The h uman brain, visual system included,
is an incredible computational device. I read once that the h uman brain is the only
evidence w e ha v e that generalized ob ject recognition is ev en p ossible. I b eliev e finding
60
Figure 3-7: 6 Examples of adv ersarial images. Images in the left columns are the
original images whic h the net w ork correctly iden tifies. Middle columns sho w the
noise injected in to the system at 10x true in tensit y to create adv ersarial images in the
righ t columns. Righ t column images are all classified to b e an ostric h. F rom [178].
Figure 3-8: (a) and (b) sho w generated adv ersarial examples for t w o net w orks trained
on MNIST. Odd ro ws are original images whic h net w ork correctly classifies while ev en
ro ws are adv ersarial examples whic h net w ork do es not correctly classify . (c) sho ws
randomly distorted images, whic h to the h uman ey e are m uc h more significan tly
c hanged than the adv ersarial examples, but whic h the net w ork can still correctly
classify half the time. F rom [178].
61
a w a y to train a net w ork to learn to iden tify ob jects in a neurologically relev an t
w a y is going to ev en tually b e the w a y w e solv e this problem. As the name implies,
artificial neural net w orks are a great approac h to this task, but something needs to
b e c hanged. Man y researc h paths are p ossible including mo dified training stim uli,
net w ork arc hitecture, learning rules, or functions of the artificial neurons themselv es.
The follo wing section details wh y I c hose mo difying the artificial neurons for m y
researc h.
62
3.3 Inspiration from Neurobiology: Nonlinear Den-
dritic Processing
3.3.1 Dendrites in CNNs
In mo dern state-of-the-art con v olutional neural net w orks, dendrites are nonexisten t
[36, 177, 101]. Presynaptic neurons form synapses directly to the cell b o dies of p ostsy-
naptic neurons with synapse strength as their only parameter [191]. Computer vision
is one of man y fields where neural net w orks are hea vily in use. This is a field where
man y c hallenges ha v e not b een solv ed to thresholds where algorithms are comp eten t
enough for real w orld applications. In the ImageNet Challenge algorithms m ust de-
tect ob jects from 1000 categories o v er a series of test images [44]. The b est system
in 2017 had an a v erage precision of just 73%, this means they w ere unable to detect
one in four ob jects. This is a task with only 1000 c hoices, a miniscule n um b er in
relation to the w orld. Clearly these algorithms are not at a place they can comp ete
with the h uman visual system. Artificial net w orks differ in man y w a ys from those
of biology , but one certain adv an tage the h uman visual system has is the increased
computational p o w er within a single neuron b y w a y of their dendrites.
3.3.2 Summary of Dendrite F unction
The simplest function dendrites p erform is ph ysically b eing the structure on whic h
most synapses form to collect presynaptic activ ation and transp ort it to the soma
[93]. Ev en this most basic function still b eha v es in a w a y that m ust b e considered.
Dendrites are not p erfect cables, they are leaky . V oltage atten uates significan tly if it
tra v els do wn the dendrite with no further external influence [175]. The axial resistance
is lo w er to w ards the soma so v oltage will flo w to w ards the cell b o dy b efore leaking
out through the mem brane. Collecting v oltage for the cell is only a fraction of their
complete function. P assiv e summation o v er dendritic v oltages pla ys an imp ortan t
role. Coinciding p eak v oltages cause a greater surge, more lik ely to cause a spik e,
at the soma. This allo ws the neuron a t yp e of temp oral calculation of synapse firing
63
order whic h migh t only p eak if inputs fire in a pro ximal-to-distal order [145, 98].
Another imp ortan t calculation dendrites pro vide, is their activ e abilit y to generate
spik es. The exact details of this abilit y are still disputed b y b y some scien tists. A
pap er b y Ma jor et al. sho ws a framew ork where in whic h conflicting results can all
fit [124].
3.3.3 Nonlinear Dendritic Spikes
The first topic discussed is the v ariet y of electrogenic effects dendrites are capable
of. There are three ma jor t yp es of spik es whic h can b e driv en b y Na+, CA2+ or
NMD A [124]. Spik es caused b y eac h t yp e can ha v e large differences in rise time and
duration, but eac h ha v e the essen tial qualities to b e considered a spik e. The authors
sp ecifically men tion that an all-or-none spik e do es not need to b e stereot yp ed or short.
It only needs to ha v e a threshold where the resp onse differs significan tly if passed.
Classical Na+ spik es tend to b e sharp er while Ca2+ and NMD A spik es remain at
their p eak v oltage for longer p erio ds of time [110, 6]. Of these three, NMD A spik es
are the most influen tial [109]. NMD A spik es ha v e b een found in vitro in all tested
t yp es of neo cortical dendrites in all la y ers of the cortex [26, 112]. The authors finish
their in tro duction with three imp ortan t p oin ts ab out NMD A. The first is that NMD A
spik es can b e triggered b y v ery few synapses firing, ab out a ten th or less of the total
connections to a spine b eing activ e is capable of generating an effect whic h can pass
the necessary threshold [34]. The second p oin t is that this clustering is not necessary ,
that synapses can also b e more spread out and concurren t firing can still cause a spik e
[144]. Lastly , previous recen t spik es can cause a lo w er threshold to allo w future spik es,
meaning once inputs are b eing receiv ed that cause spik es, spiking is more lik ely to
con tin ue [144].
As NMD A is the dominan t spik e t yp e in dendrites, I will discuss this function in
more detail. NMD A allo ws b oth thresholded and graded electrogenesis. Dep ending on
glutamate concen tration from synaptic firing. NMD AR conductance can b e up-stable,
do wn-stable, or unstable. Unstable is the condition whic h allo ws for a thresholded
resp onse. NMD AR conductance has a unique N-shap ed I-V curv e as seen in Fig 3-9.
64
This N shap e can ha v e one early in tercept, three in tercepts, or one later in tercept.
A t lo w glutamate concen trations the trough of the curv e represen ting curren t flo w is
completely ab o v e the axis meaning curren t flo ws out w ard at t ypical biological mem-
brane p oten tials. As the concen tration increases the trough gets deep er un til it hits a
p oin t where it dips b elo w zero. This causes the curv e to cross the axis at three p oin ts.
The middle crossing is an unstable state for v oltage. If the instan taneous v oltage is
b elo w this in tercept v oltage will flo w out w ard, lo wing the mem brane p oten tial un til
it reac hes the first in tercept, whic h is a stable state. If v oltage is allo w ed to surpass
this p oin t the curren t is no w in w ard, causing the mem brane to further increase up to
the third and final in tercept, another stable state. P assing this middle in tercept will
cause a spik e. This condition, with the existence of the unstable in tercept, is the only
one of the three where a spik e can happ en. A t extremely high glutamate concen tra-
tions the en tire “N” is b elo w the axis causing curren t to flo w in w ard un til reac hing
the last p ossible in tercept of the curv e. As glutamate concen tration decreases this
effect happ ens in rev erse. With the decrease the high-v oltage stable state ceases to
exist. This causes the v oltage to quic kly fall to the lo w v oltage stable state, ending
the spik e [124].
An NMD AR I-V curv e b eing based on glutamate concen tration causes an in ter-
esting effect. Unlik e the v oltage dep endan t spiking of an action p oten tial, NMD A
spik es are still v oltage dep enden t, but the the exact v oltage can c hange. It is still
all-or-none, in the sense that there is a v oltage threshold ab o v e whic h the v oltage
will spik e to a higher v alue. This threshold, ho w ev er, do es not exist at lo w glutamate
lev els, slo wly lo w ers as concen tration increases, and then ceases to exist again at high
concen trations [124]. Threshold and amplitude of the spik e b oth increase from distal
to pro ximal dendritic lo cations in correlation with the resp ectiv e input conductance.
Synaptic firing, on the other hand, has little effect on the amplitude of the spik e, and
only serv es to increase the duration of the spik e in activ ation as glutamate release
con tin ues [125]. Clearly NMD A receptors are already dynamic on their o wn, this
nonlinear function only gets more complex when considering a system with m ultiple
connections, whic h will b e discussed in the next section.
65
Figure 3-9: N-shap ed Curv e of NMD AR conductance. The green curv e sho ws the
unstable p oin t where a spik e is p ossible when mem brane p oten tial mo v es across it.
Figure from [124]
66
The results I ha v e describ ed so far w ere disco v ered first in vitro. Since their dis-
co v ery scien tists ha v e b een able to find evidence for the same effects in viv o. Ca2+
imaging is a pro cess p ossible in viv o of trac king calcium within a cell at a resolution
that can b e seen within a dendrite. NMD A receptors are a large source for Ca2+
within a dendrite so this metho d has b een used to measure the activit y and conduc-
tance of NMD A receptors. In viv o results ha v e also sho wn that dendrite function
can c hange not only b et w een cell t yp es, but also within a single cell t yp e dep end-
ing on the individual circuit the neuron is a part of [64]. Researc h has ev en sho wn
within a single neuron differen t dendrites can p erform linear or nonlinear function as
needed. This includes sp ecifically in the visual system within an orien tation sensitiv e
neuron sho wing nearly equal synaptic signaling with the prefered and non preferred
orien tation. Ho w ev er, the cell w ould only fire dendritic spik es when the input w as
in the preferred direction, sho wing linear com bination for non-preferred stim uli and
non-linear for preferred [167].
3.3.4 Computational Models of Dendrites
Our understanding of dendritic function has increased greatly , esp ecially in recen t
y ears. In addition to studying the mec hanics in cells, computational scien tists ha v e
b een w orking on simplified mo dels to replicate the mathematical capabilities these
dendrites add to the function of the cell. One suc h mo del is an augmen ted t w o-la y er
mo del [86]. It has strong accuracy for explaining the firing patterns of biological cells.
Ho w ev er, the mo del do es not fully capture the complicated function of the dendritic
tree. It only aims to mo del a subset of the dendritic tree consisting of a cen tral
no de with uniform terminal branc hes. Also b ecause curren t flo w within a branc h is
strong and flo w b et w een branc hes is w eak [97] the mo del assumes instan taneous uni-
form v oltage transfer within a branc h and no transfer b et w een branc hes. With these
assumptions in mind, the t w o-la y er mo del b eha v es in the same w a y as an artificial
neural net w ork where eac h dendritic unit functions as a la y er one no de and the soma
functions as the output no de. Synapses can form on an y la y er one no de as input with
v arious w eigh ts. The outputs from la y er one no des are summed and passed through
67
a nonlinear function. These outputs are the inputs to the soma, eac h with their o wn
individual w eigh t mo difier. Finally the output no de sums its input and passed to
another function to calculate the h yp othetical firing rate.
Exp erimen ts ha v e b een p erformed to compare the t w o-la y er mo del’s calculated
output firing rate against a significan tly more detailed computational mo del, and to
the actual output of a neuron in vitro. The computational mo del used w as NEUR ON
[72]. NEUR ON is a highly realistic mo del whic h uses differen tial equations to calculate
the exact concen tration of v arious molecules within miniscule v olumes of an artificial
biological cell. In a test against complex input patterns the 2 la y er mo del w as able
to explain 67% of the firing rate compared to the more complex mo del. A p oin t
neuron system, as is used to mo dern con v olutional neural net w orks, could explain
only 11% [143]. The mo del successfully sho w ed, as predicted, inputs within a branc h
are b etter com bined in a nonlinear manner, while inputs from separate branc hes can
b e com bined linearly up on reac hing the soma. 67% is a go o d result, but clearly there
is more w ork to b e done. P art of this lac k of accuracy has to do with the assumptions
of the mo del whic h are kno wn to not b e true. Among these, is the assumption of
instan taneous uniform transfer within a branc h. Real dendrites are not lo cationless,
and the exact lo cation of inputs has imp ortan t effects on the output [125].
Ev en a single input synapse will ha v e differen t effects based on its lo cation within
a dendrite. A t an y lo cation the I-O curv e follo ws a sigmoid, with a spik e at the lo cal
v oltage threshold. The sigmoid is not uniform, it v aries within a single dendritic
compartmen t based on the lo cation of the input stim ulus. F or an y giv en lo cation,
a more distally lo cated synapse’s sigmoid will ha v e a smaller threshold, but also a
smaller total output amplitude [125]. The parameters of the sigmoid can b e c hanged
from a single synapse based on its lo cation, so when there are m ultiple synapses at
m ultiple lo cations the math gets more complicated. T o figure out ho w this in teraction
w ould affect the output, I-O curv es from w ere generated from real neurons in brain
slices with “driv er” inputs and “mo dulator” inputs at differen t lo cations. The same
curv es w ere then generated with b oth the complicated NEUR ON mo del and a simple
mo del. The results can b e seen in 3-10.
68
Figure 3-10: Pro ximal vs distal driv er synapse I-O curv es. Data gathered from real
neurons NEUR ON mo dels and the simple mo del. Figure from [86]
69
The simple mo del with only t w o compartmen ts for eac h lo cation w as able to calcu-
late a similar output as compared with the NEUR ON mo del. The dendritic sigmoid
function is mo dified in a consisten t manner, with excitatory and inhibitory inputs
ha ving opp osite effects. Pro ximal mo dulators stretc h the sigmoid v ertically , in the
case of an excitatory input the threshold for spiking is lo w ered and the maxim um
amplitude in increased. Distal mo dulators mo v e the sigmoid horizon tally , with exci-
tatory inputs lo w ering the spik e threshold but ha ving no effect on the amplitude. Ov er
man y exp erimen ts the relationship w as found to follo w a 2-D sigmoid [13, 85]. F urther
exp erimen ts pro vide evidence that more inputs lead to higher order m ultidimensional
sigmoids, but there has not b een extensiv e researc h in this t yp e of mo dulation [91].
It has b een suggested that dendritic arb ors can act lik e b o olean logic net w orks, and
implemen t logical op erations [164]. This abilit y , in theory , should translate w ell to
artificial in telligence and neural net w orks. Ho w ev er, in m y researc h few algorithms
seemed to ha v e an y parallels to artificial dendrites. I will discuss t w o here, and touc h
up on others later in this text.
3.3.5 Dendrite parallels in Machine Learning
Upstart [54] is an app ealing algorithm as it is guaran teed to ev en tually ac hiev e 100%
classification on an y set of patterns of binary v ariables. The algorithm itself is pleasan t
in its simplicit y . It starts with a simple p erceptron [191], whic h can b e trained b y
an y metho d, to b e as correct as p ossible with some input data. Once enough training
has completed the w eigh ts from the input v ector are lo c k ed and t w o daugh ter no des
are added. One corrects for false p ositiv es with a large negativ e w eigh t to its paren t,
and the other corrects for false negativ es with a large p ositiv e w eigh t. Eac h of these
new no des receiv es input from all input no des and outputs its binary output v alue
only to its paren t. These t w o no des are then trained as their o wn simple p erceptron
where the correct output v alues are instead defined as the error of their paren t. Once
p erceptron training is complete on these no des their input w eigh ts are lo c k ed and they
undergo the same pro cess with t w o c hildren no des of their o wn. New c hildren are
only generated if the paren t mak es mistak es. Since the total output error decreases
70
Figure 3-11: A represen tation of Upstart’s splitting of the parit y problem. Sho wn are
the original p erceptron classification line as w ell as the lines from the false p ositiv e
and false negativ e no des. Figure from [54]
at eac h branc hing, this pro cess will ev en tually end when no c hild no de has an y errors.
A t that p oin t the en tire tree, and therefore the output, will also not mak e an y errors
on the training data. 3-11 sho ws the separation generated for the parit y problem of
kno wing if an ev en or o dd n um b er of inputs is activ e. Z represen ts the separation
of input space b y ro ot no de, Y is the false negativ e no de, and X is the false p ositiv e
no de. The ob vious problem with this algorithm, and really an y metho d that can
b e 100% accurate with an y set of input v ectors, is o v erfitting [9, 66]. One of the
imp ortan t asp ects of artificial neural net w orks is their abilit y to generalize, and at
that, Upstart fails.
71
The second algorithm, Net w ork in Net w ork [122], is m uc h more in teresting. First
of all, it is from just 2013 so it is more mo dern. Next, state of the art mo dels
for the ImageNet c hallenge [44], Go ogLeNet [177] and Residual Net w orks [69] tak e
inspiration from this system. Lastly , just as [164] discusses, the Net w ork in Net w ork
algorithm treats certain neurons as if they are the top no de in m ulti la y er net w orks of
their o wn. These net w orks, ho w ev er, are m ultila y er p erceptrons, and not an y form of
tree or b o olean logic net w ork. Rather than con v olv e eac h neuron as in a traditional
net w ork [116, 58], the m ultila y er p erceptrons are eac h con v olv ed the same w a y with
only their final output v alue b eing considered input to the next lev el. T raining is done
with bac k-propagation in the same manner as a prior state of the art net w ork [101].
One of the large goals, as is a goal with man y net w orks no w, is to prev en t o v erfitting.
F ully connected deep net w orks are highly prone to o v erfitting, and metho ds suc h
as drop out ha v e b een used to com bat this [77]. Net w ork in Net w ork argues that a
linear neuron has a lo w in v ariance to v ariations of a concept and using a m ulti la y er
p erceptron to enco de the same concept w ould b e b etter. General linear mo dels can
only w ork w ell if input t yp es are linearly separable, whereas m ulti la y er p erceptrons
can fit to more complex datasets. Net w ork in net w ork ac hiev ed state of the art error
rates on CIF AR [100] and SVHN [135], and v ery close to state of the art on MNIST
[116]. These results are surely wh y mo dern algorithms use similar ideas.
The deep learning net w orks to da y are leaps and b ounds b etter than the b est
mac hine learning algorithms of the past. Besides the algorithmic impro v emen ts, a
large reason for this is simply b ecause they are running on faster, stronger computers
and ha v e more n umerous and rigorously lab eled data sets to w ork with. Mo dern
deep net w orks train on sev eral high-end GPUs and, after augmen ting the ImageNet
dataset, train on o v er 2.4 billion images [177]. A t its core the bac kpropagation learning
algorithm has not had significan t c hanges since its origin. The p oin t neuron mo del has
not had significan t c hanges since its origin. Both of these ma y ha v e b een inspired b y
biology , but as this section has sho wn they do not encapsulate the true computational
p o w er of the neuron. P erhaps with the righ t arc hitecture and learning paradigms
deep net w orks with p oin t neurons could b e as accurate as h uman view ers. Ho w ev er,
72
learning this directed m y researc h to trying to find a w a y to instan tiate a system
of non-linear dendrites in a CNN, sp eficilly without just adding more no des to the
arc hitecture. Before discussing m y pro ject, ho w ev er, it is imp ortan t to review other
recen t researc h in the same field.
73
3.4 Modern Neural Network Research
3.4.1 Overview
Mac hine Learning is an incredibly p o w erful to ol and the sub ject of significan t researc h
in the mo dern area. Muc h of this is in the field of artificial neural net w orks. This
section will go o v er mo dern results that ac hiev ed state of the art results on v arious
datasets. T w o w ell kno wn computer vision c hallenges are MNIST [116] and ImageNet
[44]. They will b e the fo cus of this discussion.
3.4.2 MNIST
MNIST [116] is a p opular dataset for computer vision researc h. It is comp osed of
50,000 training images and 10,000 test images. The images are 29x29 blac k and
white images of handwritten n um b ers. Man y w ould describ e this dataset as the one
to lo ok at to first test if y our algorithm is w orking on a small scale. As of 2014 the
b est algorithm in the w orld for MNIST is called the Multi-Column Deep Net w ork
[36]. It b oasts an impressiv e 0.23% error rate o v er the en tire test set. A t its core,
this algorithm follo ws the con v olutional neural net w ork arc hitecture first in tro duced in
1980 [58]. Differen t instan tiations of the mo del use 6-10 la y ers. These la y ers alternate
b et w een t ypical la y ers and p o oling la y ers. The final la y ers are a fully connected m ulti
la y er p erceptron.
T raining is simple online bac k-propagation after unsup ervised pre-training, whic h
has b een sho wn to impro v e p erformance [49]. The authors also p oin t out that re-
ceptiv e fields in lo w er la y ers are minimal, only 2x2 or 3x3, so the net w ork is close
to the maxim um depth it can p ossibly b e. The inno v ation of this algorithm is its
idea of ha ving m ultiple columns. This means, ha ving m ultiple instances of the same
t yp e of net w ork taugh t in v arious w a ys whic h v ote on a final answ er. In their test on
MNIST the full system had 35 columns. This w as comp osed of 5 columns whic h w ould
learn normal MNIST inputs, and 5 columns for eac h of 6 differen t normalized digit
widths. When input images are passed in for training, eac h column first p erforms
74
its designated prepro cessing to get to the correct width for its input. Eac h net w ork
is instan tiated with a differen t set of random w eigh ts in the range [-0.05,0.05]. Once
training is complete, all testing is p erformed b y passing images in to the net w ork and
the output confidences are demo cratically a v eraged to determine the output confi-
dences of the net w ork for eac h answ er c hoice. On MNIST this m ulti-column DNN
impro v ed state of the art b y 34% [87] giving its error rate of only 0.23%. Humans on
this task are only b etter b y 0.03%.
An imp ortan t con tribution of this pap er is the idea that m ultiple net w orks v oting
together are b etter than a single net w ork. This pap er w as a large inspiration for
Drop out [172] discussed in the follo wing section. Drop out is one of the most p opular
algorithms to use to reduce o v erfitting with a large net w ork. It relies on the idea that
when random no des are dropp ed out of a net w ork it creates a temp orary net w ork
that only exists during the time that one set of data is presen ted. This means at
test time y our actual net w ork is the result of the a v erage com bination of all of the
temp orary net w orks that existed during training.
3.4.3 ImageNet
A net w ork called AlexNet [101], first blew a w a y the ImageNet c hallenge in 2012.
Since then it has b een the core of m uc h recen t w ork in neural net w ork and computer
vision researc h. An imp ortan t con tribution of AlexNet is that with 8 la y ers, 650,000
neurons and 60 million parameters it w as, at the time, one of the largest con v olutional
net w orks ev er created. Most of their other algorithmic c hoices are to battle the side
effects of a net w ork of this size, computational complexit y and o v erfitting. T o sp eed up
calculations all training w as p erformed on m ultiple GPUs. In addition neurons w ere
rectified linear units rather than ha ving a sigmoidal output function, whic h mak es
them faster to train [133]. T w o metho ds w ere also selected to reduce o v erfitting
data. The first w as augmen ting the training set with mo difications to the original
images and secondly the recen tly created drop out metho d w as used [77, 172] to reduce
o v erfitting. On the 2012 ImageNet c hallenge AlexNet ac hiev ed a top-5 error rate of
15.3%, with the runner up only giving 26.2%. This landslide victory is certainly what
75
con tributed to its influence on recen t algorithms.
The winner from 2014 w as Go ogLeNet [177]. A t its core, Go ogLeNet uses a
lot of the same learning rules as other deep neural net w orks. Lik e AlexNet, one of
Go ogLeNet’s biggest c hanges from past w ork w as its increase in the depth and width
of the net w ork. Go ogLeNet attempts to solv e the inciden tal problems of a larger
net w ork with man y dimension reductions b et w een la y ers and b y forming sparse lo cal
connections rather than the fully connected systems used in man y deep nets follo wing
the w ork of Arora et al. [7]. With these mo difications ev en though Go ogLeNet has
a whopping 22 con v olutional la y ers in addition to 5 more max p o oling la y ers it still
uses 12 times few er parameters than AlexNet. In addition to these c hanges Go ogleNet
tak es from the Net w ork-in-Net w ork [122] approac h to form la y ers of “Inception mo d-
ules.” Eac h “Inception mo dule” receiv es input from the previous la y er. The mo dules
ha v e four differen t con v olutional filters, 1x1, 3x3, 5x5, and a 3x3 max p o oling filters.
The outputs of these filters are com bined in to a single output v ector, whic h is the
input for the next stage. After training, the system ac hiev es a winning top-5 error of
6.67% for classification on the ImageNet c hallenge. In the detection c hallenge, where
b ounding b o xes m ust b e dra wn, Go ogLeNet also w as the winner in 2014 with 43.9%
mean a v erage precision.
Recen t winners starting in 2015 ha v e used Residual Neural Net w orks (resnets)
originally dev elop ed m y MSRA [69]. Resnet’s are another t yp e of net w ork that do es
seem to ha v e a parallel in dendritic pro cessing. The main computational p o w er of
the dendrite is that within a single neuron m ultiple nonlinearities can exist. This
means the decision making p o w er of the cell is not determined solely b y the cell b o dy
spiking or not, but smaller decisions are calculated in the dendrites first. In other
w ords, presynaptic data is pro cessed more hea vily b efore the neuron output mak es the
final decision to pass on p ostsynaptically . The c hange a resnet in tro duces is that there
are skip connections b et w een la y ers creating blo c ks. La y ers will form p ostsynaptic
connections to all neurons in the follo wing la y er, but also some la y ers will additionally
send their output to another la y er further up the net w ork. La y ers b et w een this first
connection and the final connection p oin t of the same la y er are considered a blo c k.
76
Bac kpropagated gradien t descen t is still used for training. Ho w ev er, rather than ev ery
la y er b eing a fixed decision p oin t where la y ers do not ha v e access to an y preceding
data, no w fixed decision p oin ts are only at the final la y er of blo c ks. This means data
can b e pro cessed b y more no des, and b y more la y ers, b efore a final decision of what to
pass forw ard m ust b e decided up on. In their first y ear the MSRA team using resnets
w as able to ac hiev e an a v erage precision of 62.1% on ob ject detection with the runner
up only ha ving 53.6%.
Analysis
In m y view the most significan t reason for the impro v ed p erformance turnaround in
2012 w as the increased size. F or all the discussed net w orks at least part of their in-
no v ation w as simply making a bigger net w ork. Researc h has b een done on the exact
effects of differen t net w ork arc hitectures [82, 192, 182]. The results pro vide evidence
that for generalization and a v oiding o v erfitting deep er net w orks with minimal no des
are ideal. This w as exactly what the Go ogLeNet team’s goal w as with their algo-
rithm and wh y they ac hiev ed impro v ed p erformance. The pap ers compare v arious
arc hitectures and algorithms for the parit y problem and the t w o-spiral problem. The
attac hed figures sho w results comparing three mo dels. T w o net w orks considered are
v ariations of the m ulti-la y er p erceptron. There is the standard MLP and the Bridged
MLP where all no des are connected to all no des of all previous la y ers. In addition
to these no des they also test a fully connected cascade arc hitecture, whic h w ould b e
equiv alen t to the BMLP if eac h la y er had only one no de. The three algorithms are
tested with differen t n um b ers of hidden la y ers and differen t n um b ers of neurons.
Fig 3-12 sho ws the the maxim um n um b er of inputs eac h algorithm could correctly
categorize for the parit y problem with an y arrangemen ts of total neurons. As the
graph sho ws standard MLP can only categorize for the total n um b er of neurons in
the hidden la y er. BMLP is able to solv e for a larger n um b er of inputs with more
la y ers when using the same n um b er of neurons. Lastly F CC is almost an en tire order
of magnitude more capable than a three la y er BMLP . This result sho ws ho w deep er
net w orks, ev en with few er neurons, can p erform more complex functions.
77
Figure 3-12: Num b er of inputs net w orks with a particular n um b er of no des can
correctly categorize on the parit y problem. While more adding more neurons increases
capabilit y across all net w orks, deep er net w orks can outp erform shallo w er net w orks
ev en with few er neurons. F rom [82].
Fig 3-13 sho ws the data collected o v er a series of tests with all arc hitectures o v er
v arious n um b er of neurons, v arious n um b er of la y ers, and differen t total n um b er of
inputs. The results sho w the total success rate for eac h arc hitecture. As y ou can see
there is ob vious trend within figures to w ards success rate decreasing as required input
parit y n um b er increases and success rate increasing as n um b er of neurons increases.
A more imp ortan t trend b et w een figures is that as the arc hitectures use more la y ers
with the same n um b er of no des their success rate also increases, and in all cases the
fully connected BMLP alw a ys outp erforms the standard MLP . Discrepancies from
the prior figure o ccur b ecause the follo wing figures w ere generated with a maxim um
n um b er of training iterations.
Another reason for impro v ed p erformance in mo dern net w orks b ecause of measures
for coun teracting o v erfitting caused b y these larger net w orks. The first, most ob vious
tactic, is to try to coun terbalance the cause, ha ving to o few training patterns relativ e
to the n um b er of parameters of the mo del. More training patterns will impro v e
generalization and reduce o v erfitting [82] This can b e done with data augmen tation.
Mo dels suc h as AlexNet [101] ha v e up w ards of 60 million parameters. ImageNet only
78
Figure 3-13: T otal accuracy o v er o v er man y runs comparing a single arc hitecture in
eac h graph compared to total n um b er of neurons in arc hitecture. F rom [82].
79
pro vides 1.2 million training images. This means to come ev en close to the same
n um b er of parameters a large n um b er of data augmen tation is required. F ortunately ,
for images a large n um b er of data augmen tation tec hniques are a v ailable [36, 37, 165,
14]. Some in v olv e not actually c hanging the prop ortions of the image. These include
reflections and sampling the image. F or example, AlexNet’s first la y er is 224x224,
so with 256x256 ImageNet images sampling and reflection increase the amoun t of
training b y 2048. This alone puts the training set w ell o v er the n um b er of parameters.
Then of course there are mo difications that do c hange the prop ortion, scaling the
image uniformly , adjusting the heigh t or width, or shearing. Elastic deformations can
also b e used, suc h as applying gaussian displacemen ts around random p oin ts in the
image [165]. Another augmen tation metho d is mo difications to the color. This can b e
as simple as just adjusting the brigh tness, con trast, or single color c hannels, or more
complex suc h as calculating the principal comp onen t of the pixel v alues and adding
m ultiples of that to eac h image.
Other strategies mo dify activit y within the net w ork itself. One tak en b y man y
curren t neural net w orks [177, 101] is drop out [77]. Drop out is based on the idea that
com bining the prediction of m ultiple separately trained net w orks pro vides a b etter
answ er than an y of those net w orks alone [36, 14, 27]. T o reduce computation time
this same metho dology can b e done within a single net w ork. T o accomplish this
outputs of random neurons are ignored with a preset probabilit y , often 0.5 is pic k ed.
When neurons are forced to learn to still b e successful with an y random subset of
half their no des they learn to b e less relian t on an y individual input. After training,
all w eigh ts are m ultiplied b y the drop out probabilit y . This essen tially a v erages the
learned w eigh ts for all of these temp orary mo dels whic h exist during eac h training
iteration using drop out. A visual depicting drop out can b e seen in Fig 3-14
Another net w ork mo dification c hoice is to c hange the p o oling tec hnique [157]. One
metho d is sto c hastic p o oling where a semi-random w eigh t is tak en instead of the max
w eigh t [201]. Another metho d whic h has b een sho wn to help is simply making the
p o oling neigh b orho o ds o v erlap [101]. Ev en with these metho ds, curren t net w orks still
sho w far from p erfect accuracy on complicated ob ject detection tasks [177, 101, 44].
80
Figure 3-14: Left net w ork sho ws the fully connected net w ork. Righ t net w ork sho ws a
temp orary net w ork during a single pass of drop out. Random no des whose outputs are
ignored are represen ted b y b eing crossed out and remo v ed from the net w ork. F rom
[172].
My h yp othesis for this, is that ev en with these coun termeasures o v erfitting is still
happ ening.
My earlier descriptions sho w evidence that more connections, and more la y ers are
alw a ys more successful on the training set with the same n um b er of neurons. Man y
mo dels sho w as more neurons are added a net w ork is alw a ys able to b etter fit more
exactly to training data as w ell [40, 54, 83, 129]. These results are fairly ob vious.
The question at hand is exactly ho w this increase in size applies to generalizing
what is learned during training to no v el test data. In curv e fitting using p olynomial
in terp olation, higher order p olynomials can alw a ys mak e a b etter fit to training data.
Ho w ev er, using a high order of a p olynomial can lead to o v erfitting whic h will p erform
p o orly when generalizing during testing. This translates to neural net w orks b enefiting
from ha ving the few est n um b er of neurons and the simplest arc hitecture that can still
ha v e a reasonable success rate on the training data. T o sho w an example in neural
net w orks Hun ter et al. [82] use the F CC net w ork to map to three dimensional p oin ts
on a w a vy surface. Fig 3-15 sho ws the con trol surface and the surfaces mapp ed b y
F CCs with t w o, four and fiv e neurons. As y ou can see four neurons learns a stronger
matc h than t w o, but once fiv e neurons are b eing used the artifacts of o v erfitting start
to sho w. With fiv e neurons the mapp ed surface is closer to exact matc hes for the
actual 30 p oin ts, but the surface itself no longer lo oks similar to the con trol surface.
81
Figure 3-15: Represen tation of goal surface mapp ed b y 2, 4, and 5 neurons. With
2 Neurons the top-righ t mo deled surface can b e seen to underfit its represen tation
of the goal surface. Ho w ev er, with 5 neurons the b ottom-righ t surface o v erfits, with
distortions that are not presen t in the goal surface. In the b ottom-left the surface
represen ted b y 4 neurons is visually the most similar to the goal. F rom [82].
This means it w ould not generalize as w ell to new test p oin ts tak en from the same
con trol surface. As I ha v e discussed the b est coun termeasure is to pic k the minimal
arc hitecture, This is the exact opp osite of what deep net w orks do.
A significan t reason o v erfitting causes problems is b ecause it destro ys the smo oth-
ness of the mo del. F or strong generalization a mo del’s represen tation m ust b e smo oth
rather than fitting to noise, whic h is inheren tly not smo oth [60]. One of the effects of
a less smo oth represen tation is that, on a v erage, p oin ts in the input space are mapp ed
to p oin ts whic h are further a w a y in the represen tation. With the input t yp e of 3d
p oin ts in the surface example it is easy to visually see the places that are o v erfit b y
their lac k of smo othness. By o v erla ying the surfaces one could also visually select a
testing X,Y co ordinate where the Z co ordinate in the neural net w orks solution surface
is far from where it should b e on the con trol surface. On the other hand, with natural
color images as the input selecting suc h p oin ts is less ob vious. This task exactly what
w as successfully undertak en b y a group of vision scien tists in 2013 [178] I ha v e already
discussed.
82
With the coun termeasures to o v erfitting and, after augmen tation, literally billions
of images to train on these deep con v olutional neural net w orks still generalize w ell to
test images that are not included in the training set b ecause they are able to form a
strong represen tation of the input space. Ho w ev er, based on the w ork b y Szegedy et
al. [178], I h yp othesize that some o v erfitting is, in fact, still happ ening. The fitting
they ha v e done matc hes man y p ossible images correctly , so the o v erfit input t yp es
are not as ob vious. Szegedy et al. sho w that with the correct tec hniques o v erfit areas
of the input space can b e found and exploited to create adv ersarial examples. These
images are practically iden tical to images that are classified correctly but completely
trip up the net w ork. I h yp othesize this is o v erfitting b ecause it clearly sho ws the
represen tation is not smo oth if t w o images whic h lo ok nearly iden tical, and therefore
close neigh b ors in the input space, can ha v e suc h differen t output activ ations. This
pap er has made an impact in the deep net w ork comm unit y , and since its publishing
differen t labs ha v e already started trying to use adv ersarial examples as part of the
training pro cedure [130, 62].
A cause of o v erfitting is to o man y parameters in relation to the n um b er of training
examples. This allo ws the parameters to fit to input noise. The original ImageNet
data set is smaller than the n um b er of parameters. Augmen ting data with mo difica-
tions to those images pro vides more data p oin ts. Ho w ev er, these mo dified training
images teac h the net w ork to b ecome in v arian t to the c hanges that created them, suc h
as reflection and translation. Mo difications suc h as this will also b e applied to the
noise in eac h image. It is p ossible that net w orks will con tin ue to o v erfit to this noise,
they will just b e in v arian t to the noise’s translations are reflections as w ell.
The h uman brain on the other hand is already p erfectly in v arian t to this sort of
noise. The h uman brain using recognition-b y-comp onen ts has a deep er understanding
of what it means to b e a particular ob ject. It co des for features that are directly
related to 3d structure. I b eliev e one reason the h uman brain p erforms so w ell is
the same reason resnets ha v e seen so m uc h success. Both use a system with few er
la y ers where a fixed decision m ust b e made and passed to subsequen t la y ers, while
the pro cessing going in to eac h decision is more complex and allo ws to co de for more
83
complex features. My w ork tak es resnets a step further and tries to directly create a
dendritic parallel in mo dern CNNs.
84
Chapter 4
Perforated Backpropagation: A
Biologically Inspired Modification to
Deep Neural Networks
Abstract
The neurons of artificial neural net w orks w ere originally in v en ted when m uc h less w as
kno wn ab out biological neurons than is kno wn to da y . Our w ork explores a neuromor-
phic mo dification to the core neuron unit that could demonstrate a learning paradigm
more closely related to biological neural net w orks. The mo dification is made with the
kno wledge that biological dendrites are not simply passiv e activ ation funnels, but
also compute complex non-linear functions as they transmit activ ation to the cell
b o dy . It has also b een sho wn that in addition to simple Hebbian learning synaptic
plasticit y is also mo dified and stabilized b y mec hanisms within individual neurons, a
significan t p ortion of whic h is on the p ostsynaptic side. The pap er explores a no v el
system of “P erforated” bac kpropagation allo wing neurons to compute a more complex
filter than that of a simple p erceptron. The curren t instan tiation also sho ws a new
w a y to gro w the size of a neural net w ork after its initial creation. The approac h pro-
vides an increase in accuracy when added to baseline net w orks whose initial training
plateaued without it. Ho w ev er, the c hange also comes with an increase in learnable
parameters and exp erimen ts sho w a greater impro v emen t in accuracy can b e ac hiev ed
b y increasing the n um b er of parameters in the net w ork b y the same amoun t through
more traditional mec hanisms.
85
4.1 Introduction
A dv ances are con tin uously b eing w ork ed on in computer vision. After the large resur-
gence of neural nets for computer vision with the success of AlexNet [101] man y t w eaks
ha v e b een made to the system of connections b et w een la y ers. F rom Go ogle’s Incep-
tion Mo dules [177] to Microsoft’s ResNets [69] the results on the Imagenet ob ject
detection c hallenge [154] are impro v ed ev ery y ear. Eac h of these algorithms brough t
with them significan t impro v emen ts, but with most of them came an increased size
of these arc hitectures while making mo difications to a v oid the problem of v anishing
gradien ts [79]. This w ork instead aims to mo dify the core p erceptron building blo c k
while main taining the same n um b er of decision p oin ts as used in other mo dels.
4.2 Background
4.2.1 Biological Background - Active Dendrites
The p oin ts discussed in the previous section put neural net w ork researc h in a p osition
where more no des and larger net w orks are creating more accurate but not smo oth
represen tations of the input space. T o mak e the represen tation more smo oth the
net w ork size m ust b e smaller, but not at the cost of increasing error. T o com bat
this, w e prop ose a mo dification to no des themselv es inspired b y biology , whic h giv es
a net w ork the same n um b er of decision p oin ts b et w een la y ers, but is able to mak e
more complex calculations within a single la y er.
The h uman brain is a truly amazing system. Its visual system has b een said to
b e the only evidence that generalized ob ject recognition is actually p ossible. This
is wh y w e are lo oking to follo w its design in an artificial system. Comparing an
artificial neural net w ork no de to a biological neuron presen ts as man y differences as
comparing the plastic pla y ers on a fo osball table to the MVP’s of the English Premier
League. But to capture all of the functional capabilities of a biological neuron, an
artificial neuron do es not need to mo del ev ery single atom. The question b ecomes, not
just what functions are missing in artificial neural net w orks that biological net w orks
86
p erform, but also what functions are essen tial for their aptitude. The one to inspire
this w ork is that of activ e dendrites.
In mo dern state-of-the-art con v olutional neural net w orks, dendrites are nonexis-
ten t [177, 101, 154, 36]. Presynaptic neurons form synapses directly to the cell b o dies
of p ostsynaptic neurons with synapse strength as their only parameter [191]. These
algorithms are not at a place they can comp ete with the h uman visual system. Arti-
ficial net w orks differ in man y w a ys from those of biology , but one certain adv an tage
the h uman visual system has is the increased computational p o w er within a single
neuron b y w a y of their dendrites.
Biological dendrites ha v e spiking mec hanisms similar, although not iden tical, to
the spiking of a the cell b o dy itself. This has b een discussed in detail earlier. With
this in mind w e b egin with the decision to mak e the artificial dendrites manifest as
just more no des in the net w ork. Ho w ev er, w e can not simply add more no des to the
net w ork or w e w ould end up with the same problems discussed earlier. W e pic k ed
a goal to try to accomplish t w o things with these new no des. First, they m ust in
some w a y b e outside the general net w ork and y et eac h b e able to affect units in the
net w ork. Second, something ab out them m ust mak e them differen t than the other
no des. T o do this, w e lo ok bac k to an older algorithm men tioned earlier, Cascade
Correlation [51].
4.3 Computational Background - Cascade Correla-
tion
Cascade Correlation [51] is a system dev elop ed to impro v e on single la y er net w orks.
It utilizes artificial neurogenesis to con tin uously impro v e error b y adding new hidden
no des b et w een the input and output la y ers. The actual math is fairly complex and
can b e read in the cited pap er, but the concept is simple.
Learning tak es place b y alternating b et w een t w o phases, t ypical p erceptron learn-
ing, and cascade no de learning. During p erceptron learning, eac h output no de learns
87
as a standard p erceptron to minimize its error based on all of its inputs. During
cascade no de learning first the w eigh ts of the net w ork are completely fixed. Next a
set of candidate no des are created whic h are essen tially not a part of the net w ork.
They are giv en inputs from the same input no des as the no des of the output la y er,
but they form no output connections. T o determine their input w eigh ts, they use the
total error of the output no des. When giv en a test p oin t, if the total error of the
system is high the new no des strengthen their connections, and if the total error is
lo w they w eak en them. In other w ords, these new no des learn to maximise their cor-
relation with the error of the net w ork. Once training is complete the candidate no de
whic h ac hiev ed this the b est is added to the net w ork. This is done b y first fixing it’s
input w eigh ts, whic h are nev er c hanged again forming a p ermanen t snapshot of the
net w orks error at the p oin t it w as added. Next it forms output connections to eac h
of the output no des. Then the phase is o v er, switc hing bac k to p erceptron learning
where eac h output no de treats their new input connections iden tical to their connec-
tions directly from the input. The only additional complexit y to kno w is that during
the next phase, when a new cascade no de is added it also forms input connections
from all previously created cascade no des. Figs 4-1 and 4-2 sho w visuals that should
help explain the arc hitecture itself, and what the arc hitecture can do. F or simplicit y
they b oth use t w o inputs to represen t x and y co ordinates making the p erceptron a
t yp e of linear classifier.
4.4 Explanation of System
In this section w e will refer to all original net w ork no des as Neurons and all cascade
correlation no des as PBNo des. The c hanges required to add Cascade Correlation to a
traditional con v olutional neural net are not o v erly complicated. Just lik e in Cascade
Correlation the net w ork is fully trained un til no more impro v emen ts to training error
are made with additional ep o c hs through the training data. A t this p oin t ev ery sin-
gle con v olutional Neuron in the net w ork is giv en a set of candidate PBNo des, whic h
b eha v e exactly as the hidden no des of a Cascade Correlation system. All of the math
88
(a) (b)
(c) (d)
Figure 4-1: Stages of Cascade Correlation learning are sho wn. In [a] a p erceptron is
sho wn with 2 inputs and 1 output. In the first phase this p erceptron is trained to
maximize classification of the input data. Next in [b] the old w eigh ts in the net w ork
are fixed and the new no de learns w eigh ts whic h maximize its correlation with the
p erceptron’s error. [c] sho ws the net w ork in the next phase with the new no de added
after ha ving its w eigh ts fixed and the p erceptron learning to mo dify its input w eigh ts
as normal. In [d] an additional no de has b een added with input connections to the
input and the other created no de.
from Cascade Correlation learning transfers o v er without an y mo dification. The only
difference to consider is that some of these neurons are not output neurons, and are
instead deep er neurons in the net w ork itself. This problem of ho w they learn is solv ed
b y relying on the error metho d already in use, bac kpropagation. Ho w ev er, w e use
a mo dified “P erforated” bac kpropagation paradigm. During P erforated bac kpropaga-
tion, error from the output is propagated through the net w ork, only passing through
Neurons and not through an y PBNo des included already . Ho w ev er the error gradien t
calculated at eac h Neuron is not only used to up date it’s w eigh ts, but during PBNo de
learning the PBNo des learn to maximally correlate their output activ ation with the
bac kpropagated error of the single Neuron they are connected to. This training will
con tin ue un til no Neuron has a PBNo de whic h is still impro ving its correlation. A t
this p oin t rather than c ho osing a single No de to add to the net w ork, as in Cascade
89
(a) (b)
(c)
Figure 4-2: These graph sho ws the corresp onding capabilities of a CC net w ork. In [a]
w e only ha v e a single p erceptron so only a single classification line can b e dra wn. F or
simplicit y this line has b een dra wn to p erfectly classify the blue data on the left and
all of the red data, while getting its classification wrong on all of the blue data on the
righ t. [b] sho ws ho w the CC no de learns. Rather than classifying the data, it learns
to classify where the p erceptron is wrong. If there w ere more than one output no de
it w ould learn to classify where all of the outputs mak e the most mistak es. In [c] the
red line represen ts the classification line of the new no de. The output p erceptron no w
has not only the x and y co ordinates as input but also an input represen ting if those
p oin ts are ab o v e or b elo w the red line. It can no w mo dify its w eigh ts to correctly
classify all of the data.
90
Correlation, ev ery Neuron adds its candidate PBNo de with the highest correlation to
its error. After adding the candidate PBNo des all PBNo de input w eigh ts are lo c k ed.
A t this p oin t ev ery Neuron in the net w ork has the same inputs as b efore, but with
the addition of one single connection to a new PBNo de.
With PBNo des added, the net w ork once again p erforms bac kprop learning. Ho w-
ev er, as stated earlier, the error bac kprop is still p erformed in exactly the same w a y ,
whic h means only through the Neurons of the net w ork. The connection w eigh ts to
the PBNo dess are mo dified in the same w a y as connections to the other presynaptic
Neurons in the net w ork, but no error is propagated through them. Once again with
bac kprop learning the net w ork will reac h a learning asymptote. When this o ccurs
another set of PBNo dess are added in the same manner. The one difference b eing
new PBNo dess for a Neuron also receiv e input from all previous PBNo dess created
for that individual Neuron, as is done in Cascade Correlation.
This ac hiev es the t w o goals w e set for ourselv es. First the PBNo des are outside the
net w ork. Mainly b ecause during bac kpropagation learning error is not passed through
them. Error calculation and bac kpropagation is only done through the Neurons in
the net w ork. Also b ecause in the full net w ork la y ers are fully connected but eac h
PBNo de only connects to a single Neuron p ostsynaptically . They are still able to
affect individual Neurons b ecause in the forw ard pass they connect their output to
their single paren t Neuron and an y other PBNo des connected to the same Neuron that
w ere created after them. Secondly they are differen t than the Neurons in the w a y they
learn. Rather than learning through bac kprop they use the same Cascade Correlation
learning system but use the bac kpropagated error receiv ed b y their paren t Neuron
rather than the p erceptron output error a traditional Cascade Correlation no de w ould
ha v e used.
T o help visualize this w e ha v e created Figs 4-3 and 4-4. Fig 4-3 sho ws the differ-
ences b et w een a traditional m ultila y er net w ork, a Resnet blo c k, and a PBDNN Blo c k.
Fig 4-4 sho ws the same, but with k ernels and planes rather than individual no des.
91
4.4.1 Proof of Concept T est
Before testing on large complex datasets, w e first tested the algorithm on smaller
datasets w e knew should w ork. Beginning with a trivial xor example with t w o input
neurons and one Neuron in the output la y er, w e w ere able to replicate the results
from the original Cascade Correlation pap er. Once this w as accomplished w e lo ok ed
to create a dataset that w ould test if Cascade Correlation could b e extended to a
m ultila y er net w ork with bac kprop of errors. T o this end w e decided to test on a
double xor problem. This problem tak es 4 input no des and aims to ha v e its single
output no de output 1 if and only if (no de1 xor no de2) xor (no de3 xor no de4) is a true
statemen t. The arc hitecture consists simply of 4 input no des, 2 hidden Neurons and
one output Neuron. After selecting correct parameters this net w ork is consisten tly
able to learn to classify this double xor problem with p erfect accuracy , t ypically after
t w o PBNo de adding ep o c hs. T raining set error rate is sho wn in Fig. 4-5.
4.4.2 Results
Next w e b egan tests on “real” computer vision datasets. F or our first test w e aimed
to test a m ultila y er p erceptron net w ork. F or suc h a simple net w ork a simple task w as
required, so w e pic k ed the MNIST [116] dataset. a net w ork w as created with a simple
29x29 input la y er for the pixels, to a hidden la y er of 40 Neurons, to an output la y er of
10 Neurons. T raining w as p erformed o v er the MNIST images un til no impro v emen t
w as seen for 10 ep o c hs in a ro w. A t this p oin t PBNo dess w ould b e added and training
w ould con tin ue. With this paradigm the first PBNo des w ere added after ep o c h 90
with a error rate of 5.3 p ercen t. after six PBNo des w ere added error rate w as reduced
to 3.9 p ercen t. Detailed results can b e seen in Fig. 4-6.
W e c hose not to w ork with MNIST for a larger con v olutional net w ork test b ecause
Cascade Correlation only w orks if there are consisten t t yp es of errors that the net w ork
mak es mistak es on, otherwise it will just o v erfit the training data. Ev en with a simple
arc hitecture MNIST can b e trained to sub 1% accuracy . In these cases, often errors
are significan t outliers without man y consistencies b et w een them. W e instead c hose to
92
train on NIST sp ecial database 19 sometimes referred to as EMNIST [65, 39]. This is
nearly iden tical to MNIST, as it is where MNIST actually comes from. Lik e MNIST
it also con tains handwritten classes of the same size, but has letters in addition to
n um b ers and significan tly more examples. W e compare our results to that of [36] the
state of the art on MNIST with an a v erage error of only 0.23%. Curren tly it is still
the lo w est error that exists for NIST19. NIST19 can b e split in 8 differen t w a ys. They
are as follo ws: n um b ers, letters with upp ercase and lo w ercase differen t classes, letters
with upp ercase and lo w ercase the same class, upp ercase letters, lo w ercase letters, all
letters and n um b ers, and the merged datasets. Merged is a splitting NIST decided
on whic h con tains all classes but c haracters suc h as upp ercase “O” and lo w ercase
“O” are considered the same class. This splitting con tains 37 classes for letters or
47 classes for letters and n um b ers. W e decided to test on the merged dataset with
47 classes b ecause w e w an ted the largest dataset, but did not lik e the concept of
ha ving to differen tiate b et w een these classes with iden tical features whic h w ould b e
indistinguishable ev en b y a h uman without global con text. This dataset also hase
m ultiple splits, the normal split (Merge) con tains all data while the balanced split
(Balance) reduces the n um b er of examples so ev ery class has an equal n um b er of
training images.
A t this time w e ha v e conducted four t yp es of tests of our system on this dataset.
The first is to simply run our system, and after the addition of eac h PBNo de to collect
results with more PBNo des and also what the results w ould ha v e b een without more
PBNo des. The second is to test if our P erforated Bac kpropagation system is actually
w orking, to ensure this is not simply w orking b ecause of cascade correlation on the
final la y er. F or this test w e compare adding PBNo des to all la y ers (F ull), adding
PBNo des only to the output la y er (T op), and adding PBNo des to all la y ers excluding
the output la y er (Bottom). The final test w as to incorp orate our system in to a mo dern
state-of-the-art residual neural net w ork. T o do this the PBNo des also receiv e input
from the skip connection and PBNo des are only added to the resnet no des at the final
la y er of blo c ks that ha v e skip connections and to the output la y er. Results of these
tests can b e seen in Fig. 4-7, Fig. 4-8, and Fig. 4-9. F or all tests w e utilized data
93
augmen tation tec hniques discussed earlier as w ell as drop out and batc h normalization
[84] so coun teract o v erfitting.
The fourth test w e ran w as testing if switc hing bac k and forth b et w een PBNo de
training and Neuron training w as required. This is ho w Cascade Correlation learning
w as done, but it w as not tested if the same error correlation calculation and w eigh t
up dating could b e done in parallel with normal bac kpropagation and w eigh t training.
F or this training metho d w e simply remo v ed the restriction of lo c king w eigh ts while
the other half of the net w ork trained. Our test w as run b y initializing the net w ork
with random w eigh ts while giving ev ery Neuron a single PBNo de that w ould up date
its w eigh ts at the same time as the Neurons. Results can b e seen in 4-10.
Figs 4-7, 4-8, and 4-9 all sho w evidence that out system can w ork. Figures are
only of a single run. Fig 4-7 sho ws that as more PBNo des are added error con tin ues
to b e able to b e reduced on b oth the training and test datasets. Fig 4-8 sho ws that
it is certainly the p erforated bac kpropagation system that is w orking and not just
traditional Cascade Correlation b eing applied at the top output la y er of the net w ork.
Finally Fig 4-9 sho ws that our algorithm is also compatible with a state of the art
resnet mo del. Eigh t additional Con v olutional tests w ere p erformed to collect data for
the con tin uous learning mo del. This mo del also is w orking, as o v er eigh t tests the
a v erage error rates for b oth training and testing are lo w er than a net w ork without
PBNo des. The same tests also had data collected for maxim um test accuracy reac hed.
These results can b e seen for training in Fig 4-11 and testing in 4-12. The results in
this b o xplots sho w that o v er eigh t separate tests there is a clear trend of b oth training
and test error b eing reduced as more PBNo des are added and trained.
4.4.3 Increase of Parameters
A dding PBNo des to the net w ork increases the total n um b er of no des and connections.
That means the n um b er of learnable parameters in the net w ork go es up. As discussed
earlier increasing parameters is asso ciated with an increase in accuracy across v arious
t yp es of net w orks. A final test w e p erformed w as comparing increasing parameters
with our metho d to a more traditional metho d of increasing parameters b y increasing
94
the n um b er of k ernels p er la y er suc h as in [200]. The results can b e seen in 4-13. T est-
ing with three widths, PBNo des w ere added to smaller initial widths un til the n um b er
of parameters surpassed larger widths. With width one our baseline con v olutional
net w ork has 81,053 parameters. As w e add PBNo des this n um b er go es to 161,691,
242,424, 323,252, 404,175, 485,193, and 566,306. Starting with three no des b egins
with 270,205 and con tin ues to 539,899, 809,784, 107,9860, 1,350,127, 1,620,585, and
1,891,234. T w elv e starts with 1,577,581 and gro ws to 3,154,219, 4,731,480, 6,309,364,
and finally 7,887,871. These n um b ers are the X co ordinates of the p oin ts on the
graph. This means width one surprasses the parameter coun t of width three after
adding three PBNo des and width three surprasses width t w elv e after adding fiv e. As
can b e seen in the c hart increasing the parameters simply b y increasing the width
results in a larger impro v emen t in accuracy than the impro v emen ts seen b y increasing
the n um b er of parameters b y adding PBNo des.
4.4.4 Possible Biological Implications
W e are not trying to argue that these PBNo des actually mirror the w a y dendrites
w ork. They do not p erform the same mathematical function, they do not form the
same t yp e of arb orization, they do not learn in the same manner, and are not created
in the same w a y . W e are simply arguing t w o p oin ts that mean ha ving PBNo des suc h
as these is more realistic than not ha ving them. First that it is more realistic to
ha v e presynaptic input go through additional nonlinear functions rather than direct
presynaptic-p ostsynaptic connections of traditional Artificial Neural Net w orks. Sec-
ond that PBNo des are closer in arc hitectural function to biological dendrites than
biological neurons b ecause they p ostsynaptically connect to only a single Neuron and
other PBNo des connected to that neuron.
Biology also supp orts our p erforated learning paradigm. Hebbian learning is an
accepted system in neuroscience, the idea that when neurons fire together temp orally
their connection is strengthened. Connection strengths are also mo dified with other
mec hanisms, man y of whic h transpire on the p ostsynaptic side of the synapse. As
some examples, p ostsynaptic cells exhibiting a rise in calcium can lead to increased
95
neurotransmitter receptors at synapses [59]. T o stabilize hebbian learning a v erage
lev el of p ostsynaptic dep olarization can decrease synapse strength [120, 1]. P ostsy-
naptic synapse size has b een sho wn to b e directly related to synapse strength [128].
Protein kinase A, found more concen trated in dendrites than the rest of the neuron,
regulates long term p oten tiation [204]. This idea that so m uc h of synapse strength is
mo dified b y the p ostsynaptic cell is evidence our p erforated bac kprop system, where
the error at p ostsynaptic No des is used to mo dify PBNo des, is not coun ter to biology .
W e hop e the results of this algorithm could also b e sho wn to b e more computa-
tionally biologically realistic than previous mo dels ha v e b een. Muc h w ork has b een
done trying to pro v e that curren t instan tiations of con v olutional neural nets already
pro vide a parallel to the biological brain [139, 104, 96]. But from our researc h w e
ha v e found significan t evidence against CNNs actually learning neurologically rele-
v an t represen tations of the h uman visual system.
This evidence exists b oth on the lev el of individual neurons and net w orks as a
whole. Researc h has sho wn that in biology individual neurons are sensitiv e to non-
acciden tal prop ert y (NAP) c hanges and therefore are co ding for some shap e asp ect
whic h is b e more in v arian t to metric c hanges than to NAP c hanges [94]. In artificial
CNNs “neurons” instead learn “unin terpretable solutions” and that the individual top
la y er units do not co de for an ything more sp ecific random linear com binations of top
la y er units [61, 202]. Other neural net w ork pap ers often sho w v arious input images
or pixel clusters that most strongly activ ate neurons to sho w that eac h particular
neuron is sensitiv e to a sp ecific qualit y and claim it learned something imp ortan t [101].
One group tested neurons at equal depths, from net w orks whic h had b een trained
and net w orks whic h instead had instead b een giv en random w eigh ts. T w o tests are
p erformed, one on MNIST handwritten digits [116] and a second on a deep er net w ork
against ImageNet images [44]. The results sho w that no des with random w eigh ts sho w
the same t yp e of results that a researc hers ha v e claimed pro v es the neurons learned a
useful seman tic prop ert y con tained in the training images. This exp erimen t suggests
when lo oking at the calculation of a single individual neuron in that net w ork, a
random set of w eigh ts could b e in terpreted to b e just as useful as a trained no de [178].
96
More evidence that CNNs are not utilizing a neurologically relev an t represen tation
comes from viewing net w orks as a whole, testing what kind of input images can trip
up a fully trained deep CNN. As discussed earlier, adv ersarial images, can confuse
artificial neural net w orks. If neurons co ded for 3-dimensional structure through a
hierarc h y of NAPs to geons to ob jects there w ould b e little difference in the top la y er
activ ations from these minor c hange to the input. Results also sho w ed across net w orks
and training sets the same adv ersarial examples w ould cause similar mistak es. This
suggests net w orks do learn a consisten t represen tation of the input space [178]. Other
researc h has sho wn CNNs ev en learn certain in v ariances to c hanges in input [202].
Ho w ev er, failing on these t yp es of c hanges pro vides evidence that the represen tation
and in v ariances learned are not b e the same as in the brain. F or man y of these exam-
ples a h uman view er w ould find the mo dified images literally indistinguishable from
their originals. One cannot argue that these deep net w orks are not doing impressiv e
calculations, but this is strong evidence that they are not doing them in neurologically
relev an t w a ys.
An final p oin t w e w an t to mak e ab out the biological relev ance of our new neuron
mo del is that neurons with activ e dendrites are actually capable of p erforming a
function pro v en to exist b y [94] while traditional artificial neurons are not. On an y
giv en binary training set, giv en enough time and the correct parameters, a Cascade
Correlation p erceptron will impro v e the error rate. Consider a dataset consisting of
non-acciden tal image features. Ev en in the simple case of learning arro w-v ertices vs
Y-v ertices a traditional artificial neuron could not learn this without m ultiple la y ers.
If a line segmen t w as tuned to pro vide p ositiv e activ ation to the neuron it could not
b e con tingen t on other line segmen ts in the receptiv e field. W e created a dataset of
this exact format. Images consisted of three line segmen ts b eing dra wn, originating at
the cen ter, rotated in 30 degree in terv als, and allo wing segmen ts to o v erlap, i.e. ha v e
one segmen t disapp ears. The net w ork arc hitecture consisted of a 11x11 input image
and a single output la y er with six neurons to categorize the inputs. Correct categories
w ere single line segmen ts (all three o v erlapping), straigh t lines (t w o line segmen ts in
opp osite direction), t w o segmen t corners, T-v ertices, Y-v ertices, and arro w-v ertices.
97
Results are sho wn in Fig 4-14. As is sho wn b efore allo wing PBNo dess to b e added
sum squared error stops b eing reduced after only reducing to ab out 0.23, whic h c hance
b eing at 0.35. With PBNo des added that error con tin ues to b e reduced un til ac hieving
a sum squared error of only 0.05. W e do not kno w if our whole net w orks will b e more
biologically realistic, or ev en if our Neuron blo c ks will, in fact, learn more biologically
relev an t features suc h as this. Ho w ev er, this simple example sho ws that giv en the
correct error signal our Neurons p ossess this capabilit y to learn suc h features that
un til no w w ould ha v e tak en m ultiple la y ers of neuronal connections to p erform.
4.4.5 Neurogenesis and Artificial Intelligence
Ov erview
An additional p oin t to mak e ab out our system is it is a mo dern example of a net w ork
whic h implemen ts artificial neurogenesis. This section will discuss neurogenesis b oth
in the biological brain and applied to artificial neural net w orks. Most neurogenesis in
the brain o ccurs during early dev elopmen t. But it has b een sho wn to also o ccur in the
adult brain. In artificial neural net w orks, most mo dern algorithms do not utilize this.
But it w as a larger topic historically , and some researc h con tin ues to b e p erformed in
recen t y ears.
Biological Neurogenesis
“One of the most unexp ected findings in neuroscience has b een the realization that
neurogenesis is not limited to early stages of dev elopmen t. New neurons con tin ue to b e
b orn throughout adultho o d and b ecome incorp orated in to neural circuits. Ho w ev er,
adult neurogenesis is limited to t w o t yp es of neurons in t w o brain regions: inhibitory
gran ule cells in the olfactory bulb and the excitatory gran ule neurons of the den tate
gyrus.” - [93]
Once it w as disco v ered that neurogenesis do es, in fact, o ccur in the adult brain
v arious exp erimen ts ha v e b een p erformed to see the effects of this neurogenesis. These
exp erimen ts use metho ds whic h ha v e b een sho wn to harm or enhance adult neuroge-
98
nesis, meaning eac h effect sho wn is also pro viding evidence that adult neurogenesis
exists and causes functional differences in the adult brain. The strongest effect, as
exp ected, w as on learning and memory . Results ha v e b een similar b et w een olfactory
bulb and den tate gyrus [156, 38, 155, 88]. This o v erview will fo cus on the den tate
gyrus.
The memory function h urt the most b y loss of neurogenesis in the adult brain
is called pattern separation. This means b eing able to tell the difference b et w een
images or sp ecial configurations whic h are similar to eac h other [93]. Both lesion and
genetic studies in ro den ts ha v e sho wn that the den tate gyrus is a brain area largely
in v olv ed in pattern separation [156, 38, 155, 88]. Evidence for the same function
in h umans has b een sho wn in fMRI studies [11, 107]. Ho w ev er, all exp erimen ts on
mo difying adult neurogenesis, of course, w ere conducted with animal mo dels. One
set of spatial pattern separation exp erimen ts w as conducted b y Claire Clelland at
UCSF. Clelland used x-irradiation in exp erimen tal mice to sh un t neurogenesis [131].
T w o tasks w ere tested, one with t w o c hoices to select on a touc h screen and another
in a radial arm maze. Eac h of these tasks rely on spatial pattern separation. Re-
sults sho w ed that the con trol group w as b etter at making fine spatial discriminations
but large discriminations w ere still able to b e made b y the mice who receiv ed the
radiation [38]. In addition to these exp erimen ts sho wing the results of inhibiting
neurogenesis, Saha y et al. p erformed an exp erimen t, using genetic mo dification, to
see the results of increasing neurogenesis. F ollo wing the later exp erimen t Saha y used
a fear discrimination task and sho w ed increased adult neurogenesis do es in fact in-
crease spatial pattern separation [155]. Another exp erimen t tested the same effect on
ob ject recognition. This w as not just visual, it w as instead based on a ph ysical 3d
ob ject whic h a rat w as able to see, but also explore with b y an y other means. The
exp erimen t first placed a rat in a c ham b er with t w o copies of an ob ject, and then
tested in a c ham b er with a no v el ob ject and a third cop y of the no w familiar ob ject.
Times w ere compared b et w een exploring the no v el ob ject and exploring the familiar
ob ject, relying on a rat’s preference to explore no v el ob jects [171]. In this exp erimen t
neurogenesis w as inhibited b y inhibiting WNT signaling [121], with b oth high and
99
lo w kno c kdo wn amoun ts. After a 3 hour dela y rat’s in the high kno c kdo wn group
explored the familiar ob ject with statistically significan t increased times compared to
the con trol and lo w kno c kdo wn groups [88]. In addition to this neurogenesis sho wn
in health y adult rats, after artificially induced strok es in rats neurons normally pro-
duced in these regions are able to migrate to the strok e site where they mature. Exact
function of these neurons after migration is not w ell understo o d [8]. In the health y
brain ho w ev er, neurogenesis app ears to primarily b e used to store new memories of
relationships and differences b et w een concepts enco ded b y other neurons. In compu-
tational mo deling, most neurogenesis w ork has b een done on successfully enco ding
the concepts themselv es.
Historical Computational Neurogenesis
There are four um brella categories of algorithms. I will touc h up on the individual
adv an tages of neurogenesis in eac h case. In all cases, ho w ev er, one of the problems
with neural net w ork design is that it requires a deep understanding of the task to
ev en start with a go o d guess for a strong arc hitecture. F ailure to fit a training set in a
net w ork will alw a ys b e caused b y insufficien t capacit y b y not ha ving enough neurons
or la y ers [137]. On the other hand, to o man y can cause o v erfitting whic h will fail
on the test set if it is unable to generalize. Whatev er the starting p oin t, after one
is pic k ed getting an ideal arc hitecture requires m uc h trial and error [198, 78]. F or
these reasons, and certainly more, computer scien tists ha v e approac hed this problem
of neurogenesis in artificial neural net w orks.
The first category is Cascade-Correlation [51] whic h has already b een discussed in
detail. As far as biological plausibilit y , Cascade-Correlation lea v es a lot to b e desired.
This metho dology w ould imply neurons fire one at a time since their output is needed
as input for ev ery higher order neuron, whic h is completely unreasonable with the
sp eed of the visual system [180, 181]. Ho w ev er, in our implemen tation thinking of the
new no des as dendrites, it is m uc h more reasonable. Simply in terms of its compu-
tational capabilities, it is quite impressiv e. The main c hallenge the original authors
completed w as the t w o-spirals problem. The imp ortan t adv ance Cascade-Correlation
100
ac hiev ed w as the w orkload it required to solv e this problem compared to comp et-
ing algorithms. The strongest opp osition came from a net w ork called Quic kprop [50]
whic h had three la y ers with fiv e hidden units in eac h. Quic kprop w as able to complete
the c hallenge in 8000 training ep o c hs. Ov er 100 runs with cascade correlation the al-
gorithm a v eraged only 1700 training ep o c hs to ac hiev e success. The n um b er of hidden
units v aried from 12 to 19, but the a v erage w as 15.2 and the median w as 15, the same
as in Quic kprop. In addition, while the ep o c h reduction is only a factor of ab out 5
the amoun t of computation reduction, b y not actually doing full bac k-propagation,
is immense. The authors use a term “connection crossings” to mean “the n um b er of
m ultiply-accum ulate steps necessary to propagate activ ation v alues forw ard through
the net w ork and error v alues bac kw ard.” In Quic kprop’s 8000 ep o c hs there are 438
million crossings. Cascade correlation requires only 19 million, meaning 23 times the
sp eed on a serial mac hine [51].
The next category are algorithms whic h form la y ers lik e a traditional neural net-
w ork, where eac h la y er’s units only connect to the previous la y er. Within this category
w ere three sub categories, forming of new la y ers, adding new no des to curren t la y ers,
and one metho d whic h creates no des and connections in a tree structure. The sim-
plest of these mo dels w ere those with a fixed n um b er of hidden la y ers [137, 78, 193].
In eac h, a curren t structure w ould b e trained, and it w ould then b e determined if the
output error w as satisfactory . If it w as not, a new no de w ould b e added. One metho d
w as to split a paren t cell in to t w o daugh ter cells whic h w ere enforced to not c hange
the initial output of the net w ork [137]. Another metho d simply added a new no de
with random w eigh ts [78]. A third metho d had an in teresting correlate to biology .
This inno v ation to the standard caused older neurons in the hidden la y ers to fix their
w eigh ts and only allo w new hidden neurons and the output neurons to learn. This
w ould help with catastrophic in terference [126] with old neurons to retain memories
while new neurons and the output w ould still learn new information [193]. Once new
no des w ere added the net w ork w ould b e trained again. Some also had a final step of
trying to prune a w a y an y extra no des they created and retaining the pruned net w ork
if its error w as not m uc h w orse after pruning [78]. All w ere tested on if they could
101
generate an X OR net w ork, and all succeeded. All algorithms of this t yp e are b etter
than just bac k-propagation. Sp ecifically in their abilit y to escap e lo cal minima, a
deficit in an y gradien t descen t algorithm suc h as bac k-propagation. They do also
ha v e the tidiness of ensuring ev en tual con v ergence, whic h bac kprop migh t nev er do
with to o few no des [137, 78, 193]. Ho w ev er, for problems more adv anced than X OR
these algorithms could b e at risk for o v erfitting if to o man y no des are allo w ed to b e
added [82, 182].
The third set of algorithms I will describ e in depth is those that add la y ers. One
of the first to do this is called The Tiling Algorithm [129]. The algorithm starts b y
training a p erceptron to the b est of its abilit y on a binary input v ector. Once trained,
a complicated lo op b egins.
There is no w a no de in its o wn la y er, fully connected to all no des in the previous
la y er, whic h ha v e undergone p erceptron training. This no de is called is called the
“Master No de.” Once training is complete the connections to the “Master No de” ha v e
their w eigh ts are lo c k ed in place. If this no de, acting as a p erceptron, fully classifies
the input data the lo op is complete. Otherwise a set of no des are created to b e its
neigh b ors in a new top la y er. These no des are created based on all p ossible sets of
v alues in the curren t top la y er.
F or eac h p ossible pattern of v alues, ie all p ossible binary assignmen ts for the
curren t no des in the top la y er, the pattern is determined to b e faithful or unfaithful.
It is faithful if for that pattern the output is alw a ys consisten t. A pattern in the
curren t la y er is unfaithful if the same binary output v alues for the no des in that
la y er can map to either a 0 or a 1 as a correct output for the total net w ork. One
unfaithful pattern in the top la y er is selected. A new pro visional no de is created
whic h forms connections to the previous la y er. Output v alues for this no de will b e
app ended to eac h other output pattern, doubling the total n um b er of p ossible output
patterns. This pro visional no de no w p erforms p erceptron learning on all input v ectors
whic h activ ate the selected unfaithful pattern of the other no des in the top la y er and
its w eigh ts are then lo c k ed. If this no de can correctly classify these input v ectors
that original pattern is no w faithful. All patterns whic h con tain this pattern will
102
con tin ue to b e faithful as patterns gro w larger with new added no des. If the new
no de is not able to classify all input v ectors whic h activ ate the unfaithful pattern the
new no des is still retained. In this case the set of input v alues whic h pro duced the
original unfaithful pattern are no w split in to t w o smaller sets. These sets corresp ond
to the original pattern with 0 or 1 app ended for the pro visional no des v alue. Because
these sets are made smaller with the addition of ev ery new no de the top la y er is
ev en tually able to pro duce only faithful patterns. F or the curren t la y er if all patterns
are faithful mo v e to next la y er. otherwise rep eat this lo op. Once all patterns are
faithful a new “Master No de” can b e created and trained to classify all inputs with
p erceptron learning again. Eac h constructed la y er will ha v e few er no des than the one
b efore it so the pro cess will ev en tually complete [129].
Another algorithm discussed earlier, Upstart [54], built up on this. Upstart tested
on the same problems and in all cases Tiling required more training, more final no des,
and generalized w orse than Upstart. Ho w ev er, with no des that are nonlinear, Tiling
uses neurogenesis to actually create a traditional m ultila y er artificial neural net w ork
while Upstart uses linear no des whic h could b e condensed to a single hidden la y er
Another task whic h w as approac hed b y Tiling and Upstart w as mapping sets
of random b o olean patterns where eac h pattern w as assigned a 0 or 1 with equal
probabilit y . Both Tiling and Upstart w ere able to solv e this problem [54, 129]. With
this example in mind, It is ob vious to see the o v erfitting these algorithms w ould
cause. One of the imp ortan t asp ects of artificial neural net w orks is their abilit y to
generalize. With an y input data where correct output v alues asso ciated with them
are not p erfectly split apart from eac h other an “optimal” net w ork with a binary
output w ould ha v e to b e allo w ed to mak e some mistak es on training data in order to
minimize mistak es on test data. An y tree or m ulti la y er p erceptron that trains un til
100% accuracy on training data could cause significan t o v erfitting, whic h w ould fail
to generalize on new test data [78, 198, 82].
The final category w ere those that did not follo w the la y ered feedforw ard p ercep-
tron structure or learning algorithm. A w ell kno wn tec hnique of these is A daptiv e
Resonance Theory . This mo del also w as designed with catastrophic in terference in
103
mind. Rather than just reducing or stopping learning in older neurons these mo dels
instead will only add new neurons if an input is seen whic h demanded a mo dification
w ould b e to o significan t. The system has only an input la y er and a fully connected
output la y er. It utilizes a winner-tak e-all metho d where only the neuron most strongly
activ ated b y the input can learn. This neuron then sends a top-do wn signal based on
what it curren tly co des for, testing the o v erlap b et w een its curren t strengths and the
input seen. Prior to training a vigilance parameter is c hosen. If this o v erlap is strong
enough compared to the vigilance selected the no de learns a new set of w eigh ts whic h
is the fusion of its old w eigh ts and the new input. If the o v erlap is not strong enough a
new no de is created. An example learning alphan umeric c haracters is sho wn in 4-15
with vigilance parameters requiring the no des represen tation compared to the input
to b e at least 0.5 or 0.8 [30]. Ov er the y ears this metho d has b een extended suc h as
allo wing analog inputs, and sup ervised learning, [30, 31].
A second theme in this final neurogenesis category w as creation of self organizing
maps. These are a metho d of organizing high-dimensional data p oin ts on to a lo w-
dimensional grid. Maps learn an organization suc h that similar inputs are mapp ed
to closer no des in the grid than dissimilar ones. The most familiar of these is the
K ohonen net w ork [99]. A dding neurogenesis to these net w orks can b e done b y v arious
metho ds. A simple metho d to do this increases the grid b y adding ro ws or columns
in b et w een curren t ro ws and columns un til input space is mapp ed at satisfactory
resolution [150, 57]. More complex options also exist suc h as inserting no des one at
a time as neigh b ors of the no de whic h has b een the strongest activ ated unit for the
maxim um n um b er of recen t inputs [56, 57].
Relationship to this Pro ject
Muc h w ork has b een done in the past on neurogenesis in artificial neural net w orks.
One unfortunate theme from m y researc h is that there are almost no pap ers on
this topic with a large n um b er of citations in recen t y ears. I w ould theorize this
is b ecause curren t scien tists are fo cusing on pushing the b oundaries of the accuracy
of neural net w orks. Most w ork in computational neurogenesis I ha v e discussed tests
104
their mo dels on problems that are already solv ed to sho w that they can b e solv ed
with differen t, smaller, and more efficien t net w orks. In terms of biological plausibilit y
of this approac h it ma y seem discouraging that in the visual system neurogenesis
do es not o ccur in adults, or ev en after gestation at all. Evidence has sho wn in the
neo cortex w e are generally b orn with all the neurons w e get, and w e actually end up
with less surviving through adultho o d [93, 18].
Our system is a mo dern example of artificial neurogenesis. It builds up on the old
system of Cascade Correlation and brings it in to an implemen tation compatible with
mo dern state of the art paradigms. Ho w ev er, our neurogenesis is not made with the
goal of relating to biological neurogenesis. Instead it aims to replicate neurons simply
creating new synapses and branc hing new dendrites. This t yp e of no de addition
actually do es o ccur in biological brains after birth. Exp erimen ts ha v e sho wn dendritic
arb orization con tin ues after birth. It has ev en b een sp ecifically b een sho wn to o ccur
during the critical p erio d of dev elopmen t in the visual system [22, 170, 146, 186].
4.4.6 F uture work
The next tests to run are if our system is able to actually increase maxim um accuracy .
Our tests ha v e sho wn that increasing the k ernels p er la y er results in b etter accuracy
than adding equiv alen tly adding parameters with PBNo des. With a significan t in-
crease in training time and a similar increase in computation complexit y increasing
the size in a traditional manner could b e b etter than with our system. Ho w ev er, with
traditional means ev en tually test accuracy cannot b e impro v ed as net w ork size gets
progressiv ely larger. Ev en tually more no des p er la y er and more la y ers b egin to cause
o v erfitting. F uture tests planned on our system will sho w if it is p ossible to increase
the accuracy once a maxim um arc hitecture has b een found. W e also aim to test our
p erforated bac kprop system using other learning paradigms for the PBNo des suc h as
using normal bac kpropagation rather than Cascade Correlation.
105
(a) (b) (c)
(d) (e)
Figure 4-3: T op three images are to giv e the reader a sense of the color sc heme.
Blac k lines sho w connections b et w een presynaptic Neurons and final p ostsynaptic
Neurons. Y ello w lines sho w connections b et w een presynaptic Neurons and hidden
la y er 2 Neurons. Red lines sho w connections b et w een presynaptic Neurons and hidden
la y er 1 Neurons. Blue lines sho w connections b et w een hidden la y er 1 Neurons and
hidden la y er 2 Neurons. Finally Green lines sho w connections b et w een hidden la y er 2
Neurons and final p ostsynaptic Neurons. T o b e clear the blac k lines simply represen t
a fully connected presynaptic and p ostsynaptic la y er, the curv es are to allo w more
clear visualization in the b ottom images. The top three images simply represen t
traditional m ulti-la y er p erceptron net w orks with v arious n um b ers of fully connected
la y ers. (d) is a traditional Resnet blo c k. It lo oks iden tical to the three la y er net w ork
blo c k in terms of fully connectedness, with the addition of skip connections b et w een
the first presynaptic la y er and the final output la y er. (e) sho ws a PBDNN blo c k
where the p ostsynaptic Neurons ha v e created t w o PBNo des eac h rather than ha ving
t w o hidden Neuron la y ers. Eac h PBNo de is receiving input from the Presynaptic
Neurons sho wing full connectedness of red and y ello w lines. Ho w ev er, blue lines
no w only are connected to a single PBNo de in the follo wing la y er, and green lines
are connected only from the PBNo des to the to the single final No de itself. In the
F orw ard propagation step, activ ation passes through all colored connections. In the
bac kw ard propagation step, error is eliminated rather than going through y ello w and
red connections.
106
Figure 4-4: The top image sho ws a traditional con v olutional neural net w ork setup. A
presynaptic la y er forms connections to all Neurons in the p ostsynaptic la y er. These
input connections are represen ted b y w eigh t v alues of k ernels in the p ostsynaptic la y er
whic h con v olv e on the presynaptic la y er to form set of planes of activ ation v alues.
These planes all form connections to all of the Neurons in the next la y er in the same
manner. The b ottom image sho ws the planes of a PBDNN. PBNo de k ernels receiv e
input and calculate activ ation planes in the same w a y , with eac h PBNo de forming
a plane of activ ation v alues based on their con v olution around the input planes.
Ho w ev er, the planes they form do not form connections to ev ery Neuron k ernel in
the p ostsynaptic la y er. The single activ ation v alue at eac h lo cation in a PBNo de
plane only is passed as a single additional input to the corresp onding Neuron in the
same lo cation in the p ostsynaptically . This means ev ery Neuron k ernel only adds
one single input connection p er PBNo de k ernel it adds. I.e. with an NxN k ernel the
p ostsynaptic Neuron will ha v e N*N+1 input connections with a single PBNo de and
N*N+2 with t w o.
107
Figure 4-5: Graph of training sum squared error on t w o la y er xor test. The first three
blue bars corresp ond to b eginnings of ep o c hs whic h did not add PBNo des, green bars
corresp ond to the b eginnings of ep o c hs where PBNo des w ere added. X-axis is batc hes
run. Sum squared error c hance probabilit y is 0.125 (where the flat p ortion rests).
108
Figure 4-6: Graph of training error on MNIST dataset with neurogenesis steps at 90,
197, 262, 359, 411, and 448. These ep o c hs are mark ed with v ertical green lines. Before
eac h time p oin t error can b e seen to flatline, b efore b eing reduced once again after
passing the neurogenesis ep o c hs. T otal error for the 29x29-40-10 net w ork thresholds
at 5.3 p ercen t without an y PBNo des and con tin ues b eing reduced to 3.9 p ercen t after
500 ep o c hs and the addition of six PBNo des
109
Figure 4-7: This test w as conducted also using the metho d of early stopping on the
EMNIST Merged Dataset. After 200 ep o c hs the net w ork with the b est accuracy is
loaded b efore adding additional PBNo des. PBNo des are added after Ep o c hs 184,
202, 205, 251, 251, 260, 266, and 266, with duplicates meaning that PBNo de addition
did not impro v e accuracy . The thic k blue line indicates training error for the final 8
PBNo de net w ork while the thic k red line indicates test error. All other dotted lines
sho w ho w the accuracies progressed b efore loading the early stopp ed time p oin t of
maxim um v alidation accuracy , meaning this is ho w training w ould progress if more
PBNo des w ere not added. As can b e seen eac h time a new PBNo de is added accuracy
con tin ues to impro v e, while accuracies stagnate or b ecome w orse without the addition.
V ertical blac k lines sho w timep oin ts where PBNo des w ere added, with double lines at
251, and 266 signifying early stopping determined the first no de added at that time
step did not impro v e accuracy .
110
Figure 4-8: In this test only a single single set of PBNo des is added to a net w ork
running on the EMNIST Merged Dataset. T raining w as p erformed on a traditional
net w ork with no PBNo des through 285 ep o c hs un til accuracy no longer impro v ed.
A t this time a set of PBNo des w as created for all Neurons in the net w ork. T ests
w ere then con tin ued b y adding the PBNo des to all Neurons, to only the Neurons in
the top la y er, or only excluding neurons in the top la y er. Results sho w that adding
only PBNo des to the top do es not actually impro v e the net w ork, while adding to just
the b ottom do es impro v e results, but adding to the full net w ork impro v es results the
most significan tly . V ertical green line sho ws timep oin t where PBNo des w ere added.
111
Figure 4-9: This test is conducted on a residual neural net w ork on EMNIST Merge
and Balance datasets. Once again early stopping w as conducted with the b est net w ork
after 200 ep o c hs b eing loaded b efore PBNo des are added. F or the balance dataset
this w as at ep o c h 118 while the merge dataset added PBNo des at 128, sho wn b y the
v ertical blue and red lines. Dotted lines sho w ho w learning prograssed without the
addition of the PBNo des b efore loading the time p oin t of maxim um accuracy .. These
results sho w our PBDNN system is compatible with mo dern resnets.
112
Figure 4-10: Bo x plots comparing training and test error for eigh t tests of normal
(no PBNo des), con tin uous training, maxim um accuracy reac hed b y adding b et w een
three and eigh t PBNo des, and maxim um accuracy reac hed after only one PBNo de.
Exact PBNo des created for the maxim um condition are as follo ws: 2 No des: 2 T ests,
3 No des: 1 T est, 5 No des: 3 T ests, 7 No des: 1 T est, 8 No des: 1 T est. As can b e seen
Con tin uous training outp erforms all others on the training data. Ho w ev er, on T est
data it is comparable to the accuracies reac hed with a single PBNo de and w orse than
adding m ultiple PBNo des. More testing is still required on if con tin uous training is
p ossible with m ultiple PBNo des as w ell.
113
Figure 4-11: Bo x plots comparing training error of eigh t tests b et w een a net w ork with
no no des and the same net w ork after adding b et w een one and eigh t no des. Exact
PBNo des created for the are as follo ws: 0 No des: 8 T ests, 1 No de: 8 T ests, 2 No des:
8 T ests, 3 No des: 6 T ests, 4 No des: 5 T ests, 5 No des: 5 T ests, 6 No des: 2 T ests, 7
No des: 2 T ests, 8 No des: 1 T ests.
114
Figure 4-12: Bo x plots comparing training error of eigh t tests b et w een a net w ork with
no no des and the same net w ork after adding b et w een one and eigh t no des. Exact
PBNo des created for the are as follo ws: 0 No des: 8 T ests, 1 No de: 8 T ests, 2 No des:
8 T ests, 3 No des: 6 T ests, 4 No des: 5 T ests, 5 No des: 5 T ests, 6 No des: 2 T ests, 7
No des: 2 T ests, 8 No des: 1 T ests.
115
Figure 4-13: This graph sho ws training accuracy on the EMNIST Balance dataset
with con v olutional net w orks of depth three. Y axis is accuracy , while X axis sho ws the
total n um b er of learnable parameters in the net w ork. The three categories represen t
the width of the net w ork. Width terminology is tak en from [200]. It simply means
a width of one is the starting arc hitecture for a deep con v olutional net w ork. A width
of three has three times as man y k ernels in eac h la y er, while a width of t w elv e has
t w elv e times as man y k ernels. Squares p oin ts represen t the n um b er of parameters
in the original con v olutional net w ork and eac h circular p oin ts represen t net w orks
after an additional set of PBNo des has b een added. Eac h accuracy sho wn is the
maxim um accuracy ac hiev ed b y the net w ork at whic h p oin t more ep o c hs do not
con tin ue impro v emen t. V ertical green lines are sho wn to giv e a clear visual of the
n um b er of parameters in the baseline net w orks of eac h size. With these v ertical lines
it can b e seen that after adding four PBNo des a net w ork with initial width one has
more parameters than the initial width of a net w ork of width three. A net w ork of
width three m ust add fiv e PBNo des b efore it con tains more parameters than the
initial size of a net w ork of width t w elv e.
116
Figure 4-14: T raining sum squared error rate while learning a v ertices dataset. Blue
bars represen t the ends of Ep o c hs without adding PBNo des, Green bars represen t the
end of Ep o c hs where PBNo des w ere added. X-axis is batc hes run.
117
Figure 4-15: T emplates of no des created b y AR T on alphan umeric c haracters. Figure
from [30]
118
Chapter 5
Summary of Contributions and
References
T o summarize this dissertation t w o exp erimen ts w ere p erformed. Both made decisions
founded on the idea that in mo dern artificial in telligence and computer vision the
h uman brain still surpasses the b est algorithms in a significan t p ortion of applications.
The h uman brain is comp osed of around 100 billion neurons with 100 trillion synapses
and trains o v er a lifetime. Mo dern neural net w orks can only do so m uc h to comp ete.
In these t w o pro jects one p erforms a w ork around to the deficits b y using the h uman
brain in the task, while the other uses biological inspiration to c hange artificial neural
net w ork learning in a w a y that has nev er b een done b efore.
The first exp erimen t uses a h uman-in-the-lo op system of computer vision. T ypi-
cally computer vision tasks m ust mak e decisions on their o wn, and are part of systems
that are en tirely artificial. In this pro ject w e instead use a system that in v olv es b oth
a digital camera with computer vision pro cessing of images, and also a h uman who is
not able to pro vide vision. The computer vision system giv es audio instructions to the
user who mak es decisions ab out those instructions and mo v es through the exp erimen t
space. Using this proto col 42 participan ts ran 15 trials eac h in a sim ulated gro cery
store setting using a blindfold where they m ust select the correct gro cery store item
among 4 distractor ob jects from the same category . In all 630 trials across all sub jects
the correct item w as selected 100% of the time. This exp erimen t sho ws an assistiv e
119
vision tec hnology whic h can pro vide p erfect decision making to a user b y relying on
the user’s o wn in telligence and mobilit y to supplemen t the system. In a field where
ev en a small c hance for mistak es migh t cause a user to decide not to use a device
100% accuracy is a significan t result.
The second pro ject has sho wn a no v el mo dification to artificial neural net w ork
learning. Based on biological dendrite’s non-linear spiking, new no des can also b e
added to artificial net w orks to mo del dendritic branc hes the same w a y p erceptrons
mo del neurons. These no des are used to mo dify the w a y presynaptic input is pro-
cessed to affect only a single neuron rather than fully connected no des lik e in tradi-
tional net w orks. W e dev elop ed a unique c hange to bac kpropagation to allo w these
no des to learn. This c hange rev olv es around the theory that in the biological brain
learning is only affected b y neuronal firing and not b y the non-linearities within den-
drites. P erforated Bac kpropagation follo ws this theory b y only bac kpropagating error
through the main neuron no des of a net w ork. Separate dendritic no des learn as if
the single output neuron they connect to is a single no de in a final output la y er,
and the bac kpropagated gradien t is the final output error. In our cascade correlation
learning implemen tation dendrite no des learn input w eigh ts to only help reduce error
for a single no de in the net w ork. On the forw ard pass they p erform all math in the
same w a y . Using this framew ork our results reliably sho w starting with a m ultila y er
p erceptron, a deep con v olutional neural net w ork, or a residual neural net w ork once
error has reac hed a threshold it can b e reduced further b y adding dendrite no des to
the curren t neuron no des whic h ha v e b een maximally trained and allo wing training
to con tin ue.
This dissertation sho ws ho w researc h in neurobiology can inform c hoices for re-
searc h paths to pursue in artificial in telligence and computer vision. The second
pro ject brings artificial neural net w orks a small step closer follo wing the functional
system of the brain. Ho w ev er, with the brain b eing the incredible computational
device that it is, artificial in telligence still has a lot to catc h up on b efore reac hing
h uman accuracy on most tasks. This also means, there is still m uc h ab out the brain
artificial in telligence researc hers can learn from.
120
Bibliography
[1] Larry F Abb ott and Sac ha B Nelson. Synaptic plasticit y: taming the b east.
Natur e neur oscienc e , 3(11s):1178, 2000.
[2] Aminat A debiyi, Nii Man te, Carey Zhang, F urk an E Sahin, Gerard G Medioni,
Armand R T angua y , and James D W eiland. Ev aluation of feedbac k mec hanisms
for w earable visual aids. In Multime dia and Exp o W orkshops (ICMEW), 2013
IEEE International Confer enc e on , pages 1–6. IEEE, 2013.
[3] Saleh Alb elwi and Ausif Mahmo o d. A framew ork for designing the arc hitectures
of deep con v olutional neural net w orks. Entr opy , 19(6):242, 2017.
[4] John Aloimonos, Isaac W eiss, and Amit Bandy opadh y a y . A ctiv e vision. Inter-
national journal of c omputer vision , 1(4):333–356, 1988.
[5] Ori Amir, Irving Biederman, and Kenneth J Ha yw orth. Sensitivit y to nonacci-
den tal prop erties across v arious shap e dimensions. Vision R ese ar ch , 62:35–43,
2012.
[6] Srdjan D An tic, W en-Liang Zhou, Anna R Mo ore, Shaina M Short, and Kate-
rina D Ik onom u. The decade of the dendritic nmda spik e. Journal of neur o-
scienc e r ese ar ch , 88(14):2991–3001, 2010.
[7] Sanjeev Arora, A dit y a Bhask ara, Rong Ge, and T engyu Ma. Pro v able b ounds
for learning some deep represen tations. In International Confer enc e on Machine
L e arning , pages 584–592, 2014.
[8] Andreas Arvidsson, T o v e Collin, Deniz Kirik, Zaal K ok aia, and Olle Lind-
v all. Neuronal replacemen t from endogenous precursors in the adult brain after
strok e. Natur e me dicine , 8(9):963, 2002.
[9] Mic hael A Bab y ak. What y ou see ma y not b e what y ou get: a brief, non tec hnical
in tro duction to o v erfitting in regression-t yp e mo dels. Psychosomatic me dicine ,
66(3):411–421, 2004.
[10] Andrew D Bagdano v, Alb erto Del Bim b o, and W alter Nunziati. Impro ving
eviden tial qualit y of surv eillance imagery through activ e face trac king. In 18th
International Confer enc e on Pattern R e c o gnition (ICPR’06) , v olume 3, pages
1200–1203. IEEE, 2006.
121
[11] Arnold Bakk er, C Bro c k Kirw an, Mic hael Miller, and Craig EL Stark. P at-
tern separation in the h uman hipp o campal ca3 and den tate gyrus. Scienc e ,
319(5870):1640–1642, 2008.
[12] Herb ert Ba y , Tinne T uytelaars, and Luc V an Go ol. Surf: Sp eeded up robust
features. In Eur op e an c onfer enc e on c omputer vision , pages 404–417. Springer,
2006.
[13] Bardia F Behabadi, Alon P olsky , Monik a Jadi, Jac kie Sc hiller, and Bartlett W
Mel. Lo cation-dep enden t excitatory synaptic in teractions in p yramidal neuron
dendrites. PL oS c omputational biolo gy , 8(7):e1002599, 2012.
[14] Rob ert M Bell and Y eh uda K oren. Lessons from the netflix prize c hallenge.
ACM SIGKDD Explor ations Newsletter , 9(2):75–79, 2007.
[15] Y osh ua Bengio et al. Learning deep arc hitectures for ai. F oundations and
tr ends R ○ in Machine L e arning , 2(1):1–127, 2009.
[16] Y osh ua Bengio, P ascal Lam blin, Dan P op o vici, and Hugo Laro c helle. Greedy
la y er-wise training of deep net w orks. In A dvanc es in neur al information pr o-
c essing systems , pages 153–160, 2007.
[17] Thomas Berg and P eter N Belh umeur. T om-vs-p ete classifiers and iden tit y-
preserving alignmen t for face v erification. In BMVC, v olume 2, page 7. Citeseer,
2012.
[18] Ratan D Bhardw a j, Maurice A Curtis, Kirst y L Spalding, Bruce A Buc hholz,
Da vid Fink, Thomas Björk-Eriksson, Claes Nordb org, F red H Gage, Henrik
Druid, P eter S Eriksson, et al. Neo cortical neurogenesis in h umans is re-
stricted to dev elopmen t. Pr o c e e dings of the National A c ademy of Scienc es ,
103(33):12564–12568, 2006.
[19] Irving Biederman. Recognition-b y-comp onen ts: a theory of h uman image un-
derstanding. Psycholo gic al r eview , 94(2):115, 1987.
[20] Irving Biederman and Eric E Co op er. Priming con tour-deleted images: Evi-
dence for in termediate represen tations in visual ob ject recognition. Co gnitive
psycholo gy , 23(3):393–419, 1991.
[21] Marten Bjorkman and Jan-Olof Eklundh. Vision in the real w orld: Finding,
attending and recognizing ob jects. International Journal of Imaging Systems
and T e chnolo gy , 16(5):189–208, 2006.
[22] S Borges and M Berry . The effects of dark rearing on the dev elopmen t of the
visual cortex of the rat. Journal of Comp ar ative Neur olo gy , 180(2):277–300,
1978.
122
[23] Anna Bosc h, Andrew Zisserman, and Xa vier Munoz. Represen ting shap e with a
spatial p yramid k ernel. In Pr o c e e dings of the 6th ACM international c onfer enc e
on Image and vide o r etrieval , pages 401–408. A CM, 2007.
[24] Y-Lan Boureau, Jean P once, and Y ann LeCun. A theoretical analysis of feature
p o oling in visual recognition. In Pr o c e e dings of the 27th international c onfer enc e
on machine le arning (ICML-10) , pages 111–118, 2010.
[25] Calv ert L Bo w en I I I, Timoth y K Buennemey er, Ingrid Burb ey , and Vinit Joshi.
Using wireless net w orks to assist na vigation for individuals with disabilities.
In California State University, Northridge Center on Disabilities’ 21st Annual
International T e chnolo gy and Persons with Disabilities Confer enc e , 2006.
[26] Tiago Branco and Mic hael Häusser. Synaptic in tegration gradien ts in single
cortical p yramidal cell dendrites. Neur on , 69(5):885–892, 2011.
[27] Leo Breiman. Random forests. Machine le arning , 45(1):5–32, 2001.
[28] Rob ert Gro v er Bro wn, P atric k YC Hw ang, et al. Intr o duction to r andom signals
and applie d Kalman filtering , v olume 3. Wiley New Y ork, 1992.
[29] Xudong Cao, Da vid Wipf, F ang W en, Genquan Duan, and Jian Sun. A practical
transfer learning algorithm for face v erification. In Pr o c e e dings of the IEEE
International Confer enc e on Computer Vision , pages 3208–3215, 2013.
[30] Gail A Carp en ter and Stephen Grossb erg. Art 2: Self-organization of stable cat-
egory recognition co des for analog input patterns. Applie d optics , 26(23):4919–
4930, 1987.
[31] Gail A Carp en ter, Stephen Grossb erg, and John H Reynolds. Artmap: Su-
p ervised real-time learning and classification of nonstationary data b y a self-
organizing neural net w ork. Neur al networks , 4(5):565–588, 1991.
[32] Dong Chen, Xudong Cao, Liw ei W ang, F ang W en, and Jian Sun. Ba y esian face
revisited: A join t form ulation. In Eur op e an Confer enc e on Computer Vision ,
pages 566–579. Springer, 2012.
[33] Dong Chen, Xudong Cao, F ang W en, and Jian Sun. Blessing of dimensionalit y:
High-dimensional feature and its efficien t compression for face v erification. In
2013 IEEE Confer enc e on Computer Vision and Pattern R e c o gnition , pages
3025–3032. IEEE, 2013.
[34] Xiao w ei Chen, Ulric h Leisc hner, Nathalie L Ro c hefort, Israel Nelk en, and
Arth ur K onnerth. F unctional mapping of single spines in cortical neurons in
viv o. Natur e , 475(7357):501, 2011.
[35] Gary E Christensen, Ric hard D Rabbitt, Mic hael I Miller, et al. Deformable
templates using large deformation kinematics. IEEE tr ansactions on image
pr o c essing , 5(10):1435–1447, 1996.
123
[36] Dan Ciregan, Ueli Meier, and Jürgen Sc hmidh ub er. Multi-column deep neural
net w orks for image classification. In Computer Vision and Pattern R e c o gnition
(CVPR), 2012 IEEE Confer enc e on , pages 3642–3649. IEEE, 2012.
[37] Dan C Cireşan, Ueli Meier, Jonathan Masci, Luca M Gam bardella, and Jürgen
Sc hmidh ub er. High-p erformance neural net w orks for visual ob ject classification.
arXiv pr eprint arXiv:1102.0183 , 2011.
[38] CD Clelland, M Choi, C Rom b erg, GD Clemenson, A F ragniere, P T y ers, S Jess-
b erger, LM Saksida, RA Bark er, FH Gage, et al. A functional role for adult hip-
p o campal neurogenesis in spatial pattern separation. Scienc e , 325(5937):210–
213, 2009.
[39] Gregory Cohen, Saeed Afshar, Jonathan T apson, and André v an Sc haik.
Emnist: an extension of mnist to handwritten letters. arXiv pr eprint
arXiv:1702.05373 , 2017.
[40] Jose Demisio Simo es da Silv a. The cascade-correlation neural net w ork gro wing
algorithm using the matlab en vironmen t, 1997.
[41] Na vneet Dalal and Bill T riggs. Histograms of orien ted gradien ts for h uman
detection. In Computer Vision and Pattern R e c o gnition, 2005. CVPR 2005.
IEEE Computer So ciety Confer enc e on , v olume 1, pages 886–893. IEEE, 2005.
[42] Hans P Op de Beec k, Katrien T orfs, and Johan W agemans. P erceiv ed shap e
similarit y among unfamiliar ob jects and the organization of the h uman ob ject
vision path w a y . Journal of Neur oscienc e , 28(40):10111–10123, 2008.
[43] F ernando De la T orre and Mic hael J Blac k. Robust principal comp onen t analysis
for computer vision. In Computer Vision, 2001. ICCV 2001. Pr o c e e dings. Eighth
IEEE International Confer enc e on , v olume 1, pages 362–369. IEEE, 2001.
[44] Jia Deng, W ei Dong, Ric hard So c her, Li-Jia Li, Kai Li, and Li F ei-F ei. Imagenet:
A large-scale hierarc hical image database. In Computer Vision and Pattern
R e c o gnition, 2009. CVPR 2009. IEEE Confer enc e on , pages 248–255. IEEE,
2009.
[45] Thomas G Dietteric hl. Ensem ble learning. 2002.
[46] LA Doumas and John E Hummel. Approac hes to mo deling h uman men tal rep-
resen tations: What w orks, what do esnâĂŹt and wh y . The Cambridge handb o ok
of thinking and r e asoning, e d. KJ Holyo ak & RG Morrison , pages 73–94, 2005.
[47] Leonidas AA Doumas, John E Hummel, and Catherine M Sandhofer. A theory
of the disco v ery and predication of relational concepts. Psycholo gic al r eview ,
115(1):1, 2008.
124
[48] Florian Dramas, Simon J Thorp e, and Christophe Jouffrais. Artificial vision
for the blind: a bio-inspired algorithm for ob jects and obstacles detection. In-
ternational Journal of Image and Gr aphics , 10(04):531–544, 2010.
[49] Dumitru Erhan, Y osh ua Bengio, Aaron Courville, Pierre-An toine Manzagol,
P ascal Vincen t, and Sam y Bengio. Wh y do es unsup ervised pre-training help
deep learning? Journal of Machine L e arning R ese ar ch , 11(F eb):625–660, 2010.
[50] Scott E F ahlman. F aster-learning v ariations of bac k-propagation: An empiri-
cal study . In Pr o c. 1988 Conne ctionist Mo dels Summer Scho ol , pages 38–51.
Morgan Kaufmann, 1988.
[51] Scott E F ahlman and Christian Lebiere. The cascade-correlation learning ar-
c hitecture. 1989.
[52] P edro F F elzenszw alb, Ross B Girshic k, and Da vid McAllester. Cascade ob-
ject detection with deformable part mo dels. In Computer vision and p attern
r e c o gnition (CVPR), 2010 IEEE c onfer enc e on , pages 2241–2248. IEEE, 2010.
[53] Daniel F ok, Janice Miller P olgar, Lynn Sha w, and Jeffrey W Jutai. Lo w vision
assistiv e tec hnology device usage and imp ortance in daily o ccupations. W ork,
39(1):37–48, 2011.
[54] Marcus F rean. The upstart algorithm: A metho d for constructing and training
feedforw ard neural net w orks. Neur al c omputation , 2(2):198–209, 1990.
[55] Simone F rin trop. B the viola-jones classifier. In VOCUS: A Visual Attention
System for Obje ct Dete ction and Go al-Dir e cte d Se ar ch , pages 193–197. Springer,
2006.
[56] Bernd F ritzk e. Gro wing cell structuresâĂ Ť a self-organizing net w ork for unsu-
p ervised and sup ervised learning. Neur al networks , 7(9):1441–1460, 1994.
[57] Bernd F ritzk e. A gro wing neural gas net w ork learns top ologies. In A dvanc es in
neur al information pr o c essing systems , pages 625–632, 1995.
[58] Kunihik o F ukushima. Neo cognitron–a self-organizing neural net w ork mo del
for a mec hanism of pattern recognition unaffected b y shift in p osition. NHK
æŤ¿éĂĄ çğŚåŋę å§žçďŐçăŤçľűæĽĂ åăśåŚŁ , (15):p106–115, 1981.
[59] Kim b erly Gerro w and An toine T riller. Synaptic stabilit y and plasticit y in a
floating w orld. Curr ent opinion in neur obiolo gy , 20(5):631–639, 2010.
[60] F ederico Girosi and T omaso P oggio. Represen tation prop erties of net w orks:
K olmogoro v’s theorem is irrelev an t. Neur al Computation , 1(4):465–469, 1989.
[61] Ian Go o dfello w, Honglak Lee, Quo c V Le, Andrew Saxe, and Andrew Y Ng.
Measuring in v ariances in deep net w orks. In A dvanc es in neur al information
pr o c essing systems , pages 646–654, 2009.
125
[62] Ian J Go o dfello w, Jonathon Shlens, and Christian Szegedy . Explaining and
harnessing adv ersarial examples. corr (2015).
[63] Xa vi Gratal, Ja vier Romero, Jeannette Bohg, and Danica Kragic. Visual ser-
v oing on unkno wn ob jects. Me chatr onics , 22(4):423–435, 2012.
[64] Christine Grien b erger, Xiao w ei Chen, and Arth ur K onnerth. Dendritic function
in viv o. T r ends in Neur oscienc es , 38(1):45–54, 2015.
[65] P atric k J Grother. Nist sp ecial database 19 handprin ted forms and c haracters
database. National Institute of Standar ds and T e chnolo gy , 1995.
[66] Douglas M Ha wkins. The problem of o v erfitting. Journal of chemic al informa-
tion and c omputer scienc es , 44(1):1–12, 2004.
[67] Kenneth J Ha yw orth and Irving Biederman. Neural evidence for in termediate
represen tations in ob ject recognition. Vision r ese ar ch , 46(23):4024–4031, 2006.
[68] Kaiming He, Xiangyu Zhang, Shao qing Ren, and Jian Sun. Spatial p yramid
p o oling in deep con v olutional net w orks for visual recognition. In Eur op e an
c onfer enc e on c omputer vision , pages 346–361. Springer, 2014.
[69] Kaiming He, Xiangyu Zhang, Shao qing Ren, and Jian Sun. Deep residual
learning for image recognition. arXiv pr eprint arXiv:1512.03385 , 2015.
[70] Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard
Sc holk opf. Supp ort v ector mac hines. IEEE Intel ligent Systems and their appli-
c ations , 13(4):18–28, 1998.
[71] Rob ert Hec h t-Nielsen. Theory of the bac kpropagation neural net w ork. In Neur al
networks for p er c eption , pages 65–93. Elsevier, 1992.
[72] Mic hael L Hines and Nic holas T Carnev ale. The neuron sim ulation en vironmen t.
Neur al c omputation , 9(6):1179–1209, 1997.
[73] Geoffrey E Hin ton. Deep b elief net w orks. Scholarp e dia , 4(5):5947, 2009.
[74] Geoffrey E Hin ton. A practical guide to training restricted b oltzmann mac hines.
In Neur al networks: T ricks of the tr ade , pages 599–619. Springer, 2012.
[75] Geoffrey E Hin ton and Ruslan R Salakh utdino v. Reducing the dimensionalit y
of data with neural net w orks. scienc e , 313(5786):504–507, 2006.
[76] Geoffrey E Hin ton, T errence J Sejno wski, et al. Learning and relearning in
b oltzmann mac hines. Par al lel distribute d pr o c essing: Explor ations in the mi-
cr ostructur e of c o gnition , 1:282–317, 1986.
[77] Geoffrey E Hin ton, Nitish Sriv asta v a, Alex Krizhevsky , Ily a Sutsk ev er, and Rus-
lan R Salakh utdino v. Impro ving neural net w orks b y prev en ting co-adaptation
of feature detectors. arXiv pr eprint arXiv:1207.0580 , 2012.
126
[78] Y oshio Hirose, K oic hi Y amashita, and Shimp ei Hijiy a. Bac k-propagation algo-
rithm whic h v aries the n um b er of hidden units. Neur al Networks , 4(1):61–66,
1991.
[79] Sepp Ho c hreiter. The v anishing gradien t problem during learning recurren t neu-
ral nets and problem solutions. International Journal of Unc ertainty, F uzziness
and Know le dge-Base d Systems , 6(02):107–116, 1998.
[80] Andreas Hub, Tim Hartter, and Thomas Ertl. In teractiv e trac king of mo v-
able ob jects for the blind on the basis of en vironmen t mo dels and p erception-
orien ted ob ject recognition metho ds. In Pr o c e e dings of the 8th international
ACM SIGACCESS c onfer enc e on Computers and ac c essibility , pages 111–118.
A CM, 2006.
[81] John E Hummel and Irving Biederman. Dynamic binding in a neural net w ork
for shap e recognition. Psycholo gic al r eview , 99(3):480, 1992.
[82] Da vid Hun ter, Hao Y u, Mic hael S Pukish I I I, Jan usz K olbusz, and Bogdan M
Wilamo wski. Selection of prop er neural net w ork sizes and arc hitectures?a com-
parativ e study . IEEE T r ansactions on Industrial Informatics , 8(2):228–240,
2012.
[83] Jenq-Neng Hw ang, Shih-Shien Y ou, Sh yh-Rong La y , and I-Chang Jou. The
cascade-correlation learning: A pro jection pursuit learning p ersp ectiv e. IEEE
T r ansactions on Neur al Networks , 7(2):278–289, 1996.
[84] Sergey Ioffe and Christian Szegedy . Batc h normalization: A ccelerating
deep net w ork training b y reducing in ternal co v ariate shift. arXiv pr eprint
arXiv:1502.03167 , 2015.
[85] Monik a Jadi, Alon P olsky , Jac kie Sc hiller, and Bartlett W Mel. Lo cation-
dep enden t effects of inhibition on lo cal spiking in p yramidal neuron dendrites.
PL oS c omputational biolo gy , 8(6):e1002550, 2012.
[86] Monik a P Jadi, Bardia F Behabadi, Alon P oleg-P olsky , Jac kie Sc hiller, and
Bartlett W Mel. An augmen ted t w o-la y er mo del captures nonlinear analog
spatial in tegration effects in p yramidal neuron dendrites. Pr o c e e dings of the
IEEE, 102(5):782–798, 2014.
[87] Kevin Jarrett, K ora y Ka vuk cuoglu, Y ann LeCun, et al. What is the b est m ulti-
stage arc hitecture for ob ject recognition? In Computer Vision, 2009 IEEE 12th
International Confer enc e on , pages 2146–2153. IEEE, 2009.
[88] Sebastian Jessb erger, Rob ert E Clark, Nicola J Broadb en t, Gregory D Clemen-
son, An tonella Consiglio, D Chic h ung Lie, Larry R Squire, and F red H Gage.
Den tate gyrus-sp ecific kno c kdo wn of adult neurogenesis impairs spatial and ob-
ject recognition memory in adult rats. L e arning & memory , 16(2):147–154,
2009.
127
[89] Y angqing Jia, Ev an Shelhamer, Jeff Donah ue, Sergey Kara y ev, Jonathan Long,
Ross Girshic k, Sergio Guadarrama, and T rev or Darrell. Caffe: Con v olutional
arc hitecture for fast feature em b edding. In Pr o c e e dings of the 22nd ACM inter-
national c onfer enc e on Multime dia , pages 675–678. A CM, 2014.
[90] Xiao yi Jiang and Daniel Mo jon. A daptiv e lo cal thresholding b y v erification-
based m ultithreshold probing with application to v essel detection in retinal
images. IEEE T r ansactions on Pattern Analysis and Machine Intel ligenc e ,
25(1):131–137, 2003.
[91] L Jin, BF Behabadi, and BW Mel. Dimensionalit y of dendritic computation.
In Pr o c. Neur osci. Me eting Planner , 2012.
[92] Simon J Julier and Jeffrey K Uhlmann. New extension of the k alman filter to
nonlinear systems. In Signal pr o c essing, sensor fusion, and tar get r e c o gnition
VI, v olume 3068, pages 182–194. In ternational So ciet y for Optics and Photonics,
1997.
[93] Eric R Kandel, James H Sc h w artz, Thomas M Jessell, Stev en A Siegelbaum,
and AJ Hudsp eth. Principles of neur al scienc e , v olume 4. McGra w-hill New
Y ork, 2000.
[94] Greet Ka y aert, Irving Biederman, and Rufin V ogels. Shap e tuning in macaque
inferior temp oral cortex. the Journal of Neur oscienc e , 23(7):3016–3027, 2003.
[95] Y an Ke and Rah ul Sukthank ar. Pca-sift: A more distinctiv e represen tation for
lo cal image descriptors. In Computer Vision and Pattern R e c o gnition, 2004.
CVPR 2004. Pr o c e e dings of the 2004 IEEE Computer So ciety Confer enc e on ,
v olume 2, pages I I–506. IEEE, 2004.
[96] Sey ed-Mahdi Khaligh-Raza vi and Nik olaus Kriegesk orte. Deep sup ervised, but
not unsup ervised, mo dels ma y explain it cortical represen tation. PL oS Comput
Biol , 10(11):e1003915, 2014.
[97] Christof K o c h, T omaso P oggio, and Vincen t T orre. Retinal ganglion cells: a
functional in terpretation of dendritic morphology . Phil. T r ans. R. So c. L ond.
B, 298(1090):227–263, 1982.
[98] Christof K o c h, T omaso P oggio, and Vincen t T orre. Nonlinear in teractions in
a dendritic tree: lo calization, timing, and role in information pro cessing. Pr o-
c e e dings of the National A c ademy of Scienc es , 80(9):2799–2802, 1983.
[99] T euv o K ohonen. The self-organizing map. Pr o c e e dings of the IEEE , 78(9):1464–
1480, 1990.
[100] Alex Krizhevsky and Geoffrey Hin ton. Learning m ultiple la y ers of features from
tin y images. T ec hnical rep ort, Citeseer, 2009.
128
[101] Alex Krizhevsky , Ily a Sutsk ev er, and Geoffrey E Hin ton. Imagenet classification
with deep con v olutional neural net w orks. In A dvanc es in neur al information
pr o c essing systems , pages 1097–1105, 2012.
[102] Jonas Kubilius. A framew ork for streamlining researc h w orkflo w in neuroscience
and psyc hology . F r ontiers in neur oinformatics , 7:52, 2014.
[103] Jonas Kubilius, Annelies Baec k, Johan W agemans, and Hans P Op de Beec k.
Brain-deco ding fmri rev eals ho w wholes relate to the sum of parts. Cortex ,
72:5–14, 2015.
[104] Jonas Kubilius, Stefania Bracci, and Hans P Op de Beec k. Deep neural net w orks
as a computational mo del for h uman shap e sensitivit y . PL oS Comput Biol ,
12(4):e1004896, 2016.
[105] V Kulykukin, Chaitan y a Gharpure, and Nathan DeGra w. Human-rob ot in ter-
action in a rob otic guide for the visually impaired. In AAAI Spring Symp osium ,
pages 158–164, 2004.
[106] Vladimir Kulyukin, Chaitan y a Gharpure, and John Nic holson. Rob o cart: T o-
w ard rob ot-assisted na vigation of gro cery stores b y the visually impaired. In
2005 IEEE/RSJ International Confer enc e on Intel ligent R ob ots and Systems ,
pages 2845–2850. IEEE, 2005.
[107] Jo yce W Lacy , Mic hael A Y assa, Shauna M Stark, L T ugan Muftuler, and
Craig EL Stark. Distinct pattern separation related transfer functions in h uman
ca3/den tate and ca1 rev ealed using high-resolution fmri and v ariable mnemonic
similarit y . L e arning & Memory , 18(1):15–18, 2011.
[108] Martin Lades, Jan C V orbruggen, Joac him Buhmann, Jörg Lange, Christoph
V on Der Malsburg, Rolf P W urtz, and W olfgang K onen. Distortion in v arian t
ob ject recognition in the dynamic link arc hitecture. IEEE T r ansactions on
c omputers , (3):300–311, 1993.
[109] Matthew E Larkum, Thomas Nevian, Ma y a Sandler, Alon P olsky , and Jac kie
Sc hiller. Synaptic in tegration in tuft dendrites of la y er 5 p yramidal neurons: a
new unifying principle. Scienc e , 325(5941):756–760, 2009.
[110] Matthew E Larkum and J Julius Zh u. Signaling of la y er 1 and whisk er-ev ok ed
ca2+ and na+ action p oten tials in distal and terminal dendrites of rat neo-
cortical p yramidal neurons in vitro and in viv o. Journal of neur oscienc e ,
22(16):6991–7005, 2002.
[111] Hugo Laro c helle, Dumitru Erhan, Aaron Courville, James Bergstra, and Y osh ua
Bengio. An empirical ev aluation of deep arc hitectures on problems with man y
factors of v ariation. In Pr o c e e dings of the 24th international c onfer enc e on
Machine le arning , pages 473–480. A CM, 2007.
129
[112] Maria La vzin, Sophia Rap op ort, Alon P olsky , Liora Garion, and Jac kie Sc hiller.
Nonlinear dendritic pro cessing determines angular tuning of barrel cortex neu-
rons in viv o. Natur e , 490(7420):397, 2012.
[113] Stev e La wrence, C Lee Giles, Ah Ch ung T soi, and Andrew D Bac k. F ace
recognition: A con v olutional neural-net w ork approac h. IEEE tr ansactions on
neur al networks , 8(1):98–113, 1997.
[114] Sv etlana Lazebnik, Cordelia Sc hmid, and Jean P once. Bey ond bags of features:
Spatial p yramid matc hing for recognizing natural scene categories. In nul l ,
pages 2169–2178. IEEE, 2006.
[115] Y ann LeCun, Bernhard Boser, John S Denk er, Donnie Henderson, Ric hard E
Ho w ard, W a yne Hubbard, and La wrence D Jac k el. Bac kpropagation applied to
handwritten zip co de recognition. Neur al c omputation , 1(4):541–551, 1989.
[116] Y ann LeCun, Léon Bottou, Y osh ua Bengio, and P atric k Haffner. Gradien t-
based learning applied to do cumen t recognition. Pr o c e e dings of the IEEE ,
86(11):2278–2324, 1998.
[117] Honglak Lee, Roger Grosse, Ra jesh Ranganath, and Andrew Y Ng. Con v o-
lutional deep b elief net w orks for scalable unsup ervised learning of hierarc hical
represen tations. In Pr o c e e dings of the 26th annual international c onfer enc e on
machine le arning , pages 609–616. A CM, 2009.
[118] Honglak Lee, Roger Grosse, Ra jesh Ranganath, and Andrew Y Ng. Unsu-
p ervised learning of hierarc hical represen tations with con v olutional deep b elief
net w orks. Communic ations of the ACM , 54(10):95–103, 2011.
[119] Honglak Lee, P eter Pham, Y an Largman, and Andrew Y Ng. Unsup ervised
feature learning for audio classification using con v olutional deep b elief net w orks.
In A dvanc es in neur al information pr o c essing systems , pages 1096–1104, 2009.
[120] Kenneth R Leslie, Sac ha B Nelson, and Gina G T urrigiano. P ostsynaptic de-
p olarization scales quan tal amplitude in cortical p yramidal neurons. Journal of
Neur oscienc e , 21(19):R C170–R C170, 2001.
[121] Dieter-Chic h ung Lie, Sophia A Colamarino, Hong-Jun Song, Lauren t Désiré,
Helena Mira, An tonella Consiglio, Edw ard S Lein, Sebastian Jessb erger,
Heather Lansford, Alejandro R Dearie, et al. W n t signalling regulates adult
hipp o campal neurogenesis. Natur e , 437(7063):1370, 2005.
[122] Min Lin, Qiang Chen, and Sh uic heng Y an. Net w ork in net w ork. arXiv pr eprint
arXiv:1312.4400 , 2013.
[123] Pierre Lison. âĂIJan in tro duction to mac hine learning, 2015.
130
[124] Guy Ma jor, Matthew E Larkum, and Jac kie Sc hiller. A ctiv e prop erties of
neo cortical p yramidal neuron dendrites. Annual r eview of neur oscienc e , 36:1–
24, 2013.
[125] Guy Ma jor, Alon P olsky , Winfried Denk, Jac kie Sc hiller, and Da vid W T ank.
Spatiotemp orally graded nmda spik e/plateau p oten tials in basal dendrites of
neo cortical p yramidal neurons. Journal of neur ophysiolo gy , 99(5):2584–2601,
2008.
[126] Mic hael McClosk ey and Neal J Cohen. Catastrophic in terference in connection-
ist net w orks: The sequen tial learning problem. In Psycholo gy of le arning and
motivation , v olume 24, pages 109–165. Elsevier, 1989.
[127] Mic hele Merler, Carolina Galleguillos, and Serge Belongie. Recognizing gro-
ceries in situ using in vitro training data. In 2007 IEEE Confer enc e on Com-
puter Vision and Pattern R e c o gnition , pages 1–8. IEEE, 2007.
[128] Daniel Mey er, T obias Bonho effer, and V olk er Sc heuss. Balance and stabilit y of
synaptic structures during synaptic plasticit y . Neur on , 82(2):430–443, 2014.
[129] Marc Mézard and Jean-P Nadal. Learning in feedforw ard la y ered net w orks: The
tiling algorithm. Journal of Physics A: Mathematic al and Gener al , 22(12):2191,
1989.
[130] T ak eru Miy ato, Shin-ic hi Maeda, Masanori K o y ama, Ken Nak ae, and Shin
Ishii. Distributional smo othing with virtual adv ersarial training. arXiv pr eprint
arXiv:1507.00677 , 2015.
[131] Shinic hiro Mizumatsu, Mic helle L Monje, Duncan R Morhardt, Radosla w Rola,
Theo D P almer, and John R Fik e. Extreme sensitivit y of adult neurogenesis to
lo w doses of x-irradiation. Canc er r ese ar ch , 63(14):4021–4027, 2003.
[132] Erez Na’aman, Amnon Shash ua, and Y onatan W exler. User w earable visual
assistance system, August 23 2012. US P aten t App. 13/397,919.
[133] Vino d Nair and Geoffrey E Hin ton. Rectified linear units impro v e restricted
b oltzmann mac hines. In Pr o c e e dings of the 27th international c onfer enc e on
machine le arning (ICML-10) , pages 807–814, 2010.
[134] Suranga Nana y akk ara, Ro y Shilkrot, and P attie Maes. Ey ering: a finger-w orn
assistan t. In CHI’12 Extende d Abstr acts on Human F actors in Computing Sys-
tems , pages 1961–1966. A CM, 2012.
[135] Y uv al Netzer, T ao W ang, A dam Coates, Alessandro Bissacco, Bo W u, and
Andrew Y Ng. Reading digits in natural images with unsup ervised feature
learning. In NIPS workshop on de ep le arning and unsup ervise d fe atur e le arning ,
v olume 2011, page 5, 2011.
131
[136] John Nic holson, Vladimir Kulyukin, and Daniel Coster. Shoptalk: indep enden t
blind shopping through v erbal route directions and barco de scans. The Op en
R ehabilitation Journal , 2(1):11–23, 2009.
[137] Stev an V Odri, Dusan P P etro v ac ki, and Gordana A Krstonosic. Ev olutional
dev elopmen t of a m ultilev el neural net w ork. Neur al Networks , 6(4):583–595,
1993.
[138] La wrence O ŠGorman. Binarization and m ultithresholding of do cumen t images
using connectivit y . CVGIP: Gr aphic al Mo dels and Image Pr o c essing , 56(6):494–
506, 1994.
[139] Sarah M P ark er and Thomas Serre. Unsup ervised in v ariance learning of trans-
formation sequences in a mo del of ob ject recognition yields selectivit y for non-
acciden tal prop erties. F r ontiers in c omputational neur oscienc e , 9, 2015.
[140] Omk ar M P arkhi, Andrea V edaldi, Andrew Zisserman, et al. Deep face recog-
nition. In BMVC, v olume 1, page 6, 2015.
[141] Romedi P assini and Guyltne Proulx. W a yfinding without vision an exp erimen t
with congenitally totally blind p eople. Envir onment and Behavior , 20(2):227–
252, 1988.
[142] Betsy Phillips and Hongxin Zhao. Predictors of assistiv e tec hnology abandon-
men t. Assistive te chnolo gy , 5(1):36–45, 1993.
[143] P ana yiota P oirazi, T errence Brannon, and Bartlett W Mel. Pyramidal neuron
as t w o-la y er neural net w ork. Neur on , 37(6):989–999, 2003.
[144] Alon P olsky , Bartlett Mel, and Jac kie Sc hiller. Enco ding and deco ding bursts
b y nmda spik es in basal dendrites of la y er 5 p yramidal neurons. Journal of
Neur oscienc e , 29(38):11891–11903, 2009.
[145] Wilfrid Rall. Theoretical significance of dendritic trees for neuronal input-
output relations. Neur al the ory and mo deling , 7397, 1964.
[146] Rob ert V Riccio and Murra y A Matthews. Effects of in trao cular tetro doto xin
on dendritic spines in the dev eloping rat visual cortex: a golgi analysis. Devel-
opmental Br ain R ese ar ch , 19(2):173–182, 1985.
[147] Maximilian Riesenh ub er and T omaso P oggio. Hierarc hical mo dels of ob ject
recognition in cortex. Natur e neur oscienc e , 2(11):1019, 1999.
[148] Maximilian Riesenh ub er and T omaso P oggio. Neural mec hanisms of ob ject
recognition. Curr ent opinion in neur obiolo gy , 12(2):162–168, 2002.
[149] Irina Rish et al. An empirical study of the naiv e ba y es classifier. In IJCAI 2001
workshop on empiric al metho ds in artificial intel ligenc e , v olume 3, pages 41–46.
IBM New Y ork, 2001.
132
[150] Joaquim S Ro drigues and Luis B Almeida. Impro ving the learning sp eed in
top ological maps of patterns. In International Neur al Network Confer enc e ,
pages 813–816. Springer, 1990.
[151] F rank Rosen blatt. The p er c eptr on, a p er c eiving and r e c o gnizing automaton
Pr oje ct Par a . Cornell A eronautical Lab oratory , 1957.
[152] F rank Rosen blatt. The p erceptron: a probabilistic mo del for information stor-
age and organization in the brain. Psycholo gic al r eview , 65(6):386, 1958.
[153] Ethan Rublee, Vincen t Rabaud, Kurt K onolige, and Gary Bradski. Orb: An
efficien t alternativ e to sift or surf. In Computer Vision (ICCV), 2011 IEEE
international c onfer enc e on , pages 2564–2571. IEEE, 2011.
[154] Olga Russak o vsky , Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean
Ma, Zhiheng Huang, Andrej Karpath y , A dit y a Khosla, Mic hael Bernstein, et al.
Imagenet large scale visual recognition c hallenge. International Journal of Com-
puter Vision , 115(3):211–252, 2015.
[155] Amar Saha y , Kim b erly N Scobie, Alexis S Hill, Colin M O’carroll, Mazen A
Kheirb ek, Nesha S Burghardt, André A F en ton, Alex Drano vsky , and René
Hen. Increasing adult hipp o campal neurogenesis is sufficien t to impro v e pattern
separation. Natur e , 472(7344):466, 2011.
[156] Amar Saha y , Donald A Wilson, and René Hen. P attern separation: a com-
mon function for new neurons in hipp o campus and olfactory bulb. Neur on ,
70(4):582–588, 2011.
[157] T ara N Sainath, Brian Kingsbury , Ab del-rahman Mohamed, George E Dahl,
George Saon, Hagen Soltau, T omas Beran, Aleksandr Y Ara vkin, and Bh u-
v ana Ramabhadran. Impro v emen ts to deep con v olutional neural net w orks for
lv csr. In Automatic Sp e e ch R e c o gnition and Understanding (ASRU), 2013 IEEE
W orkshop on , pages 315–320. IEEE, 2013.
[158] Ruslan Salakh utdino v and Hugo Laro c helle. Efficien t learning of deep b oltz-
mann mac hines. In Pr o c e e dings of the thirte enth international c onfer enc e on
artificial intel ligenc e and statistics , pages 693–700, 2010.
[159] R Sath y a and Annamma Abraham. Comparison of sup ervised and unsup ervised
learning algorithms for pattern classification. International Journal of A dvanc e d
R ese ar ch in Artificial Intel ligenc e , 2(2):34–38, 2013.
[160] Dominik Sc herer, Andreas Müller, and Sv en Behnk e. Ev aluation of p o oling
op erations in con v olutional arc hitectures for ob ject recognition. In Artificial
Neur al Networks–ICANN 2010 , pages 92–101. Springer, 2010.
[161] Jürgen Sc hmidh ub er. Deep learning in neural net w orks: An o v erview. Neur al
networks , 61:85–117, 2015.
133
[162] Scien tist47. Homograph y-transl. In Wikime dia.or g , 2008.
[163] Thomas Serre, Aude Oliv a, and T omaso P oggio. A feedforw ard arc hitecture ac-
coun ts for rapid categorization. Pr o c e e dings of the national ac ademy of scienc es ,
104(15):6424–6429, 2007.
[164] Gordon M Shepherd and Rob ert K Bra yton. Logic op erations are prop erties
of computer-sim ulated in teractions b et w een excitable dendritic spines. Neur o-
scienc e , 21(1):151–165, 1987.
[165] P atrice Y Simard, Da vid Steinkraus, and John C Platt. Best practices for
con v olutional neural net w orks applied to visual do cumen t analysis. In ICDAR,
v olume 3, pages 958–962, 2003.
[166] Karen Simon y an and Andrew Zisserman. V ery deep con v olutional net w orks for
large-scale image recognition. arXiv pr eprint arXiv:1409.1556 , 2014.
[167] Sp encer L Smith, Ikuk o T Smith, Tiago Branco, and Mic hael Häusser. Den-
dritic spik es enhance stim ulus selectivit y in cortical neurons in viv o. Natur e ,
503(7474):115, 2013.
[168] P aul Smolensky . Information pro cessing in dynamical systems: F oundations of
harmon y theory . T ec hnical rep ort, COLORADO UNIV A T BOULDER DEPT
OF COMPUTER SCIENCE, 1986.
[169] Joan G Sno dgrass and Mary V anderw art. A standardized set of 260 pictures:
norms for name agreemen t, image agreemen t, familiarit y , and visual complexit y .
Journal of exp erimental psycholo gy: Human le arning and memory , 6(2):174,
1980.
[170] RF Sp encer and PD Coleman. Influence of selectiv e visual exp erience up on
morphological maturation of visual-cortex. In Anatomic al R e c or d , v olume 178,
pages 469–469. WILEY-LISS DIV JOHN WILEY & SONS INC, 605 THIRD
A VE, NEW YORK, NY 10158-0012, 1974.
[171] Larry R Squire, John T Wixted, and Rob ert E Clark. Recognition memory and
the medial temp oral lob e: a new p ersp ectiv e. Natur e R eviews Neur oscienc e ,
8(11):872, 2007.
[172] Nitish Sriv asta v a, Geoffrey Hin ton, Alex Krizhevsky , Ily a Sutsk ev er, and Rus-
lan Salakh utdino v. Drop out: a simple w a y to prev en t neural net w orks from
o v erfitting. The Journal of Machine L e arning R ese ar ch , 15(1):1929–1958, 2014.
[173] Nitish Sriv asta v a and Ruslan R Salakh utdino v. Multimo dal learning with deep
b oltzmann mac hines. In A dvanc es in neur al information pr o c essing systems ,
pages 2222–2230, 2012.
134
[174] Ella Striem-Amit, Miriam Guendelman, and Amir Amedi. ŒvisualŠacuit y of
the congenitally blind using visual-to-auditory sensory substitution. PloS one ,
7(3):e33136, 2012.
[175] Greg Stuart and Nelson Spruston. Determinan ts of v oltage atten uation in neo-
cortical p yramidal neuron dendrites. The Journal of neur oscienc e , 18(10):3501–
3510, 1998.
[176] Ric hard S Sutton and Andrew G Barto. R einfor c ement le arning: An intr o duc-
tion . MIT press, 2018.
[177] Christian Szegedy , W ei Liu, Y angqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelo v, Dumitru Erhan, Vincen t V anhouc k e, and Andrew Rabi-
no vic h. Going deep er with con v olutions. In Pr o c e e dings of the IEEE Confer enc e
on Computer Vision and Pattern R e c o gnition , pages 1–9, 2015.
[178] Christian Szegedy , W o jciec h Zarem ba, Ily a Sutsk ev er, Joan Bruna, Dumitru Er-
han, Ian Go o dfello w, and Rob F ergus. In triguing prop erties of neural net w orks.
arXiv pr eprint arXiv:1312.6199 , 2013.
[179] Ka v eri A Thak o or, Sophie Marat, P atric k J Nasiatk a, Ben P McIn tosh,
F urk an E Sahin, Armand R T angua y , James D W eiland, and Lauren t Itti. A t-
ten tion biased sp eeded up robust features (ab-surf ): A neurally-inspired ob ject
recognition algorithm for a w earable aid for the visually-impaired. In Multime-
dia and Exp o W orkshops (ICMEW), 2013 IEEE International Confer enc e on ,
pages 1–6. IEEE, 2013.
[180] Simon Thorp e, Denis Fize, and Catherine Marlot. Sp eed of pro cessing in the
h uman visual system. natur e , 381(6582):520, 1996.
[181] Simon J Thorp e and Mic hele F abre-Thorp e. Seeking categories in the brain.
Scienc e , 291(5502):260–263, 2001.
[182] Stephan T renn. Multila y er p erceptrons: appro ximation order and necessary
n um b er of hidden units. IEEE T r ansactions on Neur al Networks , 19(5):836–
844, 2008.
[183] Matthew A T urk and Alex P P en tland. F ace recognition using eigenfaces. In
Computer Vision and Pattern R e c o gnition, 1991. Pr o c e e dings CVPR’91., IEEE
Computer So ciety Confer enc e on , pages 586–591. IEEE, 1991.
[184] Ales Ude, Chris Gask ett, and Gordon Cheng. F o v eated vision systems with
t w o cameras p er ey e. In Pr o c e e dings 2006 IEEE International Confer enc e on
R ob otics and Automation, 2006. ICRA 2006. , pages 3457–3462. IEEE, 2006.
[185] Ultracane. Ultracane: Putting the w orld at y our fingertips,. In
http://www.ultr ac ane.c om/ab out_the_ultr ac ane , 2016.
135
[186] F acundo V alv erde. Structural c hanges in the area striata of the mouse after
en ucleation. Exp erimental br ain r ese ar ch , 5(4):274–292, 1968.
[187] P ascal Vincen t, Hugo Laro c helle, Y osh ua Bengio, and Pierre-An toine Manzagol.
Extracting and comp osing robust features with denoising auto enco ders. In
Pr o c e e dings of the 25th international c onfer enc e on Machine le arning , pages
1096–1103. A CM, 2008.
[188] A thanasios V oulo dimos, Nik olaos Doulamis, Anastasios Doulamis, and Eft y-
c hios Protopapadakis. Deep learning for computer vision: A brief review. Com-
putational intel ligenc e and neur oscienc e , 2018, 2018.
[189] Johan W agemans, Jo eri De Win ter, Hans Op de Beec k, Annemie Plo eger, T om
Bec k ers, and P eter V anro ose. Iden tification of ev eryda y ob jects on the basis of
silhouette and outline v ersions. Per c eption , 37(2):207–244, 2008.
[190] WHO. W orld health organization fact sheet. WHO, NÂř282, 2014.
[191] Bernard Widro w and Mic hael A Lehr. 30 y ears of adaptiv e neural net w orks: p er-
ceptron, madaline, and bac kpropagation. Pr o c e e dings of the IEEE , 78(9):1415–
1442, 1990.
[192] Bo dgan M Wilamo wski, Da vid Hun ter, and Aleksander Malino wski. Solving
parit y-n problems with feedforw ard neural net w orks. In Neur al Networks, 2003.
Pr o c e e dings of the International Joint Confer enc e on , v olume 4, pages 2546–
2551. IEEE, 2003.
[193] Laurenz Wisk ott, Malte J Rasc h, and Gerd Kemp ermann. A functional h yp oth-
esis for adult hipp o campal neurogenesis: a v oidance of catastrophic in terference
in the den tate gyrus. Hipp o c ampus , 16(3):329–343, 2006.
[194] Haibing W u and Xiao dong Gu. Max-p o oling drop out for regularization of con-
v olutional neural net w orks. In International Confer enc e on Neur al Information
Pr o c essing , pages 46–54. Springer, 2015.
[195] Xiang W u, Ran He, Zhenan Sun, and Tieniu T an. A ligh t cnn for deep face
represen tation with noisy lab els. IEEE T r ansactions on Information F or ensics
and Se curity , 13(11):2884–2896, 2018.
[196] Daniel LK Y amins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seib-
ert, and James J DiCarlo. P erformance-optimized hierarc hical mo dels predict
neural resp onses in higher visual cortex. Pr o c e e dings of the National A c ademy
of Scienc es , 111(23):8619–8624, 2014.
[197] Jian Y ang, Da vid Zhang, Alejandro F F rangi, and Jing-yu Y ang. T w o-
dimensional p ca: a new approac h to app earance-based face represen tation and
recognition. IEEE tr ansactions on p attern analysis and machine intel ligenc e ,
26(1):131–137, 2004.
136
[198] Xin Y ao. Ev olving artificial neural net w orks. Pr o c e e dings of the IEEE ,
87(9):1423–1447, 1999.
[199] Lauren t Y ounes. On the con v ergence of mark o vian sto c hastic algorithms with
rapidly decreasing ergo dicit y rates. Sto chastics: An International Journal of
Pr ob ability and Sto chastic Pr o c esses , 65(3-4):177–228, 1999.
[200] Sergey Zagoruyk o and Nik os K omo dakis. Wide residual net w orks. arXiv
pr eprint arXiv:1605.07146 , 2016.
[201] Matthew D Zeiler and Rob F ergus. Sto c hastic p o oling for regularization of deep
con v olutional neural net w orks. arXiv pr eprint arXiv:1301.3557 , 2013.
[202] Matthew D Zeiler and Rob F ergus. Visualizing and understanding con v olu-
tional net w orks. In Eur op e an Confer enc e on Computer Vision , pages 818–833.
Springer, 2014.
[203] X Zhao and SH Ong. A daptiv e lo cal thresholding with fuzzy-v alidit y-guided
spatial partitioning. In Pattern R e c o gnition, 1998. Pr o c e e dings. F ourte enth In-
ternational Confer enc e on , v olume 2, pages 988–990. IEEE, 1998.
[204] Haining Zhong, Gek-Ming Sia, T ak ashi R Sato, Noah W Gra y , Tian yi Mao,
Zaza Kh uc h ua, Ric hard L Huganir, and Karel Sv ob o da. Sub cellular dynamics
of t yp e ii pk a in neurons. Neur on , 62(3):363–374, 2009.
137
Abstract (if available)
Abstract
In this thesis two biologically inspired projects are designed and implemented in the field of computer vision. The first relies on human biology’s superior decision making and mobility to work in a human-in-the-loop system. This system was able to achieve perfect accuracy on a task where the only external perception was with a digital camera. The second project is a modification to modern artificial neural networks. It is founded on the idea that biological dendrites perform complex nonlinear computations of presynaptic input prior to that input reaching the cell body. The backpropagation of traditional neural network learning is modified in a paradigm we call “Perforated Backpropagation” to only flow through a subset of nodes representing neurons, while skipping nodes representing dendrite branches which can learn through a different mechanism.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Neuroscience inspired algorithms for lifelong learning and machine vision
PDF
Biologically inspired mobile robot vision localization
PDF
Learning invariant features in modulatory neural networks through conflict and ambiguity
PDF
Dendritic computation and plasticity in neuromorphic circuits
PDF
Exploring complexity reduction in deep learning
PDF
Synaptic integration in dendrites: theories and applications
PDF
Building straggler-resilient and private machine learning systems in the cloud
PDF
Pretraining transferable encoders for visual navigation using unlabeled datasets
PDF
Computational intelligence: prediction, control and memory in artificial and biological agents
PDF
Object localization with deep learning techniques
PDF
Deep representations for shapes, structures and motion
PDF
Interaction between Artificial Intelligence Systems and Primate Brains
PDF
Spatiotemporal processing of saliency signals in the primate: a behavioral and neurophysiological investigation
PDF
Neural networks for narrative continuation
PDF
The representation of medial axes in the perception of shape
PDF
Fast and label-efficient graph representation learning
PDF
A data-driven approach to image splicing localization
PDF
Learning to diagnose from electronic health records data
PDF
Graph embedding algorithms for attributed and temporal graphs
PDF
Functional consequences of network architecture in rat hippocampus: a computational study
Asset Metadata
Creator
Brenner, Rorry Austin (author)
Core Title
Biologically inspired approaches to computer vision
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Neuroscience
Publication Date
04/26/2019
Defense Date
11/08/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
artificial neural networks,artificial neurogenesis,cascade correlation,computer vision,convolutional neural networks,deep learning,dendritic integration,OAI-PMH Harvest,object detection
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Mel, Bartlett (
committee chair
), Biederman, Irving (
committee member
), Itti, Laurent (
committee member
)
Creator Email
rabrenne19@gmail.com,Rorry.Brenner@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-147573
Unique identifier
UC11659983
Identifier
etd-BrennerRor-7280.pdf (filename),usctheses-c89-147573 (legacy record id)
Legacy Identifier
etd-BrennerRor-7280.pdf
Dmrecord
147573
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Brenner, Rorry Austin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
artificial neural networks
artificial neurogenesis
cascade correlation
computer vision
convolutional neural networks
deep learning
dendritic integration
object detection