Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Eye-trace signatures of clinical populations under natural viewing
(USC Thesis Other)
Eye-trace signatures of clinical populations under natural viewing
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EYE-TRACE SIGNATURES OF CLINICAL POPULATIONS
UNDER NATURAL VIEWING
by
Po-He Tseng
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2012
Copyright 2012 Po-He Tseng
Dedication
To my friends and families, especially my mom in heaven.
ii
Acknowledgements
Accomplishment rarely comes without the support of a team, and my PhD study
was no exception. First, I would like to thank my advisor, Dr. Laurent Itti, who
has set up a great research environment. There is a Chinese saying that perfectly
describes my advisor: ”still water runs deep; knowledgable people speak softly”.
I really appreciate his patience, his character, and the ’excellent’ attitude, which
allowed me to freely explore not only my research topics but also my personal
interests. I am also impressed by the way he constructs arguments and spins things
in a positive manner, and these are the things I will continue to learn.
I would also like to thank all my collaborators, Dr. Ian Cameron, Dr. Doug
Munoz, Dr. James Reynolds, and Dr. Pari Giovanna for making my work possible.
It is such a great experience working with them, and they definitely made me feel
very welcome whenever I visited Kingston, Canada. I would like to thank Ian in
particular. He ran my experiment on children that were uncontrollable, provided
useful insights in interpreting the results, revised manuscripts in a timely fashion,
and always responded immediately whenever I needed some help.
Next, I want to thank all iLab members who helped create an intellectual but
relaxing research environment, and they certainly made my PhD a great experi-
ence. I would like to thank soon Dr. David Berg and soon Dr. Farhan Baluch in
particular. We have fought together pretty much for our whole PhD career. We
iii
not only share the same office, but also lots of good times and bad times. This is
a time that I will never forget in my life.
I would also like to thank all my friends in LA, especially PYPD (Peter-Yili
families), CS gang (Shelly, I-Ting, Michael, Ethan), and Ben Malcolmson. Peter
and Yili relentlessly took me to church every Sunday, even when I didn’t want to.
Without their efforts, I could have been departed from god. Besides taking me to
the church, they took me to most of the interesting places in Southern California.
They never left me behind and never mind me being the third wheel. I really
appreciate that. For CS gangs, we had so much laughter whenever we gather. For
Ben, I can’t thank you enough to give me the best distraction of my PhD career
- photographing USC football and many other USC sports. I can’t think of any
better way to learn American culture besides watching Southpark. Thank you all
for making my life in LA colorful.
I thank my families Kuo-Lieh Tseng, Chin-Mei Lin, Hsiu-Ching Tseng, Po-Yen
Tseng, Amy Chin, and I-Ping Liao. Despite the great distance between Taiwan
and USA, your unconditional love and support are always here with me. My PhD
degree could never be accomplished without you, and the title Dr. Tseng is for all
of you.
IwouldliketothankmygirlfriendYu-ChinChiuforhersupport,love,tolerance,
and advice. I treasure all the moments with you, I am proud of you, and I am
looking forward to our journey in the future.
At last, I thank the National Science Foundation (CRCNS grant number BCS-
0827764),theArmyResearchOffice(grantsnumberW911NF-08-1-0360andW911NF-
11-1-0046),theHumanFrontierScienceProgram(grantRGP0039/2005-C),andthe
Canadian Institutes of Health Research (grant number ELA 80227) for supporting
my studies.
iv
Table of Contents
Dedication ii
Acknowledgements iii
List of Tables viii
List of Figures ix
Abstract xv
Chapter 1: General Introduction 1
1.1 Visual Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Bottom-up attention . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Top-down attention . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Neural circuitry of visual attention . . . . . . . . . . . . . . 5
1.1.3.1 Dorsal frontoparietal network . . . . . . . . . . . . 6
1.1.3.2 Ventral frontoparietal network . . . . . . . . . . . . 7
1.1.3.3 Subcortical network . . . . . . . . . . . . . . . . . 8
1.2 Disorders Impair Visual Attention . . . . . . . . . . . . . . . . . . . 9
1.2.1 Parkinson’s Disease . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Attention Deficit Hyperactivity Disorder . . . . . . . . . . . 11
1.2.3 Fetal Alcohol Spectrum Disorder . . . . . . . . . . . . . . . 13
1.3 Computational Models of Visual Attention . . . . . . . . . . . . . . 14
1.3.1 The goal: identify regions of interest . . . . . . . . . . . . . 15
1.3.2 Psychophysical theories of visual attention . . . . . . . . . . 15
1.3.3 Bottom-up attention models . . . . . . . . . . . . . . . . . . 17
1.3.4 Top-down attention models . . . . . . . . . . . . . . . . . . 20
1.4 Emergent Experimental Stimuli to Study Visual Attention: Natural
Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4.1 Why use natural scene stimuli? . . . . . . . . . . . . . . . . 24
1.4.2 Neuroscience community . . . . . . . . . . . . . . . . . . . . 24
1.4.3 Psychology community . . . . . . . . . . . . . . . . . . . . . 25
1.4.4 Computer Vision Community . . . . . . . . . . . . . . . . . 26
v
1.5 Identifying Individuals with Attention Deficits from Their Natural
Viewing Eye Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6 Document Organization . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 2: High-Throughput Classification of Human Clinical Pop-
ulations from Eye Movements while Viewing Natural Scenes 32
2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.1 Standard protocol approvals and patient consent . . . . . . . 37
2.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.4 Classification and Feature Selection . . . . . . . . . . . . . . 43
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.1 Classifying PD and Controls . . . . . . . . . . . . . . . . . . 44
2.4.2 Classifying ADHD, FASD and Control Children . . . . . . . 47
2.4.3 Classification accuracies throughout the process of feature
elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4.4 Sub-features selected by the SVM-RFE process . . . . . . . 51
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6 Neurological Implications . . . . . . . . . . . . . . . . . . . . . . . 56
2.6.1 Hypotheses and Choice of Feature Categories . . . . . . . . 56
2.6.2 Parkinson’s Disease (PD). . . . . . . . . . . . . . . . . . . . 58
2.6.3 ADHD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.6.4 FASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.6.5 ADHD versus FASD . . . . . . . . . . . . . . . . . . . . . . 66
2.7 Methods in Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Chapter 3: Differentiating Children with FASD from Age-matched
Controls by Natural Viewing Patterns 81
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.2.1 Standard protocal approvals and patient consent . . . . . . . 83
3.2.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.2.3 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.2.4 Data acquisition. . . . . . . . . . . . . . . . . . . . . . . . . 87
3.2.5 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Chapter 4: General Discussion 96
4.1 Understandingdisordersofbroadclinicalspectrawithmachinelearn-
ing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
vi
4.2 Representing eye movement traces . . . . . . . . . . . . . . . . . . . 97
4.3 Correlating natural viewing behavior with functional and structural
imaging data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.4 Assisting confirmatory diagnoses . . . . . . . . . . . . . . . . . . . . 99
References 100
Appendix A
The Impact of Maturation and Aging on Mechanisms of Attentional Se-
lection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Appendix B
Factors that Guide Attentional Allocation during Natural Viewing . . . . 129
vii
List of Tables
2.1 Demographricdataofparticipantsinanalysis(afterremovingineligi-
ble participants). ODD, Oppositional Defiant Disorder; DD, Devel-
opmental Delay; LD, Learning Disability; NS, Noonan’s Syndrome;
MMR, Mild Mental Retardation. †Participants were not required to
finish the entire 20-minute-long experiment in order to be included.
‡For the 2 child clinical populations, ’None’ meant the child had
never taken medicine for the disorder. If they took medicine regu-
larly but not on the day of the experiment, they were listed in the
table. If they took medicine on the day of experiment, they were
removed from any analysis. For the PD population, even they took
medicine on the day of experiment they were listed in the table and
were included in analysis. . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Look up table for feature abbreviation, whose format is [feature]
[measure] [saccade]. †after onset of snippets . . . . . . . . . . . . . 42
3.1 DemographicdataofchildrenwithFASDandcontrolchildren. OCD:
obsessive-compulsive disorder. . . . . . . . . . . . . . . . . . . . . 86
viii
List of Figures
1.1 Neural circuitry of visual attention. The retina sends visual infor-
mation to the SC and the visual areas. The visual information then
travelsfromthevisualareastothedorsalfrontoparietalnetworkand
the ventral frontoparietal network. The two networks not only mod-
ulate the visual areas, but also extensively interact with the sub-
cortical network, which controls the eyes. LGN, lateral geniculate
nucleus; MT, medial temporal; IT, inferior temporal; IPS, intrapari-
etal sulcus; LIP, lateral intraparietal; FEF, frontal eye field; SFG,
superior frontal gyrus; PFC, prefrontal cortex; MFG, medial frontal
gyrus; VFC, ventral frontal gyrus; TPJ, temporal parietal junction;
SC, superior colliculus. (see section 1.1.3 for the function of these
regions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Parkinson’s disease degenerates dopaminergic neurons in the sub-
stantia nigra pars compacta (SNpc) in the basal ganglia. . . . . . . 10
1.3 (a) Attention Deficit Hyperactivity Disorder (ADHD) impairs the
processes and communications in the frontal cortex and/or basal
ganglia. (b) Fetal Alcohol Spectrum Disorder (FASD) influences
the brain globally and results in malformation of brain structures,
reduction of overall brain size, and decrease of white-matter volumes. 12
1.4 (a) The Koch-Ullman attention model (Adapted from [109]). (b)
The Guided Search Model by Wolfe (Reprinted from [233] with kind
permission from Springer Science and Business Media). . . . . . . . 16
1.5 Taxonomy of visual attention models by Borji and Itti (Reprinted
from [20] c
2012 IEEE). . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Itti’s saliency model (Reprinted from [95] c
1998 IEEE). . . . . . . 19
1.7 (a) Contextual guidance model by Torralba and Oliva (Reprinted
from [214] with permission). (b) Task-driven model by Peters and
Itti (Reprinted from [162] c
2007 IEEE). . . . . . . . . . . . . . . . 22
ix
2.1 Experimental and classification paradigms. (a) Participants freely
viewed scene-shuffled videos (SV) and their eye movements were
recorded. Saliency maps of each SV were computed using a com-
putational model that mimics early visual processing. Next, we used
the recorded eye movements to compute (1) oculomotor-based sac-
cade metrics, (2) saliency-based correlations between saliency maps
and gaze (bottom-up attention), and (3) group-based similarities in
spatiotemporal distributions of gaze with reference to a database of
control eye traces (top-down attention). These features were used
in a classifier with a recursive feature selection method to identify
important features that distinguished populations. (b) Ten saliency
maps of different features (color, intensity, etc.) were computed,
here illustrated for the video frame shown in (a) under “Saliency
maps”. Brighter shades of grey indicate stronger feature contrast
at the corresponding image locations; for example the red and yel-
low flowers between the two people elicit a strong response in the
color contrast map. (c) To compute saliency-based or group-based
features, each saliency map was sampled around the saccade target
location (red circle) when a participant initiated a saccade (red dot).
At the same time, 100 map values were randomly sampled from the
map as a baseline (blue circles), for comparison. (d) Histograms
were generated from both the human and random sample values.
(e) Differences between human and random histograms were further
summarized by ordinal dominance analysis to quantify the extent to
whichhumanobserversgazedtowardshighersaliencevaluesthanex-
pected by chance, in terms of the area under the curve (AUC, yellow
region) (see section 2.7 Computing Features for more detail). . . . . 40
x
2.2 ClassificationperformanceindifferentiatingPDpatientsfromelderly
controlsatthreegranularitiesofstartingfeaturesets: (a)allfeatures,
(b) the 3 feature types, and (c) the 15 core features (biometric sig-
natures). (a) Starting with all 224 sub-features, PD patients were
distinguishedfromelderlycontrolswith89.6%accuracyafterfeature
selection (SVM-RFE). Each row in the confusion matrix represents
actual classes, and each column predicted classes. (b) PD and el-
derly controls differed significantly in oculomotor (starting with 48
sub-features) and group-based behavior (16 sub-features), but not
in saliency processing (160 sub-features). Asterisks between bars in-
dicate cases where the classifiers performed significantly better than
permutedchance(computedfromtrainingaclassifierwithrandomly
permutedclass labels). Dashed line representsprior chancebased on
the number of controls and patients. (c) PD patients exhibited dif-
ferences in saccade amplitude, duration, peak velocity, inter-saccade
interval, intensity variance processing, texture saliency processing,
and similarity to normative young observers. This pattern of differ-
ences yields the 15-component biometric signature of PD. Dashed
line is the prior chance. Background colors separate oculomotor-
based, saliency-based, and group-based features from left to right.
(Error bars indicate 95% confidence intervals after Bonferroni cor-
rections. Significance level: p<0.01, one-tail paired t-test (df=29).) 45
xi
2.3 Classification performancefor children with ADHD, FASD, and con-
trol children, for: (a) all features, (b) the 3 feature types, and (c)
the15corefeatures(biometricsignatures). (a)Startingwithallsub-
features, childrenwithADHD,FASD,andcontrolchildrenwerebest
classified with 77.3% accuracy (ADHD: sensitivity 80%, specificity
90%; FASD: sensitivity 73%, specificity 91%) after feature selection
(MSVM-RFE). Format is as in Fig. 2.2. (b) Classifying the 3 child
groups with different feature sets demonstrated that they differed
significantly in saliency-based behavior (upper-left sub-plot). Chil-
dren with ADHD differed from control children in saliency-based
features, whereas children with FASD differed from controls in both
saliency-based and group-based features, and children with ADHD
and those with FASD could only be distinguished with all three fea-
ture types together. (c) The 15-component biometric signature of
ADHD and FASD. Children with ADHD, compared to control chil-
dren,demonstratedsignificantlydifferentsensitivityincolorcontrast
and oriented edges, as well as increased sensitivity to texture con-
trast. Children with FASD, in contrast, showed a different signature
that involved differences in similarity to young observers in gaze
distribution, sensitivity to line junctions, and sensitivity to overall
salience, as well as increased sensitivity to texture contrast. Back-
groundcolorsseparateoculomotor-based,saliency-based,andgroup-
based features from left to right. (see Fig. 2.2 for the computation
of chance level, error bars, statistical tests, and significance level.) . 48
2.4 (a) Classification accuracy for differentiating PD and elderly con-
trols, plotted as a function of the number of selected features during
SVM-RFE. Maximum classification accuracy (89.6%) was obtained
with 5 features (black arrow). Shaded region indicates mean 1 stan-
dard deviation over the repeated leave one out bootstrap valida-
tion. Chance performance (classification accuracy with permuted
class labels, 52.0%) is indicated by the dashed curve. (b) Classifica-
tion accuracy for differentiating ADHD, FASD, and control children
is plotted as a function of the number of selected features during
MSVM-RFE. The red line indicates the overall classification accu-
racy for differentiating 3 populations, and the classifier reaches peak
performance(77.3%)with19features(blackarrow).Theblue,green,
andcyanlinesareclassificationaccuraciesforclassifying eachpairof
populations. The black dashed line is the chance level for the over-
all classifier. Shaded regions indicate mean ± 1 standard deviation.
Chance performance ( 32.4%) is indicated by the dashed curve.. . . 52
xii
2.5 Sub-features selected by SVM-RFE. (a) Normalized values for the
top5rankedsub-featuresselectedbySVM-RFEforPD.Sub-features’
names are displayed in grey (see Table 2.2 for the interpretation of
thenames),andtheircorrespondingcorefeaturesnamesareinblack.
Feature values are standardized z-scores filtered by an arctangent
function. Rows represent the top 5 ranked sub-features. Columns
represent 38 participants, and the white vertical line separates the
2 populations. (b) Normalized feature values for the top 19 ranked
sub-features selected by MSVM-RFE that best classified children
with ADHD, FASD, and control children. Note that most of the
sub-features discovered by the classifiers belonged to the saliency-
based feature type. Features and participants were re-arranged so
that high feature values were better clustered at the diagonal of the
plot. (c)14PDand24elderlycontrolswereseparatedinto2different
clusters as revealed by Linear Discriminant Analysis (LDA), which
finds the dimensions (L1 and L2) from the top 5 sub-features in (a)
thatbestdistinguishedthetwogroups. (d)Similarly,LDAfoundthe
3 dimensions (L1, L2, and L3) from the top 19 sub-features in (b)
that best differentiated every pair among 21 children with ADHD,
13 children with FASD, and 18 control children. The 3 child groups
areclearlyseparatedinthesedimensions, eventhoughclustersin(b)
are less visually distinct. (**, ANOVA p<0.01; *, ANOVA p<0.05) 53
3.1 (a) The process of extracting a saliency trace. Observers’ eye move-
ments were recorded while free viewing videos of natural scenes that
changed every 2-4 seconds. A saliency map was computed for the
corresponding video frame (see (b)), and the normalized map val-
ues at the gaze location were extracted along the eye trace. (b) Nine
saliencymaps(color,intensity,etc.) ofdifferentvisualattributesand
one map generated from the instantaneous gaze position of norma-
tive young adult observers, here illustrated for the first video frame
from (a), the elephants. Brighter color indicates locations of the
video frame with stronger feature contrast. . . . . . . . . . . . . . . 84
3.2 (a) One layer of the convolutional deep neural network, where the
simple units are the squared weighted sum of the receptive field, and
the pooling units compute the square root of the sum of adjacent
simpleunits. (b)Sixty-four(outof128)randomlyselectedfirst-layer
bases learned from normative young adults’ saliency traces. . . . . . 91
3.3 (a)TheclassifierdifferentiatedchildrenwithFASDfromage-matched
controls with 73.1% accuracy. (b) Confusion matrix, sensitivity, and
specificity in differentiating patients from controls. . . . . . . . . . . 94
xiii
A.1 Developmental trajectory of (a) oculomotor saccade dynamics, (b)
low-level visual feature processing, and (c) overall attention deploy-
ment. (*, p< 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
xiv
Abstract
Asignificantprobleminclinicaldiagnosisofcertainneurobehavioraldisordersisthe
overlap in observed behavioral deficits, which extensively complicates diagnosis by
requiring additional neuropsychometric testing. The thesis proposes and validates
anovelmethodtoreliablydifferentiatenormalcontrolsfrompatientswithneurobe-
havioral attention deficits (Attention Deficit Hyperactivity Disorder, Fetal Alcohol
Spectrum Disorder, and Parkinson’s disease). This method alleviates the need for
complex task instructions and instead quantifies how these populations (both pa-
tients and controls) deploy their overt attention differently while they freely view
natural scene videos. We used a computational model of visual salience to analyze
the videos, and the correlations between salience and participants eye movements
were computed. Then these correlations as well as saccade statistics and inter-
observer similarities were fed into classifiers, which not only reliably differentiated
these populations but also identified the most discriminative features. This pro-
posed method can be used easily with populations less able to follow structured
tasks, and the low-cost and high-throughput nature of the method makes it viable
as an unique new quantitative screening tool for clinical disorders. Moreover, the
most discriminative features discovered by the method also provide insights on the
effectsofdisordersonseveralaspectsofattentionandgazecontrol. Webelievethat
this report is the first one to show that there is a latent signature of the disease
thataffectseverydaybehaviorandisdetectablebyouralgorithms. Inaddition,this
xv
signature is expressed in terms of the basic features of early attentional processing,
which we believe based on our previous work will interest a broad community of
neuroscience, psychology and computational researchers.
xvi
Chapter 1
General Introduction
Attention enables us to interact with complex environments by selecting relevant
information to be processed in the brain. Top-down (goal-oriented) control of at-
tention allows us to focus on tasks at hand without being distracted, while bottom-
up (stimulus-driven) attention keeps us aware of changes in the environment. To
properly allocate visual attention, a network of brain resources is engaged, from
low-level visual processing to motor control of gaze orienting [44]. This renders
visual attention vulnerable to neurological disorders. Several neuropsychological
studies have demonstrated that damage in any area of the attentional network can
impair performance in laboratory tasks that test for specific aspects of attentional
control [42,171]. We hypothesize that these deficits can be observed in the nat-
ural viewing behavior of patients with attention deficits and oculomotor controls,
and the disorder that causes the deficits can be decoded from their natural eye
movements.
In this introduction, section 1.1 first introduces the circuitry in the brain that
controls visual attention. Second, section 1.2 describes how the circuitry may be
impaired by neurological disorders. Third, section 1.3 reviews several important
1
computational models of visual attention that may be useful to quantify how pa-
tients and normative controls deploy their attention differently. Fourth, section 1.4
describes recent progresses in using natural stimuli and the free viewing paradigm
in experiments to study the visual system and attentional controls. Fifth, section
1.5 presents the benefits of identifying individuals of attention deficits with natural
eye movement. At last, section 1.6 concludes the general introduction with the
organization of the dissertation proposal.
1.1 Visual Attention
Attention is a mechanism that selects relevant visual inputs for further processing
inthebrain. Wereceivehugeamountsofsensoryinformationfromtheenvironment
ateverymoment,butthebraincannotprocessitallbecauseofitslimitedcomputa-
tionalpower. Therefore, attention iscriticaltoselect themostrelevantinformation
fromthesensorydatatoreducethecomputationalcostandimproveourbehavioral
performance. However, when attention is directed to irrelevant locations, say, by a
magicians, it is difficult for audience to comprehend the trick.
Visualattention issometimesthoughtofasa spotlightin thedark, anddetailed
understanding of a scene can only be achieved by moving this spotlight around
[194,197,216]. At the locations under the spotlight, studies have shown that the
neural process is enhanced [34,103,134,142], contrast sensitivity is increased [175],
neural response is multiplicatively modulated [230], receptive field properties are
changeddynamically[40,82,206,235], and, lastbutnotleast, neuralresponseatthe
unattended locations is inhibited [140,176]. This enhanced neural processing leads
to improved behavioral performance, such as better feature discrimination [119],
higher spatial resolution [241], and improved signal detectability [29,83].
2
Itisworthwhiletomentionthatattentionandeyemovementcanbedissociated.
That is, when one attends to a stimuli, he or she does not need to look straight
at it (covert attention) [89,166]; however, attention and gaze usually go together
(overt attention) in daily life. The advantage of overt attention is that the location
of interest overlaps with the fovea, which has much higher spatial resolution than
the rest of the retina, so that one can process the visual input in finer detail.
Researchershavebeeninvestigatedhowattentionselectsthemostrelevantinfor-
mation. A classic visual search paradigm was designed to study this question [216].
The experimental setup typically involves an array of visual stimuli. One of the
stimuliisthetargetdefinedbytheexperimenter, andtherestofthestimuliaredis-
tractors. The observer is asked to find the target as fast as possible, and the target
therefore becomes the most relevant visual information to be selected. By manip-
ulating the properties of the target and the distractors and the size of the search
array, two distinct modes of attention was demonstrated, which are bottom-up and
top-down attention.
1.1.1 Bottom-up attention
The first mode is bottom-up attention, where attention is driven by salient and
conspicuous external stimuli, such as a flashlight in the dark, or jack-popping-
out-of-the-box. These salient external stimuli differ spatiotemporally from their
surroundings, and one’s attention is captured automatically, quickly and involun-
tarily. Bottom-up attention is observed in the visual search experiment when the
target in the search array is distinct from the distractors in a feature space or more
(e.g., a red bar among green bars; an pop-out search). In this case, the observers
find the target very efficiently, and the time they spent on searching the target
3
does not increase with the number of distractors in the search array [216]. Several
computational models of bottom-up attention have been proposed. These models
takeexternalstimuliastheinputs(images/videos), computeatopographicmapsof
conspicuous locations of various features (saliency map), and guide attention based
on the saliency map [95,109,126].
1.1.2 Top-down attention
The second mode is top-down attention, where attention is guided internally by
goals or tasks to achieve, such as writing this thesis (internal goal) while ignoring
crying kids in the neighborhood (external distraction). Top-down attention can
direct attention in various ways. For example, if one is looking for a red pen, his
top-down attention can guide his attention to the top of his desk (spatial attention
[13,60,165,240]), to red color (feature-based attention [74,128,216]), or to all the
pens (object-based attention [9,55,58,185]). In a visual search experiment, when
the target can only be distinguished from the distractors by a conjunctions of
features (conjunction search), attention is needed to integrate all these features.
Then, top-down attention directs the observer’s attention to each stimulus one by
one to see whether it is the target. This search is inefficient and the search time
increaseswiththesizeofthesearcharray[216]. However,ifthetargetiscuedtothe
observer (observer knows how the target looks like), the time to detect the target is
reduced,possiblyduetoenhancedtargetfeatures[11];ontheotherhand,ifthecued
target is invalid, then the observer takes longier to find the target (Posner cueing
paradigm[165]). Thiscueingexperimentdemonstrateshowtop-downattentioncan
affect one’s behavior.
4
Visual Areas
Parietal Cortex Frontal Cortex
subcortical network
dorsal
frontoparietal
network
basal
ganglia
thalamus SC Eye
IT
MT
V4
V1/V2
LGN
IPS/LIP
TPJ
FEF
SFG
PFC
MFG
VFC
ventral
frontoparietal
network
Figure 1.1: Neural circuitry of visual attention. The retina sends visual infor-
mation to the SC and the visual areas. The visual information then travels from
the visual areas to the dorsal frontoparietal network and the ventral frontoparietal
network. The two networks not only modulate the visual areas, but also exten-
sively interact with the subcortical network, which controls the eyes. LGN, lateral
geniculate nucleus; MT, medial temporal; IT, inferior temporal; IPS, intrapari-
etal sulcus; LIP, lateral intraparietal; FEF, frontal eye field; SFG, superior frontal
gyrus; PFC, prefrontal cortex; MFG, medial frontal gyrus; VFC, ventral frontal
gyrus; TPJ, temporal parietal junction; SC, superior colliculus. (see section 1.1.3
for the function of these regions)
1.1.3 Neural circuitry of visual attention
Top-down and bottom-up attention are controlled by roughly 3 inter-connected
networks (Fig. 1.1). The first, the dorsal frontoparietal network, selects sensory
information based on one’s current goals and expectations. The second, the ventral
frontoparietal network, responds to unexpected, but behaviorally relevant sensory
information. The third, the subcortical network, integrates top-down and bottom-
up selections and generates the final overt/covert attention shift. The 3 systems
interact seamlessly to enable us select appropriate sensory information and react
to our complex environment.
5
1.1.3.1 Dorsal frontoparietal network
To illustrate how these networks interact, let’s look at the visual system. When a
visual stimulus is presented, the visual information travels from the retina to visual
cortex, which extracts features such as oriented edges, color contrast, and motion.
Thisinformationthentravelsfromtheretothedorsalfrontoparietalnetwork,which
is mainly composed of the frontal eye field (FEF), prefrontal cortex (PFC) and
medial frontal gyrus (MFG) in the frontal cortex, as well as the intraparietal sulcus
(IPS) in the parietal cortex (which is equivalent to the lateral intraparietal regions
(LIP) in macaque). These areas are strongly connected both functionally and
anatomically.
These brain areas generate and maintain top-down signals, and they also mod-
ulate visual areas to facilitate the selection of goal-relevant stimuli in several ways.
For example, during a visual search task, the signal enhances neuronal responses
to a specific feature of the target (feature-based attention, e.g. enhances responses
to red when looking for a red apple). Besides biasing responses to features, the
signal also biases responses to location (spatial attention) by facilitating the speed
of visual processing if a target appears at the expected location [165,239]. Some
other studies have also shown a causal relationship between behavioral responses
and top-down attention signal by artificially inducing behavioral responses via mi-
crostimulation in the PFC [158], FEF [139], and LIP [47], which lead to a biased
targetselection. Besidesmicrostimulatingtheseregions, Rossietal.[182]surgically
lesioned the PFC of two macaques, and they showed severe deficits in completing
tasks in which goals were switched across trials, but not in tasks in which goals
remained the same. This finding suggests the role of PFC in generating top-down
6
signals and adapting task demands. Besides the PFC, which performs many ex-
ecutive functions and generates top-down signal, the FEF and IPS take the signal
and compute a so called attention map or priority map, which is a topographic
map where locations are prioritized to guide attention [14,209]. Moreover, the FEF
can take the map information and directly initiate or suppress a saccade (overt
attention shift) as it directly connects to the saccadic premotor circuit [188]. Note
that the map reflects not only top-down, but also bottom-up signals because the
FEF and IPS also have information from the external environment. Unlike the
PFC, FEF, and IPS, the MFG plays quite a different role. The MFG may act as
the communication channel between the dorsal and ventral networks, because its
spontaneousactivitiesarecorrelatedwiththetwofunctionallydistinctsystems[66].
1.1.3.2 Ventral frontoparietal network
The visual cortex not only sends information to the dorsal frontoparietal network
but also to the ventral frontoparietal network, which is mainly composed of the
temporalparietaljunction(TPJ)intheparietalcortex,aswellastheventralfrontal
cortex (VFC) and medial frontal gyrus (MFG) in the frontal cortex. The ventral
system is lateralized to the right hemisphere, and these regions are highly activated
when humans passively (i.e., no response required) observe sudden changes in the
environment [54].
However, when humans do tasks actively, not all salient changes/events in
the environment activate these regions. Only those changes that are unexpected
and task relevant activate the ventral system, even if the changes are not very
salient [51,92,105,189]. This event-filtering mechanism is hypothesized to originate
from the dorsal network or the frontal cortex, which continuously suppresses the
ventral network unless the event is task relevant [43,193,212]. Therefore, both the
7
dorsal and ventral systems respond to the unexpected but important events. More-
over, the timing of their responses to a novel stimulus is indistinguishable [43], and
disruptions (e.g. by transcranial magnetic stimulation) or lesions to either system
decrease the ability to detect unexpected but relevant events [31,42,59,84,137]. In
summary, the ventral system is activated along with the dorsal system by unex-
pected but relevant events. Although the ventral system may not initiate the event
response, it is required to detect these events.
1.1.3.3 Subcortical network
The visual information and attention signals from the dorsal and ventral systems
alsotraveltothesubcorticalnetwork, inwhich3structuresarestronglyinvolvedin
attentional control: the superior colliculus (SC), basal ganglia (BG), and thalamus.
The SC receives top-down signals from the dorsal network (the FEF and LIP)
and bottom-up signals from the retina and visual cortex. In addition to receiv-
ing these signals, the SC can directly initiate or suppress a saccade because it
directly projects to the saccadic premotor circuit [147]. These dual roles (sensory
and motor) that the SC plays make it an ideal structure to compute the final
saliency/priority map that guides attention [63]. This idea is supported by studies
that bias target selections [26] and enhance performance in perceptual tasks [143]
by microstimulating the SC.
Unlike the SC, the BG and thalamus not only receive signals from the dorsal
and ventral systems, but also actively interact with them. For example, the BG
first receives reward information from the frontal cortex, encodes the reward sig-
nals to bias attention selection, and then sends both reward and attention signals
to the SC and the frontal cortex via the pulvinar [18,76]. The pulvinar is part
of the thalamus that acts as a hub that facilitates communication and enhances
8
coordination between a large number of brain areas, such as the visual, parietal,
frontal, and temporal cortices as well as the SC [181].
The pulvinar also involves attention modulation processing [192]. For example,
when the pulvinar is lesioned, top-down attentional control is disrupted, and pa-
tientsfailtodiscriminatetargetsfromdistractorsifthedistractorsaresalient[200].
Besides relaying attention signals to modulate other brain regions, the thalamus is
also directly modulated by attention signals (e.g. in lateral geniculate nucleus, and
thalamic reticular nucleus [135]). In summary, the subcortical network strongly
interacts with the dorsal and ventral systems in integrating bottom-up and top-
down attention signals, encoding reward signals, and relaying information from one
region to another.
1.2 Disorders Impair Visual Attention
Neuroscientists have done a tremendous job in mapping brain structures to func-
tions, but it is still a work in progress to figure out how and where the brain
is impaired by behaviorally-defined disorders, such as Autism Spectrum Disorder,
Depression, Anxiety, and Attention Deficit Hyperactivity Disorder. Most of these
disorders co-occur with a certain attention deficit, which is not surprising because
a large network of brain regions is engaged to properly allocate visual attention
(section 1.1.3). This renders visual attention vulnerable to neurological disorders.
In this chapter, one neurodegenerative and two neurodevelopmental disorders
are chosen to be reviewed because (1) they have been shown to involve deficits in
visual attention and oculomotor functions, and (2) they are directly involved in
the studies in Chapter 2 and 3. The three disorders are Parkinson’s Disease (PD),
Attention Deficit Hyperactivity Disorder (ADHD), and Fetal Alcohol Spectrum
9
Visual Areas
Parietal Cortex Frontal Cortex
subcortical network
dorsal
frontoparietal
network
basal
ganglia
thalamus SC Eye
IT
MT
V4
V1/V2
LGN
IPS/LIP
TPJ
FEF
SFG
PFC
MFG
VFC
ventral
frontoparietal
network
Figure1.2: Parkinson’sdiseasedegeneratesdopaminergicneuronsinthesubstantia
nigra pars compacta (SNpc) in the basal ganglia.
Disorder(FASD).Thisreviewfocusesontherelationshipbetweenattentiondeficits
andthedamageintheattentioncircuitryofthebrainthatareinducedbythethree
disorders.
1.2.1 Parkinson’s Disease
PD is a neurodegenerative disorder characterized by degeneration of dopaminer-
gic neurons in the substantia nigra pars compacta (SNpc) (Fig. 1.2), affecting
the motor loop of the basal ganglia, which subsequently impairs body movement
(e.g., tremor, bradykinesia) and oculomotor movement (e.g., slower and shorter
saccades). Early diagnosis of PD is of critical importance because the onset of
dopaminergic neuronal loss preceeds the first clinical motor symptom by approxi-
mately4-6years[62,130,141], andasmanyas58-64%oftheseneuronshavealready
been lost [210,211].
10
Besides impairing the motor loop, PD also affects the prefrontal, premotor, mo-
tor and parietal cortices [10,41,231], leading to deficient attention guidance. This
cognitive deficit is demonstrated by a pro/anti saccade task. For the prosaccade
task, an observer first fixates straight ahead at a fixation point, and he makes a
saccade toward a target when it suddenly appears in his peripheral visual field.
This type of saccade is considered as a reflexive, visually-guided, bottom-up driven
saccade. For the antisaccade task, when the target suddenly appears, the ob-
server must inhibit the automatic response to the target, and make a saccade to
the opposite direction. These saccades are considered as goal-oriented, top-down
driven saccades. If the observer is slow to respond or makes more direction error
in the antisaccade task, it implies the observer has difficulties in inhibiting the
automatic response and his top-down attention may be deficient. Unfortunately,
this is what has been observerd in PD patients. However, PD patients responds
faster (shorter reaction times) than elderly controls when making visually-guided,
bottom-up driven saccades saccades. Taking these two observations together, PD
patients appear more stimulus-driven. [2,21,25,90,178].
1.2.2 Attention Deficit Hyperactivity Disorder
ADHD in childhood is characterized by a delay in cortical maturation, dysfunction
in dopamine transmission in the frontal cortex and/or basal ganglia [73,123,220],
reduced volume and cortical thickness in several frontal and subcortical regions
(review in [114,184]), and decreased activity in frontal and striatal regions [3,186,
195,242] (Fig. 1.3a). These deficits result in various impairments in attentional
controls, whichleadtothreesubtypesofADHD:hyperactive, inattentive, andcom-
bined subtype [5]. However, the current criteria for defining subtypes are heavily
11
Visual Areas
Parietal Cortex Frontal Cortex
subcortical network
dorsal
frontoparietal
network
basal
ganglia
thalamus SC Eye
IT
MT
V4
V1/V2
LGN
IPS/LIP
TPJ
FEF
SFG
PFC
MFG
VFC
ventral
frontoparietal
network
Visual Areas
Parietal Cortex Frontal Cortex
subcortical network
dorsal
frontoparietal
network
basal
ganglia
thalamus SC Eye
IT
MT
V4
V1/V2
LGN
IPS/LIP
TPJ
FEF
SFG
PFC
MFG
VFC
ventral
frontoparietal
network
frontoparietal
FEF
SFG
PFC
MFG
VFC frontoparietal
frontoparietal
frontoparietal
ganglia ganglia ganglia ganglia
(a) (b)
Figure 1.3: (a) Attention Deficit Hyperactivity Disorder (ADHD) impairs the
processes and communications in the frontal cortex and/or basal ganglia. (b) Fetal
Alcohol Spectrum Disorder (FASD) influences the brain globally and results in
malformation of brain structures, reduction of overall brain size, and decrease of
white-matter volumes.
criticized because of the heterogeneous symptoms of patients with ADHD [152]. At
the same time, most research treats all patients with ADHD as a whole, regardless
their subtypes.
Nevertheless, besides impairing executive controls, these structural and func-
tional impairments in the brain result in a characteristic behavior of ADHD pa-
tients - difficulties in inhibiting premature responses, and thus patients appear
more stimulus-driven [102,145]. This difficulty has been consistently demonstrated
across studies using the pro/anti saccade task. ADHD patients are found mak-
ing more premature saccades (saccade before the target appears), and exhibiting
longer reaction times and greater intra-subject variance in both tasks. Most im-
portantly, ADHD patients make more direction errors only in the antisaccade task,
which shows their difficulties in inhibitory controls. Moreover, thepro/anti saccade
12
performance of ADHD patients has never matched that of their age-matched con-
trols [102,107,145]. Oculomotor function (e.g., peak saccade velocity and smooth
pursuit) seems relatively unimpaired, though previous studies have suggested in-
consistent differences compared to controls [102].
1.2.3 Fetal Alcohol Spectrum Disorder
Patients with either ADHD or FASD demonstrate comparable deficits in visual
attention tasks [110,170,222]; however, the underlying causes of these symptoms
may be different in the two disorders.
FASD is caused by excessive maternal alcohol consumption, which impacts the
braingloballyandresultsinreducedoverallbrainsize[133],decreasedwhite-matter
volumes, increased gray-matter density and altered cortical surface of the temporal
and parietal lobes [201,203,204], and malformation of the frontal lobe, temporal
lobe, parietal lobe, basal ganglia, corpus callosum, hippocampus, and cerebellum
of the offspring [110,178,205,228] (Fig. 1.3b). The exposure to alcohol also causes
characteristic patterns of craniofacial dysmorphologies (short palpebral fissures,
smoothphiltrum, thinupperlip, flattenedmidface). Basedonthepresenceofthese
facial features and other criterias (e.g., growth restrictions and structural and/or
functional abnormalities of the central nervous system), FASD is categorized into
3 subtypes: fetal alcohol syndrome (FAS), partial FAS (pFAS), and alcohol-related
neurodevelopmental disorder (ARND) [35]. Patients with FAS present full facial
features, patients with pFAS has some of the craniofacial features presented, and
patients with ARND show few or no facial features. Studies suggest that the
presence of the facial features correlates with the severity of patients’ cognitive
deficits [77,79] and the abnormalities of their brain [238].
13
The global impact of alcohol during development on cortical and subcortical
structurescausesawiderangeofneurobehavioraldeficits,suchaslearning,memory,
language, intelligence, executive function, motor function, visual-spatial ability,
and attention (see [132] for review). These patients’ top-down attentional control
is decreased [124], the bottom-up attention is weakened possibly due to deficient
visual sensory processing [28], and their oculomotor functions are impaired [101].
When the patients perform a pro/anti saccade task, they showed longer and more
variable saccadic reaction times, and increased direction error in both tasks [77],
implying deficient attention in both top-down and bottom-up guidance. Moreover,
the saccadic reaction times and the direction errors increase from control, ARND,
pFAS, to FAS in both prosaccade and antisaccade tasks, which demonstrates the
correlationbetweenthepresenceofthefacialcharactersandtheseverityofattention
controls.
See [132] for a nice and comprehensive review in the neurobehavioral deficits of
FASD patients, and a comparison to patients with ADHD.
1.3 Computational Models of Visual Attention
As computational power has increased in the past decades, computational models
of visual attention have been implemented for practical applications and gained a
lot of interest. Psychologists and neuroscientists now can realize their theoretical
models of attention computationally, so that their hypotheses can be tested and
predictions can be generated quantitatively. On the other hand, computer vision
and robotics researchers utilize attention models as a quick filter to identify regions
of interests and then perform computational algorithms that are more costly only
at these regions (e.g., object recognition). In this section, the theories behind
14
modeling visual attention will be described, as well as how visual attention has
been computationally modeled.
1.3.1 The goal: identify regions of interest
Themainpurposeofvisualattentionistoselectthemostrelevantvisualinputtobe
processedinthebrain. Therefore, itisinevitablethatalltheattentionmodelsmust
be capable of predicting regions of interest. These models should be quantitatively
validated by human behaviors, so that the models can be compared quantitatively
and can provide insights into human attentional selection processes.
1.3.2 Psychophysical theories of visual attention
The two most influential theories in visual attention for searching targets are Fea-
ture Integration Theory by Threisman and Gelade [216] and Guided Search Model
byWolfe[234]. TheFeatureIntegrationTheoryclaimsthatprimaryvisualfeatures
(e.g.,colorcontrast,orientation)areprocessedpre-attentivelyandinparallelacross
the visual field. However, to integrate these features for identifying an object, fo-
cused attention is required and all the candidate objects can only be accessed in
serial. To support this claim, Threisman and Gelade designed search arrays in two
conditions. In the first condition, the target differed from surrounding distractors
in at least one feature dimension (pop-out search); in the second condition, the tar-
get could only be identified from distractors with a conjunction of multiple features
(conjunction search). In the pop-out search, Treisman found that the observers
identified the target efficiently. Their search time did not increase with the size of
the search array. However, when the observers performed a conjunction search, the
search was inefficient, and the search time increased with the size of the array.
15
Central Representation
WTA
Saliency Map
Feature Maps
(a) (b)
Figure 1.4: (a) The Koch-Ullman attention model (Adapted from [109]). (b) The
Guided Search Model by Wolfe (Reprinted from [233] with kind permission from
Springer Science and Business Media).
ThisFeatureIntegrationTheoryledKochandUllman[109]totheideaoffeature
mapsandsaliencymaps,whichserveasaframeworkofmanycomputationalmodels
(Fig. 1.4a). Basically, afeaturemaprepresentsthestrengthofavisualfeature(45
◦
orientation) in a togographic manner, and all the feature maps are computed in
parallel. Later, these feature maps are integrated into a master saliency map that
directs attention to the location of the strongest map response (winner-take-all).
However, neither the Feature Integration Theory nor the Koch-Ullman model
mentions how to integrate top-down information, like the appearance of the target
to search. The Guided Search Model by Wolfe [234] (Fig. 1.4b) provides a way to
do so, and explains faster search time when observers know what the target looks
like. Similar to the Koch-Ullman model, the Guided Search Model first compute
all the basic feature maps. These feature maps are given different weights and are
summed to a single activation map. The weights of the feature map encode the
top-down information. For example, if one is looking for a red apple, he can give
feature map of “red” a higher weight. This activation map may have several peaks,
Wolfe introduced an inhibition map to record visited locations to avoid repeated
16
Figure 1.5: Taxonomy of visual attention models by Borji and Itti (Reprinted
from [20] c
2012 IEEE).
visits. Several later visual search models have tried to learn the weights in different
ways to better predict regions of interest.
As the Koch-Ullman model does not account for top-down information, it is a
bottom-up attention model; on the other hand, the Guided Search Model provides
a way to integrate top-down information, so it belongs to a top-down attention
model. The taxonomy of models of visual attention is shown in Fig. 1.5. The
top-down models can further be categorized into visual search models (where the
Guided Search Model belongs), context models, and task-driven models. Some of
the most important models of these categories will be presented here.
1.3.3 Bottom-up attention models
AftertheKoch-Ullmanmodelwasintroduced, itwashardtofindapurebottom-up
attention model as researchers quickly learn that top-down information can easily
be integrated by giving each feature map a different weight based on the target
appearance. Therefore, the models reviewed in this section are not pure bottom-up
17
models; however, they are either important in advancing how bottom-up saliency
mapiscomputed,ortheyarenotfocusingonintegratingthetop-downinformation.
Clark and Ferrier [37] first implemented the conceptual Koch-Ullman model.
Theybuildtheattentionmodelaspartofanactivevisionsystemtocontrolcameras
ontheirmobilerobot. TheClark-Ferriermodelfirstcomputesasetoffeaturemaps,
weights each map, and sum them linearly to generate a master saliency map. The
strongest activation in the saliency map then guides the cameras. Interestingly,
they design the weights to decay over time so that attention can be shifted.
Milanese[138]proposedanotherattentionmodelafewyearslaterbyintroducing
several novel concepts, including as conspicuity map, center-surround operation,
and a non-linear way to generate the saliency map. The center-surround operation
enhances the activity at the center, and inhibits the activity at the neighbors.
Afterthiscenter-surroundoperationisappliedonthefeaturemap, thefeaturemap
becomessparserasweakresponsesareinhibited,andtheresultingmapiscalledthe
conspicuitymap. Alltheconspicuitymapsareintegratedbyanon-linearrelaxation
procedure. Milanese’s model also comes with a top-down attention component and
an alerting system. The top-down component is an object detection system, which
detects objects by matching against stored object models, and give rise to a top-
downattentionmap. Thismapisintegratedwiththeconspicuitymapsbythesame
non-linear procedure and is outputed as the saliency map. The alerting system is
usedtodetectmotion, andthismotion/alertingmapandthesaliencymaptogether
decide the camera movement.
Itti et al. [95] present an attention model that is widely used by many research
groups today. This model is also derived from the Koch-Ullman model, and incor-
porate the ideas of conspicuity map and center-surround operation from Milanese’s
model. In addition, Itti et al. propose multi-scale image pyramids to facilitate
18
Feature maps
intensity orientations colors
Input image
Linear filtering at 8 spatial scales
Center-surround differences and normalization
R, G, B, Y 0, 45, 90, 135
o
7 features
(6 maps per
feature)
Linear combinations
Saliency map
WTA
Inhibition of Return
Central Representation
ON, OFF
Conspicuity maps
Figure 1.6: Itti’s saliency model (Reprinted from [95] c
1998 IEEE).
the center-surround operations on feature maps, and introduce a new normaliza-
tion method that promotes feature maps with a few peak response but inhibits
feature maps with many peak activities (Fig. 1.6). After normalizing each feature
map, these feature maps of the same visual attribute (e.g., all the orientation fea-
ture maps) are summed together across scale to compute conspicuity maps. The
saliency map is the linear sum of all the conspicuity maps. A winner-take-all mech-
anism is used to select the most salient location. Once the most salient location
is visited, the location is masked by inhibition-of-return (IOR) mechanism so that
the attention can be shifted to the next winner. It is worthwhile to mention that
although IOR has been demonstrated [108], it may have little to do with how the
brain prevents revisiting the same location. However, IOR is widely included in
attention models as an engineering solution [183].
19
1.3.4 Top-down attention models
There are three types of top-down attention models (Fig. 1.5), which are visual
search models, context models, and task-driven models. They will be reviewed one
by one in this section.
Visual search models. The three attention models introduced in section
1.3.3 all belong to the category of visual search models as they aim to find a target,
and they have the mechanisms to incorporate top-down information. One way to
integrate such information is to change the weights of the feature/conspicuity maps
before they sum up to the saliency map.
NavalpakkamandItti[151]extendsItti’ssaliencymodelbychangingtheweights
of both feature maps and conspicuity maps based on the image statistics of the
targetandthedistractingbackground. Theauthorslearntheweightofeachfeature
and conspicuity map by maximizing a signal-to-noise ratio, which is the ratio of
expectedsaliencebetweenthetargetandthedistractingbackground. Similarly, the
VOCUS goal-directed search system proposed by Frintrop et al. [68] also learn the
weights from the properties of both target and background for faster search time.
By tuning the weights of the maps from top-down information, the regions of the
potential target can be quickly identified. Then, more computationally expensive
object recognition algorithms can be applied on these regions to match the target.
Another common way to integrate top-down information is to incorporate some
object detection or recognition system to find the regions of interest that may
contain the target. Besides Milanese’s model [138] introduced in section 1.3.3,
another interesting model is proposed by Backer et al. [7]. This model is a two-
stage model and it attempts to explain early and late attention selection. The first
stage, the early stage, uses the Koch-Ullman like structure to compute a saliency
20
map. However,nowinner-take-allisperformedheresothatafewregionsofinterest
remain on the map. Next, the second selection stage, the late stage, goes through
all these salient locations, and shifts attention to the one most likely to contain an
object. The object is represented by a symbolic object file, and the authors define
objects by continuity in shape, features, and position within a salient location.
Context (gist) model. The semantic content of a scene is another factor
thatinfluencestheattentionalselectionprocesswhilesearchingandrecognizingthe
target. For example, while searching a sofa in a room, one can focus on searching
on the floor and ignore the ceiling and walls. However, if the sofa is somehow
floating in the air, which violates its typical item configurations, it takes human
observers longer to find it [12,52,86]. This configuration between objects and
scenes can be learned implicitly through multiple exposures of similar displays
[23,36,87,155]. Once the configuration is learned, human observers can recognize
these scenes even before recognizing any object in it [187], and then trigger the
attention bias automatically.
These observations lead Torralba and Oliva to design a contextual guidance
model [154,213] that incorporates the contextual information into attention models
(Fig. 1.7a). The contextual guidance model is based on coarse-level global statistic
properties of a scene (the gist), rather than objects in a scene. Therefore, the
model does not need to parse a scene into objects, and it can work in parallel with
bottom-up and object processing.
The contextual guidance model has two parallel pathways (Fig. 1.7a). The
first pathway encodes local features using Gabor filters in different orientations and
spatial frequencies, and the filter responses are integrated as in the Koch-Ullman
framework. The other pathway encodes the global features to represent the global
statistics of the scene. Each value in the global feature vector is the summation of
21
Saliency
computation
Scene
priors
Task:
looking for
pedestrians.
Scene-modulated
saliency map
Contextual
modulation
Bottom-up
saliency map
x
y
Local features
Global features
LOCAL PATHWAY
GLOBAL PATHWAY
Σ
Σ
Σ
(a)
(b)
Figure 1.7: (a) Contextual guidance model by Torralba and Oliva (Reprinted
from [214] with permission). (b) Task-driven model by Peters and Itti (Reprinted
from [162] c
2007 IEEE).
the response of gabor filters in different configurations. Given a task (e.g., looking
for pedestrians), the contextual modulation is p(x|o,V
c
), which is the probability
where one should look (x) given this task (o) and this scene (global features, V
c
).
ThisprobabilityislearnedinaBayesianframeworkwithatrainingdataset. Later,
this probability is multiplied by the saliency map to produce a scene-modulated
saliency map that guides attentional selection.
22
Task-driven model. In the contextual models, the task is usually a given
target category to be searched. How does the attention selection process can be
modeled while one performs complex tasks, such as driving, cooking, or flying
an airplane. A pioneer work in building task-driven model was done by Peters
and Itti [162]. The basic idea behind their model is to correlate the gist of the
scenes with the eye movement (the overt attention) (Fig. 1.7b). The authors setup
a video game playing experiment, where both eye movements and corresponding
video frames are recorded. To learn the correlation, they first compute the gist
vector of each video frame with visual attributes such as color contrast, luminance
contrast, orientations, temporal flickers, and motion. Then, a least-square linear
regression is used to learn the correlation between the gist and the gaze position.
Duringthetesting,twoparallelpathwaystogetherdeterminetheattendedlocation.
Thefirstpathway is theItti-Koch bottom-upmodel, and thesecond pathway is the
top-down model that encodestherelationship between thesceneand thegaze. The
outputs of the two pathway are multiplied to generate the final attention guidance
map. By examing the correlations between the gaze position and the outputs of
the two pathways, one can examinate how participants switch between top-down
and bottom-up attention modes [161].
A number of important computational models of visual attention are reviewed
in this section. For more reviews, please see [20,69,183].
23
1.4 EmergentExperimentalStimulitoStudyVisual
Attention: Natural Scenes
1.4.1 Why use natural scene stimuli?
Experiments using natural scenes have aroused considerable interest in the neuro-
science, psychology, and computer vision communities because their common goal
is to understand how our visual system works, and how we deploy visual attention
in the natural environment. Experiments using artificial stimuli (e.g., white bar on
ablackbackground)havesuccessfullycharacterizedthepropertiesofthevisualand
attention systems; however, it is unclear whether these properties can be general-
ized to the natural environment. Therefore, it is critical to use natural stimuli to
validate their functional significance, and to discover new properties of the visual
and attention systems.
1.4.2 Neuroscience community
Taking the neuroscience community for example, it has showed that visual neurons
respond differently between natural and artificial stimuli [64]. The response of the
visualsystemisdeterminedbythespatiotemporalpropertiesoftheirreceptivefield
(RF),whichhasbeencharacterizedbystudiesusingartificialstimuli. Recently,sev-
eral studies have validated their basic spatial structure of the RF by natural stim-
uli[49,167,179,199,215]. However, whenusingnaturalstimuli, studieshaveshowed
that the RFs have additional temporal and spatial inhibitory components, which
are not observed with artificial stimuli [50]. This could be due to the plain back-
ground of artificial stimuli, which simply lacks of the texture of natural stimuli that
exciteneighboringneuronstoprovidetheinibitorycomponents[50]. Moreover,free
24
viewing also changes the response of the visual neurons: they are less activated and
visualinformationismoreefficientlycoded(sparsecoding), incomparisontoseeing
the same natural stimuli while maintaining fixation [72,225]. These studies have
established connections between low-level visual processing and the functional rele-
vance to the natural environment, and they have helped researchers to understand
how the visual system works naturally by using natural stimuli and instructions.
1.4.3 Psychology community
The psychology community uses natural stimuli to validate and explore factors
that guide attention under natural viewing. The interaction between top-down
and bottom-up attention has been studied extensively with artificial stimuli, but
how the two guide attention together under natural viewing is still unclear. By
using artificial stimuli, it is established that attention is automatically attracted by
abruptonset[239]andsalientfeatures[216](bottom-upattention). Salientfeatures
seem to attract attention during natural viewing, because studies have shown that
image patches that people looked at have higher local contrast in color, intensity,
and oriented edges than the patches randomly sampled from the images did [174].
However, this correlation between fixated locations and high local contrast quickly
diminished when the observer was given a task (e.g. finding Waldo). This finding
validates that top-down attention can overwrite bottom-up attention under a natu-
ral condition. Nevertheless, it is worthwhile to note that although the observers are
free to deploy their own viewing agenda during free viewing (no given task), they
look at locations that are fairly consistent with other observers. This observation
suggeststhattheobserversmaywatchnaturalstimuliinamorebottom-upfashion,
or adapt similar agendas during free viewing. Besides validating findings obtained
25
with artificial stimuli, researchers also identified several other factors in natural
stimuli that guide attention, such as gaze cues and contextual information. Gaze
cues, which are the head and eye directions of the people in the presented images
or videos, affects attention allocation in natural viewing. If the object of interest is
located at the gaze direction, it is detected more rapidly and accurately than when
it is placed at other directions [116]. Besides gaze cues, contextual information
(gist) in natural stimuli has also been found to be a strong factor in guiding eye
movement. For example, an image of street scene gives observers a clue to look at
the road if they are interested in cars, or to look at sidewalk if they are fascinated
by people. The gist of a scene is extracted very quickly from natural stimuli, in-
teracts with the observers’ knowledge of the scene, and affects their attention shift
even before their first eye movement is made [86,154]. For instance, if the object of
interest does not reside at locations that are consistent with the scene (incongruent
locations, e.g., a microscope on a stove), people spend more time finding the object
than when it locates at congruent places (e.g., a microscope on a table) [86].
1.4.4 Computer Vision Community
For the computer vision community, researchers focus on understanding visual at-
tention by computational modeling of the factors that guide attention allocation
(e.g. visual salience [96], gist [214], task [162]). These models aim to maximize
their predictive power over natural viewing eye traces (e.g., a sequence of fixation
locations) so that the models may better mimic the human attention allocation
process of the visual system, which select the most important information in the
environment to accomplish goals and to survive. There are two major benefits in
26
the computational modeling approach. The first one is that it provides a quantita-
tive tool for testing scientific hypotheses using complex natural stimuli, which are
not as well controlled as artificial stimuli. Taking a computational model of visual
salience [95] for example, the model computes a topographic map of visual salience
(high local contrast in features such as color, intensity, oriented edges, flicker, and
motion), which has been used to quantify how well visual salience (an approximate
of bottom-up attention) is correlated to fixation locations in complex natural stim-
uli. Whileithasbeenshownthatvisualsaliencycanpredictfixationsundernatural
viewing significantly above chance, several other computational models that incor-
porate task [162] or contextual information [214] predict fixation locations even
better than the saliency model alone. These models provide not only a way to test
hypotheses(e.g. whethercontextualinformationisasignificantpredictoroffixation
locations), but also the functional significance of the hypotheses in the natural con-
dition. The second benefit of modeling for the computer vision community is that
thesemodelscanbeusedinreal-worldapplications, suchasvideocompression, tar-
get detection, commercial evaluation, and medical assessment. Visual information
is overwhelming, and computers have limited computational power to process all of
it. Therefore, it is needed to have an attention-like mechanism to select the most
critical information to process so that the computer can maximize its performance
for its tasks. For video compression [93], video details will be preserved at locations
that everyone looks at. On the other hand, locations that people rarely look at will
be compressed. For target detection (e.g. military structures in the desert) [125],
themodelscanbetunedtothepropertiesofthetarget(e.g. enhancethesensitivity
to oriented edges), and can quickly identify a few locations of interest for further
analysis, rather than analyzing the whole image. For commercial evaluation [53],
designers can get real-time feedback from the models about whether viewers look
27
at locations that the designers want them to pay attention to. For medical assess-
ment, the model can evaluate how well each feature is correlated to a participant’s
eye movement to learn whether he or she is deficient in processing certain features
(e.g., color for patients with Attention Deficit Hyperactivity Disorder, see Chapter
2).
In summary, the neuroscience, psychology, and computer vision communities
benefit from adapting natural stimuli. The neuroscience community has identified
spatial and temporal inhibitory components that were not found by artificial stim-
uli. The psychology community has discovered new factors that guide attention
such as gaze cues and contextual information with natural stimuli. The computer
vision community has modeled the attention allocation process to quantitatively
evaluate the importance of each factor that guides attention in a natural condition,
and has used these models for real-world applications.
1.5 IdentifyingIndividualswithAttentionDeficits
from Their Natural Viewing Eye Traces
This introduction so far has introduced how the brain deploys visual attention,
how different disorders affect attentional selection by impairing different parts of
the attention circuitry, and how the attentional selection process is modeled by
computational models of visual attention with complex natural scenes. The next
natural step is to quantify how patients and age-matched controls deploy their
attention differently on natural scenes. This can be done by computing the correla-
tion between gaze positions and an attention model’s outputs, such as the saliency
28
maps, top-down maps, or context-modulated saliency maps, depending on the at-
tentional selection process under investigation. These correlations can be used to
classify patients and controls, and the correlations that are more useful in differ-
entiating the two groups imply that those particular attentional selection process
may be more severely affected.
This natural viewing paradigm has three main benefits. One of the advantages
to use this paradigm to classify patients is that this approach is task-free. In
other words, the paradigm does not require the participants to either understand
orcomplywithtaskinstructions; instead, theparticipantsfreelyviewnaturalvideo
clips while their eye movements are recorded. Therefore, this approach alleviates
theneedforunderstandingtaskinstructionsandhencetheparadigmcanbeapplied
to populations who have difficulties in understanding or following tasks. Basically,
as long as the participant is able to watch television and his or her eyes can be
tracked, this paradigm can be applied on him or her.
The second benefit is that large amount of data is collected in a short period
of time. In a typical structured task like the pro/anti saccade task, participants
make only one response (saccade toward left, or toward right) for every 2 to 3
seconds, and that is 1-bit of information. However, when participants freely view
natural scenes, they make 2 to 3 saccades per saccade in all kinds of direction and
amplitude. Moreover, thecomplexnaturalsceneisrichininformationaseachpixel
on the display contains 8-bits of color information, where pro/anti saccade shows
only one bright dot on the screen.
The third advantage is that the paradigm allows psychophysics researchers to
see whether the behavioral differences they find in patients under structured, labo-
ratory task can be generalized to the patients’ natural behavior. Structured tasks
are extremely valuable in identifying particular impairments, but the nature of
29
each task might dictate whether these disorders surface or not. For example, chil-
dren with Attention Deficit Hyperactivity Disorder (ADHD) fail to inhibit overt
orienting of attention to a visual target in the antisaccade task (i.e., they saccade
towards a briefly flashed peripheral target, instead of looking away from it as in-
structed[145]); however,theycansuccessfullyinhibitsaccadestosalientdistractors
in oculomotor capture tasks [222]. The inconsistency in these results indicates that
one cannot conclude from these tasks whether ADHD children are more stimulus-
driven. Therefore, our natural viewing paradigm may be used to investigate these
patients’ behavior under a more natural setup, and may help to settle some of the
issues like the one described above.
1.6 Document Organization
The following document presents two studies. The first study (chapter 2) proposes
and validates a novel method to reliably differentiate normal controls from patients
with neurobehavioral attention deficits (ADHD, FASD, and PD). This method
alleviates the need for complex task instructions and instead quantifies how these
populations (both patients and controls) deploy their overt attention differently
while they freely view natural scene videos. This proposed method can be used
easily with populations less able to follow structured tasks, and the low-cost and
high-throughputnatureofthemethodmakesitviableasauniquenewquantitative
screening tool for clinical disorders. Moreover, the most discriminative features
discovered by the method also provide insights on the effects of disorders on several
aspects of attention and gaze control. The second study (chapter 3) extends the
first one by reducing the length of the experiment from 15 to 5 minutes, which
makes the paradigm work better on children. However, the method developed in
30
the first study is not very suitable with reduced amount of data. So, a new method
is designed to classify children with FASD and controls with this short experiment.
ThismethodalsoprovidesinsightsonhowFASDaffectsseveralaspectsofattention
and oculomotor controls. Chapter 4 is the general discussion that concludes these
two studies and lists some ideas for future works.
There are two chapters in the appendix. Appendix A descirbes how maturation
and aging affects attention and oculomotor controls during natural viewing, using
a method similar to the first study without involving a classifier. Appendix B in-
vestigates how several factors guide attention during free viewing of natural scenes,
and discusses how to better assess computational models of visual attention and
gaze using natural scene stimuli.
31
Chapter 2
High-Throughput Classification of Human
Clinical Populations from Eye Movements while
Viewing Natural Scenes
2.1 Summary
Many high-prevalence neurological disorders involve dysfunctions of oculomotor
control and attention, including Attention Deficit Hyperactivity Disorder (ADHD),
Fetal Alcohol Spectrum Disorder (FASD), and Parkinson’s Disease (PD). Previous
studieshaveexaminedthesedeficitswithclinicalneurologicalevaluation,structured
behavioral tasks, and neuroimaging. Yet, time and monetary costs prevent deploy-
ing these evaluations to large at-risk populations, which is critically important for
earlier detection and better treatment. We devised a high-throughput, low-cost
method where participants simply watched television while we recorded their eye
movements. Wecombinedeye-trackingdatafrompatientsandcontrolswithacom-
putational model of visual attention, to extract 224 quantitative features. Using
machine learning in a workflow inspired by microarray analysis, we identified crit-
ical features that differentiate patients from control subjects. With eye movement
32
tracesrecordedfromonly15minutesofvideos,weclassifiedPDversusage-matched
controls with 89.6% accuracy (chance 63.2%), and ADHD versus FASD versus con-
trol children with 77.3% accuracy (chance 40.4%). Our technique provides new
quantitative insights into which aspects of attention and gaze control are affected
by specific disorders. There is considerable promise in using this approach as a
potential screening tool that is easily-deployed, low-cost, and high-throughput for
clinical disorders, especially in young children and elderly populations who may be
less compliant to traditional evaluation tests
1
.
2.2 Introduction
Visual attention and eye movements enable us to interact with complex environ-
ments by selecting relevant information to be processed in the brain. To properly
allocate attention, a network of brain resources is engaged, from low-level visual
processing to motor control of gaze orienting [44]. This renders visual attention
vulnerable to neurological disorders. Several neuropsychological and neuroimaging
studieshavedemonstratedthatdamageindifferentareasoftheattentionalnetwork
can impair distinct aspects of task performance or can reveal unusual patterns of
brain activity, in laboratory tasks that test for specific aspects of attention [42].
However, while in-depth clinical evaluation, structured behavioral tasks, and neu-
roimaging are extremely valuable and are the current gold standard in identifying
particular impairments, they suffer from limitations which prevent their large-scale
deployment: time and cost by limited numbers of medical experts, and inability of
some patients (e.g., young children or some elderly) to either understand or comply
with structured task instructions, or with the testing machinery or protocol.
1
The majority of Chapter 2 is reproduced from my work [217] with kind permission of Springer
Science+Business Media.
33
Our core hypothesis is that natural attention and eye movement behavior - like
a drop of saliva - contains a biometric signature of an individual and of her/his
state of brain function or dysfunction. Such individual signatures, and especially
potential biomarkers of particular neurological disorders which they may contain,
however, have not yet been successfully decoded. This is likely because of the high
dimensionality and complexity of the natural stimulus (input space), of the stimu-
lus to behavior transfer function (brain function), and of the behavioral repertoire
itself(outputspace). Wedevisedasimpleparadigmthatdoesnotrequireexpensive
machinery, involves no preparation and no cognitive task for participants, is com-
pleted in 15 minutes, is portable for use outside large medical centers, and (after
initial training of the machine-learning algorithms) autonomously provides detailed
decoding of an individual’s signature.
We validated our technique with one neurodegenerative and two neurodevelop-
mental disorders that have been shown to involve deficits in visual attention and
oculomotor functions. These deficits were exploited by our algorithm with features
corresponding to oculomotor control, stimulus-driven (bottom-up), and voluntary,
contextual (top-down) attention. We first tested the algorithm on elderly partici-
pants with the neurodegenerative disorder, Parkinson’s Disease (PD) and validated
the signature of PD discovered by our algorithm, because the behavioral deficits of
PD are well understood. In short, Parkinson’s disease is characterized by degenera-
tion of dopaminergic neurons in the substantia nigra pars compacta, affecting basal
ganglia processes, which subsequently impairs body movement (tremor, bradyki-
nesia) and oculomotor movement (slower and shorter saccades) [25,33,207]. PD
alsoimpairstheprefrontal, premotor, motorandbasalganglianetworks[231], lead-
ing to deficits in attention control; in particular, PD patients are less successful in
inhibiting automatic saccades to a salient stimulus compared to controls [25,33].
34
Therefore,weexpectedPDpatientstoshowdeficientoculomotorcontrol,weakened
top-down control, and stronger bottom-up guidance in natural viewing.
Next, we tested the algorithm on the two neurodevelopmental disorders at the
other end of the age spectrum: Attention Deficit Hyperactivity Disorder (ADHD)
and Fetal Alcohol Spectrum Disorder (FASD). Patients with ADHD or FASD
demonstrate comparable deficits in visual attention tasks [78,79,102,110,145] but
for different reasons. ADHD in childhood is characterized by delayed cortical mat-
uration, dysfunction in dopamine transmission in the frontal cortex and/or basal
ganglia [220], and decreased activity in frontal and striatal regions [3,242]. These
deficits result in difficulties in inhibiting premature responses (weakened top-down
control), and thus patients appear more stimulus-driven (stronger bottom-up guid-
ance) [145]. Oculomotor function seems relatively unimpaired, though previous
studies have shown inconsistent findings [102]. On the other hand, FASD is caused
by excessive maternal alcohol consumption, which results in malformation of the
cerebral cortex, basal ganglia and cerebellum, and reduced overall brain and white-
matter volumes [110,178]. Deficits include impaired oculomotor functions [101],
decreased top-down attentional control [124], and weakened bottom-up attention
possibly due to deficient visual sensory processing [28].
The weakened bottom-up guidance of children with FASD could be a differen-
tial factor between FASD and ADHD, because children with ADHD appear to be
more stimulus-driven. For example, in pro/anti saccade tasks (where a prosaccade
requires participants to initiate an automatic eye movement to a visual stimulus,
and an antisaccade requires participants to make a voluntary eye movement in the
opposite direction) [147], children with ADHD or FASD both made more direc-
tional errors in the antisaccade task (implying difficulty in inhibiting automatic
responses), but only children with FASD made more directional errors and had
35
longer reaction time in the prosaccade task (implying weakened stimulus-driven
guidance) [79,145]. While diagnosis of some subtypes of FASD is often helped by
the presence of dysmorphic facial features [99], the majority of affected children do
notexhibitfacialdysmorphology,andwhenthesefeaturesarenotobvious,thereisa
significantriskofmisdiagnosiswithADHD[46]. Thus, thedifferentialclassification
of ADHD versus FASD provides a difficult challenge for our method.
2.3 Methods
The experimental procedure is summarized in Fig. 2.1a. Participants’ eye traces
were recorded while watching 20 minutes of video. Participants were instructed
to “watch and enjoy the clips”. Five minutes of video were excluded from the
analysis due to different length of the clip snippets, for purposes beyond the scope
of this study (see section 2.7 Stimuli, and Data Acquisition for detail). Each 30-
second video clip was composed of 2-4 second clip snippets of unrelated scenes,
to minimize predictability and to emphasize attention deployment in new envi-
ronments. Saliency maps (Fig. 2.1b; topographic maps that predict the loca-
tions of visually conspicuous stimuli based on low-level image properties; section
2.7 Computing Saliency Maps from Stimuli) were computed for every frame [96],
and correlations between model-predicted salience values and measured human
saccade endpoints (gaze) were computed (Fig. 2.1c-e). Based on previous stud-
ies [3,25,28,33,78,79,101,102,110,124,145,178,207,220,231,242] of how the dis-
orders may affect eye movement, we extracted a large number of features from the
eye movement recordings (categorized into oculomotor-based, saliency-based, and
group-based features; see Methods: Features), and built a classifier to differentiate
patients and controls based on these features. We also analyzed the features for
36
biomarkers through recursive evaluation, selection and classification. Our workflow
was inspired by successful application of advanced machine-learning techniques to
microarray assay analysis [75], here using similar techniques for the first time in
high-throughput analysis of natural eye movement behavior.
2.3.1 Standard protocol approvals and patient consent
All experimental procedures were approved by the Human Research and Ethics
Board at Queen’s University, adhering to the guidelines of the Declaration of
Helsinki, and the Canadian Tri-Council Policy Statement on Ethical Conduct for
Research Involving Humans.
2.3.2 Participants
This study describes data collected from 21 children with ADHD, 13 children with
FASD, 14 elderly PD participants, 18 control children, 18 young controls, and 24
elderly controls (Table 2.1; section 2.7 Participants, Diagnostic Criteria).
Table 2.1: Demographric data of participants in analysis (after removing ineligible
participants). ODD, Oppositional Defiant Disorder; DD, Developmental Delay;
LD,LearningDisability;NS,Noonan’sSyndrome;MMR,MildMentalRetardation.
†Participants were not required to finish the entire 20-minute-long experiment in
order to be included. ‡For the 2 child clinical populations, ’None’ meant the child
had never taken medicine for the disorder. If they took medicine regularly but
not on the day of the experiment, they were listed in the table. If they took
medicine on the day of experiment, they were removed from any analysis. For the
PD population, even they took medicine on the day of experiment they were listed
in the table and were included in analysis.
Category Ctrl.
Elderly
PD Ctrl.
Young
Ctrl.
Child
ADHD FASD
recruited n 25 15 18 24 32 23
Continued on Next Page...
37
Table 2.1 – Continued
Category Ctrl.
Elderly
PD Ctrl.
Young
Ctrl.
Child
ADHD FASD
unfinished
20-minute-long
experiment n†
0 0 0 5 7 8
excluded n
(<10 valid eye
movement
traces)
1 1 0 4 6 8
excluded n
(medication
on the day of
experiment)
0 0 0 0 2 2
excluded n (to
balance age)
0 0 0 2 3 0
n (analysis) 24 14 18 18 21 13
number of
valid eye traces
(time)
638
(19125.6 sec.)
357
(10720.7 sec.)
516
(15461.5 sec.)
436
(13054.1 sec.)
450
(13486.6 sec.)
257
(7725.8 sec.)
number of
valid eye traces
/ participant
26.6 25.5 28.7 24.2 21.4 19.8
saccade
frequency
(number per
second)
2.18 1.9 1.97 1.62 1.66 1.48
number of sac-
cades
41774 20338 30530 21098 22347 11402
age±SD (year) 70.33±7.53 67.43±6.62 23.17±2.60 10.67±1.82 11.19±1.83 12.31±2.10
male:female 11:13 10:4 8:10 10:8 16:5 7:6
subtype/severity Hoehn and
Yahr
stage 2: 6
stage 2.5: 6
stage 3: 2
UPDRS
motor:
25.17±7.22
inattentive: 4
hyperactie: 0
combined: 19
FAS: 4
pFAS: 2
ARND: 7
Continued on Next Page...
38
Table 2.1 – Continued
Category Ctrl.
Elderly
PD Ctrl.
Young
Ctrl.
Child
ADHD FASD
medication ‡ None: 0
Aman-
tadine: 1
Clona-
zepam: 1
Enta-
capone: 1
Ldopa/
carbidopa: 8
Ldopa/
carbidopa-
CR: 1
Prami-
pexole: 2
Ropinirole-
HCl: 9
None: 5
Non-
stimulant: 2
Stimulant-
LA: 7
Stimulant-
SA: 7
None: 3
Anti-
anxiety: 1
Anti-
convulsant: 1
Anti-
depressant: 2
Anti-
psychotic: 7
Anti-
hyper-
tensive: 2
Non-
stimulant: 1
Stimulant-
LA: 6
Stimulant-
SA: 2
comorbidity LD: 3
ODD: 1
MMR: 1
ADHD: 10
Anxiety: 5
Bipolar: 1
Conduct: 1
DD: 2
Depression: 1
ODD: 3
NS: 1
LD: 2
2.3.3 Features
Fromeyetracesrecordedwhileparticipantsviewedshortvideos, weextractedthree
types of features that we hypothesized would be differentially affected by disorders.
First, oculomotor-based features were computed (e.g., distributions of saccade am-
plitudesandfixationdurations)astheymightrevealdeficienciesinmotorcontrolof
attention and gaze. Second, saliency-based features correlated participants’ gaze to
predictions from a computational model of visual salience [96], which has been pre-
viously shown to significantly predict which locations in a scene may more strongly
attract attention of control subjects. We hypothesized that these features would
reveal deficits in reflexive, stimulus-driven, or so-called bottom-up attention. The
39
Color (C) Intensity (I) Orientation (O) Flicker (F) Motion (M)
Junction (J) Texture (Txt) Variance (Var) CIOFM CIOFMJ
(b)
Normalized Salience
Probability
0 0.5 1
0.5
1
Random Hit (%)
Human Hit (%)
(d) (c)
0.1 1 0.5
0.3
0.1
0.5
Random
Human
(e)
(a)
recorded/
computed
data
group-based
similarities to
young controls
oculomotor-based
saccade metrics
saliency-based
correlations between
gaze and salience
feature
selection
classification
feature
types
patients
or
controls?
recorded
eye traces
scene-shuffled
videos
gaze distribution of
young observers (Yo)
saliency maps
see (b)
Figure 2.1: Experimental and classification paradigms. (a) Participants freely
viewedscene-shuffledvideos(SV)andtheireyemovementswererecorded. Saliency
maps of each SV were computed using a computational model that mimics early
visual processing. Next, we used the recorded eye movements to compute (1)
oculomotor-based saccade metrics, (2) saliency-based correlations between saliency
maps and gaze (bottom-up attention), and (3) group-based similarities in spa-
tiotemporal distributions of gaze with reference to a database of control eye traces
(top-down attention). These features were used in a classifier with a recursive fea-
tureselectionmethodtoidentifyimportantfeaturesthatdistinguishedpopulations.
(b) Ten saliency maps of different features (color, intensity, etc.) were computed,
here illustrated for the video frame shown in (a) under “Saliency maps”. Brighter
shades of grey indicate stronger feature contrast at the corresponding image loca-
tions; for example the red and yellow flowers between the two people elicit a strong
response in the color contrast map. (c) To compute saliency-based or group-based
features, each saliency map was sampled around the saccade target location (red
circle) when a participant initiated a saccade (red dot). At the same time, 100
map values were randomly sampled from the map as a baseline (blue circles), for
comparison. (d) Histograms were generated from both the human and random
sample values. (e) Differences between human and random histograms were fur-
ther summarized by ordinal dominance analysis to quantify the extent to which
human observers gazed towards higher salience values than expected by chance, in
terms of the area under the curve (AUC, yellow region) (see section 2.7 Computing
Features for more detail).
40
third type, group-based features, captured deviations in participants’ gaze allo-
cation onto our stimuli, compared to a normative group of young adult controls.
These features, we posited, might reveal impaired volitional, subject-dependent, or
top-down attentional control, especially if differences were observed in group-based
but not saliency-based features. Together, we utilized all these features to classify
participants into clinical groups based on natural viewing behavior, the complexity
of which imposed challenges in data analysis, but also revealed rich and profound
information about the different populations.
Theclassifierswerebuilttodiscriminatepatientsfromcontrolsbasedon15core
features from our three types: four oculomotor-based core features (distributions
of saccade duration, inter-saccade interval, saccadic peak velocity, and saccade
amplitude), ten saliency-based core features (differential distributions of salience
values at human gaze vs. other locations, using the ten saliency maps of Fig.
2.1b), and one group-based core feature (correlation between a patient’s gaze and
aggregate eye traces from a normative group of young adult controls, Fig. 2.1a).
Each core feature was represented by several sub-features to capture the dynamics
of free-viewing: each oculomotor-based core feature was subdivided into 12 sub-
features (3 measures (lower-quartile, medium, upper-quartile) × 4 saccades (the
1st, 2nd, 3rd, and all saccades on each 2-4 second clip snippet) = 12 sub-features);
each saliency-based core feature was subdivided into 16 sub-features (4 measures
(Area under the ROC curve (AUC; see section 2.7 Computing Features), for low-
/medium-/high- salience bins)× 4 saccades), as was each group-based core feature
(4 measures (AUC, low-/medium-/high- similarity bins) × 4 saccades). In total,
thus, 15 core features subdivided into 224 sub-features were used (Table 2.2).
41
Table 2.2: Look up table for feature abbreviation, whose format is [feature] [mea-
sure] [saccade]. †after onset of snippets
category feature measure saccade
Oculomotor-based
pkvel (peak velocity) qL (lower quartile)
sacamp (saccade amplitude) none (median)
sacdur (saccade duration) qU (upper quartile)
sacint (saccade interval)
Saliency-based
C (color contrast)
I (intensity contrast)
O (oriented edges)
F (temporal flicker) hL (low salience bin) sA (all saccades)
M (motion contrast) hM (medium salience bin) s1 (1st saccade†)
J (line junction) hH (high salience bin) s2 (2nd saccade†)
Txt (texture contrast) none (AUC) s3 (3rd saccade†)
Var (intensity variance)
CIOFM (overall salience)
CIOFMJ (overall salience and
line junction)
Group-based
Yo(similaritytoyoung-observer) hL (low similarity bin)
hM (medium similarity bin)
hH (high similarity bin)
none (AUC)
42
2.3.4 Classification and Feature Selection
Feature selection is a popular machine-learning method to identify useful features
and overcome situations where the number of features is possibly larger than the
number of samples when training a classifier [80]. We performed feature selection
with Support Vector Machine - Recursive Feature Elimination (SVM-RFE) [81],
which has been used with great success in other fields (e.g., cancer classification
with microarrays [81]). SVM-RFE consists of training a classifier and discarding
theweakestfeatureiterativelyuntilallfeaturesareeliminated. WeusedSVM-RFE
to differentiate PD from elderly controls (binary classification), and Multiple SVM-
RFE (MSVM-RFE) [243] to distinguish children in the ADHD, FASD, and control
groups (3-way classification). All classification accuracies reported were obtained
using these two feature selection methods.
Performance of each classifier that used a particular selected subset of features
was computed using thirty iterations of a repeated leave-one-out bootstrap valida-
tion [97]. This validation method was very similar to the standard leave-one-out
validation, which leaves one participant out for testing, but here the classifier was
trainedontheremainingparticipantsthatwerebootstrapped(samplewithreplace-
ment) ten times the number of these remaining participants. The performance was
tested against permuted chance, which was the classification accuracy of a classifier
trained on the same bootstrap structure but with randomly permuted class labels
(class labels were randomly rearranged). Because classification accuracy varied
with the number of features in the process of RFE, we tested the performance of
classifiers by comparing the maximum accuracy obtained by the classifier trained
with true labels, to that obtained by the classifier trained with randomly permuted
labels (permuted chance, the chance referred in this paper unless stated otherwise),
43
regardless of how many features each classifier used to obtain maximum accuracy
(one-tail paired t-test; section 2.7 Classification and Feature Selection). All tests
were Bonferroni corrected.
2.4 Results
2.4.1 Classifying PD and Controls
Classification accuracy for 14 patients vs. 24 age-matched controls reached 89.6%
(chance: 63.2%, obtained by performing the same classification procedure with
permuted class labels; Fig. 2.2a), with only 5 of 224 sub-features selected as most
discriminative by the process of feature elimination (SVM-RFE). The confusion
and sensitivity/specificity matrices reveal that the classifier made slightly more
false negatives than false positives as we aimed to maximize overall classification
performance. In scenarios where the classifier may be used for screening purposes,
sensitivity of the classifier can be increased by assigning higher costs to missed PD
patients and lower costs to false positives during training.
Our method not only differentiated PD from elderly controls (one-tail paired
t-test, t(29)=23.07, p<0.01), but also provided information about how PD af-
fects eye movements, obtained by separately studying classification accuracy for
oculomotor-based, saliency-based, or group-based features (Fig. 2.2b). PD pa-
tients demonstrated motor deficits as revealed by classification differences between
themandcontrolsinoculomotorfeatures(consideringonlythe48oculomotor-based
sub-features, accuracy was 86.4%, t(29)=28.02, p<0.01). Oculomotor deficits have
been attributed to dysfunction in the basal ganglia [22,70,226], crucial for volun-
tary saccade control [147]. Patients’ top-down attention also differed from elderly
44
(a) (b)
Classification Accuracy − Chance
(c)
−0.1
0
0.1
0.2
peak velocity
saccade amplitude
saccade duration
inter−saccade interval
color contrast
intensity contrast
oriented edge
temporal flicker
motion contrast
line junction
overall salience
overall salience with junction
intensity variance
texture contrast
similarity to young−observer
n.s.
p < 0.01
Classification Accuracy
All
Oculomotor
Saliency
Group
PD Elderly
PD
Elderly
0.85 0.15
0.07 0.93
0.85 0.93
0.93 0.85
0
0.5
1
Ratio
Specificity Sensitivity
Overall accuracy: 89.6%
PD
Elderly
0
0.2
0.4
0.6
0.8
1
**
**
**
Figure 2.2: Classification performance in differentiating PD patients from elderly
controls at three granularities of starting feature sets: (a) all features, (b) the 3 fea-
ture types, and (c) the 15 core features (biometric signatures). (a) Starting with all
224 sub-features, PD patients were distinguished from elderly controls with 89.6%
accuracy afterfeatureselection (SVM-RFE).Each rowin theconfusion matrix rep-
resents actual classes, and each column predicted classes. (b) PD and elderly con-
trols differed significantly in oculomotor (starting with 48 sub-features) and group-
based behavior (16 sub-features), but not in saliency processing (160 sub-features).
Asterisks between bars indicate cases where the classifiers performed significantly
better than permuted chance (computed from training a classifier with randomly
permuted class labels). Dashed line represents prior chance based on the number
of controls and patients. (c) PD patients exhibited differences in saccade ampli-
tude, duration, peak velocity, inter-saccade interval, intensity variance processing,
texture saliency processing, and similarity to normative young observers. This pat-
tern of differences yields the 15-component biometric signature of PD. Dashed line
is the prior chance. Background colors separate oculomotor-based, saliency-based,
and group-based features from left to right. (Error bars indicate 95% confidence
intervals after Bonferroni corrections. Significance level: p<0.01, one-tail paired
t-test (df=29).)
45
controls (16 group-based sub-features, 74.6%, t(29)=11.58, p<0.01), in agreement
with previously reported impairment in voluntary attention, involving cortical and
sub-cortical attention networks [22,70,113,122,163]. However, counter to our
expectation that lower top-down control may give rise to higher reliance upon
stimulus-driven salience, bottom-up attention of PD patients seemed unaffected,
as saliency-based features showed no overall differences (160 saliency-based sub-
features, 63.16%, t(29)=-4.10, n.s). It is possible that any higher reliance upon
visually salient stimuli to guide gaze may have been offset by impaired salience
computation due to deficient early visual processing in PD patients, as reported in
previous laboratory studies [15] (see section 2.6 Neurological Implications: Parkin-
son’s disease (PD) for more details relating the findings from previous studies to
the results from classification).
At a finer granularity, our method also permitted investigating whether each
of our 15 core features was affected by PD. We tested 15 separate classifiers, each
using only the 12 or 16 sub-features of a given core feature (with SVM-RFE).
This yielded a 15-component biometric signature of PD (Fig. 2.2c). During nat-
ural viewing, PD patients demonstrated motor deficits as their saccades were of
shorter amplitude and duration (classification accuracy: t(29)>9.62, p<0.01; di-
rection of the effect: two-sample t-test, t(36)>2.73, p<0.01; peak velocity and
inter-saccade interval were also affected (t(29)>6.31, p<0.01), but without a uni-
fied upward or downward direction of effect among the 12 sub-features; section
2.7 Direction of Effect). These observations are consistent with earlier structured-
task studies which showed shorter and slower voluntary saccades of PD patients
toward pre-determined visual locations [22,33,207,229], with less impairment for
visually-guided saccades [22,229]. The classifier also found that PD and elderly
controls differed in intensity variance (t(29)=4.96, p<0.01) and texture contrast
46
(t(29)=8.36, p<0.01), though with mixed upward and downward effects among the
involved sub-features, suggesting complex interactions between deficits that affect
behavior in opposite directions: e.g., weakened top-down control (stronger bottom-
up) and impaired saliency computation (weaker bottom-up). Deficits in voluntary
control and top-down attention were also revealed by different similarities to our
normative young observers between PD patients and elderly controls (t(29)=7.06,
p<0.01).
2.4.2 Classifying ADHD, FASD and Control Children
Classification accuracy with MSVM-RFE for 21 children with ADHD vs. 13 chil-
dren with FASD vs. 18 control children reached 77.3% (chance 40.4%) with 19 of
all 224 sub-features (Fig. 2.3a). With these 19 features, the average 2-way classifi-
cationaccuracyforADHDvs. controlwas83.3%(chance53.8%); FASDvs. control
was79.2%(chance58.1%); andADHDvs. FASDwas90.4%(chance61.8%). Rates
of miss and false alarm errors were balanced, except for a slightly higher miss rate
for FASD, as the classifier aimed to maximize overall accuracy.
Our method further examined which of the three feature types contained differ-
ential information among the 3 groups of children (Fig. 2.3b). Classification accu-
racies were significantly above chance with the saliency-based (50.8%, t(29)=4.04,
p<0.05), but not with the oculomotor-based features (40.5%, t(29)=-5.28, n.s) and
the group-based features (45.7%, t(29)=1.03, n.s.). When comparing each pair of
the 3 child groups, first, children with ADHD and controls were distinguished sig-
nificantly in saliency-based features (78.2%, t(29)=12.68, p<0.01); second, children
withFASDandcontrolsdifferedinbothsaliency-basedfeatures(77.6%,t(29)=9.95,
p<0.01)andgroup-basedfeatures(69.8%,t(29)=6.01,p<0.01);lastly,childrenwith
47
−0.1
0
0.1
0.2
(b)
(a)
Ratio
Child ADHD FASD
Child
ADHD
FASD
0.77 0.12 0.11
0.12 0.80 0.08
0.19 0.08 0.73
0.77 0.85
0.80 0.90
0.73 0.91
0
0.2
0.4
0.6
0.8
1
Specificity Sensitivity
Overall accuracy: 77.3%
peak velocity
saccade amplitude
saccade duration
inter−saccade interval
color contrast
intensity contrast
oriented edge
temporal flicker
motion contrast
line junction
overall salience
overall salience with junction
intensity variance
texture contrast
similarity to young−observer
Classification Accuracy − Chance Classification Accuracy − Chance
(c)
n.s.
p < 0.01
Child vs ADHD
Child vs FASD
Child vs ADHD vs FASD Child vs ADHD
All
Oculomotor
Saliency
Group
Classification Accuracy
Child vs FASD ADHD vs FASD
All
Oculomotor
Saliency
Group
0
0.25
0.5
0.75
1
**
*
0
0.25
0.5
0.75
1
**
**
0
0.25
0.5
0.75
1
**
**
**
0
0.25
0.5
0.75
1
**
−0.1
0
0.1
0.2
0.3
Figure 2.3: ClassificationperformanceforchildrenwithADHD,FASD,andcontrol
children, for: (a) all features, (b) the 3 feature types, and (c) the 15 core features
(biometric signatures). (a) Starting with all sub-features, children with ADHD,
FASD, and control children were best classified with 77.3% accuracy (ADHD: sen-
sitivity 80%, specificity 90%; FASD: sensitivity 73%, specificity 91%) after feature
selection (MSVM-RFE).Formatisasin Fig. 2.2. (b)Classifying the3child groups
with different feature sets demonstrated that they differed significantly in saliency-
based behavior (upper-left sub-plot). Children with ADHD differed from control
children in saliency-based features, whereas children with FASD differed from con-
trolsinbothsaliency-basedandgroup-basedfeatures,andchildrenwithADHDand
those with FASD could only be distinguished with all three feature types together.
(c) The 15-component biometric signature of ADHD and FASD. Children with
ADHD, compared to control children, demonstrated significantly different sensitiv-
ity in color contrast and oriented edges, as well as increased sensitivity to texture
contrast. Children with FASD, in contrast, showed a different signature that in-
volved differences in similarity to young observers in gaze distribution, sensitivity
to line junctions, and sensitivity to overall salience, as well as increased sensitivity
to texture contrast. Background colors separate oculomotor-based, saliency-based,
and group-based features from left to right. (see Fig. 2.2 for the computation of
chance level, error bars, statistical tests, and significance level.)
48
ADHD and FASD showed no differentiability by each feature type alone, but they
could be distinguished with all feature types together (t(29)<22.96, p<0.01). Al-
though we focus on classification performance, these results are in line with earlier
studies that showed how children with ADHD have difficulties in inhibiting pre-
mature responses and thus appear more stimulus-driven [145], as well as studies
that demonstrated how children with FASD have atypical top-down [78,79,124]
and bottom-up [28] attentional control (see section 2.6 Neurological Implications:
ADHD, FASD, and ADHD versus FASD for more details pertaining to previous
studiesandthepresentresults). However, whenweexaminedwhetherthesaliency-
based and group-based sub-features showed larger feature values in one population
than in the other, we found mixed directions of effect among the sub-features of
both feature types, indicating that the disorder impacts natural viewing behavior
in more than one single unified manner (e.g., impaired response inhibition [79,145],
but also possibly weakened early visual processing [24,100,221]. The quantitative
predictions of our classifier for every sub-feature provide for the first time a rich
basis to further investigate these complex effects from a neurological viewpoint.
At the level of the 15 core features, our method yielded clearly distinct bio-
metric signatures for ADHD versus FASD (Fig. 2.3c), thus successfully teasing
apart the two disorders along 15 important dimensions. For children with ADHD,
the best feature differentiating them from control children was texture processing
(t(29)=15.67, p<0.01; children with ADHD showed higher correlation to texture
contrast(two-samplet-test, t(37)=2.75, p<0.01), Fig. 2.3c), inlinewithpreviously
reported tactile texture sensitivity [61,129,159]. Thus, the current results suggest
this may not be limited to the tactile domain. Propensity to look toward color con-
trast[100,221] and(t(29)=5.63, p<0.01), orientededges(t(29)=6.72, p<0.01)were
also discriminative between children with ADHD and controls. Oriented edges are
49
important to perceptually construct the contour and shape of objects. For children
with FASD, line junctions, overall salience, and texture contrast were discrimina-
tive (t(29)>4.92, p<0.01). To our knowledge, no previous study has investigated
how ADHD might affect processing of oriented edges, nor how different domains of
salient features may be affected by FASD. The discovery of these features by our
classifier thus suggests interesting new research directions.
2.4.3 Classification accuracies throughout the process of
feature elimination
PD vs. Controls. Classification accuracy reached 89.6% with only 5 features in
the process of feature selection (SVM-RFE). The classification accuracy along the
process is plotted in Fig. 2.4a. While using all 224 features, the classifier performed
significantly but slightly better than chance.This was probably due to over-fitting
because there were more features than participants; therefore, the classifier per-
formedwellintraining,butpoorlyintesting.Nevertheless,astheprocessoffeature
elimination went on, the classification accuracy started to increase, achieving peak
performance (89.6% accuracy) with 5 features (out of 224). Subsequently, when
more features were eliminated, the accuracy decreased due to too few features.
ADHD, FASD vs. Control Children. The classification accuracy through-
out MSVM-RFE is shown in Fig. 2.4b. The overall classifier (ADHD vs FASD vs
controlchildren)reachedthehighestaccuracy(77.3%)with19features.Withthese
19features,theaverage2-wayclassificationaccuracyforADHDvs. controlchildren
was 83.3% (chance 53.8%); FASD vs. control children was 79.2% (chance 58.1%);
andADHDvs. FASDwas90.4% (chance61.8%). Whenusingallthefeatures, chil-
dren with ADHD or FASD were harder to differentiate in comparison to ADHD vs
50
control and FASD vs control. Nevertheless, as the feature selection processes went
on and critical features were identified, all three classifiers’ performance improved,
especially the classifier for children with ADHD vs FASD.
2.4.4 Sub-features selected by the SVM-RFE process
Finally, we investigated which collections of sub-features best differentiated the
populations based on the result of feature selection (SVM-RFE). The top 5 sub-
featuresthatclassifiedPDfromelderlycontrolsandtheirnormalizedfeaturevalues
are shown in Fig. 2.5a. The feature selection method found a collection of 5
oculomotor sub-features that reliably differentiate PD from elderly controls. On
the other hand, the top 19 features for differentiating each pair of the 3 child
populations (ADHD, FASD and control) spanned all 3 broad feature types (Fig.
4b). While some core features of the selected sub-features failed to differentiate
the populations when considered in isolation, they are important complementary
features for the classifiers to separate the groups. Obviously the pattern of feature
valuesobservedhereiscomplex,indicatingthatsophisticatedclassifierswereindeed
necessary to discover the subsets of features that yielded the best classification
accuracy. To visualize how well our approach was able to cluster individuals into
separate groups, we further reduced the dimensionality of our results using Linear
DiscriminantAnalysis, whichfindsaxesthatbestseparateeachpairofgroups(Fig.
2.5c,d). This analysis allowed us to validate our method by demonstrating clearly
distinct clusters based on the features selected by our classifiers. We suggest that
similar clustering techniques could be employed in future studies of other disorders,
andtopossiblydiscoverdifferentsub-populationclusterswithinpatientgroupsthat
were previously considered homogeneous per standard medical assessment.
51
Classification Acuuracy
PD vs Elderly
Chance
ADHD vs FASD vs Child
(ADHD vs Child)
(FASD vs Child)
(ADHD vs FASD)
Chance
(a) (b)
10
0
10
1
10
2
0
0.2
0.4
0.6
0.8
1
10
0
10
1
10
2
0
0.2
0.4
0.6
0.8
1
Number of Features
Number of Features
Figure 2.4: (a) Classification accuracy for differentiating PD and elderly controls,
plotted as a function of the number of selected features during SVM-RFE. Max-
imum classification accuracy (89.6%) was obtained with 5 features (black arrow).
Shaded region indicates mean 1 standard deviation over the repeated leave one out
bootstrap validation. Chance performance (classification accuracy with permuted
class labels, 52.0%) is indicated by the dashed curve. (b) Classification accuracy
for differentiating ADHD, FASD, and control children is plotted as a function of
the number of selected features during MSVM-RFE. The red line indicates the
overall classification accuracy for differentiating 3 populations, and the classifier
reaches peak performance (77.3%) with 19 features (black arrow). The blue, green,
and cyan lines are classification accuracies for classifying each pair of populations.
The black dashed line is the chance level for the overall classifier. Shaded regions
indicate mean ± 1 standard deviation. Chance performance ( 32.4%) is indicated
by the dashed curve.
52
interval
velocity
velocity
duration
duration
O
Txt
Yo
duration
M
velocity
Txt
Txt
Txt
Txt
CIOFM
M
CIOFMJ
J
interval
Txt
J
Txt
C
−1 0 1
(a)
(d) (c)
(b)
PD Elderly
Child ADHD FASD
1... ...14,1... ...24
...18, ...21, ...13 1... 1... 1...
subject index
subject index
normalized feature value
L1
L2
L1 L2
L3
PD
Elderly
Child
ADHD
FASD
[sacint_qU_sA]**
[pkvel_qL_s2]
[pkvel_s2]*
[sacdur_qL_s3]**
[sacdur_qU_s1]**
[O_hM_s2]*
[Txt_hM_s2]*
[Yo_hM_s3]**
[sacdur_sA]
[M_hH_s2]
[pkvel_qL_s2]
[Txt_hH_s1]*
[Txt_hL_s2]**
[Txt_hL_s1]
[Txt_sA]
[CIOFM_hM_s1]
[M_hM_sA]
[CIOFMJ_hM_s2]
[J_hM_s1]
[sacint_qL_sA]
[Txt_hH_s2]
[J_hM_sA]
[Txt_hM_s1]**
[C_hM_s3]
Figure 2.5: Sub-features selected by SVM-RFE. (a) Normalized values for the
top 5 ranked sub-features selected by SVM-RFE for PD. Sub-features’ names are
displayedingrey(seeTable2.2fortheinterpretationofthenames),andtheircorre-
sponding core features names are in black. Feature values are standardized z-scores
filtered by an arctangent function. Rows represent the top 5 ranked sub-features.
Columns represent 38 participants, and the white vertical line separates the 2 pop-
ulations. (b) Normalized feature values for the top 19 ranked sub-features selected
by MSVM-RFE that best classified children with ADHD, FASD, and control chil-
dren. Note that most of the sub-features discovered by the classifiers belonged
to the saliency-based feature type. Features and participants were re-arranged so
that high feature values were better clustered at the diagonal of the plot. (c) 14
PD and 24 elderly controls were separated into 2 different clusters as revealed by
Linear Discriminant Analysis (LDA), which finds the dimensions (L1 and L2) from
the top 5 sub-features in (a) that best distinguished the two groups. (d) Similarly,
LDA found the 3 dimensions (L1, L2, and L3) from the top 19 sub-features in (b)
that best differentiated every pair among 21 children with ADHD, 13 children with
FASD, and 18 control children. The 3 child groups are clearly separated in these
dimensions, even though clusters in (b) are less visually distinct. (**, ANOVA
p<0.01; *, ANOVA p<0.05)
53
2.5 Discussion
Thisstudyrevealeddifferentbiometricprofilesofoculomotorfunctionandattention
allocationamongPD,ADHDandFASDpatientgroupsthroughquantitativeanaly-
sisofnatural-viewingeyetraces. OurautomatedSVM-RFEprocessdiscoveredthat
PD patients were best discriminated from elderly controls by oculomotor-based fea-
tures, implyingthatmotordeficitsaremoreapparentthanattentiondeficitsforPD
patients during free viewing. In contrast, children with ADHD or FASD were best
distinguished from controls by saliency-based features, suggesting that the disor-
ders affect their bottom-up attention. The disorders also influence overall attention
allocation in every patient group, as group-based features showed differentiability
for clinical and control populations (see section 2.6 for our interpretations of the
particular features identified by our method, and the corresponding neurological
implications in each disorder). By identifying features that are most discrimina-
tive among populations, our technique provides new insights into the nature of the
different disorders and their interactions with attention control. The encouraging
results obtained here with diseases that lie on both ends of the age spectrum sug-
gest that the proposed approach may generalize to additional disorders that affect
attention and oculomotor systems. The fact that our paradigm alleviates the need
for structured tasks is of great importance because the approach can be applied to
awiderrangeofpopulations, includingveryyoungchildrenwhocannotunderstand
the instructions of experiments, or individuals who have cognitive impairment.
Our method robustly differentiates disorders that may have overlapping behav-
ioral phenotypes (ADHD and FASD), but that nonetheless affect visual processing
differently. Overall, we suggest that with natural scene videos, participants’ natu-
ral viewing behaviors are evoked, and their eye movement patterns contain unique
54
and revealing information about their cognitive and motor processes. One of the
strength of this study is that it is a general framework that could identify such in-
formation of several patient populations. In the future, with better understanding
of differences in cognitive control, attention, and oculomotor systems of patients
withthesedisorders, theexperimentcouldbefurthershortenedbyselectingstimuli
that maximally evoke different eye movement patterns between populations. This
would also provide for a better understanding of novel behavioral differences that
were revealed by this study, such as the discovery of edge processing differences in
children with ADHD. In summary, our method provides for the first time an ob-
jective, automated, high-throughput, time- and cost-effective tool that can screen
large populations and that, through clustering, may further discover new disease
subtypes and assist more precise medical diagnosis. Future benefits of our method
may include earlier and more accurate identification of neurological disorders and
subtypes.
Study Limitations. Because ADHD and FASD (and to a lesser extent PD)
have very broad clinical spectra (e.g., subtypes, disease severity), our limited num-
ber of participants might not have covered the entire extent of the neurobehavioral
profile. Thus, the usefulness of this approach for screening of early stage and yet
undiagnosed patients remains to be proven. This likely will require a large-scale,
multi-year screening study with many more participants. Our goal is to eventu-
ally run such study. Nevertheless, we believe that we have achieved an exciting
proof of concept for our methodology here. Second, different medications taken by
individuals further complicated the variability in our samples. We attempted to
minimize this confound in the child populations (children in the ADHD and FASD
groups did not take stimulant medication on the day of experiment); however,
whether the effects of medication taken in earlier days were completely ‘washed
55
out’ is unclear. PD patients were not required to withhold their medications, as
we, and others, have previously shown that medicated PD patients continue to dis-
play deficits in visual processing and saccade generation [2,25,33]. Nevertheless, a
stricter off-medication condition should be considered in the future. Third, while
our short video set was sufficient to reliably classify patient and control groups, the
limited content may not identify all the features that are discriminative between
the groups. It is possible that other features are discriminative with a different
stimulus set or with a given viewing instruction. Finally, the comorbidity of the
disorders made classification accuracy a challenge. Children with ADHD or FASD
were often co-morbid with other disorders (see Table 2.1). A majority of FASD
children included in our analysis had a co-morbid diagnosis of ADHD, although
they were still shown to be different from children with ADHD only. This is po-
tentially very important, since one of the best practical applications of a successful
classifier to different patient groups is one that can perform above chance when the
data from two populations display significant overlapping symptoms. Therefore, it
is apparent that the classification method utilized in this study demonstrates the
robustness of its application in differentiating disorders that may have overlapping
behavioral phenotypes, but that nonetheless affect visual processing differently.
2.6 Neurological Implications
2.6.1 Hypotheses and Choice of Feature Categories
We posited that the commonly observed deficits in attention allocation and ocu-
lomotor function are based on pathological neural substrates that also govern how
a person naturally directs attention in an everyday environment. We predicted
56
PD patients would show deficient oculomotor control (oculomotor-based features),
weakened top-down control (group-based features), and stronger bottom-up guid-
ance (saliency-based features) in natural viewing. Because PD patients have been
showntoexhibitshorterandslowersaccadestopre-definedtargets[33],weexpected
to observe similar oculomotor deficits over our video stimuli even when there is no
pre-defined target, and hence our classifiers would reveal significant differences in
oculomotor-based features. PD also affects the frontal lobe and other parts of the
attention network, as mentioned above, so that they appear more stimulus-driven
due to weakened top-down control; hence we expected to see increased guidance of
gaze towards salient stimuli (saliency-based features) as well as lowered similarities
ofgazedistributionsbetweenPDandcontrols(group-basedfeatures). Withrespect
to ADHD, as children with ADHD are primarily deficient in frontal cortical pro-
cessing, we expected that their weakened top-down control would give rise to more
saliency-driven saccades (saliency-based features), and thus they would attend to
different locations from controls (group-based features). Although ADHD affects
the basal ganglia as well, we did not predict that oculomotor-based features would
be different because previous literature shows inconsistent results in motor deficits.
Finally, as FASD influences the brain globally, we anticipated impairments in both
oculomotor function and attention control. First, we predicted slower saccades and
longer inter-saccade intervals (oculomotor-based features). Second, due to their
frontal impairments possibly affecting top-down control, we expected lower spa-
tiotemporal correlation with young adult controls (group-based features). We also
predicted differences, though initially not in any particular direction, in bottom-
up attention (saliency-based features): while weakened top-down attention control
could give rise to higher reliance on bottom-up stimuli, deficits in visual sensory
processing could decrease bottom-up attention process.
57
We constructed classifiers with each feature type to test our hypotheses. Fur-
thermore, to derive more precise conclusions than the previous studies which had
motivated our fairly coarse initial set of hypotheses, we quantified differences in
individual component features for each feature type. For example, saccade am-
plitude and peak velocity features were considered within the oculomotor-based
feature type, and color contrast, oriented edges, and motion features were con-
sidered within the saliency-based feature type. In summary, the rich information
exhibited in natural scenes and the corresponding eye traces enabled us to quantify
the differences in several aspects of oculomotor functions and attention allocation
in one simple paradigm, and we utilized these differences between patients and
controls to build a classifier that can reliably identify individual participants in
different clinical groups.
2.6.2 Parkinson’s Disease (PD)
As predicted, PD patients demonstrated deficits in saccade dynamics (oculomotor-
based features: e.g., shorter duration and smaller amplitude), which suggests dis-
ruptions in cortical-subcortical pathways as well as brainstem nuclei (as described
inthefollowingparagraphs). Attentionallocation(group-basedfeatures)ofPDpa-
tients was different from elderly controls (mixed directions), implying impairment
in attention networks involving the frontal lobe, parietal cortex and basal ganglia.
However, counter to our expectation that lower top-down control may give rise to
higher reliance on salience, bottom-up attention of PD patients seemed to be un-
affected as saliency-based features showed no differences (except texture contrast)
between the two populations.
58
During natural viewing, PD patients demonstrated motor deficits as their sac-
cades were of shorter amplitude and duration. Previous studies have shown that
voluntary saccades of PD patients are smaller in amplitude and slower toward
pre-determined targets [22,33,45,104,106,148,191,223,229], while the impair-
ment is less pronounced when their saccades are visually guided [22,168,224,229].
This motor deficit has been primarily attributed to dysfunction in the basal gan-
glia [22,70,113,163], which is heavily involved in voluntary saccade control [147].
Smaller saccade amplitude data of the PD patients was not caused by ‘square wave
jerks’ [6,169,190] because only a few saccades were categorized as square wave
jerks during free viewing. In addition to motor deficits, other factors could con-
tribute to the smaller saccades in PD patients. PD patients have been shown to
have reduced ‘useful fields of view’ (i.e., attentional processing range) [219] so that
they may not have processed peripheral information as well as control subjects,
and therefore their range of the next saccade location is limited. We also observed
slightly longer inter-saccade intervals (two-sample t-test, t(36)=1.72, p=0.09) in
PD patients, which could be due to slower visual information processing [98,127]
and/or slower voluntary saccade initiation [2,33], which is shown to correlate with
reduced frontal cortical activation in PD [177].
Our data suggests that top-down attentional control, but not bottom-up re-
sponding, is selectively impaired in PD patients, because the classifiers showed
differences (mixed directions) in group-based features, but not in saliency-based
features. This finding is consistent with earlier studies that PD patients are im-
paired in executive functions including top-down attention, and these deficits have
been attributed to pathological changes in the striatum, and later, in frontal cor-
tex[122]. Whenthesedeficitsintop-downattentionareconsideredinthecontextof
task performance, PD and control participants differed behaviorally in tasks that
59
measure response times to visual targets under manipulation of attention direc-
tion; for instance, PD patients can show enhanced attentional ‘capture’ effects to
a visual stimulus, resulting in faster processing of subsequent targets that appear
at the same location [65,164]. This behavior points to intact bottom-up process-
ing. At the same time, PD patients can show impaired maintenance of attention
to specific locations [237], and stronger ‘inhibition of return’ (an initial cue slows
responding to targets that appear later at the same location) than controls for
advanced PD patients [21]. When one considers voluntary saccade deficits in PD,
combinedwithincreasedcaptureofattentiontotransientcues,butdecreasedmain-
tenance of attention to those locations, a common trend of “hyper-reflexivity” in
PD in visual tasks emerges, predicting that we should have observed differences
in saliency-based features. However, hyper-reflexivity (measured in many cases by
prosaccade reaction times) was not consistently reported across studies in a recent
meta-analysis [32]. This study revealed that the disparities may relate, in part, to
differences in target eccentricity: PD patients being faster than controls for small
saccades (< 7 degrees) but slower for larger ones, which might be caused by inter-
actions between retinal center-surround inhibition and inhibition on the superior
colliculus by the basal ganglia (see [32] for details). One might expect that hyper-
reflexivity to abrupt salient scene onset might alter the overall saccade dynamics in
PD in the current experiment (independent of specific saliency-based features), but
PD patients were not any faster at initiating a saccade following scene change even
though the average amplitude was less than 7 degrees. Therefore, it is difficult to
take a hyper-reflexivity interpretation derived from cue-target tasks and apply it to
the current dynamic and free viewing environment, especially when one considers
that basic prosaccades are not always produced at faster latencies.
60
Nevertheless, ourresultsareconsistentwithstudiesthatshowPDpatientshave
little impairment in reflexive prosaccades, and it may be interpreted as unimpaired
bottom-up processing. Reflexive prosaccades can be generated when incoming sen-
sory information directly inputs to saccade motor cells in the superior colliculus
(thus bypassing frontal cortex and basal ganglia circuits) [144]. Clinically, it is also
interesting that ‘freezing symptoms’ (i.e, hypokinetic movements) can be amelio-
rated by providing visual cues, suggesting that bottom-up processing may not only
be less impaired, but can be useful to guide voluntary behavior [156]. The neuronal
implications of these findings together point to a pathology in the basal ganglia
that affects circuits important to voluntary movement and voluntary attentional
orienting: those that include the premotor, prefrontal and motor cortices and basal
ganglia circuits. In contrast, bottom-up signals may be utilized by neural networks
that are less dependent on these brain regions.
In summary, contrary to our initial prediction, bottom-up attention of PD pa-
tients seemed to be unaffected because saliency-based features showed no differ-
ence (except intensity variance and texture contrast) compared to elderly controls.
While we expected PD patients to show higher correlation between gaze and vi-
sual salience (as described above), PD also impairs visual salience computation by
damaging retinal, LGN and V1 processing [15,30] and reducing pattern contrast
and flicker sensitivity [17,172]. Therefore, the effect of high correlation to visual
salience may have been offset by low contrast sensitivity. We must also acknowl-
edge the potential effect of medication on our results. First, Bodis-Wollner and col-
leagues[17]showedthatPDpatientsdisplayedbettercontrastandflickersensitivity
when they were ‘on’ dopamine medication compared to ‘off’ medication, suggesting
that PD patients will regain the sensitivity to visual salience to a certain degree
on-meds. Second, L-DOPA medication (taken by most PD patients in our study)
61
decreases error rates in the antisaccade task and increases reaction time in the
prosaccade task [90], implying better top-down control. Moreover, L-DOPA med-
ication might also improve flexibility in attention shifting, but have had less of an
effect on maintaining attention [41], which perhaps impacts classifier performance
when new scenes demanded shifting attention. The exact effects of medication on
attentional selection during free viewing are unclear without performing an on/off
medication experiment. Nevertheless, our results show that even for patients who
were on medication, oculomotor-based and group-based features can differentiate
PD from control behavior here quantified for the first time in a natural viewing
setting and thus potential biomarkers may be revealed in free-viewing conditions.
2.6.3 ADHD
As predicted, no motor deficit was observed in children with ADHD (oculomotor-
based features). Children with ADHD showed slightly lower similarity to young
adults in gaze distribution than control children (group-based features), but it did
not approach significance (two-sample t-test, t(37)=1.04, p=0.31).
Children with ADHD also showed differences in (mixed directions) correlation
to salience as their bottom-up attention was affected. The best feature in dif-
ferentiating children with ADHD from controls was texture processing. This is
interesting because children with ADHD have deficits in sensory processing and
modulation [57,61,129,159,160], which influences their early development and task
performance in daily life [56]. For example, they are responsive to certain food and
tactile textures but they will try to avoid them (sensory avoidance) [57]. However,
they can be very unresponsive to some sensory stimuli as well, so that they keep
seekingmoresensoryexperience(sensoryseeking)[56]. Ourresultssuggestthatthis
62
sensitivity to textures might not be limited to the tactile domain. These children
appear to be sensitive to texture salience (contrast in texture) because they made
saccades more often to locations of high or low texture salience values, but not to
medium values. In spite of this overall sensitivity, the actual behavior could result
from both sensory seeking and sensory avoidance. For example, high texture stim-
uli might initially capture their attention, but later cognitive processes may initiate
either salience-seeking or salience-avoiding behavior, similar to what is seen in the
tactile domain. The converse explanation (e.g., initial seeking or avoidance behav-
ior, followed by attentional capture) could be applied as well, when one considers
locations of low texture salience, however this is purely speculative. Nevertheless,
we have shown that children with ADHD appear to be sensitive to texture salience,
because they made saccades to locations of medium salience significantly less often
than control children.
Some other saliency-based features were found discriminative between children
with ADHD and control children with mixed directions of the effects. We believe
that it was the profile of these features, rather than the direction, that differen-
tiated the groups, indicating the disorder impacts natural viewing behavior in a
more complicated manner than expected. For example, ADHD impairs response
inhibition [145] so that children with ADHD may be more stimulus-driven; how-
ever, ADHD also delays development [102] and weakens early visual processing
(e.g., color contrast sensitivity [100,221]), and thus discounts their stimulus-driven
behavior during natural viewing. Nevertheless, the classifier still revealed several
saliency-based features that were discriminative between children with ADHD and
controls, such as color contrast and oriented edges. Oriented edges are important
to the perceptual construction of the contour and shape of objects. Edge detection
is achieved in the retina [227] and the visual cortex, and is related to visual acuity.
63
However, children with ADHD can show reduced visual acuity. This deficit can be
improved by psycho-stimulants that increase dopamine-levels [131], and it has been
shown with fMRI that dopaminergic network activity is different in the brains of
ADHD and control children [220] indicating that it is possible that dopaminergic
changes in ADHD may impact edge detection, just as retinal dopamine depletion
has been suggested to impact contrast sensitivity in PD [16]. However, these links
drawn between these two disorders, dopamine, and retinal processing are specula-
tive. We also point out that the classifier showed that edge detection was discrim-
inative between control and ADHD children: the direction of effect did not reveal
whether ADHD children were more or less sensitive to edges, so it is difficult to
speculate on the underlying mechanisms. To our knowledge, there are no previous
studies investigating how ADHD might affect the processing of oriented edges, and
thus the discovery and selection of the oriented edge feature by the classifier points
to a novel finding for future investigation.
2.6.4 FASD
As expected, because children with FASD suffer from a global impact of alcohol
on brain development, they showed significant differences (p<0.01) in correlation
to salience (mixed directions, saliency-based features) as well as similarity in gaze
distribution to young controls (slightly lower similarity (p = 0.15), group-based
features). Deficits in visual processing are consistently observed in subjects with
FASD[24,196]. Thesedeficitscouldreflectproblemswithattentionand/orprocess-
ing of sensory information (they do not appear to be due to problems with motor
control), whichmayresultindifferentcorrelationtosalienceobservedinthisstudy.
Furthermore, deficits in top-down control can further interrupt bottom-up process
64
as showed in prosaccade tasks that children with FASD make more direction er-
rors than controls in automatic reflexive saccade [78,79]. Among saliency-based
features, line junction, overall salience, and texture contrast were found discrimi-
native between children with FASD and control children, and children with FASD
also showed higher correlation to texture contrast (p<0.01). We are not aware of
any research that has attempted to break down the deficits in visual processing in
children with FASD based on different domains of salient features in a visual dis-
play, but structural injury to the brain in children with FASD may lead to some of
the observed deficits. Children with FASD also exhibited lower similarity to young
adults’ gaze distribution, as predicted, which is consistent with previous literature
showing deficits in the frontal lobe for children with FASD [178].
Microcephaly is common in individuals with full FAS [99], and structural ab-
normalities have been observed in FASD that correlate to behavioral deficits (see
review [153]). Because these characteristics are more prominently associated with
FASD than ADHD, it is possible that there could be structurally based impacts on
visual and perceptual processes that could explain some of the differences between
the groups. In particular, structural dysmorphology and decreased brain size in
FASD has been observed in the parietal and temporal lobes [4,203], which should
be expected to influence bottom-up attention to visual objects. However, it has
also been observed that despite smaller overall brain size, there can be increased
grey matter density (with decreased white matter density) in FASD in superior-
posterior temporal and inferior parietal regions, though mostly laterally in the
perisylvian region [202]. A recent study also suggests that there is reduced inter-
hemispheric connectivity through the corpus callosum between parietal cortices in
FASD (measured by diffusion tensor imaging, and inter-hemispheric correlations
65
in fMRI BOLD signal) [236]. In any case, these studies suggest that both dor-
sal and ventral stream processes guiding visuomotor behavior might be affected in
some way if these structural abnormalities do impact visual processing. Function-
ally, however, little research examining dysfunction in visual processing in FASD in
these regions has been done, as most studies of dysfunction in fronto-parietal and
fronto-temporal networks are related to primarily cognitive tasks (e.g., working
memory [153], arithmetic operations [136]).
2.6.5 ADHD versus FASD
It is important to understand the differences between ADHD and FASD because
they can present with similar symptoms clinically, but have different underlying
pathologies and treatments. However, this study did not reveal differences between
ADHD and FASD in each feature type or core feature alone, but we will summarize
a few studies that have directly compared patients with ADHD or FASD in the
context of attentional control.
Coles and colleagues [39] measured the performance of children with ADHD
or FASD on four factors of attentional control: sustaining, focusing, encoding (se-
quential memory and learning), and shifting (flexibility and executive function),
and each factor was associated with different brain regions (e.g., sustain: the pre-
frontal cortex, parietal cortex; shift: the frontal eye field, posterior parietal cortex).
They found that children with ADHD suffered more than children with FASD in
sustaining and focusing attention, and children with FASD performed worse than
children with ADHD in encoding and shifting attention. We mainly discuss the
shifting attention factor, because our saliency-based and group-based features were
measured only when observers make an overt attention shift.
66
In shifting of attention, consistent with Coles et al. [39], children with FASD or
ADHD exhibit different patterns of deficits in structured pro/anti saccade tasks,
suggesting different pathological effects on oculomotor behavior [78,79,145]. In the
prosaccade task, children with FASD had longer reaction times and made more
directional errors than both children with ADHD and controls, implying that chil-
dren with FASD are more deficient in orienting than children with ADHD. This
would suggest that alcohol damaged the posterior parietal cortex, frontal cortex,
and basal ganglia [78]. In the antisaccade task, children with ADHD or FASD
made more directional errors compared to controls, implying similar level of diffi-
culty in inhibiting automatic response for both groups. Interestingly, the ability
to inhibit impulsive responses may depend on different sub-types and event rate.
In a Go/No-Go task, Kooristra et al. [112] reported that children of the ADHD-
combined type made more mistakes in the slow-paced condition, and children with
FASD or ADHD-inattentive type performed worse in the fast-paced condition.
However,otherstudiesthatcomparedADHDandFASDdirectlyhavearrivedat
different conclusions, especially when patients of different subtypes or severity were
recruited, or differentoutcome measures were used. Taking sustaining attention for
example, while Coles et al. [39] reported children with ADHD were more deficient
than children with FASD, other studies found no differences between ADHD and
FASD [112,149]. Moreover, Kooistra et al. [111] looked into the ADHD inattentive-
type, and found that their performance in a flanker task was similar to that of
controls, implying no deficits in their sustaining attention.
Taken together, studies have revealed through attentional processing and re-
sponse control that FASD and ADHD are difficult to segregate based on an indi-
vidual parameter. Both can point to deficits in executive control over responding,
aswellasattentionalmaintenance. Thus,itfitswithwhy,wefoundnullresultswith
67
regards to the differentiability of each feature type, or core feature alone, between
ADHD and FASD. Nevertheless, this could potentially be due to the heterogeneity
of the patients recruited in this study. Future studies could investigate a greater
number of people with each subtype to better identify unique attention profiles.
2.7 Methods in Details
This section describes the Methods in detail.
Stimuli. Sixty Scene-shuffle Videos (SV, approximately 30 seconds each) were
used. To create these, we filmed thirty 30-second continuous videos with a cam-
corder(SonyHandyCamDCR-HC211NTSC,640×480pixels,MPEG-1),seteither:
immobileonatripod,topanataconstantspeed(6
◦
/secondranging120
◦
backand
forth horizontally), or to follow particular people at a university campus, a beach,
a shopping district, a ski resort, and a desert (filmed videos). We also recorded
ten 30-second continuous videos from television and video games (recorded videos).
The 30 filmed videos were randomly cut to clip snippets (2 4 seconds uniformly
distributed), yielding a total of 291 snippets, and reassembled to thirty 30-second
SVs. Each SV (approximately 30 seconds) was made from 9 to 11 snippets without
any temporal gap in between, and there were no more than one snippet included
from the same original video. One set of 30 SVs were made from the filmed videos
only. Another set of 30 SVs were made from the 10 recorded videos alone in the
same way, but contained snippets of different lengths (0.5-2.5, 1-3, or 2-4 seconds).
Withinthisset, the10recordedvideoswerecuttosnippetswhoselengthuniformly
distributed from 0.5-2.5 seconds (200 snippets) and reassembled to create the first
group of 10 SVs. A second group of 10 SVs was made with the same 10 recorded
videos, but they were cut to snippets whose length uniformly distributed from 1-3
68
seconds (139 snippets). Similarly, a third group of 10 SVs had snippet lengths var-
ied uniformly from 2-4 seconds (93 snippets). Our choice of snippet length (2-4 s)
was within the range of our daily exposure to television and it enabled us to convey
the relative quickness of new, novel and dynamic stimuli (television commercials
have 1-2 seconds shot length (in 15- and 30-seconds commercials) in the United
States [19], and Hollywood films have an average of 5 seconds shot length [48].
This snippet length was further motivated by our previous studies of saliency
effects during free viewing in normal volunteers [27]. The study investigated the
roleofmemoryinguidingattentionwhenwatchingcontinuous, uncutvideosversus
watching video snippets of 1-3 seconds in length. Perceptual memory is critical in
guiding attention, and the authors suggested that perceptual memory of a scene
could be quickly replaced by a new scene when the scene changes. Immediately af-
terthescenechanges, observersrelymoreontheirbottom-upattentionguidanceto
deploy their attention because it is faster than top-down attention [85,232]. Then,
thetop-downcontrolisexpectedtograduallytakeoverafterthesceneisrecognized.
Therefore, with longer snippet length, one may expect to observe stronger compo-
nents of perceptual memory and top-down attention control in guiding attention,
and we explicitly wanted to extract bottom-up measures.
Participants.Thispaperdescribesdatacollectedfrom21childrenwithADHD,
13 children with FASD, 14 elderly PD participants, 18 control children, 18 young
controls, and 24elderly controls (see Diagnostic Criteria and Table 2.1 for demo-
graphic data). More participants were recruited (3 patient groups: 32 ADHD, 22
FASD, 15 PD; 3 healthy control groups: 24 children, 18 young adults, 25 elderly)
than those entered in the final analysis; an individual participant was excluded if
he/she had too few (<10) valid eye traces (see Data Acquisition) or had received
69
medication on the day of the experiment in the child participant groups (PD pa-
tients were not required to withhold medication). A few of the youngest control
children and children with ADHD were excluded to match age across the 3 child
groups. All participants had normal or corrected-to-normal vision, were compen-
sated, and were naive to the purpose of the experiment. Young adult controls were
not directly involved in classification. They were recruited to (1) provide an inde-
pendent gaze distribution to compute group-based, young-observer similarities and
(2) perform saccade selection (see Saccades Deviated from Norm).
Diagnostic Criteria. Diagnosis of PD was established by one of the co-
authors, Dr. GiovanniPari, aneurologistspecializinginmovementdisorders, based
upon the Unified Parkinson’s Disease Rating Scale. Children with ADHD were di-
agnosed from licensed practitioner across Ontario, Canada. Diagnosis of ADHD
was confirmed and co-morbidity assessed using DSM-IV criteria and the Conner
Parent’s Rating Scales (CPRS) for children. Inclusion criteria for the ADHD pool
included meeting DSM-IV criteria and criteria established from the CPRS. ADHD
subjects were excluded if we identified the following co-morbid signs: learning dis-
abilities resulting in delayed advancement in school, Tourette’s syndrome, or bipo-
lar disease. Children with FASD recruited in this study were previously assessed
at diagnostic clinics in Ontario and in accordance with the Canadian Diagnostic
guidelines [35]. The FASD group in the study contained children with one of three
diagnoses (Fetal Alcohol Syndrome (FAS), partial FAS or Alcohol Related Neu-
rodevelopmental Disorder (ARND) that fall under the FASD umbrella term. FAS
requires (i) the presence of a characteristic pattern of craniofacial dysmorphologies
(short palpebral fissures, smooth philtrum, thin upper lip, flattened midface); (ii)
pre- and/or postnatal growth restriction; and (iii) structural and/or functional ab-
normalitiesoftheCNS.AdiagnosisofFAScanbemadeintheabsenceofconfirmed
70
maternal alcohol consumption. Partial FAS is the diagnosis when the presentation
of the child includes some of the craniofacial and physical features, and structural
and/or functional abnormalities of the CNS not explained by other causes, and
confirmed maternal alcohol consumption during pregnancy. ARND is the diagnosis
used when the child presents with structural and/or functional abnormalities of
the CNS not explained by other causes, confirmed maternal alcohol consumption
during pregnancy, but few or no physical features.
Data Acquisition Forty SVs (30 from the filmed videos and 10 from the
recorded videos) were played in a random order. Participants were allowed to rest
after every 10 clips (about 5 minutes). A nine-point calibration was performed at
the beginning of each session. At the beginning of each SV, participants were re-
quired to fixate on a grey cross displayed at the center of the screen, but they could
thenlookanywhereonthescreenatthebeginningofasnippet. Instantaneousgaze
position was tracked by a head-mounted EyeLink II (SR Research Ltd.) from the
participants’ right eye, and 5,066 eye traces (137 participants×≤40 SVs) were ob-
tained. Eye-movement traces from the SVs made of recorded videos were not used
in the present study because of their different snippet lengths, and these clips were
usedtoexploreotheraspectsofeye-movementsunrelatedtothisstudy. Theremain-
ing 3,763 eye traces (137 participants ×≤30 SVs from filmed videos) were further
analyzed, and each gaze position was classified as fixation, saccade, blink/artifact,
saccade during blink, smooth pursuit, and eye tracker drift/misclassification. After
excluding participants and discarding invalid eye traces (next paragraph), there
were 108 participants with 2,654 valid eye traces, and 147,489 saccades were ob-
tained from 79,574 seconds of eye-movement recording (see Table 2.1).
An eye trace needed to meet the following criteria to be considered valid: (1)
calibration error less than 0.75
◦
, (2) drift correction error less than 5
◦
at the end of
71
an eye trace, (3) gaze position outside a 1
◦
inside border of the screen occurred less
than 20% during a trace, (4) loss of gaze tracking occurred less than 10% during a
trace, and (5) maximum fixation length of a single fixation was less than 6 seconds;
otherwise, the eye trace was removed (bad tracking quality).
Themajorcausesforexcludingparticipantswerehighcalibrationanddrifterror
rates (17 out of 20 excluded participants) (Table 2.1). For the 3 child groups, it
was harder to obtain accurate calibration, which resulted in higher exclusion rate
compared to the older participants. The child groups were less likely to complete
the entire 20-minute-long experiment; however, as long as they had 5-minutes (10
eye traces) of valid data, they were not excluded. Although children with FASD
had lower completion rate compared to ADHD and control children, children with
FASDwereengagedwiththetask. Interestingly,childrenwithADHDwerethemost
difficult as far as sitting still. The lower completion rate for children with FAS (a
subtype of FASD) might result from microcephaly, which in our experiment using
a head-mounted eye-tracker (EYELINK II) posed problems, because it was too big
and heavy for some children. We believe that our method would perform even
better by minimizing eye-tracking calibration issues, which experience has taught
ispossiblewhenusingnewereye-trackermodelswithchildren(e.g. EYELINK1000
with remote setup instead of head-mounted EYELINK II). Moreover, reducing the
length of the experiment would improve the completion rate as well. Finally, we
were strict in our criteria for inclusion, and we rejected all clips that followed a bad
calibration (10 eye traces). One could also decrease rejection rate by performing
calibration more often.
ComputingSaliencyMapsfromStimuli. TheIttiandKochsaliencymodel
[94–96] is a biologically-inspired computation model based on feature integration
theory[216]andtheprimatevisualsystem. Themodelsuccessfullyexplainshuman
72
performance in visual search [96], and has been widely used in predicting human
gazeinbothartificialandnaturalscenestimuli. Themodelcomputessaliencymaps
for several low-level features, or combination of features, for every video frame.
These saliency maps are topographic maps of conspicuity which highlight locations
that may attract attention in a stimulus-driven manner. It has been shown that
several regions in the brain resemble a saliency map, such as the superior colliculus
[115], frontal eye fields [208], posterior parietal cortex [38], and pulvinar [180].
The saliency model first applied a linear filter of a feature on a video frame
at several scales to generate multi-scale (i.e., fine to coarse) filtered maps. Fine
filtered maps were then subtracted from coarse filtered maps to simulate center-
surround operations in human vision and to produce feature maps. Next, these
multi-scale feature maps were normalized and combined together to generate con-
spicuity maps in a manner that favors feature maps with sparse peak responses.
Finally, conspicuity maps of different features were linearly summed together to
form a saliency map [95]. A brief description of each feature and the corresponding
citation are provided as follows:
Color contrast [95]: Color red (r
n
), green (g
n
), and blue (b
n
) from a video frame
werenormalizedbyintensity(seeIntensity contrast)todecouplehuefromintensity.
Next, four filters of different color were created: red (R
n
= r
n
− (g
n
+ b
n
)/2),
green (G
n
= g
n
− (r
n
+b
n
)/2), blue (B
n
= b
n
− (r
n
+g
n
)/2), and yellow (Y
n
=
r
n
+g
n
−2(|r
n
−g
n
|/2+b
n
S)).
Intensity contrast[95]: AnintensityfilterwascalculatedasI
n
= (r
n
+g
n
+b
n
)/3.
Orientededges[95]: Gaborfiltersof4differentorientations(θ ={0
◦
,45
◦
,90
◦
,135
◦
})
were generated to filter video frames.
Temporal flicker [94]: Flicker (F
n
) was the absolute difference in Intensity be-
tween the current frame and the previous frame (F
n
=I
n
−I
n−1
).
73
Motion contrast [94]: Motion contrast was computed by the Reichardt model
[173]. Foreachscale,themotionfilterM
n
(θ) =|O
n
(θ)•S
n−1
(θ)−O
n−1
(θ)•S
n
(θ)|,
where • is the point product, O
n
(θ) is the Gabor filter of orientation θ, and S
n
(θ)
is a spatially-shifted difference between two frames that is orthogonal to O
n
(θ).
Line junctions including corners and edge crossings [125]: Junction filters were
built on top of oriented Gabor filters. Four types of junction filters were created:
L-junction, T-junction, X-junction, and E-junction. The L-junction filter responds
to edge corners; the T-junction filter is sensitive to two perpendicular edges that
only one edge ends at the intersection; similarly, the X-junction filter responds to
two perpendicular edges that cross each other; the E-junction filter responds at the
endpoint of an edge.
Intensity variance [125]: Local intensity variance was computed over 16×16
image patches as the following equation: Var(i,j) =
r
P
i
P
j
[I(i,j)−μ]
2
Sp−1
, where S
p
is
the size of the patch, and μ is the average intensity value of the image patch.
Texture contrast[125]: Texturecontrastwascomputedasthespatialcorrelation
between a 16×16 image patch and its neighboring patches within a specified radius
(16 pixels, where 19 pixels equaled 1 degree visual angle). Txt = 1−
Np
P
i
ρ
X,Y
i
Np
,
where N
p
is the number of patches within the radius, and ρ
X,Y
=
cov(X,Y)
ρ
X
ρ
Y
=
E(XY)−E(X)E(Y)
√
E(X
2
)−E
2
(X)
√
E(Y
2
)−E
2
(Y)
, where X and Y are two image patches and E is the
expected value. Therefore, high correlation indicates low texture contrast; low
correlation indicates high texture contrast.
The choice of which features were included in the overall saliency maps was
mostly due to history of model development. In the Itti and Koch saliency model
developed in 1998 [95], there were only features C, I, and O in the model to analyze
static images, but later on F and M [94] were added to analyze videos. Since then
74
we have been using CIOFM as the standard measure of “salience”. Recently, we
added line junctions as a new feature [125]. That is why we here used both CIOFM
and CIOFMJ. Features C, I, O, F, M, and J were implemented with an attempt
to mimic visual processing of primates. However, texture and variance were image
statistics commonly used by computer vision researchers to describe images. As C,
I, O, F, M, and J were derived from a different route than variance and texture, we
have not attempted to put all of them together into a single map.
Saccades after Scene Onset. Hypothesizing that some differences between
clinicalpopulationsaremoregovernedby“stimulus-driven”processes,weattempted
to find the saccades that were more likely driven by bottom-up processes. We de-
signed videos that changed scenes every 2 - 4 seconds because we assumed that
the onset of a new scene would interrupt observers top-down expectations of what
might occur, and the observers may rely more on their bottom-up attention mech-
anism to allocate their attention [27]. Therefore, we considered the first 3 saccades
individually as well as all the saccades combined to compare populations.
Saccades deviated from Norm. To reveal differences between groups, we
selected saccades that were deviated from the gaze of “young adult controls”, the
“norm”. Examining these deviated saccades may be more revealing in whether one
group is more easily attracted to salient features than the others. Because if all
saccades were included, differences between the groups may be diluted, which may
notonlydecreasetheperformanceofclassification,butalsohindertheidentification
of any underlying differences in attention mechanisms across groups. Hence, a
spatiotemporal map of gaze distribution was first computed from the scanpaths
of young controls. Next, for each participant (except young adult controls), we
discardedhalfofhis/hersaccadesthatwerebetterpredictedbythegazedistribution
of young adult controls (see next paragraph for predictability index), and analyzed
75
onlytheotherhalfofthesaccadesthatweremoredeviated. Notethatthisselection
didnotinvolveanyofthesaliency-basedfeatures,saccadedynamics,norcomparing
the clinical groups against each other, but was based solely on eliminating those
saccades which were most predictable given the young adult controls. Then we
computed values of saliency-based and group-based features.
The predictability index was determined in the following manner. First, a spa-
tiotemporal map of gaze distribution was computed from the scanpaths of young
adult controls. Second, we extracted a map value at the saccade endpoint when a
saccade was initiated and compared that to 100 randomly values sampled in the
same map. Then, the predictability of this saccade was determined by the rank of
themapvalueatsaccadeendpointamongthe100randomlysampledmapvalues. If
the saccade endpoint ranked No. 1 (had the highest map value), it meant that this
saccade can be easily predicted from the young adult sample. On the contrary, if
the saccade endpoint ranked No. 101 (had the lowest map value), then this saccade
was the most difficult to predict. This procedure was done on all saccades. Finally,
we sorted all saccades of an observer by their ease of prediction (from the easiest to
the most difficult), discarded the first half of saccades that were easier to predict,
and analyzed only the other half that was more difficult to predict.
Computing Features. Saliency-based and group-based features were titrated
by ordinal dominance analysis [8]. Both types of features reflected the correla-
tions/similarities between gaze and maps (saliency maps for saliency-based fea-
tures; gaze distribution maps of control young adults for group-based features). To
compute the correlation, a map value at saccade endpoint (max value in a 2.5
◦
cir-
cular window) was obtained when an observer initiated a saccade (Fig. 2.1c). One
hundred map values were also randomly sampled from the same map as a baseline
for comparison. These map values were normalized to values from 0 to 1 relative to
76
the minimum and maximum values of the map. With all the saccades, an observer
histogram and a random histogram (bin size was 0.1) were generated from the nor-
malized map values (Fig. 2.1d). The differences between the two histograms were
summarized as an ordinal dominance curve, which is the correlation between the
map and gaze. To create an ordinal dominance curve, we incremented a threshold
from 0 to 1 and calculated the percentage of sampled map values in each histogram
thatwereabovethethreshold(“hits”). Theverticalaxisoftherotatedordinaldom-
inance curve was the percentage of “observer hits”, and the horizontal axis was the
percentage of “random hits”. Thus, calculation of the area under the curve (AUC)
shows the predictability of the maps based on observers’ saccade endpoints. AUC
values obtained from different feature maps typically ranged between chance (0.5)
and an upper bound computed from inter-observer gaze similarity (here, around
0.88 among young adults), depending on how predictive the feature of interest was
of the observer’s gaze. In addition to the measure of AUC, the histograms pro-
vided the frequency that an observer looked at locations of low-/medium-/high
map values.
An AUC of 0.5 means the maps predicted saccade endpoint no better than
random. An AUC above 0.5 indicates the maps predict saccade endpoints above
chance. If we assume a model cannot predict a human’s gaze better than another
group of human, then we can use inter-observer similarity as the upper bound of
AUC. To do this, we took one person out of the group of control young adults,
and utilized the spatiotemporal gaze distribution of the remaining young adults to
predict his gaze. Next, an AUC value was obtained showing the similarity between
his gaze and the rest of the group. We did this for every individual in the group of
control young adults, and the averaged AUC, 0.88, was obtained.
77
Thegroup-basedfeatureswerederivedinthesamefashionasthesaliency-based
features. The only difference was how maps were generated. For saliency-based
features, maps were computed by several saliency models of different features, but
maps for group-based features were generated by the spatiotemporal gaze distribu-
tion of the control young adults. To generate the gaze distributions, the instanta-
neous eye position of each young adult control was represented as a Gaussian blob
(standard deviation = 2
◦
), and combined into a single probability density map
across all young adult controls. Because it took roughly 80 ms from planning to
initiating a saccade, the timing of instantaneous eye positions was shifted earlier
so that the gaze distribution was predictive to eye positions of other participants.
Once the gaze distribution maps were generated, the group-based features were
derived in the same way as the saliency-based features. The low, medium, and high
salience bins for the saliency-based features correspond to low, medium, and high
inter-observer similarity bins for the group-based features.
Classification and Feature Selection. The classification and feature selec-
tionworkflowwasinspiredbymicroarrayanalysis,whichhastwogoals: (1)classify-
ing patients from controls by gene expression in the microarray, and (2) identifying
genes relevant to the disease. One gene (similarly to one of our core features,
e.g., color or motion) can express several different mRNA/proteins (similar to our
sub-features, e.g., color contrast at the target of the first saccade) by alternative
splicing. While microarray analysis usually has thousands of genes to be examined,
we had 224 oculomotor-based, saliency-based and group-based sub-features.
Forfeatureselectionofmultipleclasses,weusedMSVM-RFE[243],anextension
of SVM-RFE [81] for feature selection of multiple classes. Note that MSVM-RFE
ranked features that were most useful in differentiating all populations, rather than
a pair of populations. Therefore, we looked at theweights of the features when
78
the overall classifier reached maximum classification accuracy. However, even with
using the same set of features, some features might be more important to classify
a pair of clinical groups, e.g. ADHD vs. FASD, but other features might be more
important to classify another pair of populations, e.g. controls vs. children with
ADHD.Hence,featureswithlargersquaredweightwereconsideredmoreimportant
in differentiating a pair of clinical groups. In any case, this ranking of features
ranked features that were most useful in differentiating all groups, rather than any
pair of groups. Thirty iterations of a repeated leave-one-out bootstrap [97] were
used to test the performance of each classifier that used a particular selected subset
offeatures. Theaverageaccuracyisreported(withsignificantdeviationsquantified
using one-tailed paired t-tests, df=29). Chance level was computed by training a
classifier with the same bootstrap structure but with randomly permuted class
labels. Because classification accuracy varied with the number of features in the
processofRFE,wetestedtheperformanceofclassifiersbycomparingthemaximum
accuracy obtained by the classifier trained with true labels, to that obtained by the
classifier trained with randomly permuted labels (chance level), regardless of how
many features each classifier used to obtain maximum accuracy. In addition, to
test whether a core feature was different between populations, all its sub-features
were recruited to build a classifier, and we tested whether it performed better than
the one with permuted class labels, as above.
Before training the classifier, features were normalized, and outliers of each fea-
ture were identifiedbefore feature values were standardized. If a feature value was
outside the upper or lower quartile by greater than 1.5 times the inter-quartile
difference, it was considered as an outlier. Then, the mean and standard devia-
tion of that feature was calculated excluding the outliers. All the feature values
were subtracted from the mean and divided by the standard deviation. Finally,
79
standardized feature values were filtered by an arctangent function to diminish the
influence of the outliers.
Direction of Effect. Once a feature (e.g. a core feature, or a feature type)
was found discriminative by the classifiers, we examined the direction of differences
betweenpatientsandcontrolsbycomparingbetweenpopulationstheaveragesofthe
sub-featuresofthediscriminativefeature(e.g.,averagesofthe12saccadeamplitude
sub-features). Note that these sub-features can be averaged because they were
standardized into the same range. Next, the averages between two populations
were tested (two-sample t-test, p<0.01). If they were significantly different, then
the direction was reported.
PropertiesofSupportVectorMachine RecursiveFeatureElimination
(SVM-RFE). Onecharacteristicof SVM-RFEisthatit doesnotselect redundant
features. Mutual information between features is taken into account by the nature
of SVM. Consider, for instance, that saccade duration, amplitude and peak ve-
locity are highly correlated features (following the main sequence) that all exhibit
significant population differences (e.g. by t-test). If features were selected based on
individual significance values (e.g. p-value), then all 3 features would be selected.
However, because they are highly correlated, which means highly redundant, hav-
ing all features does not add much in differentiating populations. Consequently,
highly correlated features will both receive smaller weights in SVM, with the one
that is least noisy receiving slightly higher weight (correlated features might ex-
hibit different noise levels as the correlation between saccade duration, amplitude
and peak velocity may not be perfect because of, say, curved saccade trajectories
or measurement noise). Therefore, the noisier ones will be eliminated first during
RFE,andtheotheroneswillremaininsubsequentstepswithothercomplementary
features to maximize the performance of the classifier.
80
Chapter 3
Differentiating Children with FASD from
Age-matched Controls by Natural Viewing
Patterns
3.1 Introduction
Dysfunction in inhibitory control of attention has been shown in children with fe-
tal alcohol spectrum disorder (FASD). Previous studies have explored the deficits
in top-down (goal oriented) and bottom-up (stimulus driven) attention with a se-
ries of structured visual tasks. This study investigates the difference in attentional
selection mechanism while patients freely viewed natural scene videos without per-
forming specific tasks, and capitalized on their differences in allocating attention
to develop classifiers that distinguishes patients from controls.
Compared to the study described in chapter 2, this study designed a new set of
high-definition videos and dramatically reduced the length of the experiment from
15 to only 5 minutes. With only one-third of the saccades available, the method
used in chapter 2 no longer gave us reliable statistics, which hurt the performance
of the classifiers. Therefore, we devise a novel method that classifies children with
81
FASD and controls utilizing the whole eye trace, which include saccades, fixations,
and smooth pursuit.
Similartothestudyofchapter2,thevideosarecomposedofshort(2-4seconds),
unrelated clips to reduce top-down expectation and emphasize the difference in
gaze allocation at every scene change. However, it is different from the previous
study in that no two clip snippets share the same scene. Gaze of children with
FASD and normally developed children are tracked while they watch the videos.
A computational saliency model computes bottom-up saliency maps for each video
frame. The normalized map values along the recorded eye traces are extracted and
resulted in saliency traces.
Next, a deep convolutional neural network was utilized to learn the sparse rep-
resentations of these saliency traces in an unsupervised manner. These sparse
representations were served as the features for a set of weak classifiers, and a strong
classifierwasbuiltbyAdaboostwiththeseweakclassifiers. Leave-one-outwasused
to train and test the classifiers. The classifier differentiated children with FASD
and age-matched controls with 73.1% accuracy, and found that FASD significantly
impacted most of the low-level visual attributes (e.g., color, motion) and overall
attention deployment. This study demonstrates that attentional selection mech-
anisms are influenced by FASD, and the behavioral difference is captured by the
correlation between salience and gaze. Furthermore, the short and task-free nature
of this method makes it more applicable to children with FASD compared to the
previous method, and shows considerable promise toward a screening tool in the
future.
82
3.2 Materials and Methods
The experimental procedure is summarized as follows. Participants’ eye movement
were recorded while they watched five 1-minute videos, and they were instructed to
“watch and enjoy the clips”. These clips were composed of content-unrelated clip
snippetsof2-4seconds(afewwere7-8seconds). Thedesignofthevideoattempted
to minimize participants’ expectations about what is going to happen next, and
to emphasize attention deployment in new scenes (Fig. 3.1a). For each video
frame,correspondingsaliencymapswherecomputedtopredictvisuallyconspicuous
locations from low-level visual attributes of the stimuli, and the normalized map
values were extracted along the eye trace (saliency trace). To extract features from
thesaliencytraces, aconvolutionaldeepneuralnetworkwasutilizedtodiscover256
basesfromthesaliencytracesofcontrolyoungadults. Next, weappliedthesebases
onchildrendataandobtainedasparserepresentationoftheirsaliencytraces. These
representation was served as the input of a SVM classifier after selecting features
thatbestdistinguishFASDandcontrolchildren. Weperformedtheclassificationon
each clip snippet independently, and used Adaboost to construct a strong classifier
that generates the final prediction.
3.2.1 Standard protocal approvals and patient consent
All experimental procedures were approved by the Human Research and Ethics
Board at Queen’s University, adhering to the guidelines of the Declaration of
Helsinki, and the Canadian Tri-Council Policy Statement on Ethical Conduct for
Research Involving Humans.
83
(A)
scene changes every 2-4 seconds
computational maps of visual salience
(B)
Color (C) Intensity (I) Orientation (O) Flicker (F) Motion (M)
Line
Junctions (J)
CIOFM CIOFMJ Texture (Txt) young-
observer
similarity (yos)
0
5
10
0
2
4
time (seconds)
normalized
map values
along
eye traces
Figure 3.1: (a) The process of extracting a saliency trace. Observers’ eye move-
ments were recorded while free viewing videos of natural scenes that changed every
2-4 seconds. A saliency map was computed for the corresponding video frame (see
(b)), and the normalized map values at the gaze location were extracted along the
eye trace. (b) Nine saliency maps (color, intensity, etc.) of different visual at-
tributes and one map generated from the instantaneous gaze position of normative
young adult observers, here illustrated for the first video frame from (a), the ele-
phants. Brighter color indicates locations of the video frame with stronger feature
contrast.
84
3.2.2 Participants
The results reported in this study included 52 control children and 49 children with
FASD(Table3.1). EightmorecontrolchildrenandfifteenmorechildrenwithFASD
were recruited, but they were excluded from the analysis due tracking quality (see
the next paragraph). Nineteen young adult controls (20.74±1.33 yr) were recruited
to unsupervisedly learn the sparse representation of the eye movement signals (see
3.2.5 Data Analysis).
A participant was excluded from analysis if 3 or more eye traces out of 5 were
invalid,andaneyetracewasconsideredinvalidifitmetoneofthefollowingcriteria:
(1) 50% or more of the eye movements were labeled as blink, (2) 50% or more of
the eye movements were found outside of the screen coordinate, (3) the observer
continuously fixated for more than 30 seconds in a 60 second video, (4) the average
calibration error was larger than 1 degree, and (5) the drift correction error was
larger than 5 degree.
3.2.3 Stimuli
The stimuli were continuous natural scene videos whose content changed every
2-4 seconds (61 snippets, length was uniformly and randomly distributed) or 7-8
seconds (9 snippets). Each video (about 1 minute) included 13 to 15 snippets.
These high-definition videos (1280×1024 pixels) were either shot by a camcorder
(Canon HD 5 videos Vixia HF S20) or extracted from BBC Planet Earth and Pixar
Short Films Collection (Vol. 1). Scenes in the videos included, but not limited
to, the ocean, animals, bird’s-eye view of the earth, jungles, caves, deserts, snow
environment, mountains, seasonal forest, theme parks, streets, zoos, animations,
etc. (see Fig. 3.1a for examples).
85
Table 3.1: Demographic data of children with FASD and control children. OCD:
obsessive-compulsive disorder.
Category FASD Ctrl. Child
n 49 52
age (yr) 12.22±3.26 11.07±3.33
male:female 25:24 24:28
subtypes ARND:39
pFAS:8
FAS:2
comorbidity ADHD: 29
Anxiety: 4
Depression: 2
Bipolar: 1
OCD: 6
Other: 12
ADHD: 0
Anxiety: 0
Depression: 0
Bipolar: 0
OCD: 0
Other: 0
medication Risperidone: 7
Adderall: 6
Strattera: 5
Biphentin: 4
Concerta: 4
Antidepressants: 2
Dexedrine: 2
Melatonin: 2
Risperdal: 2
Ritalin: 2
Epival: 1
Olanzapine: 1
Seroquel: 1
Eellbupron: 1
Ventolin: 2
Advair: 1
Singular: 1
86
3.2.4 Data acquisition
This experiment involved two eye tracking setups. The first setup was the mobile
eye-tracking laboratory from Reynolds lab of Queen’s University. This setup trav-
eledacrossCanadatorecordeyemovementofchildrenwithFASDandage-matched
controls. The second setup, located at iLab of the University of Southern Califor-
nia, was used to record eye movements of control young adults (compensated by
participation credits), and the field of view of the stimuli were adjusted to be the
same. Participants of both setups were instructed to “watch and enjoy the clips”.
The mobile eye-tracking laboratory was set up with an EyeLink 1000 (SR Re-
search Ltd.) in the standard remote configuration, and the participants sat 60
centimeters in front of the screen that came with the EyeLink (27.8
◦
×22.3
◦
field-
of-view). The iLab eye-tracking setup also used an EyeLink 1000 in the desktop
configuration, and the young adult participants sat 118 centimeters in front of the
screen(40”SonyBraviaKDL-40V5100). Theseyoungadults’chinsweresupported
by a chin-rest and their foreheads were attached to a head band for stabilization.
Although the two eye tracking setups were different, the videos displayed in the
iLab setup was resized so that both setups had identical field-of-view to the stim-
uli. These young adult participants watched ten 1-minute videos, where the first
half of the videos were the same as those children watched, and the second half of
the videos were made in the same way as the first half but with different contents.
Only the right eye of the observers were tracked at 500Hz for both setups.
The experiment was divided into two sessions, five 1-minute videos each. Par-
ticipants were allowed to take a break between the sessions. Before each session,
a 5- or 9-point calibration grid, depending on the tracking quality, were used to
calibrate the eye tracker. An Elmo icon blinked at the center of the stimuli before
87
the start of each video to indicate an video is about to start. After each video was
display, a drift correction was performed to monitor the tracking quality.
3.2.5 Data analysis
Extracting saliency traces. To extract the saliency traces, which served as the
input of later analysis, saliency maps of each video frames were computed first.
For each video frame, nine saliency maps of different visual attributes and one
young-observer similarity map were generated. These visual attributes included
color contrast (C), intensity contrast (I), oriented edges (O), temporal flicker (F),
motion contrast (M), line junctions (J), and texture contrast (Txt). Our lab has
been using CIOFM as the standard measure of “salience”. With a recently devel-
oped line junction visual attribute, the junction attribute was combined with the
standard saliency map to form another saliency map - CIOFMJ. Young-observer
similarity map (Yos) was not computed from low-level attribute. Instead, the map
was generated by the instantaneous gaze position from the young adult controls,
whichprovidedinformationoftheirattentiondeploymentontopofstimulus-driven
attention estimated from saliency maps.
Oncethetensaliency/similaritymapswerecomputed, themapvaluesalongthe
eye traces (saliency traces) were extracted and represented by normalized saliency
score (NSS). The NSS score showed how many standard deviation away the map
value at the gaze position was compared to 100 randomly sampled map values of
the same map: nss =
g−avg(r)
std(r)
, where g was the map value at the gaze position,
and r was 100 randomly sampled map values. With the ten saliency/similarity
maps, ten saliency/similarity traces were obtained. These traces were sampled at
500Hz as they were computed from the eye traces, and resampled to 62.5Hz to
88
reduce the computational load. Next, a Principle Component Analysis (PCA) was
performed among the 10 traces to whiten and decorrelate them, and the first 7
principle components were kept to maintain 97.5% of the trace information.
Convolutional Deep Belief Network. Deep Belief Network (DBN) [88,120]
is a computational model that learns deep hierachical representations of the in-
puts in an unsupervised manner. The deep hierachical structure of DBN could
abstract the inputs layer by layer, and each layer of the DBN was trained to learn
the sparse representation of the inputs from the previous layer. While searching
for the sparse representation, the layer learned to represent the inputs by mini-
mizing the input reconstruction errors using an overcomplete set of basis. This
sparse representation had several advantages in representing input signals, and the
one benefited classification the most was denoising, because signals and noises were
likely to be separated by different bases. Several classification tasks (e.g., object
recognition [117,120], speech recognition [121]) using bases discovered by DBN
showed superior/comparable performance to other algorithms with well-engineered
features. Moreover, these DBN-discovered bases remarkably resembled the recep-
tivefieldsintheearlyvisual/auditorycortexofthebrainwhentrainedwithnatural
images [157] or sound signals [198].
In this study, we used a two-layer convolutional deep neural network (CDNN)
[118] to discover the sparse representation of the saliency traces, and classified chil-
drenwithFASDfromcontrolsbasedonthisrepresentation. ComparedtofullDBN,
the convolutional version reduces number of parameters to be learned because it
shared the same basis across different part of the saliency traces. Each layer of
the CDNN was a topographic independent component analysis (TICA) network
89
(Fig. 3.2a) [91]. TICA is an unsupervised learning algorithm that discovers com-
ponents (bases) from unlabelled data in a way pretty similar to independent com-
ponent analysis (ICA). They were different in that TICA loosened independence
constraint between neighboring components so that similar components were next
to each other. Therefore, given a natural image set, gabor-like bases discovered
by classic ICA were orderless due to the assumption of statistical independence; in
contrast, these bases uncovered by TICA were ordered like the orientation map in
the primary visual cortex. This orientation map was essential to give rise to the
propertiesofcomplexcellsinthenextlayer, whichcouldhelptheprocessingforthe
next layer. Another reason to use TICA was that, in practice, ICA often violated
the independence assumption and one could easily find components that depended
on each other (e.g., 90
◦
and 80
◦
Gabor-filters that ICA discovered from natural
images). Considering this dependency could be useful in estimating a more sta-
ble set of the independent components when the number of potential independent
components were larger than the number of independent components estimated
(e.g, image decomposition). In this case, classic ICA gave a random subset of inde-
pendent components depending on different initial conditions. In contrast, TICA
found a more stable and meaningful subset of independent components because the
dependency between the independent components were considered.
TICA network was composed of three components: input, simple, and pooling
nodes (Fig. 3.2a). Given an input pattern x
(t)
, each pooling node was activated
as p
i
(x
(t)
;W,V) =
q
P
m
k=1
V
ik
(
P
n
j=1
W
kj
x
(t)
j
)
2
, where n was the local receptive
field size (size of the input) and m was the number of simple units. The weight
V ∈ R
m×m
between the simple and pooling nodes were fixed (V
ij
= 0or1) to
represent to topographical structure. The weight W ∈ R
m×n
between the input
90
input
simple units
pooling units
( )
2
( )
receptive !eld size = 4
pooling size = 3
W
(a)
(b)
Figure 3.2: (a) One layer of the convolutional deep neural network, where the
simple units are the squared weighted sum of the receptive field, and the pooling
units compute the square root of the sum of adjacent simple units. (b) Sixty-
four (out of 128) randomly selected first-layer bases learned from normative young
adults’ saliency traces.
91
and simple nodes was the sparse bases to be learned by TICA. The TICA learned
the weight W by solving
argmin
W
T
X
t=1
m
X
i=1
p
i
(x
(t)
;W,V), subject to WW
T
=I (3.1)
where input patterns {x
(t)
}
T
t=1
were whitened by PCA. This objective function
minimized the sum of activities from all pooling nodes to reduce redundancies
between them, and the constraint WW
T
= I ensured sparseness between bases.
This objective function corresponded to classic ICA if each pooling node p
i
was
connectedtoexactlyonesimplenodeh
i
. Intheimplementationofthisstudy,weset
eachpoolingnode(p
i
)connectingto3continuoussimplenodes(h
i−1
,h
i
,h
i+1
). The
size of the local receptive field was set to 4 (4 continuous samples from the saliency
trace), the step size was 2 (the local receptive field moved 2 samples forward), and
the number of simple and pooling nodes was 128 (128 sparse bases). These bases
were trained from the whitened saliency traces from normative young adults (Fig.
3.2b).
The sparse representation of children’s saliency traces was simply obtained by
propagating their whitened traces through the CDNN pretrained from the young
controls. Thechildren’ssaliencytraceswerewhitenedbythesameprinciplecompo-
nentsthatyoungcontrolsused. Thesewhitenedtracesthenpropagatedthroughthe
CDNN composed of two TICA networks whose weight matrices W were pretrained
by normative young adults. For each TICA network, the output of each pooling
node along the whole saliency trace was called a “map”, and all the maps together
should cover the entire saliency trace. We concatenated 256 maps (128 maps × 2
TICA network) into one giant vector that sparsely represented the saliency traces
and would be the features used for classification.
92
Classifying children with FASD from controls. A weak classifier was
built per clip snippet, and a strong classifier was built by the classic Adaboost [67]
from all the weak classifiers that performed better than the chance level (
max(n
i
)
P
n
i
,
n
i
: number of participants of group i). To build a weak classifier, a combination
of filter and wrapper method was used in selecting features (parts of the sparsely
represented saliency traces) that discriminated patients from controls. For each
map in a trace, the filter method was a two-tail t-test to identify features whose
p-value≤ 0.05 after Bonferroni correction within the map. Once the filter method
selectedasetoffeatures,anL1-regularizedsupportvectormachine(SVM)wasused
to perform another feature selection and classification simultaneously, and resulted
in our weak classifier of this snippet.
A leave-one-out cross validation (LOOCV) was used to test the performance of
the strong classifier. At each iteration, one participant per group was left out for
testing, and the remaining subjects were the training data. Within the training
data set, another LOOCV was used to search the cost value of the SVM and to
estimate whether this weak classifier could perform above chance. If this weak
classifier was estimated to perform better than chance, it was trained again with all
the training samples and would be included in the strong classifier. The LOOCV
was repeated 10 times to evaluate the performance of the strong classifier, and a
one-tail t-test was used for statistical tests.
3.3 Results
Classification accuracy for 49 children with FASD versus 52 age-matched controls
reached 73.1% (chance: 51.5%; t(9)=66.88, p<0.01; Fig. 3.3a). The sensitivity to
detect children with FASD was 68.4%, and the specificity was 77.5% (Fig. 3.3b).
93
0.5
0.6
0.7
0.8
0.9
Classi cation Accuracy
*
(a) (b)
0.78
0.32
0.22
0.68
sensitivity = 68.4%
speci cty = 77.5%
True Label
Ctrl. FASD
Ctrl.
FASD
Predicted Label
Figure 3.3: (a) The classifier differentiated children with FASD from age-matched
controls with 73.1% accuracy. (b) Confusion matrix, sensitivity, and specificity in
differentiating patients from controls.
3.4 Discussion
This study demonstrates children with FASD can be identified by only five minutes
of natural viewing eye traces. This result greatly increases the applicability of the
natural viewing paradigm on children, and the potential to be used as a screening
tool.
ThesignificantclassificationresultsimpliesthatFASDinfluencestheprocessing
in deploying overall attention and most of the low-level visual attributes, which
agreeswithpreviousfindingsthatconsistentlyreportFASDaffectsvisualprocessing
[24,196]. The deficient visual processing may affect bottom-up attention guidance
forchildrenwithFASDasoverallsaliencearefounddiscriminativebetweenchildren
with FASD and controls. This finding is consistent with Green et al. [77,79],
who reports that children with FASD exhibits longer reaction times and higher
direction error rate than age-matched controls in the prosaccade task, which is a
task that test reflexive, bottom-up driven saccadic response. Nevertheless, their
impaired prosaccade response may be due to their deficient top-down control [178]
that interrupts bottom-up attention guidance [77,79].
94
In summmary, by shortening the experiment time to only five minutes, this
study moves one step forward to use the natural viewing paradigm to screen in-
dividuals with FASD. One should keep developing new algorithms to understand
how the natural viewing eye traces and the saliency traces should be represented
such that the classification accuracy can be further improved.
95
Chapter 4
General Discussion
In summary, this work demonstrates for the first time that signatures of attention
deficits can be decoded from natural viewing eye traces. These natural viewing eye
traces contain unique and revealing information about observers’ attention process.
This information allows us to design an objective, automated, high-throughput,
time- and cost-effective method that has the potential to be used for screening
large population, and the task-free nature of this method makes it suitable for a
wide range of populations, such as those who cannot understand task instructions.
This demonstration serves as the first step in decoding attention function from
natural behavior. This work opens up avenues for many research opportunities and
biomedical applications. A few of them are discussed here.
4.1 Understanding disorders of broad clinical
spectra with machine learning
Machine learning techniques can not only be used to classify patients from con-
trols or another group of patients with overlapping symptoms, but also advance
our understanding in disorders of broad clinical spectra, such as ADHD and FASD.
96
Two types of techniques can be used: supervised or unsupervised learning. For
supervised learning, the method can give each subtype a class label and train a
classifier. When the classifier learns to differentiate each subtype, the features that
are important to identify each subtype emerge, and thus the oculomotor and atten-
tion profile of each subtype is revealed. Similarly, if there are no categorical labels,
but a continuous scale such as a severity spectrum, we could learn how the pro-
file changes across the severity by means of regression. For unsupervised learning,
which usually requires large amount of data, clustering methods can be applied to
see whether the discovered clusters correspond to the known sub-types. If there are
more clusters than number of subtypes, researchers can then investigate whether
a known subtype needs to be further divided. If there are fewer clusters than the
number of subtypes, then investigators can further examine whether certain known
subtypes actually show no differences in their oculomotor and attention profiles.
4.2 Representing eye movement traces
To have a reliable and accurate classifier, one needs a good classifier as well as a
good feature space to represent the data so that different classes are separable in
that feature space. Computer vision and speech recognition researchers have put
in decades of efforts to learn or to design features that best represent objects in an
image or speech in sound for their object/speech recognition tasks. However, no
workbeforethisoneattemptstofindafeaturespacethatbestrepresentseyemove-
menttracesfordecodingtheobserver’sattentionfunction. Thefeaturesusedinthe
chapter 2 are based on saccade, the overt attention shift, because it is known that
the three disorders affect the attention selection mechanisms of the patients. For
97
the study in chapter 3, a convolutional deep neural network is used to sparsely rep-
resent the saliency traces. The features discovered by the deep networks have given
rise to classification performance that is comparable to those with well-engineered
features (e.g. gabor-filters for vision and MFCC for speech) [118,120,121]. Because
the deep network successfully discover features in several domains, this study also
uses the deep network to discover features for eye movement. However, note that
the patients and controls look at the same regions in the videos during free view-
ing most of the time. When they both look at the same location, they cannot be
distinguished by their eye movement. Therefore, the information lies in the traces
where they look at different regions. However, the default deep network does not
consider this and it learns the features with the whole eye traces. Therefore, it
is very important to better represent the eye movements to decode the attention
function.
4.3 Correlating natural viewing behavior with
functional and structural imaging data
The current work points to followup studies in correlating the impairment in the
attentional network and the deficient features identified by our method. With
the brain imaging data, one no longer needs to speculate what is happening in
these patients’ brains. For example, future studies can to address whether the
severity of the frontal lobe impairment (MRI structural data) covary with the same
observer’s agreement to top-down guidance maps during natural viewing. How
does the degree of impairment in the connection between the frontal and parietal
lobe (DTI data) correlate with the deficits revealed in the classification procedure?
98
Or, more dynamically, what is the drive (bottom-up or top-down) of each saccade
during natural viewing, and how that differs between the patients and controls
(EEG, ECOG, MEG, or fMRI. e.g., [150])? Many interesting questions await to
be addressed with a rich dataset including both natural viewing and neuroimaging
data. In fact, our collaborators have launced data collection along this line to
address some of these questions.
4.4 Assisting confirmatory diagnoses
Our method could also assist confirmatory diagnoses. Currently, confirmatory di-
agnoses rarely depend on a single behavioral measure, especially for differential
diagnoses. Nevertheless, our approach has the potential to be used as a large-scale
screening tool, and then trigger further medical examinations and diagnosis. Fur-
thermore,ourobjective,quantitativeresultscouldbeintegratedintoexpertsystems
that assist doctors to evaluate patients and to diagnose disorders.
99
References
[1] Alper A¸ ck, Adjmal Sarwary, Rafael Schultze-Kraft, Selim Onat, and Pe-
ter K¨ onig. Developmental Changes in Natural Viewing Behavior: Bottom-
Up and Top-Down Differences between Children, Young Adults and Older
Adults. Frontiers in Psychology, 1(November):1–14, 2010.
[2] Silvia C Amador, Ashley J Hood, Mya C Schiess, Robert Izor Sereno, and
Anne B. Dissociating cognitive deficits involved in voluntary eye movement
dysfunctions in Parkinson’s disease patients. Neuropsychologia, 44(8):1475–
1482, 2006.
[3] D G Amen and B D Carmichael. High-resolution brain SPECT imaging
in ADHD. Annals of clinical psychiatry : official journal of the American
Academy of Clinical Psychiatrists, 9(2):81–6, June 1997.
[4] S L Archibald, C Fennema-Notestine, A Gamst, Edward P Riley, S N Matt-
son,andTLJernigan. Braindysmorphologyinindividualswithsevereprena-
talalcoholexposure. Developmental medicine and child neurology,43(3):148–
54, March 2001.
[5] American Psychiatric Association and American Psychiatric Association.
Task Force DSM-IV. Diagnostic and Statistical Manual of Mental Disorders,
Fourth Edition, Text Revision (DSM-IV-TR), volume 1. American Psychi-
atric Association, Arlington, VA, 2000.
[6] L Averbuch-Heller, J S Stahl, M L Hlavin, and R J Leigh. Square-wave jerks
induced by pallidotomy in parkinsonian patients. Neurology, 52(1):185–188,
January 1999.
[7] G. Backer, B. Mertsching, and M. Bollmann. Data- and model-driven gaze
control for an active-vision system. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 23(12):1415–1429, 2001.
[8] D. Bamber. The area above the ordinal dominance graph and the area below
the receiver operating characteristic graph. Journal of mathematical psychol-
ogy, 12(4):387–415, 1975.
100
[9] Ohad Ben-Shahar, Brian J Scholl, and Steven W Zucker. Attention, segre-
gation, and textons: bridging the gap between object-based attention and
texton-based segregation. Vision research, 47(6):845–60, March 2007.
[10] SimoneABetchenandMichaelKaplitt. Futureandcurrentsurgicaltherapies
in Parkinson’s disease. Curr Opin Neurol, 16(4):487–493, August 2003.
[11] NarcissePBichot,AndrewFRossi,andRobertDesimone. Parallelandserial
neuralmechanismsforvisualsearchinmacaqueareaV4. Science (New York,
N.Y.), 308(5721):529–34, April 2005.
[12] Irving Biederman, Robert J. Mezzanotte, and Jan C. Rabinowitz. Scene
perception: Detecting and judging objects undergoing relational violations.
Cognitive psychology, 14(2):143–177, April 1982.
[13] James W Bisley and Michael E Goldberg. Neuronal activity in the lat-
eral intraparietal area and spatial attention. Science (New York, N.Y.),
299(5603):81–6, January 2003.
[14] J.W. Bisley and M.E. Goldberg. Attention, intention, and priority in the
parietal lobe. Annual review of neuroscience, 33:1–21, 2010.
[15] Ivan Bodis-Wollner. Neuropsychological and perceptual defects in Parkin-
son’s disease. Parkinsonism & related disorders, 9 Suppl 2:S83–9, August
2003.
[16] Ivan Bodis-Wollner. Retinopathy in Parkinson Disease. Journal of neural
transmission (Vienna, Austria : 1996), 116(11):1493–501, November 2009.
[17] Ivan Bodis-Wollner, M S Marx, S Mitra, P Bobak, L Mylin, and M Yahr.
Visual dysfunction in Parkinson’s disease. Loss in spatiotemporal contrast
sensitivity. Brain : a journal of neurology, 110 ( Pt 6(Pt 6):1675–98, Decem-
ber 1987.
[18] Susan E Boehnke and Douglas P Munoz. On the importance of the transient
visual response in the superior colliculus. Current opinion in neurobiology,
18(6):544–51, December 2008.
[19] David Bordwell. Intensified Continuity Visual Style in Contemporary Amer-
ican Film. Film Quarterly, 55(3):16–28, March 2002.
[20] Ali Borji and Laurent Itti. State-of-the-art in Visual Attention Modeling.
IEEE transactions on pattern analysis and machine intelligence, PP(99):1,
April 2012.
101
[21] K A Briand, W Hening, H Poizner, and A B Sereno. Automatic orienting of
visuospatial attention in Parkinson’s disease. Neuropsychologia, 39(11):1240–
1249, 2001.
[22] K A Briand, D Strallow, W Hening, H Poizner, and A B Sereno. Control of
voluntary and reflexive saccades in Parkinson’s disease. Experimental brain
research., 129(1):38–48, November 1999.
[23] James R. Brockmole and John M. Henderson. Using real-world scenes as
contextual cues for search. Visual Cognition, 13(1):99–108, January 2006.
[24] M J Burden, A Westerlund, G Muckle, N Dodge, E Dewailly, C A Nelson,
S W Jacobson, and J L Jacobson. The effects of maternal binge drinking
during pregnancy on neural correlates of response inhibition and memory
in childhood. Alcoholism, Clinical and Experimental Research, 35(1):69–82,
January 2011.
[25] Ian G M Cameron, Masayuki Watanabe, Giovanna Pari, and Douglas P
Munoz. Executive impairment in Parkinson’s disease: response automaticity
and task switching. Neuropsychologia, 48(7):1948–1957, June 2010.
[26] C.D. Carello and R.J. Krauzlis. Manipulating Intent:: Evidence for a Causal
Role of the Superior Colliculus in Target Selection. Neuron, 43(4):575–583,
2004.
[27] Ran Carmi and Laurent Itti. Visual causes versus correlates of attentional
selection in dynamic scenes. Vision research, 46(26):4333–4345, December
2006.
[28] Joshua L Carr, Sabrina Agnihotri, and Michelle Keightley. Sensory process-
ing and adaptive behavior deficits of children across the fetal alcohol spec-
trum disorder continuum. Alcoholism, clinical and experimental research,
34(6):1022–32, June 2010.
[29] Carrasco M., Penpeci-Talgar C., and Eckstein M. Spatial covert attention
increasescontrastsensitivityacrosstheCSF:supportforsignalenhancement.
Vision Research, 40(10):13, 2000.
[30] M Castelo-Branco, M Mendes, F Silva, J Massano, G Januario, C Januario,
and A Freire. Motion integration deficits are independent of magnocellular
impairmentinParkinson’sdisease. Neuropsychologia,47(2):314–320,January
2009.
[31] CD Chambers and JM Payne. Fast and slow parietal pathways mediate
spatial attention. Nature ..., 2004.
102
[32] JMChambersandTJPrescott. Responsetimesforvisuallyguidedsaccades
in persons with Parkinson’s disease: a meta-analytic review. Neuropsycholo-
gia, 48(4):887–899, March 2010.
[33] Florence Chan, Irene T Armstrong, Giovanna Pari, Richard J Riopelle, and
Douglas P Munoz. Deficits in saccadic eye-movement control in Parkinson’s
disease. Neuropsychologia, 43(5):784–796, 2005.
[34] Leonardo Chelazzi, John Duncan, Earl K. Miller, and Robert Desimone. Re-
sponses of Neurons in Inferior Temporal Cortex During Memory-Guided Vi-
sual Search. J Neurophysiol, 80(6):2918–2940, December 1998.
[35] A E Chudley, J Conry, J L Cook, C Loock, T Rosales, N LeBlanc, and Pub-
lic Health Agency of Canada’s National Advisory Committee on Fetal Alco-
holSpectrumDisorder. Fetalalcoholspectrumdisorder: Canadianguidelines
for diagnosis. CMAJ : Canadian Medical Association journal = journal de
l’Association medicale canadienne, 172(5 Suppl):S1–S21, March 2005.
[36] M M Chun and Y Jiang. Contextual cueing: implicit learning and memory
of visual context guides spatial attention. Cognitive psychology, 36(1):28–71,
June 1998.
[37] J.J. Clark and N.J. Ferrier. Modal Control Of An Attentive Vision System.
In [1988 Proceedings] Second International Conference on Computer Vision,
pages 514–523. IEEE, 1988.
[38] CLColbyandMEGoldberg. Spaceandattentioninparietalcortex. Annual
review of neuroscience, 22:319–49, January 1999.
[39] Claire D. Coles, Kathleen A. Platzman, Cheryl L. Raskind-Hood, Ronald T.
Brown, Arthur Falek, and Iris E. Smith. A comparison of children affected
by prenatal alcohol exposure and attention deficit, hyperactivity disorder.
Alcoholism, clinical and experimental research, 21(1):150–61, February 1997.
[40] Charles E. Connor, Dean C. Preddie, Jack L. Gallant, and David C. Van Es-
sen. SpatialAttentionEffectsinMacaqueAreaV4. J. Neurosci.,17(9):3201–
3214, May 1997.
[41] Roshan Cools. Dopaminergic modulation of cognitive function-implications
for L-DOPA treatment in Parkinson’s disease. Neurosci Biobehav Rev,
30(1):1–23, 2006.
[42] MaurizioCorbetta, MichelleJKincade, ChrisLewis, AbrahamZSnyder, and
Ayelet Sapir. Neural basis and recovery of spatial attention deficits in spatial
neglect. Nature neuroscience, 8(11):1603–10, November 2005.
103
[43] Maurizio Corbetta, Gaurav Patel, and Gordon L Shulman. The reorienting
system of the human brain: from environment to theory of mind. Neuron,
58(3):306–24, May 2008.
[44] Maurizio Corbetta and Gordon L Shulman. Control of goal-directed
and stimulus-driven attention in the brain. Nature reviews. Neuroscience,
3(3):201–15, March 2002.
[45] T J Crawford, L Henderson, and C Kennard. Abnormalities of nonvisually-
guided eye movements in Parkinson’s disease. Brain, 112 ( Pt 6:1573–1586,
December 1989.
[46] Nicole Crocker, Linnea Vaurio, Edward P Riley, and Sarah N Mattson. Com-
parisonofadaptivebehaviorinchildrenwithheavyprenatalalcoholexposure
or attention-deficit/hyperactivity disorder. Alcoholism, clinical and experi-
mental research, 33(11):2015–23, November 2009.
[47] EB Cutrell and RT Marrocco. Electrical microstimulation of primate pos-
terior parietal cortex initiates orienting and alerting components of covert
attention. Experimental brain research, 144(1):103–113, 2002.
[48] J E Cutting, J E DeLong, and K L Brunick. Visual activity in Hollywood
film: 1935 to 2005 and beyond. Psychology of Aesthetics, Creativity, and the
Arts, 5(2):115–125, 2011.
[49] Y Dan. Efficient coding of natural scenes in the lateral geniculate nucleus:
experimental test of a computational theory. The Journal of Neuroscience,
1996.
[50] Stephen V SV David, William E Vinje, and Jack L Gallant. Natural stimulus
statistics alter the receptive field structure of v1 neurons. The Journal of
Neuroscience, 24(31):6991–7006, August 2004.
[51] J.deFockert,G.Rees,C.Frith,andN.Lavie. Neuralcorrelatesofattentional
capture in visual search. Journal of Cognitive Neuroscience, 16(5):751–759,
2004.
[52] PeterDeGraef,DominieChristiaens,andGryD’Ydewalle. Perceptualeffects
of scene context on object identification. Psychological Research, 52(4):317–
329, January 1990.
[53] A.-Y. Debra Chiang, D. Berg, and Laurent Itti. Saliency, Memory, and At-
tention Capture in Marketing. Journal of Vision, 11(11):493–493, September
2011.
104
[54] J Downar, a P Crawley, D J Mikulis, and K D Davis. A multimodal cortical
network for the detection of changes in the sensory environment. Nature
neuroscience, 3(3):277–83, March 2000.
[55] J Duncan. Selective attention and the organization of visual information.
Journal of experimental psychology. General, 113(4):501–17, December 1984.
[56] W Dunn. The impact of sensory processing abilities on the daily lives of
young children and their families: A conceptual model. Infants and Young
Children, 9:23–35, 1997.
[57] W Dunn and Donna Bennett. Patterns of sensory processing in children
with attention deficit hyperactivity disorder. Occupational Therapy Journal
of Research, pages 4–15, 2002.
[58] Wolfgang Einh¨ auser, Merrielle Spain, and Pietro Perona. Objects predict
fixations better than early saliency. Journal of Vision, 8(14):1–26, January
2008.
[59] A Ellison and I Schindler. An exploration of the role of the superior temporal
gyrus in visual search and spatial perception using TMS. Brain, 2004.
[60] Charles W. Eriksen and James D. St. James. Visual attention within and
around the field of focal attention: A zoom lens model. Perception & Psy-
chophysics, 40(4):225–240, July 1986.
[61] J Ermer and W Dunn. The sensory profile: a discriminant analysis of chil-
dren with and without disabilities. The American journal of occupational
therapy. : official publication of the American Occupational Therapy Associ-
ation, 52(4):283–90, April 1998.
[62] JMFearnleyandAJLees. AgeingandParkinson’sdisease: substantianigra
regional selectivity. Brain : a journal of neurology, 114 ( Pt 5(5):2283–301,
October 1991.
[63] Jillian H Fecteau and Douglas P Munoz. Salience, relevance, and firing: a
priority map for target selection. Trends in cognitive sciences, 10(8):382–90,
August 2006.
[64] Gidon Felsen and Yang Dan. A natural approach to studying vision. Nature
neuroscience, 8(12):1643–6, December 2005.
[65] Joanne Fielding, Nellie Georgiou-Karistianis, and Owen White. The role of
thebasalgangliainthecontrolofautomaticvisuospatialattention. Journalof
theInternationalNeuropsychologicalSociety: JINS,12(5):657–67,September
2006.
105
[66] M.D. Fox, M. Corbetta, A.Z. Snyder, J.L. Vincent, and M.E. Raichle. Spon-
taneous neuronal activity distinguishes human dorsal and ventral attention
systems. Proceedings of the National Academy of Sciences, 103(26):10046,
2006.
[67] Yoav Freund and Robert Schapire. A desicion-theoretic generalization of on-
line learning and an application to boosting. Computational learning theory,
904(1):23–37, 1995.
[68] Simone Frintrop, Gerriet Backer, and Erich Rome. Selecting what is impor-
tant: Training visual attention. In KI 2005: advances in artificial intelli-
gence: 28th Annual German Conference on AI, KI 2005, Koblenz, Germany,
September 11-14, 2005: proceedings, volume 3698, page 351. Springer Verlag,
2005.
[69] Simone Frintrop, Erich Rome, and HI Christensen. Computational visual
attention systems and their cognitive foundations: A survey. ACM Transac-
tions on Applied ..., 7(1):1–39, January 2010.
[70] J Fukushima, K Fukushima, K Miyasaka, and I Yamashita. Voluntary con-
trol of saccadic eye movement in patients with frontal cortical lesions and
parkinsonian patients in comparison with that in schizophrenics. Biological
psychiatry, 36(1):21–30, July 1994.
[71] Junko Fukushima, Tatsuo Hatta, and Kikuro Fukushima. Development
of voluntary control of saccadic eye movements. Brain and Development,
22(3):173–180, May 2000.
[72] J.L. Gallant, C.E. Connor, and D.C. Van Essen. Neural activity in areas
V1, V2 and V4 during free viewing of natural scenes compared to controlled
viewing. Neuroreport, 9(9):2153, 1998.
[73] J N Giedd, J Blumenthal, E Molloy, and F X Castellanos. Brain imaging of
attention deficit/hyperactivity disorder. Ann N Y Acad Sci, 931:33–49, June
2001.
[74] B Giesbrecht, M G Woldorff, A W Song, and G R Mangun. Neural mecha-
nisms of top-down control during spatial and feature attention. NeuroImage,
19(3):496–512, July 2003.
[75] T.R. R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P.
Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, and Others.
Molecular Classification of Cancer: Class Discovery and Class Prediction by
Gene Expression Monitoring. Science, 286(5439):531–537, October 1999.
106
[76] A.M. Graybiel. The basal ganglia: learning new tricks and loving it. Current
Opinion in Neurobiology, 15(6):638–644, 2005.
[77] Courtney R Green, A M Mihic, D C Brien, I T Armstrong, S M Nikkel, B C
Stade, C Rasmussen, Douglas P Munoz, and J N Reynolds. Oculomotor con-
trol in children with fetal alcohol spectrum disorders assessed using a mobile
eye-trackinglaboratory. The European journal of neuroscience, 29(6):1302–9,
March 2009.
[78] Courtney R Green, a M Mihic, S M Nikkel, B C Stade, C Rasmussen, Dou-
glas P Munoz, and J N Reynolds. Executive function deficits in children
with fetal alcohol spectrum disorders (FASD) measured using the Cambridge
Neuropsychological Tests Automated Battery (CANTAB). Journal of child
psychology and psychiatry, and allied disciplines, 50(6):688–97, June 2009.
[79] Courtney R Green, Douglas P Munoz, Sarah M Nikkel, and James N
Reynolds. Deficitsineyemovementcontrolinchildrenwithfetalalcoholspec-
trum disorders. Alcoholism, clinical and experimental research, 31(3):500–11,
March 2007.
[80] Isabelle Guyon and A. Elisseeff. An introduction to variable and feature
selection. The Journal of Machine Learning Research, 3:1157–1182, 2003.
[81] IsabelleGuyon,JWeston,SBarnhill,andVVapnik. Geneselectionforcancer
classification using support vector machines. Machine Learning, 46(1):389–
422, 2002.
[82] P E Haenny and P H Schiller. State dependent activity in monkey visual
cortex. I. Single cell activity in V1 and V4 on visual tasks. Experimen-
tal brain research. Experimentelle Hirnforschung. Exp´ erimentation c´ er´ ebrale,
69(2):225–44, January 1988.
[83] Harold L. Hawkins, Steven A. Hillyard, Steven J. Luck, and Mustapha
Mouloua. Visual attention modulates signal detectability. Journal of Ex-
perimental Psychology: Human Perception and Performance, 16(4):802–811,
1990.
[84] B.J. He, A.Z. Snyder, J.L. Vincent, A. Epstein, G.L. Shulman, and M. Cor-
betta. Breakdown of functional connectivity in frontoparietal networks un-
derlies behavioral deficits in spatial neglect. Neuron, 53(6):905–918, 2007.
[85] JHenderson. Humangazecontrolduringreal-worldsceneperception. Trends
in Cognitive Sciences, 7(11):498–504, November 2003.
107
[86] J.M.Henderson, P.A.WeeksJr, andA.Hollingworth. Theeffectsofsemantic
consistency on eye movements during complex scene viewing. Journal of
Experimental Psychology: Human Perception and Performance, 25(1):210,
1999.
[87] B.Hidalgo-Sotelo, A.Oliva, andA.Torralba. HumanLearningofContextual
Priors for Object Search: Where does the time go? In 2005 IEEE Computer
Society Conference on Computer Vision and Pattern Recognition (CVPR’05)
- Workshops, volume 3, pages 86–86. IEEE.
[88] Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning
algorithmfordeepbeliefnets. Neural computation, 18(7):1527–54, July2006.
[89] J E Hoffman. Visual attention and eye movements. In H Pashler, editor,
Attention, volume 31, chapter 3, pages 119–153. Psychology Press, 1998.
[90] A J Hood, S C Amador, A E Cain, K A Briand, A H Al-Refai, M C Schiess,
and A B Sereno. Levodopa slows prosaccades and improves antisaccades:
an eye movement study in Parkinson’s disease. Journal of neurology, neuro-
surgery, and psychiatry, 78(6):565–570, June 2007.
[91] a Hyv¨ arinen, P O Hoyer, and M Inki. Topographic independent component
analysis. Neural computation, 13(7):1527–58, July 2001.
[92] I. Indovina and E. Macaluso. Dissociation of stimulus relevance and saliency
factors during shifts of visuospatial attention. Cerebral Cortex, 17(7):1701,
2007.
[93] Laurent Itti. Automatic Foveation for Video Compression Using a Neurobio-
logical Model of Visual Attention. IEEE Transactions on Image Processing,
13(10):1304–18, October 2004.
[94] Laurent Itti, N Dhavale, and F Pighin. Realistic Avatar Eye and Head An-
imation Using a Neurobiological Model of Visual Attention. In B Bosacchi,
D B Fogel, and J C Bezdek, editors, Proc. SPIE 48th Annual International
Symposium on Optical Science and Technology, volume 5200, pages 64–78,
Bellingham, WA, August 2003. SPIE Press.
[95] Laurent Itti, C. Koch, and E. Niebur. A model of saliency-based visual
attention for rapid scene analysis. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 20(11):1254–1259, November 1998.
[96] LaurentIttiandChristofKoch. Computationalmodellingofvisualattention.
Nature reviews. Neuroscience, 2(3):194–203, March 2001.
108
[97] WenyuJiangandRichardSimon. Acomparisonofbootstrapmethodsandan
adjustedbootstrapapproachforestimatingthepredictionerrorinmicroarray
classification. Statistics in medicine, 26(29):5320–5334, 2007.
[98] Andrew M Johnson, Quincy J Almeida, Con Stough, James C Thompson,
Rene Singarayer, and Mandar S Jog. Visual inspection time in Parkinson’s
disease: deficits in early stages of cognitive processing. Neuropsychologia,
42(5):577–583, 2004.
[99] K L Jones and D W Smith. Recognition of the fetal alcohol syndrome in
early infancy. Lancet, 302(7836):999–1001, November 1973.
[100] L M Jonkman, J L Kenemans, C Kemner, M N Verbaten, and H van Enge-
land. Dipole source localization of event-related brain activity indicative of
an early visual selective attention deficit in ADHD children. Clinical neuro-
physiology : official journal of the International Federation of Clinical Neu-
rophysiology, 115(7):1537–1549, July 2004.
[101] W O Kalberg, B Provost, S J Tollison, B G Tabachnick, L K Robinson,
H Eugene Hoyme, P M Trujillo, D Buckley, A S Aragon, and P A May.
Comparison of motor delays in young children with fetal alcohol syndrome to
those with prenatal alcohol exposure and with no prenatal alcohol exposure.
Alcoholism, ClinicalandExperimentalResearch,30(12):2037–2045,December
2006.
[102] C Karatekin. Eye tracking studies of normative and atypical development.
Developmental Review, 27(3):283–348, September 2007.
[103] Sabine Kastner, Mark A. Pinsk, Peter De Weerd, Robert Desimone, and
Leslie G. Ungerleider. Increased Activity in Human Visual Cortex during
Directed Attention in the Absence of Visual Stimulation. Neuron, 22(4):751–
761, April 1999.
[104] HKimmig,KHaussmann,TMergner,andCHLucking. Whatispathological
with gaze shift fragmentation in Parkinson’s disease? Journal of neurology,
249(6):683–692, June 2002.
[105] J.M. Kincade, R.A. Abrams, S.V. Astafiev, G.L. Shulman, and M. Corbetta.
An event-related functional magnetic resonance imaging study of voluntary
and stimulus-driven orienting of attention. The Journal of neuroscience,
25(18):4593, 2005.
[106] MKitagawa,JFukushima,andKTashiro. Relationshipbetweenantisaccades
and the clinical symptoms in Parkinson’s disease. Neurology, 44(12):2285–
2289, December 1994.
109
[107] C H Klein, A Raschke, and A Brandenbusch. Development of pro- and an-
tisaccades in children with attention-deficit hyperactivity disorder (ADHD)
and healthy controls. Psychophysiology, 40(1):17–28, January 2003.
[108] Raymond M. Klein. Inhibition of return. Trends in Cognitive Sciences,
4(4):138–147, April 2000.
[109] C Koch and S Ullman. Shifts in selective visual attention: towards the un-
derlying neural circuitry. Human neurobiology, 4(4):219–27, January 1985.
[110] P W Kodituwakku. Defining the behavioral phenotype in children with fetal
alcoholspectrumdisorders: areview. Neuroscienceandbiobehavioralreviews,
31(2):192–201, January 2007.
[111] Libbe Kooistra, Susan Crawford, Ben Gibbard, Bonnie J Kaplan, and Jin
Fan. Comparing Attentional Networks in fetal alcohol spectrum disorder
and the inattentive and combined subtypes of attention deficit hyperactivity
disorder. Developmental neuropsychology, 36(5):566–77, January 2011.
[112] Libbe Kooistra, Susan Crawford, Ben Gibbard, Barbara Ramage, and Bon-
nie J Kaplan. Differentiating attention deficits in children with fetal alcohol
spectrum disorder or attention-deficit-hyperactivity disorder. Developmental
medicine and child neurology, 52(2):205–11, February 2010.
[113] A Kori, N Miyashita, M Kato, O Hikosaka, S Usui, and M Matsumura. Eye
movementsinmonkeyswithlocaldopaminedepletioninthecaudatenucleus.
II. Deficits in voluntary saccades. The Journal of neuroscience : the official
journal of the Society for Neuroscience, 15(1 Pt 2):928–941, January 1995.
[114] Amy L Krain and F Xavier Castellanos. Brain development and ADHD.
Clinical psychology review, 26(4):433–44, August 2006.
[115] A A Kustov and D L Robinson. Shared neural control of attentional shifts
and eye movements. Nature, 384(6604):74–7, November 1996.
[116] StephenRHLangton,ChristopherO’Donnell,DeborahMRiby,andCarrieJ
Ballantyne. Gaze cues influence the allocation of attention in natural scene
viewing. Quarterly journal of experimental psychology (2006), 59(12):2056–
64, December 2006.
[117] Q V Le, R Monga, M Devin, K Chen, G S Corrado, J Dean, and A Y Ng.
Building High-level Features Using Large Scale Unsupervised Learning. Pro-
ceedings of the Twenty-Ninth International Conference on Machine Learning,
2012.
110
[118] QV Quoc Le, Jiquan Ngiam, Zhenghao Chen, Daniel Jin hao Chia, Pang Wei
Koh, Andrew Y. Ng, and Daniel Chia. Tiled convolutional neural networks.
In J Lafferty, C K I Williams, J Shawe-Taylor, R S Zemel, and A Culotta,
editors, Advances in Neural Information Processing Systems 23, pages 1279–
1287. 2010.
[119] D K Lee, L Itti, C Koch, and J Braun. Attention activates winner-take-all
competition among visual filters. Nature neuroscience, 2(4):375–81, April
1999.
[120] Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. Convo-
lutionaldeepbeliefnetworksforscalableunsupervisedlearningofhierarchical
representations. In Proceedings of the 26th Annual International Conference
on Machine Learning - ICML ’09, pages 1–8, New York, New York, USA,
June 2009. ACM Press.
[121] Honglak Lee, Peter Pham, Yan Largman, and Andrew Ng. Unsupervised
feature learning for audio classification using convolutional deep belief net-
works. In Y Bengio, D Schuurmans, J Lafferty, C K I Williams, and A Cu-
lotta, editors, Advances in Neural Information Processing Systems 22, pages
1096–1104. 2009.
[122] SELeh,MPetrides,andAPStrafella. Theneuralcircuitryofexecutivefunc-
tions in healthy subjects and Parkinson’s disease. Neuropsychopharmacology
: official publication of the American College of Neuropsychopharmacology,
35(1):70–85, January 2010.
[123] F Levy. The dopamine theory of attention deficit hyperactivity disorder
(ADHD). The Australian and New Zealand Journal of Psychiatry, 25(2):277–
283, June 1991.
[124] Muriel Deutsch Lezak. Neuropsychological Assessment. Oxford University
Press, USA; 3 edition, 1995.
[125] Z Li and Laurent Itti. Saliency and Gist Features for Target Detection in
Satellite Images. IEEE transactions on image processing : a publication of
the IEEE Signal Processing Society, December 2010.
[126] Zhaoping Li. A saliency map in primary visual cortex. Trends in Cognitive
Sciences, 6(1):9–16, January 2002.
[127] K Lieb, S Brucker, M Bach, T Els, C H Lucking, and M W Greenlee. Impair-
ment in preattentive visual processing in patients with Parkinson’s disease.
Brain : a journal of neurology, 122 ( Pt 2(Pt 2):303–313, February 1999.
111
[128] T. Liu. Cortical Mechanisms of Feature-based Attentional Control. Cerebral
Cortex, 13(12):1334–1343, December 2003.
[129] Shanley Donelan Mangeot, Lucy Jane Miller, Daniel N McIntosh, Jude
McGrath-Clarke,JodySimon,RandiJHagerman,andEdwardGoldson. Sen-
sory modulation dysfunction in children with attention-deficithyperactivity
disorder. Developmental Medicine and Child Neurology, 43(06):399, June
2001.
[130] K. Marek, R. Innis, C. van Dyck, B. Fussell, M. Early;, S. Eberly, D. Oakes,
and J. Seibyl. [123I]-CIT SPECT imaging assessment of the rate of Parkin-
son’s disease progression. Neurology, 57(11):2089–2094, December 2001.
[131] Lene Martin, Eva Aring, Magnus Landgren, Ann Hellstr¨ om, and Marita An-
dersson Gr¨ onlund. Visual fields in children with attention-deficit / hyperac-
tivity disorder before and after treatment with stimulants. Acta ophthalmo-
logica, 86(3):259–64, May 2008.
[132] Sarah N Mattson, Nicole Crocker, and Tanya T Nguyen. Fetal alcohol spec-
trumdisorders: neuropsychologicalandbehavioralfeatures. Neuropsychology
review, 21(2):81–101, June 2011.
[133] Sarah N Mattson, Terry L Jernigan, and Edward P Riley. MRI and prenatal
alcohol exposure: Images provide insight into FAS. Alcohol Health Research
World, 18(1):49–52, 1994.
[134] Carrie J. McAdams and John H.R. Maunsell. Effects of Attention on
Orientation-Tuning Functions of Single Neurons in Macaque Cortical Area
V4. J. Neurosci., 19(1):431–441, January 1999.
[135] K. McAlonan, J. Cavanaugh, and R.H. Wurtz. Guarding the gateway to
cortex: attention in visual thalamus. Nature, 456(7220):391, 2008.
[136] E M Meintjes, J L Jacobson, C D Molteno, J C Gatenby, C Warton, C J
Cannistraci, H E Hoyme, L K Robinson, N Khaole, J C Gore, and S W
Jacobson. AnFMRIstudyofnumberprocessinginchildrenwithfetalalcohol
syndrome. Alcoholism, Clinical and Experimental Research, 34(8):1450–1464,
August 2010.
[137] IG Meister, M Wienemann, and D Buelte. Hemiextinction induced by tran-
scranial magnetic stimulation over the right temporo-parietal junction. Neu-
roscience, 2006.
[138] R Milanese. Integration of bottom-up and top-down cues for visual attention
using non-linear relaxation. In Proceedings of IEEE Conference on Computer
112
Vision and Pattern Recognition CVPR-94, pages 781–785. IEEE Comput.
Soc. Press, 1994.
[139] T. Moore and K.M. Armstrong. Selective gating of visual signals by micros-
timulation of frontal cortex. Nature, 421(6921):370–373, 2003.
[140] J Moran and R Desimone. Selective attention gates visual processing in
the extrastriate cortex. Science (New York, N.Y.), 229(4715):782–4, August
1985.
[141] PKMorrish, JSRakshi, DLBailey, GVSawle, andDJBrooks. Measuring
the rate of progression and estimating the preclinical period of Parkinson’s
disease with [18F]dopa PET. Journal of Neurology, Neurosurgery & Psychi-
atry, 64(3):314–319, March 1998.
[142] B. C. Motter. Focal attention produces spatially selective processing in vi-
sual cortical areas V1, V2, and V4 in the presence of competing stimuli. J
Neurophysiol, 70(3):909–919, September 1993.
[143] J.R. M¨ uller, M.G. Philiastides, and W.T. Newsome. Microstimulation of the
superior colliculus focuses attention without moving the eyes. Proceedings of
theNationalAcademyofSciencesoftheUnitedStatesofAmerica,102(3):524,
2005.
[144] DouglasPMunoz. Commentary: saccadiceyemovements: overviewofneural
circuitry. Progress in brain research, 140:89–96, January 2002.
[145] Douglas P Munoz, Irene T Armstrong, Karen a Hampton, and Kim-
berly D Moore. Altered control of visual fixation and saccadic eye move-
ments in attention-deficit hyperactivity disorder. Journal of neurophysiology,
90(1):503–14, July 2003.
[146] Douglas P Munoz, J R Broughton, J E Goldring, and I T Armstrong. Age-
related performance of human subjects on saccadic eye movement tasks.
Experimental brain research. Experimentelle Hirnforschung. Exp´ erimentation
c´ er´ ebrale, 121(4):391–400, August 1998.
[147] Douglas P Munoz and Stefan Everling. Look away: the anti-saccade task
and the voluntary control of eye movement. Nature reviews. Neuroscience,
5(3):218–28, March 2004.
[148] TNakamura, AMBronstein, CLueck, CDMarsden, andPRudge. Vestibu-
lar, cervical and visual remembered saccades in Parkinson’s disease. Brain,
117 ( Pt 6:1423–1432, December 1994.
113
[149] J.L.NansonandM.Hiscock. AttentionDeficitsinChildrenExposedtoAlco-
hol Prenatally. Alcoholism: Clinical and Experimental Research, 14(5):656–
661, October 1990.
[150] Davide Nardo, Valerio Santangelo, and Emiliano Macaluso. Stimulus-driven
orienting of visuo-spatial attention in complex dynamic environments. Neu-
ron, 69(5):1015–28, March 2011.
[151] V. Navalpakkam and L. Itti. An Integrated Model of Top-Down and Bottom-
Up Attention for Optimizing Detection Speed. In 2006 IEEE Computer So-
ciety Conference on Computer Vision and Pattern Recognition - Volume 2
(CVPR’06), volume 2, pages 2049–2056. IEEE, 2006.
[152] Joel T Nigg, Rosemary Tannock, and Luis a Rohde. What is to be the fate
of ADHD subtypes? An introduction to the special section on research on
the ADHD subtypes and implications for the DSM-V. Journal of clinical
child and adolescent psychology : the official journal for the Society of Clin-
ical Child and Adolescent Psychology, American Psychological Association,
Division 53, 39(6):723–5, January 2010.
[153] Elizabeth D O’Hare, Lisa H Lu, Suzanne M Houston, Susan Y Bookheimer,
Sarah N Mattson, Mary J O’Connor, and Elizabeth R Sowell. Altered
frontal-parietal functioning during verbal working memory in children and
adolescents with heavy prenatal alcohol exposure. Human brain mapping,
30(10):3200–8, October 2009.
[154] Atrde Oliva, A. Torralba, M.S. Castelhano, and J.M. Henderson. top-down
control of visual attention in object detection. In Image Processing, 2003.
ICIP 2003. Proceedings. 2003 International Conference on, volume 1, pages
I–253. IEEE, 2003.
[155] Aude Oliva, Jeremy M. Wolfe, and Helga C. Arsenio. Panoramic search: the
interaction of memory and vision in search through a familiar scene. Journal
of experimental psychology. Human perception and performance, 30(6):1132–
46, December 2004.
[156] R. M Oliveira, J. M Gurd, P. Nixon, J. C Marshall, and R. E Passingham.
Micrographia in Parkinson’s disease: the effect of providing external cues.
Journal of Neurology, Neurosurgery & Psychiatry, 63(4):429–433, October
1997.
[157] B A Olshausen and D J Field. Emergence of simple-cell receptive field prop-
erties by learning a sparse code for natural images. Nature, 381(6583):607–9,
June 1996.
114
[158] I. Opris, A. Barborica, and V.P. Ferrera. Microstimulation of the dorsolat-
eral prefrontal cortex biases saccade target selection. Journal of cognitive
neuroscience, 17(6):893–904, 2005.
[159] S Parush, H Sohmer, A Steinberg, and M Kaitz. Somatosensory function-
ing in children with attention deficit hyperactivity disorder. Developmental
medicine and child neurology, 39(7):464–8, July 1997.
[160] S Parush, H Sohmer, A Steinberg, and M Kaitz. Somatosensory function in
boyswithADHDandtactiledefensiveness. Physiology&behavior,90(4):553–
8, March 2007.
[161] RJ Peters and Laurent Itti. Congruence between model and human atten-
tion reveals unique signatures of critical visual events. Advances in neural
information processing systems, pages 1–8, 2007.
[162] Robert J. Peters and Laurent Itti. Beyond bottom-up: Incorporating task-
dependentinfluencesintoacomputationalmodelofspatialattention. In2007
IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8.
IEEE, June 2007.
[163] Charles Pierrot-Deseilligny, Dan Milea, and Ren´ e M M¨ uri. Eye movement
control by the cerebral cortex. Current opinion in neurology, 17(1):17–25,
February 2004.
[164] P M Pollux and C Robertson. Voluntary and automatic visual spatial shifts
of attention in Parkinson’s disease: an analysis of costs and benefits. Journal
of clinical and experimental neuropsychology, 23(5):662–670, October 2001.
[165] M I Posner, C R Snyder, and B J Davidson. Attention and the detection of
signals. Journal of experimental psychology, 109(2):160–74, June 1980.
[166] C. C. Pratt. Helmholtz’s treatise on physiological optics. Journal of Applied
Psychology, 10(2):294–295, 1926.
[167] R Prenger and MCK Wu. Nonlinear V1 responses to natural scenes revealed
by neural network analysis. Neural Networks, 2004.
[168] O Rascol, M Clanet, J L Montastruc, M Simonetta, M J Soulier-Esteve,
B Doyon, and A Rascol. Abnormal ocular movements in Parkinson’s disease.
Evidence for involvement of dopaminergic systems. Brain : a journal of
neurology, 112 ( Pt 5(5):1193–214, October 1989.
[169] O Rascol, U Sabatini, M Simonetta-Moreau, J L Montastruc, A Rascol, and
M Clanet. Square wave jerks in parkinsonian syndromes. Journal of neurol-
ogy, neurosurgery, and psychiatry, 54(7):599–602, July 1991.
115
[170] Carmen Rasmussen. Executive functioning and working memory in fetal
alcohol spectrum disorder. Alcohol Clin Exp Res, 29(8):1359–1367, August
2005.
[171] S.M. M Ravizza and R.B. B Ivry. Comparison of the basal ganglia and cere-
bellum in shifting attention. Journal of Cognitive Neuroscience, 13(3):285–
297, April 2001.
[172] D Regan and C Maxner. Orientation-selective visual loss in patients with
Parkinson’s disease. Brain : a journal of neurology, 110 ( Pt 2(Pt 2):415–
432, April 1987.
[173] W Reichardt. Evaluation of optical motion information by movement detec-
tors. J Comp Physiol A, 161(4):533–547, September 1987.
[174] P Reinagel and A M Zador. Natural scene statistics at the centre of gaze.
Network (Bristol, England), 10(4):341–50, November 1999.
[175] J H Reynolds, T Pasternak, and R Desimone. Attention increases sensitivity
of V4 neurons. Neuron, 26(3):703–14, June 2000.
[176] John H. Reynolds, Leonardo Chelazzi, and Robert Desimone. Competitive
Mechanisms Subserve Attention in Macaque Areas V2 and V4. J. Neurosci.,
19(5):1736–1753, March 1999.
[177] JochemWRieger, AleanderKim, MiklosArgyelan, MarkFarber, SofyaGlaz-
man, Marc Liebeskind, Thomas Meyer, and Ivan Bodis-Wollner. Cortical
functional anatomy of voluntary saccades in Parkinson disease. Clinical EEG
and neuroscience : official journal of the EEG and Clinical Neuroscience
Society (ENCS), 39(4):169–74, October 2008.
[178] Edward P Riley and Christie L McGee. Fetal alcohol spectrum disorders:
an overview with emphasis on changes in brain and behavior. Experimental
biology and medicine (Maywood, N.J.), 230(6):357–65, June 2005.
[179] DL Ringach. Receptive field structure of neurons in monkey primary visual
cortex revealed by stimulation with natural image sequences. Journal of
Vision, 2002.
[180] D L Robinson and S E Petersen. The pulvinar and visual salience. Trends in
neurosciences, 15(4):127–32, April 1992.
[181] LM Romanski, M. Giguere, JF Bates, and PS Goldman-Rakic. Topographic
organization of medial pulvinar connections with the prefrontal cortex in
the rhesus monkey. The Journal of Comparative Neurology, 379(3):313–332,
1997.
116
[182] A.F. Rossi, N.P. Bichot, R. Desimone, and L.G. Ungerleider. Topdown at-
tentional deficits in macaques with lesions of lateral prefrontal cortex. The
Journal of Neuroscience, 27(42):11306, 2007.
[183] Albert L. Rothenstein and John K. Tsotsos. Attention links sensing to recog-
nition. Image and Vision Computing, 26(1):114–126, January 2008.
[184] Katya Rubia. ”Cool” inferior frontostriatal dysfunction in attention-
deficit/hyperactivity disorder versus ”hot” ventromedial orbitofrontal-limbic
dysfunction in conduct disorder: a review. Biological psychiatry, 69(12):e69–
87, June 2011.
[185] BJScholl, ZWPylyshyn, andJFeldman. Whatisavisualobject? Evidence
from target merging in multiple object tracking. Cognition, 80(1-2):159–77,
June 2001.
[186] J B Schweitzer, T L Faber, S T Grafton, L E Tune, J M Hoffman, and C D
Kilts. Alterations in the functional anatomy of working memory in adult
attention deficit hyperactivity disorder. Am J Psychiatry, 157(2):278–280,
February 2000.
[187] Philippe G Schyns and Aude Oliva. FROM BLOBS TO BOUNDARY
EDGES:. Evidence for Time- and Spatial-Scale-Dependent Scene Recogni-
tion. Psychological Science, 5(4):195–200, July 1994.
[188] M. A. Segraves. Activity of monkey frontal eye field neurons projecting to
oculomotor regions of the pons. J Neurophysiol, 68(6):1967–1985, December
1992.
[189] J.T.Serences,S.Shomstein,A.B.Leber,X.Golay,H.E.Egeth,andS.Yantis.
Coordination of voluntary and stimulus-driven attentional control in human
cortex. Psychological Science, 16(2):114, 2005.
[190] AGShaikh,MXu-Wilson,SGrill,andDSZee. ’Staircase’square-wavejerks
inearlyParkinson’sdisease. The British journal of ophthalmology,95(5):705–
709, May 2011.
[191] H Shibasaki, S Tsuji, and Y Kuroiwa. Oculomotor abnormalities in Parkin-
son’s disease. Arch Neurol, 36(6):360–364, June 1979.
[192] Stewart Shipp. The brain circuitry of attention. Trends in cognitive sciences,
8(5):223–30, May 2004.
[193] G.L. Shulman, M.P. McAvoy, M.C. Cowan, S.V. Astafiev, A.P. Tansy,
G. D’Avossa, and M. Corbetta. Quantitative analysis of attention and de-
tection signals during visual search. Journal of Neurophysiology, 90(5):3384,
2003.
117
[194] Gordon L. Shulman, Roger W. Remington, and John P. McLean. Moving
attention through visual space. Journal of Experimental Psychology: Human
Perception and Performance, 5(3):522–526, 1979.
[195] K G Sieg, G R Gaffney, D F Preston, and J A Hellings. SPECT brain
imaging abnormalities in attention deficit hyperactivity disorder. Clin Nucl
Med, 20(1):55–60, January 1995.
[196] Roger W Simmons, Jennifer D Thomas, Susan S Levy, and Edward P Riley.
Motor response selection in children with fetal alcohol spectrum disorders.
Neurotoxicology and teratology, 28(2):278–85, 2006.
[197] D J Simons and D T Levin. Change blindness. Trends in cognitive sciences,
1(7):261–7, October 1997.
[198] Evan C Smith and Michael S Lewicki. Efficient auditory coding. Nature,
439(7079):978–82, February 2006.
[199] D Smyth and B Willmore. The receptive-field organization of simple cells in
primary visual cortex of ferrets under natural scene stimulation. The Journal
of ..., 2003.
[200] JC Snow and HA Allen. Impaired attentional selection following lesions to
human pulvinar: evidence for homology between human and monkey. Pro-
ceedings of the ..., 2009.
[201] E Sowell. Mapping Cortical Gray Matter Asymmetry Patterns in Adoles-
cents with Heavy Prenatal Alcohol Exposure. NeuroImage, 17(4):1807–1819,
December 2002.
[202] E R Sowell, P M Thompson, S N Mattson, K D Tessner, T L Jernigan, E P
Riley,andAWToga. Voxel-basedmorphometricanalysesofthebraininchil-
dren and adolescents prenatally exposed to alcohol. Neuroreport, 12(3):515–
23, March 2001.
[203] E R Sowell, Paul M Thompson, Sarah N Mattson, Kevin D Tessner, Terry L
Jernigan, Edward P Riley, and Arthur W Toga. Regional brain shape ab-
normalities persist into adolescence after heavy prenatal alcohol exposure.
Cerebral cortex (New York, N.Y. : 1991), 12(8):856–65, August 2002.
[204] E.R. Sowell, S.N. Mattson, P.M. Thompson, T.L. Jernigan, E.P. Riley, and
A.W. Toga. Mapping callosal morphology and cognitive correlates: Effects
of heavy prenatal alcohol exposure. Neurology, 57(2):235–244, July 2001.
[205] Andrea D Spadoni, Christie L McGee, Susanna L Fryer, and Edward P Riley.
Neuroimaging and fetal alcohol spectrum disorders. Neurosci Biobehav Rev,
31(2):239–245, 2007.
118
[206] H Spitzer, R Desimone, and J Moran. Increased attention enhances
both behavioral and neuronal performance. Science (New York, N.Y.),
240(4850):338–40, May 1988.
[207] YasuoTerao, HidekiFukuda, AkihiroYugeta, OkihideHikosaka, YoshikoNo-
mura,MasayaSegawa,RitsukoHanajima,ShojiTsuji,andYoshikazuUgawa.
Initiation and inhibitory control of saccades with the progression of Parkin-
son’s disease - changes in three major drives converging on the superior col-
liculus. Neuropsychologia, 49(7):1794–806, June 2011.
[208] KGThompsonandJDSchall. Antecedentsandcorrelatesofvisualdetection
andawarenessinmacaqueprefrontalcortex. Visionresearch,40(10-12):1523–
38, January 2000.
[209] K.G.ThompsonandN.P.Bichot. Avisualsaliencemapintheprimatefrontal
eye field. Progress in brain research, 147:249–262, 2005.
[210] G. Tissingh, P. Bergmans, J. Booij, A. Winogrodzka, E. A. van Royen, J. C.
Stoof, and E. C. Wolters. Drug-naive patients with Parkinson’s disease in
HoehnandYahrstagesIandIIshowabilateraldecreaseinstriataldopamine
transporters as revealed by [ 123 I]β-CIT SPECT. Journal of Neurology,
245(1):14–20, December 1997.
[211] G Tissingh, J Booij, P Bergmans, A Winogrodzka, A G Janssen, E A van
Royen, J C Stoof, and E C Wolters. Iodine-123-N-omega-fluoropropyl-2beta-
carbomethoxy-3beta-(4-iod ophenyl)tropane SPECT in healthy controls and
early-stage, drug-naive Parkinson’s disease. Journal of nuclear medicine :
official publication, Society of Nuclear Medicine, 39(7):1143–8, July 1998.
[212] JJ Todd. Visual short-term memory load suppresses temporo-parietal junc-
tionactivityandinducesinattentionalblindness. Psychological Science,2005.
[213] Antonio Torralba. Modeling global scene factors in attention. Journal of the
Optical Society of America. A, Optics, image science, and vision, 20(7):1407–
18, July 2003.
[214] AntonioTorralba,AudeOliva,MonicaSCastelhano,andJohnMHenderson.
Contextualguidanceofeyemovementsandattentioninreal-worldscenes: the
role of global features in object search. Psychological review, 113(4):766–86,
October 2006.
[215] J.Touryan,G.Felsen,andY.Dan. Spatialstructureofcomplexcellreceptive
fields measured with natural images. Neuron, 45(5):781–791, 2005.
[216] A.M. Treisman and G. Gelade. A feature-integration theory of attention.
Cognitive psychology, 12(1):97–136, 1980.
119
[217] Po-He Tseng, Ian G M Cameron, Giovanna Pari, James N Reynolds, Dou-
glas P Munoz, and Laurent Itti. High-throughput classification of clinical
populations from natural viewing eye movements. Journal of neurology, Au-
gust 2012.
[218] Po-HeTseng, RanCarmi, IanGMCameron, DouglasPMunoz, andLaurent
Itti. Quantifying center bias of observers in free viewing of dynamic natural
scenes. Journal of vision, 9(7):4, January 2009.
[219] E Y Uc, M Rizzo, S W Anderson, S Qian, R L Rodnitzky, and J D Daw-
son. Visual dysfunction in Parkinson disease without dementia. Neurology,
65(12):1907–1913, December 2005.
[220] C J Vaidya, G Austin, G Kirkorian, H W Ridlehuber, J E Desmond, G H
Glover, and J D Gabrieli. Selective effects of methylphenidate in attention
deficit hyperactivity disorder: a functional magnetic resonance study. Pro-
ceedings of the National Academy of Sciences of the United States of America,
95(24):14494–9, November 1998.
[221] O. van der Stelt, M. van der Molen, W. Boudewijn Gunning, and Al-
bert Kok. Neuroelectrical signs of selective attention to color in boys with
attention-deficit hyperactivity disorder. Brain research. Cognitive brain re-
search, 12(2):245–64, October 2001.
[222] S van der Stigchel, N N J Rommelse, J B Deijen, C J A Geldof, J Witlox,
J Oosterlaan, J A Sergeant, and J Theeuwes. Oculomotor capture in ADHD.
Cogn Neuropsychol, 24(5):535–549, July 2007.
[223] a I Vermersch, S Rivaud, M Vidailhet, a M Bonnet, B Gaymard, Y Agid, and
C Pierrot-Deseilligny. Sequences of memory-guided saccades in Parkinson’s
disease. Annals of neurology, 35(4):487–90, April 1994.
[224] Marie Vidailhet, Sophie Rivaud, N. Gouider-Khouja, Bernard Pillon, A.M.
Bonnet, Bertrand Gaymard, Yves Agid, and C. Pierrot-Deseilligny. Eye
movements in parkinsonian syndromes. Annals of neurology, 35(4):420–426,
1994.
[225] WEVinjeandJLGallant. Sparsecodinganddecorrelationinprimaryvisual
cortex during natural vision. Science (New York, N.Y.), 287(5456):1273–6,
February 2000.
[226] Masayuki Watanabe and Douglas P Munoz. Probing basal ganglia func-
tions by saccade eye movements. The European journal of neuroscience,
33(11):2070–90, June 2011.
120
[227] Frank S Werblin. The retinal hypercircuit: a repeating synaptic interactive
motifunderlyingvisualfunction. TheJournalofphysiology,589(Pt15):3691–
702, August 2011.
[228] James R West. Alcohol and Brain Development. Oxford University Press,
USA, 1986.
[229] O B White, J A Saint-Cyr, R D Tomlinson, and J A Sharpe. Ocular motor
deficits in Parkinson’s disease. II. Control of the saccadic and smooth pursuit
systems. Brain : a journal of neurology, 106 (Pt 3):571–87, September 1983.
[230] ToriWillifordandJohnHRMaunsell. Effectsofspatialattentiononcontrast
responsefunctionsinmacaqueareaV4. Journalofneurophysiology,96(1):40–
54, July 2006.
[231] S P Wise, E A Murray, and C R Gerfen. The frontal cortex-basal ganglia
system in primates. Critical reviews in neurobiology, 10(3-4):317–56, January
1996.
[232] J M Wolfe, G A Alvarez, and T S Horowitz. Attention is fast but volition is
slow. Nature, 406(6797):691, August 2000.
[233] Jeremy M. Wolfe. Guided Search 2.0 A revised model of visual search. Psy-
chonomic Bulletin & Review, 1(2):202–238, June 1994.
[234] Jeremy M. Wolfe, Kyle R. Cave, and Susan L. Franzel. Guided search: an
alternative to the feature integration model for visual search. Journal of
experimental psychology. Human perception and performance, 15(3):419–33,
August 1989.
[235] Thilo Womelsdorf, Katharina Anton-Erxleben, Florian Pieper, and Stefan
Treue. Dynamic shifts of visual receptive fields in cortical area MT by spatial
attention. Nature neuroscience, 9(9):1156–60, September 2006.
[236] Jeffrey R Wozniak and Ryan L Muetzel. What does diffusion tensor imaging
reveal about the brain and cognition in fetal alcohol spectrum disorders?
Neuropsychology review, 21(2):133–47, June 2011.
[237] MJ J Wright, RJ J Burns, G M Geffen, and L B Geffen. Covert orientation
of visual attention in Parkinson’s disease: an impairment in the maintenance
of attention. Neuropsychologia, 28(2):151–159, 1990.
[238] YalingYang, FlorenceRoussotte, EricKan, KathleenKSulik, SarahNMatt-
son, Edward P Riley, Kenneth L Jones, Colleen M Adnams, Philip a May,
Mary J O’Connor, Katherine L Narr, and Elizabeth R Sowell. Abnormal
cortical thickness alterations in fetal alcohol spectrum disorders and their
121
relationships with facial dysmorphology. Cerebral cortex (New York, N.Y. :
1991), 22(5):1170–9, May 2012.
[239] StevenYantisandJohnJonides. Abruptvisualonsetsandselectiveattention:
voluntary versus automatic allocation. Journal of experimental psychology.
Human perception and performance, 16(1):121–34, February 1990.
[240] Steven Yantis, Jens Schwarzbach, John T Serences, Robert L Carlson,
Michael A Steinmetz, James J Pekar, and Susan M Courtney. Transient neu-
ral activity in human parietal cortex during spatial attention shifts. Nature
neuroscience, 5(10):995–1002, October 2002.
[241] Y Yeshurun and M Carrasco. Attention improves or impairs visual perfor-
mance by enhancing spatial resolution. Nature, 396(6706):72–5, November
1998.
[242] A J Zametkin, T E Nordahl, M Gross, A C King, W E Semple, J Rumsey,
S Hamburger, and R M Cohen. Cerebral glucose metabolism in adults with
hyperactivity of childhood onset. The New England journal of medicine,
323(20):1361–6, November 1990.
[243] Xin Zhou and David P Tuck. MSVM-RFE: extensions of SVM-RFE for mul-
ticlass gene selection on DNA microarray data. Bioinformatics, 23(9):1106–
1114, May 2007.
122
Appendix A
The Impact of Maturation and Aging on
Mechanisms of Attentional Selection
This appendix provides details of the work - P. Tseng, I. G. M. Cameron, D. P.
Munoz, L. Itti, Effects of development on low-level feature processing during nat-
ural viewing of dynamic scenes, In: Proc. Vision Science Society Annual Meeting
(VSS11), May 2011.
Eye movements have been widely used to examine many aspects of brain func-
tions, such as reflexive response, inhibitory controls, and working memory, in nor-
maldevelopment. However, thereislittleresearchinwhethertheobserveddevelop-
mental differences using artificial stimuli can be scaled to natural stimuli, and it is
unclear how normal development affects eye movements of natural viewing behav-
ior,especiallyhowtherelativeroleschangebetweentop-down(context-guided)and
bottom-up (stimulus-driven) processes in guiding attention when viewing complex
natural scenes.
This study thus examined the developmental trajectory of (1) low-level vi-
sual feature processing (bottom-up attention), (2) overall attention deployment
(bottom-up and top-down attention), and (3) oculomotor saccade dynamics, while
123
participants freely viewed videos of natural scenes. Bottom-up process was esti-
matedbythecorrelationbetweenobservers’overtattentionshiftandsaliencymaps,
which were topographic maps that predicted visually conspicuous locations of a
stimulus based on low-level image properties. A computational saliency model [96]
wasusedtocomputebottom-upsaliencymapsforeachvideoframe. Thesesaliency
maps were computed from either a single visual attribute (color contrast, intensity
contrast, oriented edge, temporal flicker, and motion contrast) or a combination of
them (section 2.7:Computing Saliency Maps from Stimuli). The overall attention
deploymentwasassessedbythesimilaritiesofinstantaneousgazepositionsbetween
observers of different age groups (inter-observer similarities). The oculomotor sac-
cade dynamics reflected observers’ motor functions by measures of saccade ampli-
tude, saccadepeakvelocity, andinter-saccadeinterval. Thecorrelations(quantified
by AUC based on the first 5 saccades after scene onset) between the gaze of each
participant and the saliency/similarity maps were computed as described in sec-
tion 2.7:Computing Features, and the saccade dynamics of each participant were
represented by the median of each measure.
The videos that participants watched in this study were composed of short
(24 seconds), unrelated clips. This design was to reduce top-down expectation
and to magnify the difference in gaze allocation at every scene change (section
2.7:Stimuli). Gazes of 3 groups of participants (18 children, 10.7±1.8 yr; 18 young
adults, 23.2±2.6 yr; 24 elderly, 70.3±7.5 yr; Table 2.1 for demographic data) were
trackedwhiletheywatchedthevideosfor20minutes(section2.7:DataAcquisition).
To reveal the developmental trajectory, one-way ANOVA was performed on each
measure of the three age groups, and post-hoc multiple comparisons (two-tail two-
sample t-test) were done if statistical signficance was revealed by ANOVA. All
reported p-values were bonferroni-corrected.
124
For oculomotor functions, the saccade amplitude (F(2,297) = 12.64,p< 0.01),
saccade peak velocity (F(2,297) = 8.88,p < 0.01), and inter-saccade interval
(F(2,297) = 14.75,p < 0.01) all differed significantly among the three age groups
(Fig. A.1a). Specifically, as the brain matured, young adults exhibited shorter
inter-saccade interval than children (t(178) = 3.69,p < 0.01). A few pro/anti sac-
cade studies showed that children’s saccadic peak velocity of prosaccades (external-
guided saccades) was as fast as young adults by age 4-5 [71,146], but children’s
reaction times for both pro/anti saccade tasks and the error rates of antisac-
cade (internal-guided saccades) had not reached adult level until much later (12-
20 years old [71,146]). Therefore, the longer inter-saccade interval may result
from faster visual processing time, faster decision, and/or faster saccade initia-
tion. On the other hand, as the brain aged, the saccade peak velocity slowed down
(t(208) = −4.11,p < 0.01) with indifferent saccade amplitude, which was consis-
tentwithearlierstudiesshowingthatpeakvelocityofthesaccadesdidnotdecrease
until age 60 [146].
For estimated bottom-up process (Fig. A.1b), ANOVA revealed significant
differences among the three age groups regardless the low-level visual attributes
(F(2,297) ≥ 15.08,p < 0.01) or the combination of these attributes (F(2,297) =
30.41,p < 0.01). Moreover, children looked at locations with lower salience more
often than young adults (t(178) ≤ −4.77,p < 0.01) and elderly adults (t(208) ≤
−4.28,p< 0.01) when free viewing natural scenes (no given cognitive tasks). Con-
sidering that in children, improvements of prosaccade reaction times do not reach
a plateau until age 12-20 [71,146], it is possible that their visual system (e.g., con-
trast sensitivity) had not been fully developed yet. Furthermore, natural scenes
were much more complex than a bright dot used in prosaccade task, the children
125
child young elderly
200
300
400
*
*
inter−saccade interval
time (ms)
child young elderly
6
7
8
*
saccade amplitude
amplitude (degree)
child young elderly
200
250
300
*
saccadic peak velocity
velocity (degree/sec)
(a)
0.6
0.65
0.7
0.75
*
*
color
child young elderly
0.6
0.65
0.7
0.75
*
*
motion
0.6
0.65
0.7
0.75
*
*
"icker
child young elderly
0.6
0.65
0.7
0.75
*
*
intensity
saliency score (AUC)
0.6
0.65
0.7
0.75
*
*
orientation
0.6
0.65
0.7
0.75
*
CIOFM
*
child young elderly
(b)
child young elderly
0.8
0.9
1
*
*
inter-observer similarities
similairty (AUC)
*
to young adults
within group
(c)
Figure A.1: Developmental trajectory of (a) oculomotor saccade dynamics, (b)
low-level visual feature processing, and (c) overall attention deployment. (*, p <
0.05)
126
may still under development in learning how to extract visual information effec-
tively. It was also possible that the children’s under-developed top-down attention
guidance (longer reaction time and higher error rate than young adults in the an-
tisaccade task [71,146]) may influence bottom-up attention process. Nevertheless,
our results were in contrast to [1], which reported children fixated on locations of
higher salience while free viewing natural images followed by a patch-recognition
task. This discrepency may be due to whether this additional patch-recognition
task was given. When children were asked to recognize image patches, they relied
more on local image patch properties compared to older adults [1]. However, when
there was no given task, they were freely exploring the natural scenes, and may
not rely on low-level visual attributes as much as they did when memorizing image
patched.
For overall attention allocation (Fig. A.1c), ANOVA showed the inter-observer
simialritiesdifferedsignificantlyamongthethreeagegroups(F(2,297) = 57.55,p<
0.01),butthesimilaritieswithineachgroupwerenotdifferent. Specifically,children
allocated attention very differently from young adults (t(178) =−11.12,p< 0.01),
but similar to other children. Similarly, elderly adults deployed their attention
different from young adults (t(208) =−8.14,p< 0.01), but similar to other elderly
adults. Moreover, their attention allocation was more similar to young adults as
compared to children (t(208) =−3.31,p = 0.036).
In summary, this study showed that as the brain matured, visual processing
time, decision-making time, and/or saccade initiation time was shortened, which
resulted in short inter-saccade interval. Young adults looked more toward highly
salient locations during natural viewing possibly due to a better balance between
top-down and bottom-up attention guidance, and/or improved visual information
extraction. Asthebrainaged,bottom-upattentionwasaseffectiveasthatofyoung
127
adults during free viewing when no tasks were given. However, despite overall
attention guidance changed significantly (persumably by top-down attention) in
the elderly, they were still more similar to young adults than to children.
128
Appendix B
Factors that Guide Attentional Allocation during
Natural Viewing
Please see my work [218]:
P.Tseng, R.Carmi, I.G.M.Cameron, D.P.Munoz, L.Itti, Quantifyingcenter
bias of observers in free viewing of dynamic natural scenes, Journal of Vision, Vol.
9, No. 7:4, pp. 1-16, July 2009. [2007 impact factor: 3.791] (Cited by 39)
129
Abstract (if available)
Abstract
A significant problem in clinical diagnosis of certain neurobehavioral disorders is the overlap in observed behavioral deficits, which extensively complicates diagnosis by requiring additional neuropsychometric testing. The thesis proposes and validates a novel method to reliably differentiate normal controls from patients with neurobehavioral attention deficits (Attention Deficit Hyperactivity Disorder, Fetal Alcohol Spectrum Disorder, and Parkinson's disease). This method alleviates the need for complex task instructions and instead quantifies how these populations (both patients and controls) deploy their overt attention differently while they freely view natural scene videos. We used a computational model of visual salience to analyze the videos, and the correlations between salience and participants eye movements were computed. Then these correlations as well as saccade statistics and inter-observer similarities were fed into classifiers, which not only reliably differentiated these populations but also identified the most discriminative features. This proposed method can be used easily with populations less able to follow structured tasks, and the low-cost and high-throughput nature of the method makes it viable as an unique new quantitative screening tool for clinical disorders. Moreover, the most discriminative features discovered by the method also provide insights on the effects of disorders on several aspects of attention and gaze control. We believe that this report is the first one to show that there is a latent signature of the disease that affects everyday behavior and is detectable by our algorithms. In addition, this signature is expressed in terms of the basic features of early attentional processing, which we believe based on our previous work will interest a broad community of neuroscience, psychology and computational researchers.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
High-capacity screenings for fetal alcohol spectrum disorder via altered eye movement control and brain functions
PDF
Spatiotemporal processing of saliency signals in the primate: a behavioral and neurophysiological investigation
PDF
Integrating top-down and bottom-up visual attention
PDF
Gaze following in natural scenes: quantifying its role in eye movements, towards a more complete model of free-viewing behavior
PDF
Autonomous mobile robot navigation in urban environment
PDF
Computational modeling and utilization of attention, surprise and attention gating
PDF
Computational modeling and utilization of attention, surprise and attention gating [slides]
PDF
Learning controllable data generation for scalable model training
PDF
Building and validating computational models of emotional expressivity in a natural social task
PDF
Understanding the psycho-social and cultural factors that influence the experience of attention-deficit/hyperactivity disorder (ADHD) in Chinese American college students: a systems approach
PDF
Machine learning in interacting multi-agent systems
PDF
Multimodality, context and continuous dynamics for recognition and analysis of emotional states, and applications in healthcare
PDF
Novel algorithms for large scale supervised and one class learning
PDF
Experience-dependent neuroplasticity of the dorsal striatum and prefrontal cortex in the MPTP-lesioned mouse model of Parkinson’s disease
Asset Metadata
Creator
Tseng, Po-He
(author)
Core Title
Eye-trace signatures of clinical populations under natural viewing
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
12/05/2012
Defense Date
10/16/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
attention deficits hyperactivity disorder,eye movements,fetal alcohol spectrum disorder,machine learning,natural scenes,OAI-PMH Harvest,Parkinson's disease
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Itti, Laurent (
committee chair
), Liu, Yan (
committee member
), Tjan, Bosco S. (
committee member
)
Creator Email
pohetsn@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-125265
Unique identifier
UC11292179
Identifier
usctheses-c3-125265 (legacy record id)
Legacy Identifier
etd-TsengPoHe-1382.pdf
Dmrecord
125265
Document Type
Dissertation
Rights
Tseng, Po-He
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
attention deficits hyperactivity disorder
eye movements
fetal alcohol spectrum disorder
machine learning
natural scenes
Parkinson's disease