Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Crowding and form vision deficits in peripheral vision
(USC Thesis Other)
Crowding and form vision deficits in peripheral vision
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
CROWDINGANDFORMVISIONDEFICITSINPERIPHERALVISION
by
AnirvanS.Nandy
ADissertation Presentedtothe
FACULTYOFTHEUSCGRADUATESCHOOL
UNIVERSITYOFSOUTHERNCALIFORNIA
InPartialFulfillmentofthe
RequirementsfortheDegree
DOCTOROFPHILOSOPHY
(PSYCHOLOGY)
August2010
Copyright 2010 AnirvanS.Nandy
Epigraph
WhenI’mnotinmyrightmind,
myleftmindgetsprettycrowded.
Stephen Wright
ii
Dedication
Tomyfather
iii
Acknowledgments
First and foremost, I would like to thank my thesis advisor Dr. Bosco Tjan
for guiding me through this exhilarating (and sometimes exhausting) journey.
Dr. Tjan has been an excellent mentor and his wit, optimism, enthusiasm and
breadthofknowledgehasbeeninvaluableinbringingoutthebestinme. Icon-
sidermyself very fortunate to havespentinnumerable hours exchangingideas
with him. I would also like to thank Bosco for his kindness, compassion and
moral support through very difficult times in my personal life. For this, I owe
himadebtofgratitudethatIcanneverhopetorepay.
IwouldliketothankDr.IrvingBiederman,Dr.NorbertoGrzywacz,Dr.Laura
Baker and Dr. Laurent Itti for being on my thesis committee. Special thanks to
Norberto for his efforts in promoting a thriving vision research community at
USC and to Dr. Biederman for the many stimulating discussions we have had
overafternoonteaintheHedcoNeurosciencesbuilding.
IwouldliketothankDr.PreetiVergheseattheSmith-KettlewellEyeResearch
Institute,whohelpedmedevelopseveraloftheideasregardingfutureresearch
projects.
T-Labisanexcellentplacetodoscienceandmeetwonderful people. Italso
offers some of the best views of Los Angeles. Special thanks to Pinglei Bao
for all your help over the years. Thanks to all our human subjects who ran
iv
these incredibly long experiments, without which this thesis could not have
beenwritten.
Finally,Iamindebtedtomyfamilyfortheirsupportandlove. Tomymother
for hercare. To Monika, myco-conspirator in science and life, for her love and
forallowingmetofulfillmydreams. Toourtwochildrenforfillingmylifewith
delight - to Gargi, 8, for thinking that some of the figures in this thesis can be
found insideourbrains; toMihir, 2.5,for thinkingthatIgototheUniversityto
doourgrocery.
v
TableofContents
Epigraph ii
Dedication iii
Acknowledgments iv
ListofTables ix
ListofFigures x
Abstract xii
Chapter1: Overview 1
1.1 Researchquestionsandsummary . . . . . . . . . . . . . . . . . . . 4
1.2 Layoutoftherestofthethesis . . . . . . . . . . . . . . . . . . . . . 6
Chapter2: ClassificationImagesforInvariantSystems 7
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Generalmethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Experiment2.1: Letteridentification . . . . . . . . . . . . . . . . . 21
2.6 Experiment2.2: Letterdetection . . . . . . . . . . . . . . . . . . . . 26
2.7 Experiment2.3: Letteridentificationintheperiphery . . . . . . . 29
2.8 Generaldiscussions . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter3: CrowdingandClassificationImages 48
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Analysisprocedures . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.4 Resultsanddiscussions . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
vi
Chapter4: IntegrationacrossSpatialFrequenciesChannels 85
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2 Analysisprocedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.4 Experiment4.1: Measuringthelettertuningfunction (LTF) . . . . 98
4.5 Experiment4.2: Estimatingtheintegration index(Φ) . . . . . . . . 99
4.6 Idealobserveranalysis . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.7 Generaldiscussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Chapter5: AUnifiedModelofVisualCrowding 115
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Chapter6: ConclusionsandFutureDirections 136
References 142
AppendixA:SignalClamping 156
AppendixB:ExpectedValueofNoisefromErrorTrials 181
AppendixC:SignalClamping: BeyondFirstOrderAnalysis 183
C.1 Signal-clampedclassificationimages . . . . . . . . . . . . . . . . . 183
C.2 Flankeranalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
C.3 Featuremaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
C.4 Optimalregionofinterest . . . . . . . . . . . . . . . . . . . . . . . 197
AppendixD:AnalyticalFormsforFittingtheCSF 201
AppendixE:Ideal-ObserverModel 202
AppendixF:OptimalIntegration 205
F.1 Optimalintegrationforwhite-noiselimitedidealobserver . . . . 205
F.2 Optimalintegrationforidealobserverwithmultiplicativenoise . 209
AppendixG:GratingOrientationDiscrimination 213
AppendixH:Saccade-ConfoundedImageStatistics: Methods 216
H.1 GeometryofV1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
H.2 Saccadiceyemovements . . . . . . . . . . . . . . . . . . . . . . . . 219
H.3 Eyemovementsimulationsandimagestatistics . . . . . . . . . . 220
vii
AppendixI:DetailedMeasurementofCrowdingZones 223
I.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
I.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
viii
ListofTables
2.1 Estimatedspatialuncertaintyforletteridentification . . . . . . . . 25
2.2 Estimatedspatialuncertaintyforletterdetection . . . . . . . . . . 29
2.3 Estimatedspatialuncertaintyfortheperiphery . . . . . . . . . . . 32
3.1 Stimulisizeforclassificationimageexperiment . . . . . . . . . . . 58
3.2 Efficiencyofhumanperformance . . . . . . . . . . . . . . . . . . . 62
3.3 rSNRoftheclassification images . . . . . . . . . . . . . . . . . . . 64
4.1 StimulisizeformeasuringLTF . . . . . . . . . . . . . . . . . . . . 95
4.2 LTFparameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
I.1 Aspectratiosofellipticalfitstocrowdingzones . . . . . . . . . . 226
ix
ListofFigures
1.1 Crowding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Stimulusandexperimentalprotocol . . . . . . . . . . . . . . . . . 18
2.2 Classificationimagesforletteridentification . . . . . . . . . . . . 23
2.3 rSNRversusnumberoftrials(Experiment2.1) . . . . . . . . . . . 26
2.4 Classificationimagesforletterdetection . . . . . . . . . . . . . . . 28
2.5 rSNRversusnumberoftrials(Experiment2.2) . . . . . . . . . . . 30
2.6 Classificationimagesintheperiphery . . . . . . . . . . . . . . . . 33
2.7 Summaryofspatialuncertainty . . . . . . . . . . . . . . . . . . . . 34
3.1 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 logSNRandfirst-orderclassification images . . . . . . . . . . . . 63
3.3 Flankeranalysisresults . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.4 Featureutilizationzones . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5 Featureutilizationzones(summary) . . . . . . . . . . . . . . . . . 72
3.6 Ideal-observerfeaturemaps . . . . . . . . . . . . . . . . . . . . . . 75
3.7 Humanfeaturemaps . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.8 Featuremapssummary . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1 Decompositionofbroadbandstimuli . . . . . . . . . . . . . . . . . 89
4.2 StimuliformeasuringtheLTF . . . . . . . . . . . . . . . . . . . . . 97
4.3 Stimuliformeasuringintegrationindex . . . . . . . . . . . . . . . 100
4.4 HumanLTFandΦ . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.5 ContrastSensitivityFunction . . . . . . . . . . . . . . . . . . . . . 104
4.6 Ideal-ObserverModel . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.7 Ideal-ObserverSimulationResults . . . . . . . . . . . . . . . . . . 107
4.8 Ideal-ObserverSimulationAnalysis . . . . . . . . . . . . . . . . . 108
5.1 Characteristicsofvisualcrowding . . . . . . . . . . . . . . . . . . 116
5.2 Theinteraction ofspatialattentionandsaccades . . . . . . . . . . 120
5.3 ThegeometryofV1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.4 Pair-wiseimagestatistics . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5 Zonesofinappropriateintegration . . . . . . . . . . . . . . . . . . 127
x
5.6 Zonesofinappropriateintegration: effectofλ . . . . . . . . . . . . 128
5.7 PredictedcrowdingzonesatandaroundthePRL . . . . . . . . . 133
5.8 Crowdingzonesintheupperandlowervisualfields . . . . . . . 135
A.1 Idealobserversimulations-letteridentification . . . . . . . . . . 179
A.2 Idealobserversimulations-letterdetection . . . . . . . . . . . . . 180
C.1 Flankeranalysisprocedure. . . . . . . . . . . . . . . . . . . . . . . 187
C.2 Calculationofsecond-orderfeaturemaps . . . . . . . . . . . . . . 193
C.3 EstimationofoptimalROI . . . . . . . . . . . . . . . . . . . . . . . 200
G.1 Φforgratingorientation discrimination . . . . . . . . . . . . . . . 214
I.1 Bias-freeestimateofthecrowdingzone . . . . . . . . . . . . . . . 224
xi
Abstract
Visual crowding is an ubiquitous limitation of peripheral vision and manifests
itself as the marked inability to identify shapes when targets are flanked by
otherobjects. Itpresentsafundamentalbottlenecktoobjectrecognitioninperiph-
eral vision. Although the phenomenon has been widely studied over the last
four decades, the neural mechanisms underlying crowding remain unsettled.
Such an understanding is critical for the development of visual enhancement
aidsforpatientswithcentralfieldloss.
Here we first investigate the nature of form vision deficits in the periphery
throughaseriesofpsychophysicalexperiments. Wedevelopanovelmethodof
classificationimagestoovercometheintrinsicspatialuncertaintyintheperiph-
ery(TjanandNandy,2006), andshow that theperceptual templatesutilizedin
the periphery are undistorted. By using higher order reverse correlation anal-
ysis, we show that the form of flanking objects greatly influence target recog-
nition errors under crowding and that crowding is associated with an ineffi-
cientselection andusage of low-level features (NandyandTjan,2007). Wefur-
ther show that feature integration across spatial frequency channels in optimal
(forletteridentification)inbothcentralandperipheralvision(NandyandTjan,
2008).
xii
We next develop a unified model of visual crowding. Our theory, guided
byempiricalfindings, includingthe onesmentioned above, viewscrowding as
a necessary consequence of gathering image statistics during eye movements.
We show that any temporal overlap between spatial attention in the periph-
eral visual field that precedes a saccadic eye movement and the motion blur
due to the subsequent saccade can cause a misrepresentation of image statis-
tics in peripheral V1, where saccadic suppression is weak. We demonstrate
with simulations that the strength and shape of long-range horizontal connec-
tions formed under such conditions quantitatively explain the three hallmark
signatures of crowding: (a) the spatial extent of crowding scales linearly with
eccentricity (Bouma, 1970); (b) crowding is asymmetric with respect to the tar-
get(Bouma,1973;Petrovetal.,2007)and(c)thezoneofcrowdingisanisotropic
(ToetandLevi, 1992). The model provides a basis for understanding specific
target-flankerinteractionsatthefeature levelandmakespredictionsaboutcor-
ticalreorganization inpatientswithcentralfieldloss.
xiii
Chapter1
Overview
Despitemanyyearsofresearch,understandingthemechanismsunderlyingthe
abilityofthehumanvisualsystemtodeciphertheshapeandstructureofobjects
in a scene remains a challenging problem. A comprehensive understanding of
thesemechanismsofformvisionisanecessarysteppingstonetowardthedevel-
opmentofneuro-biologicallyinspiredalgorithmswhichwillenablethedeploy-
ment of various object recognition tasks (tasks that are seemingly effortlessly
executed by our visual system) on computing devices. Furthermore, under-
standing the neuro-computational nature of form vision in human subjects has
significant clinical implications concerning visual impairments. For example,
patients with various forms of disease in the macula of the retina must rely on
theperipheralvisualfieldstorecognizeobjects,identifyfacesandread. Lackof
acomprehensiveunderstandingofform-visionmechanismsoutsideofmacular
(or central) vision adversely affects the development of effective rehabilitation
regimensandassistivetechnologiesforsuchpatients.
Peripheral vision is deficient both in terms of sampling at low spa-
tial resolution and neural under-representation. The density of cone pho-
toreceptors, which are the primary inputs to the parvocellular pathway
1
(LivingstoneandHubel, 1987) that mediates form and color vision, falls very
rapidly with eccentricity (Osterberg, 1935; Curcioetal., 1987). Similarly, in the
cortex, there is an over-representation of the central visual field at the expense
of the periphery (Tootelletal., 1988). This is commonly referred to as cortical
magnification. Consequently, visual acuity falls off rapidly with eccentricity
(Berkleyetal., 1975). The obvious question to ask at this point is what is form
visionintheperipheryreallylike? Isitsimplyablurredversionofthefoveaas
thefalloffinspatialacuitymightsuggest?
Although object recognition in a cluttered scene is a task that is performed
seamlessly by the central visual system, there exists within the visual appara-
tus a natural breakdown of this ability in the phenomenon known as visual
crowding (Korte, 1923). Visual crowding is the inability to recognize objects –
which are otherwise easily identifiable in isolation – when they are flanked by
otheritems(Figure1.1A);itisubiquitousintheperipherywhilebeingvirtually
non-existentincentralvision. Studyingthephenomenonofcrowdingprovides
an unique opportunity to address some fundamental questions of form vision
since it serves as a natural complementary condition to central vision which is
unaffectedbycrowding.
Figure1.1Bdemonstratesthatsimplyscalingtheobjectstoadjustforperiph-
eral loss of acuity does not relieve crowding (Pellietal., 2004). What seems to
matter is the spacing between the central object and the flanking objects. As
longastheflankingobjectsarewithinacertaincriticalspacingtheywillimpair
therecognition ofthecentralobject(PelliandTillman,2008).
2
r +are
Pelli (2008), Curr. Op. NeuroBio.
critical spacing
2.5° 5° 10°
FOVEA
Figure adapted from Toet & Levi (1992)
Bouma’s scaling law
radial
tangential
Radial-Tangential anisotropy
d
out
d
in
Inward-Outward asymmetry
A B
C
Figure 1.1: (A) Crowding demo: fixating on the red ‘−’, it should be easy to identify the
letter r on the left; the equidistant r on the right which is flanked (crowded) by other letters is
muchhardertoidentify. Whenfixationisshiftedtothe‘+’,therbecomeseasiertoidentify. (B)
Crowdingisageneralphenomenonandisnotlimitedtoletterstimuli. Thecentralobjectineach
of the set of three objects becomes very difficult to identify when it is farther in the peripheral
field (fixate on the ‘−’) than when it is closer to central vision (fixate on the ‘+’). Scaling the
objects to adjust for peripheralloss of acuity does not relievecrowding. What seems to matter
is only the spacing between the central object and the flanker (critical spacing). (C) The extent
of crowding (“crowding zone”) can be estimated by measuring the performance threshold for
target identification at peripheral locations (demarcated by ⋆) with flankers placed at various
relativepositions aroundthe target. The estimated zones have three robust signatures: critical
spacing scales up with eccentricity (Bouma’s Law); the zone is not isotropic but is markedly
elongated in the direction toward the fovea; outward flankers are more effective in crowding
thaninwardflankers.
3
1.1 Researchquestionsandsummary
These facts suggest that form vision deficits in the periphery cannot simply be
explainedbyalackofspatialresolutionandthatthereareothernon-linearinter-
actions in the cortex limiting peripheral vision. And this brings us to the set of
research questions that I wish to address in this thesis. We first address some
critical issues regarding the nature of form vision in the periphery through a
series of psychophysical experiments
1
. We then propose a unified computa-
tionalmodelofvisualcrowding.
Aretheredistortionsinperceptual-templates?
Someresearchers(HessandField,1993;HessandMcCarthy,1994)haveargued
that the form vision deficits are due to distortions in perceptual templates
(“uncalibrated neuraldisarray”) inthe periphery. InChapter2we addressthis
issue by first developing a new reverse correlation technique, which we will
refer to as “signal-clamping” (TjanandNandy, 2006). Reverse correlation is a
standardlinearmethodforuncoveringthepattern-matchingtemplatesusedby
apatternclassifier,fromaneurontoahumanobserver. Ourmethodovercomes
a basic limitation of this standard technique in its inability to recover the tem-
plateof a system (e.g. peripheralvision) thatisinvarianttothe spatial location
1
In most of these psychophysical experiments, we use letter stimuli as targets for object
identification. We believe that letter stimuli serve as good surrogates for “natural stimuli”,
to which our visual systems are attuned, for three reasons: (a) They possess the 1/f fall-off
in Fourier amplitude spectra that is an ubiquitous characteristic of natural scenes (Field, 1987;
RudermanandBialek, 1994); (b) Letters are ecologically relevant in modern societies and are
likelyover-learnedlikenaturalstimuli. Theyareatalevelofcomplexityintermediatebetween
thoseofsinusoidalgratings(lowcomplexity)andnaturalscenes(high-complexity). Atthesame
time letters areamenable to systematic manipulations; and (c) Letters, like natural stimuli, are
necessarilybroadbandandthenarrowbandcomponents thatcan beextractedfromthem (asis
doneintheearlyvisualcortex)areconfinedtoaspecificsetofphaseconfigurationsthatleadto
sharpedges,contoursandothercommonbroadbandfeatures.
4
of a target. We look at signal-clamped classification images of single objects
(Chapter 2) and flanked objects (Chapter 3) in peripheral vision. We find that
therearenoperceptualtemplatedistortionsintheperiphery.
Isintegrationacrossspatiallocationserroneous?
Although many models have been proposed for visual crowding, most
researchers in the field believe that it is due to some form of inappropriate
feature integration (see Levi, 2008 for a review). In Chapter 3 we extend the
method of “signal-clamping” to examine if there is evidence for inappropriate
feature integration across spatial locations in the periphery (NandyandTjan,
2007). Wefindthat,insharpcontrastwiththefovea,targetrecognitionerrorsin
theperipheryarestronglydrivenbytheshapeofflankingobjects. Wealsoshow
that crowding is associated with an inefficient selection and usage of low-level
features.
Isintegrationacrossspatialfrequencychannelssub-optimal?
Another form of inappropriate feature integration could be in the domain of
spatial frequency channels (Grahametal., 1978). Chapter 4 examines whether
thereisinappropriateintegrationinthespatialfrequencydomainattheperiph-
eryandwhetherthereareanydifferencesbetweencentralandperipheralvision
in this respect (NandyandTjan, 2008). We find that the periphery is not defi-
cient as compared to the fovea, and that feature integration across spatial fre-
quencychannelsisoptimalinboththefoveaandtheperiphery.
5
Canweexplaintheshapeofthecrowdingzone?
Figure 1.1C shows that crowding zones have three distinct and robust shape
characteristics. To this date there does not exist a unified model that explains
thiscuriousshape. InChapter5,weproposesuchaunifiedmodelthatnotonly
explains all the shape characteristics but also provides a basis for understand-
ing specific target-flanker interactions at the feature level. Our model makes
predictionsaboutcortical reorganization thatareclinicallyrelevantforpatients
withcentralvisionloss.
1.2 Layoutoftherestofthethesis
The following four chapters are self contained, the first three of which have
been published in peer-reviewed journals. Although any of these chapters can
be read in isolation, they form an integral part of the research agenda outlined
above. For ease of readability, most of the mathematical equations and deriva-
tions appear in separate Appendicesat the end. Chapter 6 provides an outline
of future research that can be pursued based on the foundation laid out in the
previouschapters.
6
Chapter2
ClassificationImagesforInvariant
Systems
2.1 Introduction
Ifasystemrespondslinearlytoitsinputbycorrelatingitwithasingletemplate
(by taking the dot product), then this template can be recovered by presenting
thesystemwithsamplesofwhitenoiseandaveragingthosenoisesamplesthat
ledtothesameresponse. Sincethe1980s,thissimpleformofreversecorrelation,
also known as spike-triggered averaging, has been routinely applied to map
the receptive fields of neurons in the early stages of the sensory systems (e.g.,
deBoeranddeJongh,1978;deBoerandKuyper,1968;JonesandPalmer,1987).
Whenappliedtovisual psychophysics, where thestimulus noise isinthe form
ofanimage(oramovie),thetechniqueisoftenreferredtoasthe“classification-
image method” (Ahumada, 2002; BeardandAhumada, 1999), which owes its
roots to the early work of Ahumada and colleagues in auditory psychophysics
(AhumadaandLovell,1971;AhumadaandMarken,1975).
7
In recent years, classification image and similar techniques have been
applied to study vernier acuity (BeardandAhumada, 1999), stereopsis
(Nerietal., 1999), illusory-contour perception (Goldetal., 2000), identification
of facial expression (Adolphsetal., 2005; GosselinandSchyns, 2003), and sur-
round effect on contrast discrimination (Shimozakietal., 2005), to name a few.
We can divide these applications into two broad categories. In one category,
the primary goal of the investigation was to discern from where in the stim-
ulus an observer extracts information. For example, Goldetal. (2000) found
that when an observer was asked to judge whether the shape of an illusory
square was “thin” or “fat”, observers often based their decision on the left and
right illusory edge, while ignoring the top and bottom ones, which are equally
informative. Adolphsetal. (2005) work, showing that a patient’s failure to use
informationintheeyeregionoffacesimpairedtheperceptionoffear,isanother
example of this category. In the second category, the main purpose of using
theclassification-image methodwastoinferthe“perceptualtemplate”usedby
an observer to perform a given task. For example, BeardandAhumada (1999)
showedwithclassificationimagesthatvernierdiscriminationwasmediatedby
anorientation-tunedmechanism,ashadbeenpreviouslysuggested.
The method of classification image can recover a mechanism’s template
if the mechanism is equivalent to a linear noisy correlator (Ahumada, 2002).
Murrayetal. (2002) argued that this requirement can be relaxed to include
observer models that have an additive noise whose variance is proportional
to the contrast energy of the input (as opposed to being a constant) and to
models with nonlinear transducer functions when tested over a narrow range
(for a more precise description of the requirements regarding nonlinear trans-
ducer functions, see Neri, 2004). Even with these generalizations, the range of
8
observer models for which the classification-image method is valid for infer-
ring the perceptual template appears to be restricted. Physiologists have long
maintained that spike-triggered averaging (identical to classification image)
is of very limited use for uncovering the receptive field structures of higher
order visual neurons. Various higher order techniques, such as spike-average
covariance(SteveninckandBialek,1988;Rustetal.,2004,2005),areusedtoaug-
ment spike-triggered averaging. NeriandHeeger (2002) recently extended the
classification-image method to include the analysis of covariance. Despite its
theoretical limitation, the linear version of the classification-image method has
beenappliedtoincreasinglycomplexvisualtasks, such asfacerecognition and
objectcategorization, yieldingintriguingresults.
A simple yet ubiquitous form of nonlinearity generally believed to pose
a severe problem to the method of classification image is uncertainty.
Murrayetal.(2002)describedthisproblemsuccinctly:
One type of nonlinearity that does pose a problem for the noisy
cross-correlatormodelisstimulusuncertainty. Evenwhenobservers
are told the exact shape and location of the signals that they are to
discriminate between, they sometimes behave as if they are uncer-
tain as to exactly where the stimulus will appear or what shape it
will take (e.g., ManjeshwarandWilson, 2001; Pelli, 1985). We can
model spatial uncertainty by assuming that the observer has many
identical templates that he applies over a range of spatial locations
in the stimulus, but the effects of this operation are complex, and
it is not obvious precisely how a classification image is related to
the template of such an observer, or how the SNR of the classifica-
tionimageisrelatedtoquantitiessuchastheobserver’sperformance
levelorinternal-to-external noise ratio. Ifanobserverisveryuncer-
tain about some stimulus properties, such as the phase of a grating
signal,aresponseclassification experimentmayproducenoclassifi-
cationimageatall(AhumadaandBeard,1999).
9
This problem is more serious because of the equivalence between feature
invariance and intrinsic uncertainty (an uncertainty internal to the observer, as
opposedtothatinthe stimuli, orextrinsic uncertainty), whichweshallexplain
next.
Visualprocessingentailstheextractionof“features”fromretinalinputsthat
are relevant to behavior. In theories of object perception, the degree of invari-
ance that a feature possesses is a central issue. Biederman (1987) and Marr
(1982), forexample, viewedvisual processing asa stage-wise process designed
to recover, from retinal images, non-accidental features of increasing complex-
ity and invariance (edges, contours, corners, simple volumes, and structural
description of volumes). For example, an edge feature is invariant to local
contrast and immune to changes in local illumination; a volumetric feature is
invariant not only to local and global illumination but also to the observer’s
viewpoint. All theories of object recognition involve invariance but differ in
the degree of invariance they rely on to make the final determination of object
identity(cf. Tjan,2002;TjanandLegge,1998).
Consider a detector that signals the presence of a particular feature (e.g.,
an edge) while ignoring the specific image properties that the feature was ren-
deredwith(e.g.,thecolorsacrosstheedge). Itisasifthedetectorisobligatorily
considering all possible versions of the feature (e.g., white-black edge, white-
grayedge,red-green edge,etc.). Suchafeaturedetectorwillexhibitanamount
of intrinsic uncertainty, equal to the effective number of orthogonal instances
in the equivalent set of the input images that lead to the same response. The
notionsof“invariance”and“uncertainty”,albeitdifferentintheirhistoricaland
theoreticalorigins,aretherefore thesame.
10
If the method of classification image indeed could not handle uncertainty,
it would be of limited use as a tool to reveal the mechanisms of vision, which
undoubtedlyinvolveinvariance. Limitationsimposedbyuncertaintyhavebeen
notedandpartiallyaddressedinthepast. Usingavernier-offset detectiontask,
Barthetal. (1999) rejected the linear observer model (one that does not have
uncertainty)byshowingasignificantdiscrepancybetweenclassificationimages
from human observers and those from a linear model, when the classification
images from the offset-present and offset-absent trials were considered sepa-
rately. Theyalsoestimatedtheamountofpositionalandorientationuncertainty
in the human observers by explicitly modeling uncertainty as a small Gaus-
sianweightingfunction. Solomon (2002)likewisepointedoutthatforayes–no
signal detection task, any difference in the shapes of templates estimated from
target-present trialsascomparedwithtarget-absenttrialsmaybeduetouncer-
tainty. AbbeyandEckstein (2002) extendedthisobservation to2AFCtasksand
provided a statistical test for using classification images to detect the presence
of observer nonlinearities. Although these studies showed how observer non-
linearity,suchasuncertainty,couldbedetectedfromclassificationimages,they
did not provide a general method for template estimation in the face of large
uncertainty. Ecksteinetal. (2002) went one step further to show that, at least
forasmallamountofpositionaluncertaintyinthetask(twopossiblepositions),
the classification image computed at each possible position was an unbiased
estimateoftheunderlyingtemplatesofaBayesianidealobserver.
The goal of this chapterand AppendixAisto show thatwith a slight mod-
ification to the current practice, the method of classification image is generally
applicable even when the task, the visual system, or both possess a great deal
of uncertainty (and invariance). This is achieved by understanding the role
11
thata signal playsin aclassification-image experimentin the context of awell-
established uncertainty model (cf. Pelli, 1985), first proposed by Tanner (1961).
Specifically, we will demonstrate the theoretical feasibility and empirical prac-
ticality of recovering the perceptual templates of an observer for tasks with a
high degreeof spatialuncertainty. Wewill alsodemonstrate how the degreeof
uncertaintymaybeestimatedfromtheresultingclassificationimages.
2.2 Overview
Appendix A explores the theoretical underpinnings that allow the use of
classification-imagemethodsinconditionswithhighuncertainty. Wewillillus-
trate the various aspects of our proposed method analytically and bymeansof
simulationsusinganideal-observermodelforwhichweknowthegroundtruth
abouttheobserver’stemplates. Wewillthendemonstratethepracticalityofour
method via three sets of experiments with human observers. Experiments 2.1
and2.2willshowthatwecanuncover,withinareasonablenumberoftrials,the
perceptual templates for letter identification and detection tasks in conditions
with varying degrees of spatial uncertainty. We will also show that the degree
of uncertainty can be estimated from the classification images. In Experiment
2.3 (§2.7), we will demonstrate the potential of our method by using it to mea-
sure both the quality of the perceptual templates and the amount of intrinsic
spatialuncertaintyinhumanperipheralvision.
Summaryofthemethodofsignal-clampedclassificationimage
Ourmainfindinghereisthatbypresentingarelativelystrongsignalinthestim-
ulus, the observer template for the presented signal can be imaged using the
12
conventional classification-image method in the face of a high degree of intrin-
sicspatialuncertainty. Wecalledthistypeofclassification imageobtainedwith
a relatively strong signal embedded in the stimulus the “signal-clamped” clas-
sificationimage. Ifspatialuncertaintyisextrinsic(i.e.,inthestimulus),thenthe
onlyminorchangetothecalculationofclassificationimagesistoshiftthenoise
pattern(withwraparound)tore-centerthepresentedsignalintheimage(Equa-
tion A.21). Howthisfindingmaybegeneralizedtoothertypesofuncertainties
willbeaddressedintheGeneraldiscussionssection.
We have shown analytically and with simulations the following properties
of signal-clamped classification images obtained with a high degree of spatial
uncertainty:
1. Each of the classification sub-images from the error trials contains a clear
negative image of the observer’s template for the presented signal, unaf-
fected by spatial uncertainty intrinsic or extrinsic to the observer. How-
ever, in the presence of uncertainty, the clarity of the template image
markedly deteriorates if the contrast of the presented signal is not suf-
ficiently high. The need for a high-contrast signal goes opposite to the
conventional practice of using a low-contrast signal to increase the effect
ofnoiseontheobserver’sresponse.
2. Any positive image of the alternative template in a classification sub-
imagefortheerrortrialsisblurredbyspatialuncertainty,oftenrendering
itindiscernible.
3. The extent to which these positive template images are blurred provides
anestimateofthespatialextentoftheuncertainty.
13
4. Because of the presence of a relatively strong signal in the stimulus, the
classificationsub-imagesfromthecorrecttrialscontainverylittlecontrast
andarerelativelyuninformative. Asaresult,wedonotadvocatecombin-
ing the sub-images to form a single classification image as in the conven-
tionalapproach.
Our discussions have been and will continue to be focusing on the sub-
images from the error trials, although the general properties of signal-clamped
classification images derived in this section also apply to the sub-images from
correct trialswith merelyasign change. Weignore the correct-trial sub-images
forthesakeofsimplicity. Wedonotlosemuchbecausewitharelativelystrong
signalinthestimulus,thesignal-to-noiseratio(SNR)ofcorrect-trialsub-images
areoftenquitelow.
2.3 Experiments
Three sets of human experiments were conducted to determine the practical-
ity and utility of the signal-clamped classification-image method. Experiments
2.1 (§2.5) and 2.2 (§2.6) paralleled the simulation studies and aimed to demon-
strate the feasibility of the proposed method and to empirically validate the
variouspropertiesofsignal-clampedclassificationimages. Experiment2.1used
the two-letter identification task, whereasExperiment2.2 used the single-letter
detection task. To compare the effects of spatial uncertainty, both experiments
wereperformedinthefoveawherespatialuncertaintyofahumanobservercan
beeffectivelymanipulatedwiththestimulus. Weintroducedspatialuncertainty
into the task by randomizing the signal position within a given region in the
14
stimulus display. Knowing the actual spatial extent of the stimulus-level spa-
tial uncertainty providesa reference forevaluating the estimated spatial extent
obtainedfromthesignal-clampedclassification-image method.
Experiment2.3(§2.7) tested letteridentification inthe periphery. The visual
periphery is known to have a considerable amount of intrinsic spatial uncer-
tainty (HessandField, 1993; HessandMcCarthy, 1994; LeviandKlein, 1996;
Levietal., 1987). No spatial uncertainty wasadded to the stimulus. The objec-
tive of this experiment was to demonstrate that the method of signal-clamped
classificationimagescanbeusedtouncovertheperceptualtemplateinthepres-
enceofspatialuncertaintyandtoestimatethespatialextentoftheuncertainty.
2.4 Generalmethods
Procedure
Intheidentificationexperiments(Experiments2.1and2.2),thetaskwastoindi-
cate which of the two lowercase letters “o” or “x” was presented. In the detec-
tion experiments (Experiment 2.2), the task was to indicate whether the lower-
caseletter“o”waspresented.
Each experiment consisted of 10 blocks with 1050 trials per block. In each
trial, a white-on-black letter was presented in a field of Gaussian white noise.
Thenoisystimuli(letter+noise)werepresentedatthefoveaforExperiments2.1
and2.2andat10
◦
intheinferiorvisualfieldforExperiment2.3. Thefirst50trials
ineachblockwerecalibrationtrialsinwhichthelettercontrastwasdynamically
adjusted using the QUEST procedure (WatsonandPelli, 1983) as implemented
inthePsychophysicsToolboxextensioninMATLAB(Brainard,1997;Pelli,1997)
to obtain a “calibrated” threshold letter contrast for reaching an accuracy level
15
of 75%. The remaining1000trialswere dividedintofive subblocks of 200trials
each, and QUEST was reinitialized to the calibrated value at the beginning of
eachsubblock. Duringthe initial50calibration trials, thestandard deviationof
the priordistribution of the threshold valuewassetto5logunits(a practically
flatprior),butforeachsubblock,thepriorwasnarrowedtoastandarddeviation
of 1 log unit. This restricted the variability of the test contrast but still allowed
adequate flexibility for the procedure to adapt to the observers’ continuously
improvingthresholdlevels.
For the foveal experiments, the letter size was fixed at 48 pt in Times New
Roman font (x-height = 22 pixels). For the peripheral experiments, an acuity
measurement was first performed for each subject, in which the subject was
instructed to identifyany of the 26 letters presented at a 10
◦
retinal eccentricity
intheinferiorfield. ThesizeofthepresentedletterwasvariedusingtheQUEST
proceduretoachieveanidentificationaccuracyof79%. Twicetheacuitysizeso
determinedwasusedinthemainexperiment.
Stimuli
Thestimulusforeachtrialconsistedofawhite-on-blackletteraddedtoaGaus-
sian, spectrally white noise field of 128×128 pixels. Before being presented to
the observers, each pixel of thisnoisy stimulus wasduplicated bya factor of 2,
such that four screen pixels were used to render a single pixel in the stimulus.
This was done to increase the spectral density of the noise. The noise contrast
wasfixed at25% rms. Ata viewingdistance of 105 cm, the noisy stimulus was
of size 4.7
◦
, and the noise has a two-sided spectral density of 85.5μdeg
2
. The
meanluminanceofthenoisybackgroundwas19.8cd/m
2
.
16
For thefovea experiments(Experiments2.1and2.2), theletterswere ofsize
0.81
◦
(x-height) invisualangle. Fortheperipheryexperiment(Experiment2.3),
the letters were of size 0.85
◦
for one subject and 1.15
◦
for the other subject. The
periphery letter size was 0.3 log units above the subject’s letter acuity at 10
◦
eccentricity. Thecontrast ofthe targetletterwasadjusted withaQUESTproce-
dureasdescribedintheProceduresection.
Fortheexperimentswithspatialuncertainty,1000uniformlydistributedran-
dom positions, representing the center of a presented letter, were preselected
withreplacementfromanimaginarysquarecenteredinthenoisefield. Thespa-
tialextentofthespatialuncertaintywasmanipulatedbychangingthesizeofthe
imaginary square: 32 stimulus pixels on a side (i.e., 64 screen pixels because of
the factor-of-2 blocking to increase noise spectral density, 1.18
◦
of visual angle)
forthe“medium”levelofuncertaintyand64stimuluspixels(128screenpixels,
2.37
◦
of visual angle) for the “high” level of uncertainty. For the experiments
without spatialuncertainty, the letterwasalwayspresented atthecenterof the
noise field, marked by a fixation cross before and after stimulus presentation.
Figure2.1adepictsanoisystimulususedintheexperiment.
Thestimuliweredisplayedinthecenterofa19-in. CRTmonitor(SonyTrini-
tronCPD-G400),andthemonitorwasplacedatadistanceof105cmfromasub-
ject. Themonitorhas11bits(2048levels)oflinearlyspacedcontrastlevel. All11
bitsofthecontrastlevelswereaddressabletorenderthenoisystimulusforeach
trial. This was achieved by using a passive video attenuator (PelliandZhang,
1991)andacustom-builtcontrastcalibrationandcontrolsoftwareimplemented
in MATLAB. Only the green channel of the monitor was used to present the
stimuli.
17
Figure 2.1: (a)Asampleofthenoisystimulus. (b)Timingofstimuli presentation: (1)fixation
beepimmediatelyfollowedbyafixationscreenfor500ms,(2)stimuluspresentationfor250ms,
(3) subject response period(variable)with positive feedbackbeep forcorrecttrials, and (4)500
msdelaybeforeonsetofnexttrial.
Thestimuliwerepresentedaccordingtothefollowingtemporaldesign: (1)a
fixation beepimmediatelyfollowedbyafixation screenfor500ms,(2)astimu-
luspresentationfor250ms,(3)asubjectresponseperiod(variable)withpositive
feedback beep for correct trials, and (4) a 500 ms delay before onset of the next
trial(seeFigure2.1b).
Attheendofeachtrial,thefollowingdatawerecollectedforthesubsequent
classification-image reconstruction: the center position of the target letter, the
state of the pseudorandom number generator used to produce the noise field,
theidentityandcontrastofthepresentedletter,andtheresponseofthesubject.
18
Subjects
Five subjects (one of the authors and four paid students at the University of
Southern California who were unaware of the purpose of the study) with nor-
malorcorrected-to-normalvisionparticipatedintheexperiments. Allhad(cor-
rected)acuityof20/20inbotheyes. Subjectsviewedthestimulibinocularlyina
darkroom. Writteninformedconsentwasobtainedfromeachsubjectbeforethe
commencementof datacollection. Becauseof themonotonous nature andlong
duration of each experiment (approximately 8–10 hr), subjects were allowed
(and encouraged) to take breaks whenever they so desired. All the subjects
completedtheirrespectiveexperimentsinthreetofivesessions.
Classification-imagereconstruction
Forthepurposeofreconstructingtheclassificationimages,thecalibrationtrials
ineachexperimentblockwereignored. Fortherestofthe10000trials,thenoise
field was first regenerated using the stored random number state. Next, the
noise field was shifted with wraparound based on the stored target position
information asif tore-center the presented letter(Equation A.21). Thisshifting
procedurewasobviouslyunnecessarywhentherewasnospatialuncertaintyat
the stimulus level (Experiment 2.3, and one condition in Experiment 2.1). For
each trial, the re-centered noise field was then classified into one of four bins
based on the presented stimulus andthe subjects’ response. The noise fieldsin
eachbinwerethenaveragedpixel-wisetoformthecorrespondingclassification
sub-images.
19
RelativeSNRofclassificationssub-images
The most practical concern in the signal-clamped classification method is
whether the method would require an unreasonably large number of trials to
make up for the loss in the number of error trials due to the need to use a
relatively strong signal. For our experiments, as it will transpire, 10000 trials
were sufficient to obtain classification images of good quality. We sought to
estimate from our data the minimum number of trials that would be needed
when uncertainty is high. We did so by computing the relative SNR (rSNR;
Murrayetal.,2002)asafunctionofthenumberoftrials;wethencomparedthis
functionacrossdifferentuncertaintylevels.
Murrayetal.(2002)definedrSNRofaclassification imageCas:
rSNR =
T
′
T
C
2
σ
2
C
−1, kT
′
k = 1 (2.1)
where T
′
is an assumed template and σ
C
is the pixel-wise standard deviation
of the imageC. Murray et al. showed that the discrepancy betweenT
′
and the
observer’s actual template only leads to a reduction in the amplitude of rSNR
bya constant factor relative to the inherentvariability of a classification image,
thereby making the measurement less reliable. We modified this approach to
measure only the classification sub-images of the error trials (e.g., CI
OX
and
CI
OX
for the letter identification experiment) and only the negative template
imagesinthesesub-images.
Forthetwo-letteridentificationtask,wedefinerSNRasfollows:
rSNR =
O
T
CI
OX
2
(O
T
O)σ
2
OX
+
X
T
CI
XO
2
(X
T
X)σ
2
XO
−2 (2.2)
20
Here,X andO arethepresentedletterstimuli. Equation2.2isapplicabletothe
letterdetectiontaskbysettingX tozero. Inessence,Equation2.2measuredthe
SNR of the pixels that overlap the negative O template in the sub-image CI
OX
andthenegativeX templateinthesub-imageCI
XO
.
2.5 Experiment2.1: Letteridentification
Experiment 2.1 was conducted in two different conditions, as was the case
for the simulation study: The first condition (no uncertainty) was intended
to replicate past findings without spatial uncertainty; the second condition
wasintendedtoverifyoursignal-clampedclassification-imagemethodandthe
associated theoretical claim that perceptual templates can be uncovered under
conditionsofspatialuncertainty.
Two subjects (AO and BB) participated in the no-uncertainty condition. In
this condition, the letters (“o” and “x”) were presented at fixation without any
spatial uncertainty. The task was to indicate which of the two letters (“o” or
“x”)waspresentedateachtrial. Thesubjectswereexplicitlytoldthattheletters
werealwayscenteredatfixation.
Subject AO, and a third subject, ASN, participated in the high-uncertainty
condition in which the letter stimuli (“x” or “o”) were presented at any one of
1000differentrandompositions(seeGeneralmethods,§2.4). Thesetofrandom
positionswaschosenfromwithinasquareof64×64stimuluspixel(128screen
pixelsor2.37
◦
onaside)centeredinthestimulusarea. Theextentofthespatial
uncertaintywasnotknownexplicitlytothesubjects(excepttheauthorASN).
21
Resultsanddiscussions
Introducing a high degree of uncertainty into the stimulus elevated the con-
trastthresholdbyafactorof1.74onaverageacrosssubjects,althoughtheeffect
of uncertainty on contrast threshold is not of interest here. The left column of
Figure 2.2 shows the classification sub-images for both levels of spatial uncer-
tainty. The results of Experiment 2.1 (§2.5) bear out the theoretical predictions
described earlier, that a clear classification image showing what could be an
observer’sperceptualtemplatescanbeobtainedunderhighspatialuncertainty
withinareasonablenumberoftrials(10000inthiscase). Consideronlytheclas-
sification sub-images from the error trials (top right – CI
OX
and bottom left –
CI
XO
). Themostcrucial findingisthatacross uncertainty conditions, there was
littleornodifferencebetweenthenegativecomponentsoftheerror-trialclassifi-
cationsub-images. Thiswastruebothwithinandbetweensubjects,confirming
the general validity of the signal-clamping approximation (Equations A.9 and
A.15).
There was a subtle difference in the estimated “x” templates from the two
uncertainty conditions. One stroke appeared missing in the high-uncertainty
condition. WiththeTimesNewRomanfontusedintheexperiment,themissing
stroke wasaboutonethirdthewidthoftheotherstroke. Asaresult, thelower-
case“x”isnotisotropicinitsabilitytolimitspatialuncertainty. Itislessableto
“clamp”spatialshiftofanobserver’sinternaltemplatealongthethickerstroke
than across the thicker stroke. Shift or spatial uncertainty, along the thicker
stroke blurred the image of the thinner stroke, rendered it invisible for subject
ASN and only partly visible for subject AO. In other words, we do not think
that the observer template for “x” changed asa function of spatial uncertainty;
22
Figure 2.2: Classification images for the human observers in the letter identification task
(Experiment 2.1,§2.5): top two rows, no spatial uncertainty (M = 1); bottom two rows, high
spatial uncertainty (M = 1000, d = 64 stimulus pixels); left column, classification images at a
signal contrast corresponding to 75% correct; middle column, the spatial extent of the uncer-
taintyestimatedfromtheclassificationimagesintheleftcolumn;thevalueofdattheminimum
ofeachcurve(markedbythegrayarrow)representsthemeanspatialextentoftheuncertainty;
right column, blurredversions of the classifications imagesfromthe leftcolumn using a Gaus-
sian kernel with space constant of 1.4 stimulus pixels for visualization purposes only. Image
intensitiesineachcolumnareidenticallyscaledtofacilitateacross-conditioncomparisons.
rather, the difference in the observed templates was a result of imperfect sig-
nalclamping, whichisnotalwaysavoidable. Wewillreturn tothisissuewhen
weconsidertheperceptualtemplatesinthevisualperipheryinExperiment2.3
(§2.7).
The most noticeable difference between the classification sub-images
obtained from the two uncertainty conditions is that for the condition with-
out uncertainty in the stimuli (top two rows), the sub-images from the error
trialsshowedboth anegativeandapositive component; fortheconditionwith
23
a high degree of uncertainty (bottom two rows), only the negative component
wasapparent,withthepositivecomponentbeingsmearedoutduetothespatial
uncertainty. ThisispredictedbyEquationsA.18,A.23,andA.24andconsistent
withoursimulationresults.
To aid visual inspection, particularly regarding the absence of the positive
componentsinthehigh-uncertaintycondition,weblurredtheclassificationsub-
imageswithaGaussiankernelwithaspaceconstantof1.4stimuluspixels(right
column of Figure 2.2). In the condition where there was no spatial uncertainty
in the stimulus, the positive component in the error-trial sub-images appeared
considerably weaker and less defined than the negative component. Because
thepositive componentissusceptive touncertainty(EquationsA.23andA.24),
itstandstoreasonthatthereexistedmeasurableamountsofspatialuncertainty
internaltotheobservers. Havingnouncertaintyinthestimulidoesnotguaran-
teetheabsenceofuncertaintyintrinsictoanobserver.
Toestimatethespatialextentoftheuncertainty(extrinsicandintrinsic)from
the classification images, we fitted Equations A.23 and A.24 to the two error-
trial sub-images to obtain a numerical estimation ofd in terms of stimulus pix-
els, using the lowercase stimuli as the presumed templates. As demonstrated
in the simulation, the choice of the presumed templates, which may be differ-
ent from the actual observer templates, does not significantly affect the esti-
mated value ofd. The residual landscapes are plotted in the middle column of
Figure 2.2. The standard error of the estimate was determined by bootstrap-
ping (EfronandTibshirani, 1993). The results are summarized in Table 2.1. As
expected,theestimatedspatialextent(d)ofthecombineduncertainty(extrinsic
and intrinsic) was significantly higher in the high-uncertainty condition than
24
Condition Subject d±SE
Nouncertainty BB 5±1.0
AO 9±10.4
Highuncertainty AO 35±4.6
ASN 51±8.5
Table2.1: Theestimated extents of spatial uncertainty for conditions in Experi-
ment2.1(§2.5)inunitsofstimuluspixels.
the no-uncertainty condition. Moreover, these values are in reasonable agree-
mentwiththeveridicalvalues(1fortheno-uncertaintyconditionand64forthe
high-uncertaintycondition).
Lastly, we sought to determine the minimum number of trials that would
be required to obtain classification sub-images of sufficient quality. Figure 2.3
plotstherSNR(Equation2.2)oftheerror-trialclassificationsub-imagesforsub-
ject AO as a function of the number of trials for both the no-uncertainty and
high-uncertainty conditions. rSNR linearlyincreased asa function of the num-
berof trials. This isexpected because the pixel-wise variance of a classification
sub-image linearly decreases with the number of trials. What is noteworthy is
that the rSNR for the high-uncertainty condition was higher than that for the
no-uncertainty condition, which is opposite to the results of the ideal-observer
simulation (Figure A.1). This remains to be the case even when we changed
Equation 2.2 to include both the negative and positive template images in the
calculation. We will address the relationship between rSNR and uncertainty in
theGeneraldiscussionssection.
Subjectively speaking, with 10000 trials, both of the error-trial sub-images
for the no-uncertainty condition were of sufficient quality. If we use this as a
standard,thenweonlyneedabout8000trialsinthehigh-uncertaintycondition
toreachthesamelevelofrSNR.Wenotewithinterestthatalthoughuncertainty
25
Figure 2.3: rSNR versusnumber of trials forsubject AO who participatedin both conditions
ofExperiment2.1(§2.5). Thegrayarrowmarkstheapproximatenumberoftrialsthatwouldbe
neededinthehigh-uncertaintyconditiontoachievethesameclassificationimagequalityasthe
no-uncertaintycondition. Errorbarsarebootstrapstandarderrorsofthemean.
leads to an increase in threshold, the increase in threshold, in turn, keeps in
check the number of trials required for a signal-clamped classification-image
experiment.
2.6 Experiment2.2: Letterdetection
The ideal-observer simulations described earlier (see Figure A.2) predict that
the extentof thespatial uncertainty (dasopposed toM) canbeestimated from
the classification images. This prediction was tested in Experiment 2.2. The
task was to detect a lowercase letter “o” in noise. In each trial, the target was
either presented at any one of 1,000 different random positions (see General
methods,§2.4) or not presented with equal probability. Subjects were asked to
indicate whether the letter was presented or not. In the medium-uncertainty
26
condition, the total set of random positions was chosen from within a central
square of 32× 32 stimulus pixels (1.18
◦
). The extent of the spatial uncertainty
was indicated to the subjects by means of a white rectangular bounding box
thatwasdisplayedduringthefixationperiodimmediatelypriortothestimulus
onset. SubjectsJHandMJparticipatedinthiscondition.
Inthehigh-uncertainty condition, the1000differentrandompositions were
chosen from a central square of 64× 64 stimulus pixels (2.37
◦
), and the extent
of this uncertainty range was not explicitly indicated to the subjects. In all
other respects, the high-uncertainty condition was identical to the medium-
uncertainty condition. Twosubjects, JH (whoalso participated in the medium-
uncertaintycondition) andBB,participatedinthiscondition.
Resultsanddiscussions
The resulting classification images and the estimation of the spatial extent of
the uncertainty are shown in Figure 2.4. As predicted by Equation A.25 and
consistentwiththesimulationresult,aclearnegativesignalwasvisibleinCI
miss
(the sub-image from the misstrials) in the high-uncertainty condition. Also, as
predicted,therewasnoclearimageofthetargetinCI
FA
(thesub-imagefromthe
false-alarmtrials). ThepositivehazeinCI
FA
isnotaspronouncedasthatinthe
simulation, probably due to the presence of internal noise and intrinsic spatial
uncertainty. Thepresenceofasignificantamountofintrinsicuncertaintyinthe
observersmayalsoexplaintheabsenceofanyblurringofthenegativetemplate
image in CI
miss
in the medium-uncertainty condition, which was observed in
thesimulation.
Thepositive hazeinCI
FA
ismorevisibleforthemedium-uncertaintycondi-
tion if we blur the sub-images (using a Gaussian kernel with a space constant
27
Figure 2.4: Classificationimagesforthehumanobserversintheletterdetectiontask(Experi-
ment2.2,§2.6): toptworows,mediumspatialuncertainty;bottomtworows,highspatialuncer-
tainty;leftcolumn,classificationimagesatasignalcontrastcorrespondingto75%correct;mid-
dlecolumn,estimationofspatialextentoftheuncertaintyfromtheclassificationsimagesinthe
left column; the estimated value of d with the minimum residual error is marked by the gray
arrows; right column, blurred versions of the classifications images in the left column using a
Gaussiankernelofspaceconstantequalto14.1stimulus pixelstovisualizethepositivehazein
thefalse-alarmtrials.
of 14.1stimuluspixel, rightcolumn of Figure 2.4). Such apositive hazearound
the center of the image appears to be absent from CI
FA
in the high-uncertainty
condition.
The quantitative results for the estimation of spatial extent are depicted as
plots of residual versus d (middle column of Figure 2.4) and summarized in
Table 2.2. These results were obtained by fitting Equation A.25 to the classi-
fication sub-images, using the target letter “o” as the presumed observer tem-
plate. The spatial extent of the uncertainty (d) was significantly higher in the
28
Condition Subject d±SE
Mediumuncertainty MJ 21±22
JH 31±13
Highuncertainty JH 127±45.3
BB 65±28
Table2.2: Theestimated extents of spatial uncertainty for conditions in Experi-
ment2.2(§2.6)inunitsofstimuluspixels.
high-uncertainty condition as compared with the medium-uncertainty condi-
tion, both within and between subjects. The standard errors were estimated
withbootstrap.
For the subject who participated in both of the uncertainty conditions (JH),
weplottedrSNRversusnumberoftrialsinFigure2.5. Consistentwiththeresult
of Experiment 2.1 (§2.5), we found that the rSNR was higher for the condition
with a large extent in spatial uncertainty (and a higher detection threshold).
However, unlike Experiment 2.1, this result is consistent with the result of the
correspondingideal-observermodel(FigureA.2),whichalsoexhibitedahigher
rSNRinthehigh-uncertaintycondition.
2.7 Experiment 2.3: Letter identification in the
periphery
WeexplicitlymanipulatedspatialuncertaintyinExperiments2.1and2.2totest
if the various properties of signal clamping derived from analysis and simula-
tionwereempiricallyrelevant. Theresultsfromthetwoprecedingexperiments
suggest that these properties are indeed valid. In Experiment 2.3, we used the
method of signal-clamped classification image to estimate the letter templates
inthevisualperipheryandtodeterminethelevelofintrinsicspatialuncertainty
29
Figure 2.5: Plot of rSNR versus number of trials for subject J.H. who participated in both
conditions of Experiment 2.2 (§2.6). The gray arrow marks the approximate number of trials
neededinthehigh-uncertaintyconditiontoachievethesameclassification-imagequalityasthe
medium-uncertaintycondition.
in the periphery (10
◦
in the inferior field). It has been suggested that one rea-
son for an impoverished form vision in the periphery was because of a high
degreeofintrinsicspatialuncertaintyinthehumanperiphery. Thecauseofthe
intrinsic spatial uncertainty can be due to under-sampling of the visual space
(LeviandKlein, 1996; Levietal., 1987) or to an uncalibrated disarray in spa-
tial sampling (HessandField, 1993; HessandMcCarthy, 1994). The theory of
uncalibrated disarray would predict a distorted perceptual template, whereas
thatofunder-samplingwouldnot. Thesepredictionsarecontingentonthepos-
sibility of recovering the observer’s template despite the high intrinsic spatial
uncertaintyintheperiphery.
Prior to the main experiment, an acuity measurement was first performed
oneachsubject(seeGeneralmethodsfordetails,§2.4). Inthemainexperiment,
30
letter stimuli (lowercase “x” and “o”) were presented at a fixed retinal eccen-
tricityof10
◦
. Therewasnostimulus-levelspatialuncertainty,andtheletterwas
alwayspresented at the centerof the noise field. Thesubjects were apprisedof
this fact before the commencement of data collection. Subjects maintained fix-
ation at a green LED and were asked to identify which letter was presented in
eachtrial.
The experimentwas conducted on two subjects who had previously partic-
ipatedin one of the earlierexperiments. Subject ASN(whohad participated in
Experiment2.1inthehigh-uncertaintycondition) hadaperipheralacuitymea-
surement of 0.42
◦
in x-height. A letter of 50 pt Times New Roman (x-height
= 0.85
◦
in visual angle) was used for ASN. Subject BB (who had participated
in Experiment 2.1 in the no-uncertainty condition) had an acuity of 0.57
◦
in x-
height. Alettersizeof66pt(1.15
◦
inx-height)wasusedforBB.
Resultsanddiscussions
The classification images for the two subjects are shown in the left column of
Figure2.6. Qualitatively,theclassificationimagesintheperipheryareverysim-
ilar to the one in the fovea with high stimulus-level spatial uncertainty (Rows
3 and 4 of Figure 2.2) and differ noticeably from the fovea results without
stimulus-level uncertainty (Rows 1 and 2 of Figure 2.2). The recovered tem-
plates are not distorted in shape and almost identical to those obtained in the
foveal conditions. As was the case for the fovea condition with high extrinsic
uncertainty,theobservers’“x”templatesobtainedintheperiphery,withoutany
extrinsicuncertainty,appeartoinvolveonlyonestroke. Wehaveattributedthis
effect to the possibility that spatial uncertainty was not equally reduced in all
31
Condition Subject d±SE
Fovea,nostimulus AO 9±10.4
uncertainty(Experiment2.1) BB 5±1.0
Periphery,nostimulus BB 67±31
uncertainty ASN 29±9.4
Fovea,highstimulus ASN 51±8.5
uncertainty(Experiment2.1) AO 35±4.6
Table 2.3: The estimated extents of spatial uncertainty for the periphery condi-
tioninExperiment2.3(§2.7)inunitsofstimuluspixels,ascomparedwiththose
forthefoveaconditionsinExperiment2.1(§2.5).
directions with the TimesNew Roman“x” stimulus because the two strokes of
“x”differinwidthbyafactorof3.
A very weak or nonexistent positive image in the error-trial sub-images
implies that there was a significant amount of intrinsic spatial uncertainty in
theperiphery. Unlikethefoveaexperiment(Experiment2.1),theuncertaintyin
thisexperimentwasentirelyintrinsictotheobservers. Weestimatedthespatial
extent of this intrinsic uncertainty using Equations A.23 and A.24. The resid-
ual functions for the estimation are plotted in the middle column of Figure 2.6,
andtheestimatedspatialextents,inunitsofstimuluspixels,aresummarizedin
Table 2.3. Table 2.3 also restates the results from the fovea experiment (Exper-
iment 2.1) for comparison. Comparing the fovea (from Experiment 2.1) and
periphery results obtained without any spatial uncertainty in the stimuli, it is
clearandnotsurprisingthatintrinsicspatialuncertaintyinthevisualperiphery
is much higher than that in the fovea. Averaging across the two subjects (BB
and ASN), the intrinsic spatial uncertainty as measured with an isolated letter
target in noise at10
◦
eccentricity was48pixelsor 1.78
◦
, compared with 0.25
◦
in
thefovea. Figure2.7plotstheestimatedextentofspatialuncertaintyinunitsof
visualanglesforthefoveaandtheperipheryconditionsacrosssubjects.
32
Figure 2.6: Classification images for the human observers performing a letter identification
taskintheperiphery(Experiment2.3)withnostimulus-level(extrinsic)spatialuncertainty: left
column,classificationimagesatalettercontrastsufficienttoobtain75%correct;middlecolumn,
estimation of the spatialextentof the intrinsic uncertaintyfromthe classification images in the
leftcolumn;thevalueofdwiththeminimumerrorismarkedbythegrayarrows;rightcolumn,
blurredversionsoftheclassificationsimagesintheleftcolumnusingaGaussiankernelofspace
constantequalto1.4stimuluspixelsforvisualization.
Astothe debateof whetherthe primarysource of spatialuncertainty in the
periphery is uncalibrated disarray (HessandField, 1993; HessandMcCarthy,
1994) or (calibrated) under-sampling (LeviandKlein, 1996; Levietal., 1987),
our results side with the latter. This is because the negative templates for the
letter“o”(andforthevisiblestrokeoftheletter“x”)aresharpandundistorted,
despite the sizable amount of intrinsic spatial uncertainty revealed by the lack
ofpositivetemplateimagesandtheestimatedvalueofd.
33
Figure2.7: Thespatialextentofuncertainty,d,indegreesofvisualangle,forthesubjectswho
participatedin the letteridentification task in the periphery(10
◦
inferior field, Experiment 2.3)
withoutstimulus-levelspatialuncertainty(green). Forcomparison,theresultsforthesametask
inthefoveawith(blue)andwithout(red)stimulus-levelspatialuncertaintyarealsoshown.
2.8 Generaldiscussions
We showed that by presenting a signal of sufficient contrast in noise, we could
uncover the linear kernel (template) of a shift-invariant mechanism, using an
otherwise conventional classification-image method (or reverse correlation). In
the context of a well-established uncertainty model (cf. Pelli, 1985), spatial
uncertainty or shift invariance can be modeled with a set of linear front-end
channels of identical kernels at different spatial positions. The responses from
thesechannelsarepooledbyamaxoperator. Asignalofsufficientstrength can
positivelybiasoneofthesechannels, makingitthemostlikelyonetodrivethe
system’s response. Noise samples that are negatively correlated with the ker-
nel of the selected channel will suppress its response, occasionally leading to
an error. Hence, by averaging the noise sample from the error trials associated
with a particular signal, we can obtain a negative image of the linear kernel of
34
thechannelthatnormallyrespondedtothissignal. Wedemonstratedthevalid-
ity of this theory with simulations and in three human experiments. We also
showed how the spatial extent of the uncertainty could be estimated from the
classification images.
The key to this method is to present the signal at a sufficient strength such
thatoneparticularchanneloftengeneratesthehighestresponse. Inthesimula-
tions,weshowedthattheresultingclassificationimagesrevealedtheobserver’s
internaltemplateandnotthepresentedsignal.
Another important departure from the conventional classification-image
method is that we do not combine the classification sub-images. Keeping the
sub-images separate allows us to preserve the blurry positive template images
suchthatwecannumericallyestimatethespatialextentoftheuncertainty.
Although this chapter has focused on spatial uncertainty (or shift invari-
ance),thesignal-clampedclassification-imagemethodcanbegeneralizedtothe
othertypesofuncertainties. Thisisbecausethesignal-clampingapproximation
(Equation A.9) depends solely on the validity of the uncertainty model, which
isnotspecifictospatialuncertainty.
The following discussions consider the potentials and limitations of how
this signal-clamped classification-image method may in general be applied to
uncoverinternalrepresentations.
Featurespecificityandinvariance
The spatial structure of the receptive field of a neuron in a higher cortical area
(e.g.,V4,IT)ishardtocharacterizebecausethecell’sresponsesarebothspecific
and invariant. Specificity means that the cell may respond to a face but not to
thelowerhalfofaface. Invariancemeansthatthecellmayrespondequallywell
35
to either a frontal view of a face or a quarter view, although these two images
areverydifferent.
Increases in both specificity and invariance are the hallmarks of visual pro-
cessing. However, both are forms of nonlinearities that render the conven-
tional reverse correlation or classification-image method inapplicable. Speci-
ficity implies that it is statistically unlikely to come across a noise pattern that
happens to activate a mechanism (a neuron) because a partially composed tar-
getmaynotelicitanyresponse. Invariancecausesdistinctimagepatternstobe
sorted into the same response bin. Averaging such patterns often results in a
blurandapatterntowhichthemechanismdoesnotrespondatall.
We have shown that in the case of shift invariance, the problem associated
withinvariancecanberesolvedbysignalclamping. Thismethodcanbegener-
alizedtoothertypesofinvarianceoruncertaintybecausetheuncertaintymodel
(and as a consequence, the signal-clamping approximation of Equation A.9) is
notspecifictospatialuncertainty. Ifafixedsignalisusedtoprobeamechanism,
itwillremainthecasethatthesignalwillbiaspreciselyonechannelfromamong
the many to respond. Noise patterns with pixels that are negatively correlated
with this channel will likely lead to an error in the response. Hence, a classi-
fication sub-image obtained from the error trials will contain a clear negative
imageofthetemplateofthatonechanneloftheinvariantmechanism. Further-
more, the absence of any clear positive image in an error-trial sub-image will
indicatethatthemechanismisindeedinvarianttosomeaspectofthestimulus,
although the precise nature of the invariance is not known. However, unlike
shift invariance, there is no general method to normalize the equivalent tem-
plates of an arbitrary invariant mechanism. The template images revealed in a
36
signal-clampedclassification imageexperimentcorrespond onlytothose chan-
nelsthat responded to the presented signals. For example, if the side view of a
face wasused asthe signal to probe a face-selective neuron, then only the tem-
platerespondingtothesideviewofthefacewillberevealedbytheexperiment,
although the mechanism may respond equally to all views of a face (i.e., the
mechanismisviewpointinvariant).
It may come as a surprise that signal clamping can also be useful to over-
come the difficulties associated with feature specificity of a high-level mecha-
nism. Recallthatfeaturespecificitymeansthatamechanismishighlynonlinear
such that a partial signal often leads to no response. The mechanism requires
a conjunction of features to be present before it generates a response. Random
noisepatternsarethereforeunlikelytoelicitanyresponse. Signalclampinggets
around this problem by using the noise to disrupt, as opposed to activate, a
mechanism. For example, if a mechanism is tuned to the conjunction of two
features (a AND b), and such a mechanism is activated by a stimulus, then a
noise sample that masks either feature “a” or feature “b” will lead to a error
response, and averaging such noise samples will reveal both features “a” and
“b”.
It is often possible to find a pseudo-minimal stimulus that sufficiently acti-
vatesamechanism. AclassicexampleisthereductionmethodthatSaleemetal.
(1993) used to investigate shape tuning of IT neurons. The reduction method,
which starts with a stimulus that the neuron is known to respond to and suc-
cessively reduces the feature and complexity of the stimulus until the cell’s
response drops significantly, isaneffective wayof obtaininga seeminglymini-
mal stimulus that the neuron is tuned to. However, the process of reduction is
subjective, and a choice of reduction made several steps ago may leadto a end
37
patternthatisneitherminimalnoroptimal,andthereisnowaytoknowwhich
wayitis.
We observed that such a pseudo-minimal or suboptimal stimulus could be
usedasthesignal inasignal-clamped classification-image experimentto select
achannelofaninvariantmechanism. Ifthisinitialsignalcontainsapartthatis
superfluous, thenoisecomponentthathappenstomaskthatpartwillnothave
any effect on the mechanism’s response. If a noise component masks a part of
thesignal thatiscrucial tothemechanism, thenthe mechanism’sresponse will
be suppressed, leading to an error (miss). This is particularly true if the mech-
anism has a high feature specificity and does not respond to a partial target.
Critically, the noise patterns that masked the different crucial parts of the sig-
nal during different trials can be “ORed” together by averaging [NOT(a AND
b)=(NOT(a)ORNOT(b))],revealingthecompletesignalthatthemechanismis
tunedtoasanegativeimageintheclassificationsub-imagefromtheerrortrials.
Hence, regarding both invariance and specificity, the “trick” is to present a
signal that can effectively elicit a response from the mechanism of interest and
collect the noise patterns that suppress the response. In some sense, what we
propose here is the opposite of spike-triggered averaging. Rather than adding
upthenoisepatternsthatledtoaspike,weproposetoaddupthenoisepatterns
thatsuppressedaspike.
Detectinginvarianceinamechanism
Consider the letter identification experiment in the periphery (Experiment 2.3,
§2.7). Hadwesummedtheclassification sub-imagesasisconventionally done,
wewouldhaveobtainedadual-templateimagesimilartowhatwasobtainedin
38
the fovea condition without spatial uncertainty (assuming good signal clamp-
ing). There would not be any indication from the classification image alone
that the periphery had a high degree of intrinsic spatial uncertainty. However,
by examining the individual sub-images separately, particularly those from
the error trials, it is very clear that the fovea and the periphery results differ
qualitatively—thepositivecomponentislargelyabsentfromtheerror-trialsub-
imagesintheperipherycondition.
The signal-clamped classification-image method provides a qualitative
means to detect the presence of intrinsic uncertainty in a mechanism, as well
asaquantitativemethodtoestimatetheuncertainty. Inprinciple,themethodis
generally applicable to all types of intrinsic uncertainty and is not restricted to
spatial uncertainty. In short, when the signal-clamping approximation (Equa-
tionA.9)isvalid,thenegativecomponentintheerror-trialsub-imageswillcor-
respond to the template of a single channel in a possibly invariant mechanism
that responded to the presented signal, and the positive component will corre-
spondtotheaverageofallthetemplatesforalltheequivalentsignalsassociated
withthe erroneousresponse ofthemechanism(Equation A.16). Foratwo-way
discrimination task (e.g., our “o” vs. “x” task), the negative component from
one type of error trials (say, XO — signal was “x”, response was “o”) can be
comparedtothepositivecomponentfromthecomplementaryerrortrials(OX).
The discrepancy between the two, aside from a sign difference, is indicative of
intrinsicuncertainty. Thislineofreasoningissimilartothoseinpreviousworks
on detecting observer nonlinearity based on the differences between classifica-
tionsub-images(AbbeyandEckstein,2002;Barthetal.,1999;Solomon,2002).
Whether the discrepancy between the negative and positive components is
easily discernible depends on the type of uncertainty and the stimuli used to
39
probeit. Forexample,consideramechanismthathasanuncertaintyinthesize
but not in the position of a signal. If we tested this system with the “x” versus
“o” task, then the positive component from the OX trials would be an aver-
age of x’s of all sizes, centered on one another. The result would still resemble
an “x”but with a bright andwell-definedcenterand graded strokes extending
outward. That is, the average of x’s of all sizes may not be sufficiently differ-
entfrom asingle, medium-sized“x”. In contrast, theaverage of o’sof allsizes,
which would look like a haze, will be quite different from an “o” of any par-
ticular size. Thus, it would be easy to detect the presence of size uncertainty
with an “o” rather than with an “x” as a signal. We note with interest that the
classification sub-images from the letter identification experiment in the fovea
withnoextrinsicspatialuncertainty(Figure2.2,Rows1and2)appearstoshow
thiskindofsizeuncertainty.
Taskrequirementsandinvariance
A mechanism of sufficient flexibility (e.g., a human observer) may adjust its
degree of invariance to suit the task. For example, when there is no positional
uncertainty in the stimuli, itwould besuboptimal to usea mechanism thathas
ahighdegreeofpositionalinvariance. Theformvisionmechanisminthefovea,
forexample,seemstobecapableof limitingitsdegreeof positional invariance,
and hence the amount of intrinsic spatial uncertainty, when the target position
is precisely known (Experiment 2.1, no-uncertainty condition). In contrast, the
form-visionmechanismintheperipheryappearstobeunabletomakethesame
adjustment(Experiment2.3).
40
Likewise, aflexiblemechanismmustincrease itsdegreeof invariancealong
the relevant stimulus dimension when the task requires it to do so. The let-
ter detection experiment (Experiment 2.2) showed that the foveal mechanism
appears to make the appropriate adjustment when the extent of the spatial
uncertaintyofthestimuluschangedacrossconditions.
A mechanism that flexibly adapts to the task, such as a human observer,
posesproblemstothesignal-clampingmethod. Giventhatrelativelystrongsig-
nalsmustbeused inanexperiment, thepresenceof these signalscan influence
howthemechanismwillotherwiseperformthetask. Forexample,amechanism
thatnormallyhasahighdegreeofpositionalinvariancemaylimititsprocessing
toaparticularregiononthedisplayifthesignalisalwayspresentedatthesame
location. Inthecaseofspatial uncertainty, wemayassumethatthemechanism
is shift invariant, which allowed us to present a single signal at different posi-
tions on the display, andthen shift the noise patterns to normalize the position
of the presented signal before averaging. However, as we noted earlier, there
does not exist a general method for normalizing other types of uncertainty or
invariance. Withoutsuchmethods,animpracticallylarge numberoftrialsmay
beneededtoobtaintheclassificationimagesandtomaintainataskrequirement
ofhighinvariance.
Thecomponentsofarepresentation
Consider the “o” versus “x” experiment. What if a mechanism for this task
represents “x” as“eithera left slash (\) or a right slash (/)”? The psychometric
function(d
′
vs.signalcontrast)ofsuchamechanismisnonlinear,asopposedto
thelinearpsychometric functionofamechanismthatrepresents“x”asasingle
template. Unfortunately, linearity of a psychometric function isnon-diagnostic
41
in practice because other factors, such as other types of uncertainty, can also
leadtoanonlinearpsychometricfunction. Infact,thepsychometricfunctionof
ahumanobserverisrarelylinearforjustaboutanytasktested.
Forthetwo-componentmechanism,theclassificationimageforthe“x”tem-
plate will look exactly like “x”. There will be no indication of the two distinct
componentsin therepresentation. Signal clampingwillnothelptoresolve this
problem without any a priori assumptions about the possible components. In
fact, this is a general problem in all classification-image methods that involve
averagingnoisesamplesacrosstrials. Foreachtypeoftrials,althoughthesignal
andtheresponsewerethesame,thecauseoftheresponsemightvaryfromtrial
totrial. Averagingassumesthatthemechanismdoesnotdistinguishthehigher
order structures within a trial from those across trials, which is clearly incor-
rect. Accumulating higher order statistics across trials seem necessary. Tech-
niques that involve obtaining the covariance (SteveninckandBialek, 1988) in
additiontothemeanappearpromising,buttheirapplicationsremainrestricted
to relatively simple systems or stimuli (e.g., the spatiotemporal receptive field
of macaque V1 neurons, Rustetal., 2005; complex cells in cats, Touryanetal.,
2002; bar detection by human observers, NeriandHeeger, 2002). The major
challenges to these high-order techniques include the determination of what
high-order statistics to collect and whether the number of trials required to
obtainsuchstatisticsispractical.
To decipher a complex mechanism, intuitions about the underlying repre-
sentation are notjusthelpfulbutessential. Returningtoourtoy example,ifwe
have an a priori reason to suspect that the “x” might be represented by a dis-
junctionoftwoslashes(i.e.,“x”=“/”or“\”),wemaytestthishypothesiswith
signal clamping by presenting randomly either the left or the right slash in a
42
trial when “x” is supposed to be the target. Assuming that adequate measures
have been taken to ensure that nature of the task is not changed by the pre-
sented partial signal, we can then left-right reverse the noise patterns from the
error trials when “/” was presented and average them with the noise patterns
from theerrortrialswhen“\”waspresented. Iftheresulting negativeimageis
notthatofasingleslash,thenthehypothesisof“x”beingrepresentedaseither
theleftortherightslashcanberejected.
Thistoyexamplestressesageneralnatureofthesignal-clampingmethod—
it is as much a hypothesis-driven method as a hypothesis-free exploration tool
thattheconventionalclassification-image methodis.
Othermethodsformeasuringuncertainty
Signal-clamped classification images provide one method of estimating the
amount of intrinsic uncertainty (Equations A.23, A.24, and A.25) that is con-
siderably different from the traditional approach. The traditional method for
quantifying intrinsic uncertainty is to estimate M, the number of orthogonal
channelspossessedbytheobserver,bymeasuringtheextentbywhichthepsy-
chometricfunction(d
′
vs.signalcontrast)oftheobserverdeviatesfromlinearity
or, equivalently, its log–log slope deviates from unity (e.g., FoleyandLegge,
1981; Green, 1964; NachmiasandSansbury, 1974; 3rdStromeyerandKlein,
1974; TannerandSwets, 1954). Pelli (1985) used a Weibull approximation to
the psychometric function and established via numerical simulations the rela-
tionship between the parameters of the Weibull function and M. Later work
(e.g., Ecksteinetal., 1997; TylerandChen, 2000; VergheseandMcKee, 2002)
departedfromtheWeibullapproximation and/orderivedanalyticallytherela-
tionship between M and the parameters of a psychometric function. All these
43
approaches assumed the Max-rule model of uncertainty (observer’s response
is determined by the maximally responding channel) and that the M channels
are orthogonal. Most critically, these approaches treat uncertainty in a generic
sense and make no distinction regarding the feature dimension of the uncer-
tainty. For example, uncertainty about a signal’s position is not distinguished
intheseformulationsfromuncertaintyaboutitsorientation. Alltypesofuncer-
tainty are characterized in terms of M—the equivalent number of orthogonal
channelsthattheobserverpossesses.
An alternative approach is to use an image-based decision model and mea-
sure the intrinsic uncertainty of an observer by matching the model’s per-
formance to that of the observer by varying the amount of uncertainty in
the model. With this method, uncertainty must be introduced along one or
more specific dimensions of the stimuli. For example, TjanandLegge (1998)
studied the effect of viewpoint uncertainty on 3-D object-recognition tasks,
whereas ManjeshwarandWilson (2001) measured positional uncertainty in a
line-detection task. Both studies assumed a sum-of-likelihood decision model.
Theseimage-basedmethodscancharacterizeuncertaintyinunitsspecifictothe
feature dimension of the uncertainty (e.g., visual angle for positional uncer-
tainty or angle of rotation for viewpoint). Moreover, these methods do not
requirethetemplatesconsideredbytheobservertobeorthogonal.
Ourmethodofusingsignal-clampedclassificationimagestoestimateintrin-
sic certainty is similar to the image based approach except that it is less model
specific. The method works as long as signal clamping is reasonably effec-
tive (i.e., Equation A.9 is a reasonable approximation). Our method measures
intrinsicuncertaintyintermsofthespreadofthenoisepatternsalongafeature
dimension of interest (e.g., spatial positions of the perceived signal) that led to
44
false alarms. Such noise patterns cannot be clamped or normalized by the sig-
nal and can therefore be separated from the noise patterns that led to misses,
whichareclampedbythesignal. Unliketheimage-basedmethodsusedinear-
lierstudies,ourmethoddoesnotrequiredefiningaspecificimage-basedmodel
oftheobserver.
rSNRandspatialuncertainty
We noted with interest that the quality of the signal-clamped classification
imagesfromhumanobservers,whenmeasuredintermsofrSNR(Equation2.2),
increased as spatial uncertainty in the stimuli increased. In contrast, we
found with ideal-observer simulations that the relationship between uncer-
tainty (intrinsic or extrinsic) and rSNR was actually quite complex. Consider
theletterdetectiontask(FigureA.2). Whenspatialuncertaintyincreasedfroma
spatialextentof32×32stimuluspixels(M = 250)to64×64pixels(M = 1000),
therSNRofthemodel’sclassificationimagesincreasedfrom627to751,whereas
themodel’slogthresholdcontrastincreasedfrom−1.29to−1.14(afactorof1.4
in contrast). When there was no uncertainty, the model rSNR was 2020 (not
shown in Figure A.2) at a log threshold contrast of−1.48. This U-shape func-
tion of rSNR in terms of uncertainty was also evident for the letter discrimi-
nation task (Figure A.1). The rSNRs of the model’s classification images were
1180, 856, and 973 for spatial extents of 1×1, 32×32 (M = 250, not shown in
FigureA.1),and64×64,respectively.
We do not fully understand why rSNR versus spatial extent is a U-shape
function because we do not yet have a close-form expression relating rSNR to
uncertainty. Wesuspect that the U-shape function was a result of the interplay
between uncertainty and contrast threshold. Given that the signal-clamping
45
approximation (Equation A.9) is never perfect, we expect rSNR to decrease as
uncertaintyincreases. However,asuncertaintyincreases, sodoesthethreshold
contrast. Because masking a signal of higher contrast requires larger instan-
taneousamplitudeinthe noise,the negativecorrelation betweenthe presented
signalandthenoisepixelsintheerror-trialclassificationsub-imagesmustthere-
fore bestronger, resulting inahigherrSNR.Inshort, anincrease inuncertainty
cancauseeitheradecreaseoranincreaseinrSNR,dependingontheamountof
uncertaintyandtheamountofthresholdelevationcausedbytheuncertainty.
Intheexperimentsreporthere,humanrSNRalwaysincreasedwithextrinsic
spatialuncertainty,whichrangedfromnoneto64×64pixels(atM = 1000). This
pattern of results can be reconciled with data from the ideal-observer models
by noting that intrinsic spatial uncertainty was always present in the human
observers,evenwhentherewasnouncertaintyinthestimulus(Table2.1). Such
intrinsic uncertainty might place human data on the increasing portion of the
U-shapefunction. Inaddition,internalnoiseinhumanobserversmayalsoplay
arole. AmorethoroughanalysisofrSNRversusintrinsicuncertaintyinhuman
observersawaitsfuturestudies.
2.9 Summary
Mosthumanexperimentsusingtheclassification-image methodspresentasig-
nalineachtrialprimarilytokeeptheobserversengaged. Here,weshowedthat
suchasignal,ifofsufficientstrength, could limitoreveneliminatetheeffectof
uncertaintyontheresultingclassificationimages. Asexamples,wesuccessfully
obtainedclearimagesofhumanobservers’perceptualtemplatesinthefaceofa
highdegreeofspatialuncertainty.
46
A hallmark of visual processing is the progressive increase of invariance.
Because invariance is a form of uncertainty, our method offers a new tool for
uncoveringtheunderlyingrepresentationsinavisualprocessingsystem.
47
Chapter3
CrowdingandClassificationImages
3.1 Introduction
Crowdingreferstothemarkedinabilityinmakingperceptualjudgmentswhen
atarget objectisflankedbyother objects(Bouma,1970;Flom,1991; Flometal.,
1963; StuartandBurian, 1962; Townsendetal., 1971). It is most prominent in
peripheral vision and cannot be explained by a lack of spatial resolution. As
such, crowding may reveal the critical differences between central and periph-
eral vision in their mechanisms for perceiving shape. However, the functional
andphysiologicalcausesofcrowdingareasyetunsettled.
The attentional accounts of crowding suggest that features from the tar-
get and flankers are not bound properly due to a lack of spatial resolu-
tion in the attentional mechanism subserving the visual periphery (Heetal.,
1996; IntriligatorandCavanagh, 2001; Leatetal., 1999; Strasburgeretal., 1991;
TripathyandCavanagh, 2002). The attentional accounts of crowding are often
associated with the claim that crowding originates at a higher level of visual
processing. For example, Heetal. (1996) showed that a crowded target with
48
indiscernible orientation could produce orientation-specific adaptation, sug-
gestingthatcrowdingoccursatastagebeyondtheprimaryvisualcortex. How-
ever, using a similar paradigm, Blakeetal. (2006) showed that the strength of
orientation-specificadaptationcouldbemodulatedbytheseverityofcrowding
when the contrast of the adapting pattern was kept low. Contrary to He et al.,
thislaterfindingarguesforalow-leveloriginofcrowdingwhilebeingambiva-
lentabouttheattentionalaccountingeneral.
Arguingagainsttheattentional accounts, Pellietal.(2004) statedthatapar-
simonious explanation of crowding need not involve attention. They went as
farassuggestingthatphenomenathatweregenerallyassociatedwithattention,
such as illusory conjunction (TreismanandSchmidt, 1982), shared the same
mechanisticrootascrowdinginlowerlevelvisualprocessing. Thisviewimplies
an intriguing possibility that finding the mechanism(s) of crowding will also
resolveabroadersetofissuesconcerningattention.
Regardless of whether the root cause of crowding involves attention, most
theories propose some form of interaction between simple features as a proxi-
malcauseofcrowding. Atleastthreeformsof interaction havebeenproposed:
masking, inappropriate feature integration, and source confusion or feature
mislocalization.
A masking account of crowding argues that the sensitivity to the simple
features constituting the target is suppressed by the presence of the flankers.
Chungetal. (2001) identified the similarity and distinctiveness between ordi-
nary masking (target and masker spatially overlap) and crowding (target and
masker do not overlap). Chung et al. suggested that crowding and ordinary
masking share the same linear filtering stage early in visual processing but
differ in how responses are pooled spatially in the later stages. Later studies
49
emphasizedthe difference betweencrowding andordinary masking. Acritical
difference isthatwhereasordinary maskingaffectsboth detection andidentifi-
cationofvisualpatterns(Thomas,1985),crowdingintheperipheryaffectsonly
identification (Levietal., 2002; Pellietal., 2004). Another masking account of
crowding is surround suppression. Surround suppression refers to the physi-
ological finding of a reduction in a neuron’s response to an otherwise optimal
stimulus when certain patterns are presented outside the neuron’s “classical”
receptivefield(Allmanetal.,1985;HubelandWiesel,1965). PetrovandMcKee
(2006) showed a number of similarities between crowding and surround sup-
pression, including the anisotropy of the spatial interaction zones observed in
crowding (ToetandLevi, 1992). Although the current consensus is that crowd-
ingcannotbeexplainedbyordinarymasking,surroundsuppressionasamech-
anismforcrowdingremainsviable.
That crowding in the periphery affects identification but not detection sug-
gested a dissociation between the process of detecting simple features and the
subsequent process of integrating these features into a recognizable pattern
(Levietal., 2002; Pellietal., 2004). An inappropriate integration account of
crowdingpostulatesthatthedeficienciesoffeatureintegrationintheperiphery
leadtocrowding. Levietal.(2002)suggestedthattheflawoffeatureintegration
originates from the use of a defective template that is not well matched to the
targetataprocessingstagebeyondfeaturedetection. Parkesetal.(2001)argued
that feature attributes such as orientation appear to be compulsorily averaged
such thatalthough the meancanbe accuratelyrecorded, the individualfeature
50
attributes are inaccessible. A general form of this claim is that feature integra-
tion in the peripheryamounts to computing group statistics of the various fea-
ture attributes within a spatial region. While these group statistics are accessi-
ble,theindividualsamplevaluesarelost.
A source-confusion account of crowding suggests that features from the
flankers are mistaken to be features of the target (KrumhanslandThomas,
1977; Wolford, 1975). While this account is often associated with the atten-
tional account of crowding (e.g., Strasburger, 2005; Strasburgeretal., 1991), it
is equally consistent with the fact that spatial uncertainty in the periphery is
high(LeviandKlein,1986;Levietal.,1987;Pelli,1985).
It is to be noted that none of the accounts of crowding are mutually exclu-
sive. The goal of this study was to use a data driven approach to elucidate the
mechanismsof lettercrowding withoutmakingapriori assumptionsaboutthe
possible mechanisms. We will stay mostly neutral as to whether attention is a
root cause of crowding. However, our results will address the three proximal
accounts of crowding: masking, inappropriate feature integration, and source
confusion.
Building on our method of signal-clamped classification images
(TjanandNandy, 2006), we developed three novel analytic procedures to
reveal the features and their spatial configurations used by the central and
peripheral visual systems in a letter-identification task when the target letter
waseitherpresentedaloneorflankedbyotherletters.
Classification-image methods (Ahumada, 2002; BeardandAhumada, 1999)
have been very useful in revealing an observer’s visual strategy (e.g., vernier
acuity: BeardandAhumada,1999;stereopsis: Nerietal.,1999;illusory-contour
perception: Goldetal., 2000; identification of facial expression: Adolphsetal.,
51
2005; GosselinandSchyns, 2003; surround effect on contrast discrimination:
Shimozakietal.,2005). By“visualstrategy”,wemeanthepartsofstimulusthat
are used by a subject to perform the task. Wewill loosely referto such a visual
strategyrevealedbyaclassification-imagemethodasa“template”andtakethe
termtomeanthespatiotemporalaverageofthefeaturesusedbythevisualsys-
tem. We recently showed that the standard classification-image technique can
be easily extended to overcome the spatial uncertainty that is either present in
thestimuliorintrinsictothevisualsystem,therebyrevealingtheshift-invariant
templateusedbythevisualsystem(TjanandNandy,2006). Becausethehuman
periphery is known to exhibit a significant amount of intrinsic spatial uncer-
tainty (LeviandKlein, 1986; Levietal., 1987; Pelli, 1985), our signal-clamping
methodaffordsustheabilitytoexaminetheperceptualtemplatesthatareused
withandwithoutlettercrowding. Thishelpstoaddressoneofthefundamental
questionsregardingthenatureofcrowding: Iscrowdingcausedbyadistortion
intheperceptualtemplate?
Toaddresstheissueofsourceconfusionorfeaturemislocalization,wedevel-
opedamethodtoinferthespecificstructural characteristicsoftheflankersthat
correlatewithcrowding-induced errors. Theresultsallowustopositthepossi-
blecontributors tocrowding.
Toaddresstheissuesrelatedtofeaturedetectionandintegration,weshowed
that the noise fields in signal-clamped classification images contain sufficient
information to reveal the second-order correlation structures of sub-template
features. Computing these correlation structures allows us to infer the possi-
ble shape of the putative features, compare them to features used by an ideal-
observer model, and determine the spatial region from which these features
wereextractedbythevisualsystem.
52
Insummary,weprobedintothenatureoflettercrowdingbyestimatingand
comparing the following between crowded and non-crowded conditions: (1)
first-order classification images, which are the spatiotemporal average of fea-
tures; (2) structural effect of the flanking letters; (3) spatial extent of feature
utilization; and (4) second-order feature maps. To preview, we found that (1)
crowding did not cause distortions in the perceptual templates, (2) response
errors during crowding were strongly correlated with spatial structures of the
flankers, (3) intrinsic spatial uncertainty was not systematically affected by
crowding, and (4) crowding reduced the amount of valid features utilized by
the visual system and, at the same time, increased the amount of invalid fea-
turesused.
3.2 Analysisprocedures
Inthissection,wewilloutlinethreenovelproceduresfortheanalysisofcrowd-
ing: (1) a procedure to calculate and visualize any systematic structures in the
flankersthatleadtoerrorsundercrowding,(2)aproceduretoestimatethespa-
tialregionoverwhichfeaturesaredetectedandutilized,and(3)aprocedureto
calculate and visualize the features in terms of their second-order statistics. Of
thesethreeprocedures,thefirstisapplicableonlytoclassification-image exper-
imentswithflankerspresent. Thelasttwoproceduresarebroaderinscopeand
canbeappliedtoanysignal-clampedclassification-image data. Inthenextfew
paragraphs, we will provide a brief conceptual overview of the analytical pro-
cedures. DetailsoftheproceduresaregiveninAppendixC.
53
As we shall describe in greater detail in the Methods section, the general
setup of the experiment was that of letter identification in noise. Briefly, in dif-
ferent blocks, the target letter (“x” or “o”) was either presented alone or was
flankedbyotherletters. BoththetargetandtheflankersweremaskedbyGaus-
sianluminancenoisewithaconstantspectraldensity. Thecontrastofthetarget
and the flankers was the same and was adjusted to maintain an accuracy level
of75%correct. Wetestedatthefoveaandintheperiphery.
Obtainingclassification imagesintheperipherycan bechallengingbecause
ofthehighintrinsicspatialuncertaintyintheperiphery. Wewereabletosignif-
icantly reduce the effect of uncertainty by presenting a relatively high contrast
signal in our classification-image experiments. This “signal-clamping” tech-
nique was developed in TjanandNandy (2006). Appendix A which provides
a detailed analysis of its various properties. A brief intuitive exposition is pro-
videdinsectionC.1.
To assess and visualize the effect of the flankers on an observer’s percep-
tionofthetargetletter,weextendedthefirst-orderclassification-imageanalysis
to include the flanking letters as part of the masking noise. We constructed
the classification images bytesting for deviations from the expected pixel-wise
mean of this composite noise when the flankers and the masking noise were
sortedaccordingtotheobserver’sresponseforeachpresentedtargetletter(see
§C.2). The result is a map that reveals the structural elements of the flankers
thatinfluencedtheobserver’sresponse,therebyallowingadirectassessmentof
sourceconfusion ofvisualfeaturesundercrowding.
A methodological point of departure between the current study and most
of the early work involving classification images is that we were able to reveal
properties of the sub-template features used by the visual system beyond their
54
spatiotemporalaverages(whicharetheconventionalclassificationimages). For
eachtrial,wecomputedthecorrelationsbetweenpairsofnoisepixelsasafunc-
tionoftheirrelativeseparationsinspace. Becausethemaskingnoisewaswhite,
the expected pairwise correlation would be zero. Anyaccidental configuration
of noise pixels that resemble a feature used by the visual system would sys-
tematically affect the observer’s response. Thus, any nonzero correlations that
emergeasaresultoftheanalysisrevealthesecond-ordercorrelationalstructure
of the sub-template features (see§C.3). Comparing human feature maps with
thoseobtainedfromanideal-observermodelprovidesanassessmentoftheuti-
lization and validity of the features used by the visual system under different
experimentalconditions.
A prerequisite for computing the pairwise correlations is to decide on a
region in the noise field for carrying out the computation. If the region is too
small, the correlations will be weak due to insufficient sample size. If it is too
large, it will include regions of the noise ignored by the visual system, thereby
diluting the correlations that were due to the features used. We developed a
method to search for a region in the stimulus that leadsto a maximum level of
correlation (see§C.4). An important by-product of finding this optimal region
ofinterest(ROI
opt
)isthatitrevealstheregion intheimagewheretask-relevant
featureswereextractedbythevisualsystem.
3.3 Methods
Human experiments were conducted with four different experimental condi-
tions. There were two viewing conditions – foveal and peripheral at 10
◦
in the
55
inferior visual field. In each of the viewing conditions, lowercase target let-
ters “o” and “x” were presented either singly (unflanked) or were flanked on
either side by two other lowercase letters. The foveal conditions (flanked and
unflanked) served primarily as control conditions to enable direct (within sub-
ject)comparisonwiththecorrespondingperipheralconditions.
Subjects
Threesubjects (BB,LM,andOR,allstudentsattheUniversityof Southern Cal-
ifornia) with normal or corrected-to-normal vision and na¨ ıve to the purpose of
the study participated in the experiments. All had (corrected) acuity of 20/20
or better in both eyes. Subjects viewed the stimuli binocularly in a dark room
with a dim night-light. Written informed consent wasobtained from each sub-
ject before the commencement of data collection. Because of the monotonous
nature and long duration of each experiment (approximately 24 hr), subjects
were allowed (and encouraged) to take breaks whenever they so desired. All
the subjects completed their respective experiments in 14–16 sessions over a
spanofabout2weeks.
Procedure
In each of the four experimental conditions, the task was a 2AFC letter-
identification task in which the subject had to indicate whetherthe letterat the
targetposition was“o”or“x”,irrespectiveoftheflankersiftheywerepresent.
Each experimental condition consisted of 10 blocks with 1100 trials per
block. Thus, the entire experiment consisted of 40 blocks with the four exper-
imental conditions randomly intermixed. For the two unflanked conditions,
56
a white-on- black letter was presented on each trial in a Gaussian luminance
noise field. For the two flanked conditions, three white-on-black letters (target
+ one flanker on each side) were presented in the noise field. The target was
always at the center of the noise field. The noisy stimuli were presented at a
viewingdistance of 154cm forthe foveal viewingconditions andat105cm for
theperipheralviewingconditions. Intheperipheralviewingcondition,subjects
fixated at an LED 10
◦
above the center of the noise field. The first 100 trials in
each block were calibration trials in which the letter contrast was dynamically
adjusted using the QUEST procedure (WatsonandPelli, 1983) as implemented
in the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) to estimate a “cali-
brated”thresholdcontrastforreachinganaccuracylevelof75%. Theremaining
1000trialswithin ablock were dividedinto 5sub-blocks of 200trials each,and
QUEST was reinitialized to the calibrated value at the beginning of each sub-
block. During the initial 100 calibration trials, the standard deviation of the
priordistribution of thethreshold valuewassetto5logunits(practicallyaflat
prior), but for each subblock, the prior was narrowed to a standard deviation
of 1 log unit. This restricted the variability of the test contrast but still allowed
adequate flexibility for the procedure to adapt to the observer’s continuously
improvingthresholdlevels.
Prior to the main experiment, letter acuity was measured separately for the
foveal and peripheral viewing conditions. The subjects were asked to identify
anyofthe26lowercase letters,whilethesizeofthepresentedletterwasvaried
usingQUESTtoachieveanidentificationaccuracyof79%.
57
Subject Fovealettersize Peripherylettersize
BB 0.15
◦
0.81
◦
LM 0.20
◦
1.14
◦
OR 0.18
◦
0.89
◦
Table 3.1: Stimuli letter size (x-height) for each subject in units of degrees of
visualangle.
Stimuli
The stimulus for each trial consisted of either a white-on- black letter
(unflanked)orthreewhite-on-blackletters(flanked)addedtoaGaussian,spec-
trally white noise field of 128 × 128 pixels. Before being presented to the
observers, each pixel of this noisy stimulus was enlarged by a factor of 2, such
that four screen pixels were used to render a single pixel in the stimulus. This
was done to increase the spectral density of the noise. The noise contrast was
fixedat25%rms. Inthefovealviewingconditions(distance=154cm),thenoise
fieldwasofsize3.2
◦
withaspectraldensityof39.8μdeg
2
,whereasintheperiph-
eralviewingconditions(distance=105cm),thenoisefieldwasofsize4.7
◦
with
a spectral density of 85.5μdeg
2
. The mean luminance of the noisy background
was 19.8cd/m
2
. In the flanked condition, the target and flankers had the same
contrast.
The letter stimuli were presented in “Arial” font (Mac OS 9). The target
letterswere lowercase “o”and“x”. Theflankingletters wererandomlychosen
from the following set: “a”, “c”, “e”, “n”, “r”, “s”, “u”, “v”, and “z”. The
flankinglettersdidnotincludethetargetlettersandwerechosensuchthatnone
of them had any ascenders (e.g., “b”) or descenders (“p”) or were of unusual
width (“w”, “i”). Letter size was set to twice the subject’s letter acuity at the
respective eccentricity. The letter size (the height of a lowercase “x”) used for
each of our subjects is shown in Table 3.1. The center-to-center letter spacing
58
in the flanked conditions was 1 x-height. The contrast of the target letter (and
the flankers, if any) was adjusted with a QUEST procedure as described in the
Proceduresection. Figure3.1Adepictsexamplesofthenoisystimuliusedinthe
fourexperimentalconditions.
The stimuli were displayed in the center of a 19-in. CRT monitor (Sony
Trinitron CPD-G400), and the monitor was placed at a distance of 105 or 154
cm (depending on the experimental condition) from the subject. After calibra-
tion and gamma-linearization, the monitor has 11 bits (2048 levels) of linearly
spacedcontrast level. Theexperimentsandthemonitorwerecontrolled froma
Mac G4 running OS 9.2.2. All 11 bits of the contrast levels were addressable to
render the noisy stimulus for each trial. This was achieved by using a passive
videoattenuator(PelliandZhang,1991)andacustom-builtcontrastcalibration
and control software implemented in MATLAB. Only the green channel of the
monitorwasusedtopresentthestimuli.
The stimuli were presented according to the following temporal design: (1)
afixation beepimmediatelyfollowedbyafixation screenfor500ms,(2)stimu-
luspresentation for250ms, (3)subject response period(variable) with positive
feedback beep for correct trials, and (4) 500 ms delay before onset of the next
trial(seeFigure3.1B).
At the end of each trial, the following data were collected for subsequent
dataanalysis: thestateofthepseudorandomnumbergeneratorusedtoproduce
the noise field, the identity and contrast of the target letter, the identity of the
flankers(onlyforthetwoflankedconditions),andtheresponseofthesubject.
59
fovea periphery
flanked unflanked
A
B
500 ms
250 ms
500 ms
var
subject
response
1
2
3
4
t
Figure 3.1: (A) Examples of stimuli used in the four experimental conditions. (B) Timing of
stimulipresentation: (1)fixationbeepimmediatelyfollowedbyafixationscreenfor500ms,(2)
stimulus presentation for 250 ms, (3) subject response period (variable)with positive feedback
beepforcorrecttrials,and(4)500msdelaybeforeonsetofnexttrial.
60
3.4 Resultsanddiscussions
ThresholdSNR
Figure 3.2A depicts the log SNR threshold (contrast energy vs. noise spectral
density) orlog(E/N)forthe fourexperimentalconditions. WeuseSNRthresh-
olds instead of the contrast thresholds to allow direct comparison between the
two viewing conditions, which had different stimuli sizes and noise spectral
densities. For all subjects, there was a significant threshold elevation in the
periphery flanked condition as compared with the periphery unflanked con-
dition, showing that we had indeed induced crowding in the periphery. In
sharpcontrast, therewasasmallbutsignificant oppositeeffectinthefovea: The
flankersactuallyaidrecognition. Asitwilltranspirefromtheresultsofthefea-
tureutilizationzone(Figures3.4and3.5),thefacilitationappearstobetheresult
of a reduction in intrinsic spatial uncertainty. The presence of flankers mayact
asguidepostsmarkingthetargetposition.
ThehumanE/N thresholdcanalsobecomparedtotheideal-observermodel
described earlier in the Feature maps section. We will use this ideal-observer
modeltoevaluatethehumanfeaturemaps. Recallthattheideal-observermodel
optimallyperformstheletter-identificationtaskwithoutflankersbutwithaspa-
tialuncertaintyequatedto±1.5x-heightinbothhorizontalandverticaldimen-
sions. Table 3.2 summarizes the total efficiency (defined as the SNR threshold
[E/N]of theideal-observermodeldividedbytheSNRthreshold ofthehuman
observer) for the four experimental conditions. These values are higher than
what would be expected from the literature (e.g., NasanenandO’Leary, 1998;
61
BB LM OR
Foveaunflanked 12.8 11.3 13.5
Foveaflanked 24.7 25.5 20.6
Peripheryunflanked 19.4 19.2 13.5
Peripheryflanked 4.4 4.0 0.6
Table3.2: Efficiency(%)ofhumanperformancewithrespecttothatoftheideal-
observermodel.
Pellietal., 2004; SolomonandPelli, 1994; Tjanetal., 1995) because our ideal-
observermodelislimitedbyspatialuncertainty,thusmakingitsuboptimalwith
respecttotheactualstimuli,whichdidnothaveanyspatialuncertainty.
First-orderclassificationimages
The reconstructed classification images for subjects BB, LM, and OR are
depicted in Figure 3.2B. The images were reconstructed separately for the four
experimental conditions; the sub-images for the individual stimulus–response
categories were not combined aswe have previously argued (TjanandNandy,
2006),andwelimitedouranalysestotheerror-trialsub-images. Wefoundthat,
withtheexceptionoftheperiphery-flankedconditionforsubjectOR,thesignal-
clampedclassification-imagetechniquewasabletorevealafirst-ordertemplate
associated with the presentedsignal forallconditions. Thetemplateswerevir-
tuallyidenticalforthetwofovealconditions—wecanclearlyseethe“ring”for
the “o” and the “dot” for the “x”. The “dot” corresponds to the region where
the two oblique strokes of the “x” intersect. In the periphery-unflanked con-
dition for all subjects, the templates were of a lower contrast than the fovea-
unflankedcondition(seeTable3.3,firstrowvs. secondrow)butwereneverthe-
lessundistorted. Theyalsohadthesamestructural characteristics asthefoveal
templates. ComputationalsimulationsinTjanandNandy(2006)showedthatif
62
unanked anked unanked anked
0
1
2
3
4
log(E/N)
BB
3890 1091
1371 3648
3821 1180
1028 3971
N = 3574 1410
1022 3994
3639 1447
881 4033
unanked anked unanked anked
0
1
2
3
4
log(E/N)
LM
3581 1411
1082 3926
3802 1248
849 4101
3603 1474
882 4041
3507 1547
906 4040
unanked anked unanked anked
0
1
2
3
4
log(E/N)
OR
3798 1200
1026 3976
3733 1297
1025 3945
3325 1621
616 4438
3457 1611
770 4162
A B
fovea periphery
anked unanked
resp = “ o” resp = “x”
stim = “x” stim = “o”
periphery fovea
periphery fovea
periphery fovea
anked unanked anked unanked
Figure 3.2: Log thresholds in SNR (contrast energy E divided by noise spectral density N)
for the four experimental conditions. There is a significant increase in E/N in the periphery-
flanked condition as compared with the periphery-unflanked condition. There is also a small
but significant opposite effect in the fovea. Note that the standard error bars are smaller than
the plot symbol. (B) First- order classification images. The raw images have been filtered by
a Gaussian kernel (space constant = 1.4 pixels) to aid visualization. The numbers indicate the
number of trials for the corresponding stimulus–response category. For the fovea conditions
with small letter size, only the central portion of the classification images is magnified and
shown.
an observer mechanism uses a template that differs from the presented stimu-
lus, it is the perceptual template, not the presented stimulus, which forms the
negatively correlated component in the classification image of the error trials.
63
BB LM OR
Foveaunflanked 175.4 261.2 108.8
Foveaflanked 150.9 214.1 82.1
Peripheryunflanked 152.5 154.4 78.9
Peripheryflanked 26.3 35.6 ≈ 0
Table 3.3: rSNR of the classification images (Murrayetal., 2002 with modifica-
tion for signal-clamped classification images as described in TjanandNandy,
2006)estimatedfortheexperimentalconditions.
Hence,weconcludethattherewasnouncalibratedsamplingdisarray(contrary
to HessandField, 1993; HessandMcCarthy, 1994) at the scale of the letters in
theperiphery.
More important, forsubjectsBBandLMinthecrowdingcondition (periph-
ery flanked), the template for the letter “o” was undistorted as well, albeit of a
lowercontrast(combinedrSNRfor“o”and“x”: 26.3[BB],35.6[LM]).Thisfind-
ingsuggestedthatperceptualtemplatesintheperiphery,evenundercrowding,
arenotdistortedandthatcrowdingcouldnotbeattributedtoaberrationsinthe
template.
The contrast of the classification images in the periphery flanked condition
wassubstantiallylowerthanthatintheunflankedconditions(Table3.3,second
vs. third row). For subject OR, the contrast was too low to visualize the classi-
fication images. For the other two subjects, the template for the letter “x” was
also invisible. A reduction in the contrast of the classification images implies
thatthereweresourcesinadditiontothemaskingnoisethatledtotheresponse
errors. The possibilities include feature source confusion (i.e., mistaking fea-
turesfromtheflankersasiftheywerefromthetarget)anddeficienciesinfeature
detectionandutilization. Wewilladdressthesepossibilitiesnext.
64
Featuremislocalization
Following the procedure outlined in the Flanker analysis section, we obtained
the z-score (Equation C.3) and t-test (Equation C.4) maps shown in Figure 3.3.
As can be seen in Figure 3.3A, there was a strong effect of the flankers in the
error-trialsub-mapsintheperiphery-flankedcondition–Z
OX
(upperright)and
Z
xo
(lower left), which was the case for all subjects. There were distinct struc-
tural featuresof the flankersthat biasedtheobserver’s response —specifically,
the presence of horizontal or oblique strokes or the absence of vertical strokes
bias toward an “x” response and the presence of ring-like curves or vertical
strokes or the absence of a central dot-like patch bias toward an “o” response.
This is particularly clear in the t-test maps (Figure 3.3B) where we contrasted
betweentheerrortrialsofthetworesponsetypes. Itwasasiftheobserverscon-
fusethesetarget-likestructuresoftheflankersasiftheywerefeaturesofthetar-
get. Thisresultprovidesadirectevidencethatasignificantcauseofcrowdingis
sourceconfusionoffeatures,aspreviouslysuggested(KrumhanslandThomas,
1977; Strasburger, 2005; Wolford, 1975), although our result is equivocal as to
the causal connection between source confusion and the attention-deficiency
accountofcrowding(e.g.,Heetal.,1996;TripathyandCavanagh,2002).
Therealsoappearstobesignificantflankereffectsinthecorrecttrails. How-
ever, the flanker classification images from the correct and error trials for the
same presented stimulus are dual to each other. They provide the same infor-
mation but with a sign change and at a different level of significance. Had we
not thresholded the maps at the α level of .05, the maps from the correct and
error trials would show the same pattern. Consider, for example, a list of 1000
random numbers with a mean of μ. If 10 of the largest numbers are removed
toformanewlist,thenthenewlistwilllikelyhaveameansignificantlyhigher
65
thanμ. Themeanoftheoldlistwiththe10largestitemsremovedwillbeslightly
lowered butmaynot reachstatistical significance according to az test. Forthis
reason,weonlyanalyzetheerrortrials.
In contrast to results in the periphery, the errors in the fovea were entirely
driven by the noise in the target region. There was no effect of the flankers.
Neitherthez-scorenorthet-testmapsshowanysignificantcorrelationbetween
pixels in the flanker locations and the error response. Most of the pixels with
significant correlations were located in the target position (i.e., congruent with
theclassification images).
Featureutilizationzones
Excluding feature mislocalization, are there other causes of crowding? In this
subsection as well as in the next, we will analyze the second-order correlation
structures in the residual noise fields. A residual noise field (Equation C.5) is
a noise field that does not contain the flankers and is orthogonal to the corre-
spondingfirst-orderclassificationsub-image. Asmentionedearlier,themethod
ofsignal-clampedclassification imageseparatesthemechanismselectivetothe
presentedtarget from the mechanismselective tothealternative target bytheir
relative effects on the first-order classification sub-images. This is because of
the fact that a high-contrast signal reduces the intrinsic spatial uncertainty of
only the mechanism selective to the presented signal. With high intrinsic spa-
tial uncertainty in peripheral vision, the resulting first-order signal-clamped
classification sub-images from error trials will be dominated by the percep-
tualtemplate(spatiotemporalaverageoffeatures)ofthemechanismselectiveto
the presented target. By projecting out this first-order classification sub-image
(EquationC.5),theresidualnoisefieldswillcontainthehigherordercorrelation
66
A
B
BB OR LM
fovea periphery fovea periphery
z
t BB OR LM
−6
−4
−2
0
2
4
6
−6
−4
−2
0
2
4
6
positive contrast negative contrast “O” “X”
resp = “ o” resp = “x”
stim = “x” stim = “o”
α = 0.05 α = 0.05
Figure 3.3: Flanker analysis results: (A)z-scoremaps (Z
OX
, upper right box; Z
XO
, lower left
box) thresholded at α = .05 (|Z| > 1.96) indicate the presence (positive contrast) or absence
(negativecontrast) of significant featuresthat bias an observer’s response; (B)t-test maps, also
thresholded at α = .05, directly compare the error trials of the two response categories. The
“hot” regions indicate featuresthat bias toward “o”, whereasthe “cold” regions areindicative
offeaturethatbiastoward“x”. Forthefoveaconditions,onlythecentralportionoftheclassifi-
cationimagesismagnifiedandshown.
structures associated with the mechanism selective to the target that was not
presented and, thus, not spatially clampedby the stimulus. Thespatial disper-
sion of thesecorrelation structures from the target position provides ameasure
67
of the intrinsic uncertainty and the spatial range within which features were
detected and utilized to affect response. This spatial range is referred to as a
featureutilization zoneandisdefinedbyEquationC.14. Itisimportanttobear
in mindthatthisfeature utilization isnottobe confused with feature mislocal-
izationestimatedintheprevioussection. Thetwoareindependentbecausethe
residualnoisefieldsdonotcontaintheflankers.
The feature utilization zones are depicted in Figure 3.4. The significance
(mean log p values, Equation C.13) of the different candidate ROIs (Equa-
tion C.12) are color coded, and these color-coded regions are superimposed in
ascending order of significance (descending order of mean log p values); the
most significant region (demarcated by a blue dotted line) represents the opti-
malROI(Equation C.14)and,hence,the featureutilization zone. Thepositions
of the target and the flankers are superimposed on the maps to give a sense of
theextentoftheutilizationzones.
Figure 3.5 summarizes the horizontal extent of the feature utilization zones
andthesignificanceleveloftheoptimalROIsforthedifferentexperimentalcon-
ditionsforeachofoursubjects. Thesignificance level(meanlogpvalues)asso-
ciatedwiththeoptimalROIcanbetakentoindicatethesensitivityandtrial-to-
trial consistency of feature detection. Bythismeasure, there isnocoherent pat-
tern between subjects in terms of feature detection (Figure 3.5A). Comparing
the two periphery conditions, we found that feature detection is numerically
weaker in the flanked condition for subject BB and that there is no difference
for subjects LM and OR This result is consistent with most theories of crowd-
ing— that the deficiencyliesin second stage feature integration andnot in the
detectionofsimplefeatures(e.g.,Heetal.,1996;Pellietal.,2004).
68
For subject BB, the feature utilization zone (Figure 3.4, demarcated in blue)
shrinks vertically in the flanked conditions for both foveal and peripheral pre-
sentations. Thus, for this subject, who was our most experienced subject and
who had performed in similar experiments for over 120000 trials, the zone of
feature utilization is quite well bounded around the target region even under
crowding. In both the fovea and the periphery conditions, the presence of
flankers led to a sizable reduction in the subject’s intrinsic spatial uncertainty.
For subject OR, the utilization zone similarly shrinks vertically in the fovea-
flankedcondition In theperiphery, the zoneof utilization inthe flankedcondi-
tion remains roughly the same as in the unflanked condition. For subject LM,
theutilizationzonesremainroughlythesameunderflankinginbothfoveaand
periphery conditions. In general, the utilization zone either shrunk vertically,
orthogonal to the letter arrangement, or remained unchanged in the flanked
condition as compared with the unflanked conditions at the corresponding
eccentricity. Relativetothestimulussize,theutilizationzonesintheperiphery-
flanked (crowded) condition were not much larger than those in the fovea-
flanked(non-crowded)condition. Assuch,crowdingcannotbeattributedtoan
increase in the intrinsic spatial uncertainty associated with feature utilization
beyond any feature mislocalization induced by the flankers. This conclusion
is further strengthened by a lack of change in the horizontal extent of the fea-
tureutilizationzonesbetweenthecrowdedandnon-crowdedconditionsinthe
periphery(Figure3.5B).
The relatively larger horizontal extent of the feature utilization zones in the
fovea compared to the periphery (subjects BB and OR) in units relative to the
stimulussizedeservesfurthercommentary. Thefeatureutilizationzonereveals
the image locations from which simple features were extracted, excluding any
69
directinfluenceoftheflankers. Itdoesnot,however,telluswhathappensafter
those features have been extracted. In the fovea, although the spatial uncer-
tainty of a letter stimulus may be (hence, a large feature utilization zone), the
relative uncertainty between the letter features can still be very low. For exam-
ple, a widely separated pair of “/” and “\” may not be put together to form
“X” in the fovea. In contrast, despite having a smaller feature utilization zone,
the relative spatial uncertainty between features may still be very large in the
periphery. It is also possible that feature utilization in the periphery is sparse,
such that the mere presence of a simple feature may be sufficient to trigger a
false alarm. A zone with either sparse feature utilization (improvised feature
integration)orhighrelativespatialuncertaintybetweenfeatures(improperfea-
tureintegration)cannoteffectivelyexcludetheinfluencefromflankers.
Second-orderfeaturemaps
Usingthefeatureutilizationzones(optimalROI)identifiedintheprevioussec-
tion, we calculated the second-order (pairwise) correlations between pixels in
the residual noise fields (see §C.3). Because the masking noise was uncorre-
lated white noise, any significant correlations detected in the residual noise
field can only be due to the coincidental features in the noise that were consis-
tentlydetectedbytheobserverasfeaturesthataffectedthebehavioralresponse.
Unlike the first-order classification images, which reveal the spatial configura-
tionoffeaturesaveragedacrosstrialsbutnotthefeaturesthemselves,thesecor-
relation maps reveal the second-order structure of the extracted sub-template
features. By“feature”,wespecificallymeanafragmentofthetarget(agroupof
pixels) that is detected asa whole within atrial (asopposed to pixelsaveraged
acrosstrials).
70
-0.4
-0.42
-0.44
-0.46
-0.48
-0.5
-0.52
BB LM OR
fovea periphery
anked un anked anked un anked
1
−
N
1 x-height
Σ
log(p)
Figure3.4: EstimatedROI
opt
orfeatureutilizationzones. Thesignificance(meanoflogpval-
ues) of the different candidate ROIs are color coded, and these color-coded regions are super-
imposed in ascending order of significance; the most significant regions (demarcated by the
blue dotted lines) represent the optimal ROIs. The positions of the target and the flankers are
superimposedtogiveasenseoftheextentoftheutilizationzones.
71
fovea
BB LM OR
0
0.5
1
1.5
2
2.5
3
3.5
Width of ROI
opt
(x-height)
BB LM OR
0
0.5
1
1.5
2
2.5
3
3.5
periphery
BB LM OR
0
0.1
0.2
0.3
0.4
0.5
BB LM OR
0
0.1
0.2
0.3
0.4
0.5
un!anked
!anked
Geom. mean of p-values in ROI
opt
A
B
Figure 3.5: (A) Geometric mean of the p values in the ROI
opt
, which is a measure of the
amountoffeaturespresentintheROI.(B)Horizontalextentofthefeatureutilizationzones.
The second-order feature maps for subjects BB, LM, and OR are shown in
Figure 3.7. For purposes of comparison, the feature maps calculated from an
ideal-observer model performing the same 2AFC letter-identification task in
the unflanked conditions are shown in Figure 3.6. The ideal-observer model
72
(TjanandNandy, 2006) used the letter stimuli corresponding to the ones used
in the human experiment and is limited by a considerable amount of spatial
uncertainty (±1.5 x-height). The positive and negative correlation zones from
the ideal-observer maps at rZ = ±1.0 are overlaid on the human correlation
maps as black (+) and white (−) contours, respectively. One important fact to
bear in mind is that the ideal-observer model uses the entire letter as a single
feature for its pattern matching operations. If a subject were using a fixed set
of letter fragments as features (e.g., a pair of parallel bars to detect the letter
“o”), thenthe positive partofthe feature mapof the subjectwould bestrictly a
subsetof the significant positive correlations found inthe corresponding ideal-
observerfeaturemap. Itisalsopossiblethatasubjectusedfeaturesthatarecon-
sidered inefficient by the ideal-observer model. In general, we may not expect
any good match between the human and ideal-observer feature maps, but a
comparisonbetweenthetwoisinformative.
Let us first consider the maps for subject BB In both the fovea conditions
regardless of flanking, the subject (Figure 3.7: rows 1 and 2, columns 1 and
2) seems to be using a very similar set of second-order correlations as that of
the ideal-observer model, suggesting that the subject used the entire letter as
a unitary template. The resemblance is more striking for the letter “x” than
for the letter “o”. In the periphery, however, the subject seems to be using a
different feature set as compared with the ideal-observer model. The feature
set in the periphery-unflanked condition (Figure 3.7: row 1, columns 3 and 4)
is sparser, but it still appears to draw from the more complete set used by the
ideal-observer model. For example, whereas the ideal-observer model utilizes
a complete set of diagonal correlation for the letter “x”, the subject seemsto be
73
using correlated fragments at the extremities of a diagonal. In the periphery-
flanked condition (Figure 3.7: row 2, columns 3 and 4), the deviation from the
ideal-observermodelismorepronounced.
Toquantifyfeature detection and utilization forall of oursubjects, wecom-
pare a human feature map with the corresponding ideal-observer feature map
in terms of the three quantities defined in the Feature maps section: quality of
match(Q
m
)betweenthehumanandideal-observerfeaturemaps,fractionofthe
ideal-observerfeatures that arealso usedby thehuman observer (U, orfeature
utilization), andthefraction ofthehumanfeaturesthatarealsofeaturesforthe
ideal-observer model (V, or feature validity). Recall that for U and V, we con-
sideronlypositivelycorrelatedfeatureswithrZ > +1.0. Recallalsothatwewill
separately analyze the mechanisms used for detecting “o” from those used for
detecting“x”. TheresultsaresummarizedinFigure3.8.
Foveaconditions
The quality of match (Q
m
) will be close to 0.5 if a human feature map bears
no resemblance to the ideal-observer feature map. In the fovea, flanking the
target led to better performance in terms of contrast threshold; it also led to a
statistically significant increase in Q
m
for all mechanisms in all subjects. In the
fovea-flankedcondition,allQmvaluesaresignificantlyabove0.5,withthetwo
lowest Qm values of 0.64 and 0.67 coming from the “o” mechanisms of sub-
jectBB andOR,respectively. Visuallyinspecting thefeature mapsin Figure 3.7
confirmsthisquantification.
With the exception of the “x” mechanism for BB, feature utilization (U) for
allsubjectswasbelow20%,withanaverageofabout10%(excludingBB[“x”]).
74
IDEAL OBSERVER SIMULATIONS
A
B
letter x-height = 8, uncertainty = 3 x 3 x-height
letter x-height = 22, uncertainty = 3 x 3 x-height
−0.50 0 0.50
−0.50
0
0.50
−0.50 0 0.50
−0.50
0
0.50
−0.50 0 0.50
−0.50
0
0.50
−0.50 0 0.50
−0.50
0
0.50
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
rZ
Figure 3.6: (A) Second-orderfeaturemaps obtained from an ideal-observermodel using let-
ter stimuli (shown above the corresponding feature map) identical to that used in the foveal
experimental conditions. White and black contour lines demarcate the positive and negative
correlation zones at rZ = ±1.0. (|rZ| > 1.96 corresponds to anα level of .05.) The axes of the
mapsareinunitsofx-height. (B)Featuremapsobtainedusingtheperipheralstimuli.
This is a clear piece of evidence that human observers do not use a whole let-
ter as a single unitary feature. This result is comparable to the total efficiency
measured with respect to the ideal-observer model (see Table 3.2). Flankers
had a mixed impact on feature utilization for subject LM, increasing U for the
“x” mechanism and decreasing itfor the “o” mechanism. For both BBand OR,
75
BB
LM
OR
fovea periphery
anked unanked anked unanked anked unanked
"O" mechanism
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"X" mechanism
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"O"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
"X"
−0.50 0 0.50
−0.50
0
0.50
−2
−1
0
1
2
rZ
Figure 3.7: Second-order feature maps obtained from the human observer’s data overlaid
with positive (demarcated by the black contour lines) and negative correlation zones (white
contourlines)fromthecorrespondingideal-observerfeaturemaps(Figure3.6). Theaxesofthe
mapsareinunitsofx-height.|rZ| >1.96correspondstoanαlevelof.05.
76
"O" "X"
BB
"O" "X"
"O" "X"
"O" "X"
LM
"O" "X"
"O" "X"
"O" "X"
OR
"O" "X"
"O" "X"
"O" "X"
LM
"O" "X"
"O"
Mechanisms Mechanisms
"X"
"O" "X"
OR
"O" "X"
"O" "X"
"O" "X"
0
0.2
0.4
0.6
0.8
1
Q
m
BB
"O" "X"
0
0.2
0.4
U
"O" "X"
0
0.2
0.4
0.6
0.8
1
V
unflanked
flanked
fovea periphery
Figure 3.8: Quality of match (Q
m
), feature utilization (U), and feature validity (V) of the
human feature maps (Figure 3.7) as compared with the ideal-observer maps (Figure 3.6). The
error bars represent standarderror estimates obtained by bootstrapping on the human feature
maps.
flankingthe target ledto an overall increase of feature utilization. The effectof
flankers on feature utilization being small and somewhat equivocal is consis-
tent with the fact that the effect of flankerson threshold was small in the fovea
conditions.
With one exception, feature validity (V) for both mechanismsof all subjects
was less than 50% (average V = 28%). This means that more than half (and
in some cases, substantially more than half) of the features used by a human
observer are not features considered by an ideal-observer model. Inspecting
Figure3.6suggestedthatmanyofthesespuriousfeatureswerenotasegregious
as the low values of V might suggest. The human features are often slightly
displaced versions of the ideal-observer features. Take as an example the fea-
ture map of the “o” mechanism for BB in the fovea-unflanked condition. The
77
locationsof thepositivecorrelations seemtoshifttoward thecenterofthemap
ascomparedwiththeideal-observeroverlay(Figure3.7,firstrow,firstcolumn).
The human feature set in this case was consistent with an “o” that is narrower
by about 10− 20% than the template used by the ideal-observer model. For
thesamesubject,featurevalidityforthe“o”mechanismimprovedsignificantly
from22%to33%intheflankedcondition.
Peripheryconditions
In the periphery-flanked condition, the quality of match (Q
m
) between human
andideal-observerfeaturemapswasatchance(0.5)forsubjectORandcloseto
chance for subject LM Remarkably, the quality of match for BB remained high
forbothflankedandunflankedconditions. Flankingthetargetintheperiphery,
which led to a large increase in contrast threshold, either reduced Q
m
(for OR,
who had a SNR threshold elevation of 1.5 log units) or had no effect on Q
m
(for subjects BB and LM, for whom SNR threshold elevation was about 0.6 log
units).
Thenegativeeffectsofflankingweremoreconsistentlyobservedintermsof
feature utilization (U). For all subjects, flanking led to a significant reduction
in U and a corresponding decrease in feature validity (V). In general, for both
thefoveaandperipheryconditions,theeffectofflankingonfeaturevalidity(V)
mirroreditseffectonfeatureutilization(U). Thatis,adecreaseinU wasalmost
alwayspairedwithacorrespondingdecreaseinV. Intheperipheryconditions,
this means that flankers reduced letter-identification performance not only by
reducing the number of valid features used by a human subject but also by
increasingthenumberofinvalidfeatures.
78
Generaldiscussion
We developed a series of classification-image techniques to investigate the
nature of crowding without presupposing any model of crowding. Our find-
ingscanbesummarizedasfollows: (1)crowdingsignificantlyreducedthecon-
trastoffirst-orderclassificationimages,althoughitdidnotaltertheshapeofthe
classification images; (2) errors during crowding were strongly correlated with
the spatial structures of the flankers that resembled those of the erroneously
perceived targets; (3) crowding did not have any systematic effect on intrin-
sic spatial uncertainty, nor did it suppress feature detection; and (4) analysis
ofthesecond-orderstatisticsoftheclassificationimagesrevealedthatcrowding
reducedtheamountofvalidfeaturesusedbythevisualsystemand,atthesame
time,increasedtheamountofinvalidfeaturesused.
Our data are informative about the three proximal causes of crowding con-
cerning interactions between features from the target and the flankers, which
we have referred to as masking, inappropriate feature integration, and source
confusion (see §3.1). Our findings, however, do not directly address whether
a lack of spatial resolution in the attention mechanism is the root cause of
visual crowding (Heetal., 1996; IntriligatorandCavanagh, 2001; Leatetal.,
1999;Strasburgeretal.,1991;TripathyandCavanagh,2002).
Sourceconfusion
Our data clearly support the view that source confusion between target
and flankers is a main source of crowding (KrumhanslandThomas, 1977;
Strasburger, 2005; Strasburgeretal., 1991; Wolford, 1975). In the periphery,
there wasstrong correlation betweenasubject’serroneous responsesandparts
of the flankersthat resembletarget features(Kooietal., 1994). Such correlation
79
wascompletelyabsentinthefoveawhenacuity-scaledlettertargetsweresimi-
larlyflanked.
We used the spatial distribution of second-order correlations present in the
noisefieldstomeasurethespatialregionwithinwhichfeaturesareextractedby
the visual system. The feature utilization zone thus determined was a graded
property. Twofindings concerningthe feature utilization areparticularly inter-
esting. First, crowdingdidnotsystematically affectthehorizontal extentof the
featureutilizationzone(seeFigure3.5B).Second,relativetotargetsize,thefea-
ture utilization zone appeared to be larger in the fovea than in the periphery.
Combined with the results of the flanker analysis, a consistent view of feature
processingintheperipheryemerged. Inourexperiments,flankershadthesame
contrastasthetarget. Thefactthathigh-contrastfeaturesfromtheflankerswere
confusedwiththosefromthetargetcannotbeattributedtotheperipherybeing
more promiscuous about where to look for the relevant features. Both foveal
and peripheral mechanisms look for features outside the region of the target.
Both are equally likely to be “tricked” by the low-contrast spurious features in
the noise field. However, the fovea can distinguish between a high-contrast
feature from the flankers from one that is extracted from the target, whereas
the periphery cannot. In other words, both foveal and peripheral mechanisms
knowwhereto“look”forfeatures,butoncethisisaccomplished,theperipheral
mechanismnolongerknowswhereafeaturecamefrom.
Ifby“spatialattention” onemeansthecontrol indefiningthespatialregion
from where to extract features, then our data are inconsistent with the view
that crowding isdue to a limited spatial resolution of the attention system. On
the other hand, if by “spatial attention” one means the ability and precision to
80
maintainthelocationtagofadetectedfeature,thenourdataareconsistentwith
theattentionhypothesis.
Masking
Consistentwithotherstudies,ourdatadonotsupportmaskingbeingthecause
of crowding (Chungetal., 2001; Levietal., 2002; Majajetal., 2002; Pellietal.,
2004). Specifically, we found no evidence that crowding suppressed feature
detection. If fewer features were being detected, then there would be fewer
consistent second-order correlations in the error-trial noise fields, which was
notthecase(Figures3.5Band3.7). Consistentwiththisresultisourfindingthat
although crowding led to a decrease in the utilization of valid features, it also
led to an increase in the proportion of invalid features (Figure 3.8). A masking
accountwillpredictthatthesetofinvalidfeaturesarerandomfromtrialtotrial
and, as such, will not be detected by our second-order correlation analysisand
be classified as invalid features. Therefore, we should see a reduction inU but
nochangeinV ifmaskingwasatwork.
Ourresults,however,donotrejecttheideathatsuppressionoffeaturedetec-
tion ever occurs along the visual processing hierarchy. For example, it is pos-
sible that surround suppression leads to a degradation of feature detection in
the early visual stages. To compensate, the later stages try to infer the missing
features by an error-prone process. For this scenario to be consistent with our
data,theerror-proneprocessoffeatureinferencemustberelativelydeterminis-
ticwith respectto thestimuli; otherwise, it willnot bepossible to generate any
consistent second-order correlations associated with theinvalid features, aswe
foundinourdata.
81
Inappropriatefeatureintegration
Ourmethodof analysisidentifiesfeaturesintermsoftheirsecond-order statis-
tics. A feature, in the strict sense of our analysis, refers to a pair of correlated
pixels,whichwecanthinkofasthemostbasicformofafeature. Afullanalysis
of feature integration will require analysisbeyond the second order. Neverthe-
less, a partial analysis is possible with our current data set. Erroneous feature
integration by itself will lead to an increase in the amount of invalid second-
orderfeatures. Wefoundadecreaseinfeaturevalidityduringcrowding,which
is consistent with this prediction (Figure 3.8, periphery, third row). However,
we also found a decrease in the amount of valid features (Figure 3.8, periph-
ery, second row), which is not predicted by spurious feature integration per
se. To account for our result, the process of inappropriate feature integration
(Levietal.,2002;Pellietal.,2004)mustalsosomehowsuppressthedetectionof
validfeatures. Thiscanbethecaseiftheprocessofintegration isacompetitive
one,ascenariothatishighlyprobable. Forinstance,theideaofassociationfields
(Fieldetal.,1993)forcontourcompletionisanexampleofacompetitivefeature-
integration process; thereare situationswhenthe visualsystem can completea
contouronewayoranother,butneverboth,althoughthedecisionisambiguous
at a local detector level. A phenomenon known as bias competition found in
V4andhighercorticalareas(Chelazzietal.,2001;DesimoneandDuncan,1995;
Lucketal., 1997; MoranandDesimone, 1985; Reynoldsetal., 1999), where dis-
jointed patterns in the receptive field of a single neuron will “compete” for
the control of the neuron’s firing rate, may serve as a neural substrate for the
competitive feature-integration process. Theubiquitousdivisivenormalization
in the visual cortex (e.g., Carandinietal., 1997; Heeger, 1992; LeggeandFoley,
1980)providesacomputationalbasisofacompetitiveprocess.
82
Beyondatwo-letterdiscriminationtask
We believe that by deliberately introducing a large amount of spatial uncer-
tainty to the stimulus, our method of signal-clamped classification images and
theaccompanyinganalysisprocedurestoinfertheperceptualtemplateandthe
sub-template features (in terms of their second-order statistics) could be suit-
ablyand practically extended to nAFCtasks byusingn×N trials where 2N is
the number of trials needed to form a classification image of sufficient quality
in a 2AFC task (N ≈ 6,000 for the “o” vs. “x” task according to the analysis
in TjanandNandy, 2006). In contrast, such a task would require n
2
×N trials
underthestandardclassification-imageparadigm. Thisenormoussavinginthe
number of trials results from the fundamental premise of the signal-clamped
method—thatunderhighspatialuncertainty, onlythemechanismselectiveto
thepresentedtargetcanformaclearfirst-orderclassification imageintheerror
trials. Contributions from all the other mechanismsthat “false-alarmed” in the
errortrialsarespatiallydispersedduetothehighspatialuncertaintyinthestim-
ulus(and/or spatial uncertainty intrinsic tothe observer). Hence,in a26-letter
identificationtask,afirst-ordercharacterizationofthemechanismfordetecting
theletter“A”amongotherletterscanbeobtainedbyaveragingthenoisefields
from all the error trials when “A” was presented (i.e., when the observer com-
mitted a miss on the letter “A”) regardless of the observer’s response (as long
as it is not “A”). This will yield a clear first-order template for “A” with non-
significant contributions from the other mechanisms. To obtain a second-order
characterization of “A”, we would collect all the error trials when the observer
false-alarmed on “A” (responded “A” when another letter was presented) and
applythesecond-orderfeature-mapandoptimalROIanalysesonthecorrected
noisefields(i.e.,afterprojectingouttheaverageofthiscollectionofnoisefields).
83
For a 26-letter identification task, we realize that 26×6,000 is still a very large
number of trials, but it is certainly far better than 26
2
×6,000. Finally, to fully
investigate letter crowding, testing 10 letters will be more than sufficient. It is
also not necessary to exclude target letters from the flanker positions. To sim-
plifytheanalysis,allthatisrequiredisthatthetargetletterisdifferentfromthe
flankinglettersinatrial.
3.5 Summary
With respect to the three proximal causes of visual crowding that have been
hypothesized, our data strongly support the source-confusion or feature-
mislocalizationhypothesisand,atthesametime,argueagainstafront-endver-
sion of the spatial attention account. Our data also support the inappropriate
feature-integration hypothesis but require feature integration to be a competi-
tiveprocess. Ourdatarejectthefeature-maskinghypothesis;wedonotruleout
feature masking entirely but require that a suppressed feature-detection pro-
cess be paired with a feature-inference process that leads to a consistent set of
spuriousfeatures.
84
Chapter4
IntegrationacrossSpatial
FrequenciesChannels
4.1 Introduction
Readingand object recognition are usually associated with the clearand sharp
central vision thatisafforded bythe fovea. Comparedtothe fovea, theperiph-
ery is far less capable of these types of form vision, even after its poor spatial
resolution has been compensated for by magnification and contrast enhance-
ment. For example, reading in the periphery is laboriously slow and objects
oftencannotbeidentifiedinaclutteredscene. Atpresent,wedonotsufficiently
understand why peripheral form vision is qualitatively inferior to that of cen-
tral vision. This is not only a crucial issue for basic research but is also vital
for developing effective rehabilitation regimens and adaptive technologies for
patientswithcentralvisionloss. Paststudies,particularlythoseon“crowding”
(the inability to identify objects against a cluttered background), suggest that
peripheraldeficitsinformvisionresultfrominadequateselectionandinappro-
priate integration of simple features into complex features at a relatively early
85
stage of visual processing (Blakeetal., 2006; Heetal., 1996; Levietal., 2002;
NandyandTjan, 2007; Pellietal., 2004). However, the specific nature of these
deficitsinfeatureselectionandintegration isunknown.
Aconventional startingpointforinvestigatingfeatureprocessingistomea-
sure an observer’s spatial tuning properties when performing a form-vision
task. Chungetal. (2002) found that for both foveal and peripheral vision, the
peakspatialtuningfrequencyandtuningbandwidthforidentifyinganisolated
letter can be adequately modeled by an ideal observer with a limited spatial
resolution as described by the subject’s contrast sensitivity function (CSF) at
the test eccentricity. ChungandTjan (2007) extended this result to letter iden-
tification under crowding. Because the ideal-observer models in these studies
madeuseofallstimulus-levelinformationwithintheimposedspatialresolution
limits, their results imply that feature selection along the dimension of spatial
frequency (for the purpose of letter identification) is optimal in both the fovea
and the periphery. In the current study, we ask the complementary questions:
Is feature integration across spatial frequencies efficient in the fovea and in the
periphery for the identification of an isolated letter? “Feature” refers to any
aspectof a stimulus thatcarries information relevantto a given task. Afeature
is therefore task dependent. A feature that is useful for detecting a target may
not be useful for discriminating it from a distracter. Letters, similar to most
visual forms, comprise of a broad range of spatial-frequency components that
bearshapeinformation. Efficientintegrationacrossspatialfrequenciesisthusa
prerequisiteforefficientformvision.
Classicalfindingsongratingdetectionanddiscriminationsuggestthatinte-
grationacrossfeaturesisinefficientwhenthereisalargedifferenceinthespatial
86
frequencyofthefeatures(e.g.,Grahametal.,1978;Quicketal.,1978). Forexam-
ple, the probability of detecting a compound grating that is comprised of two
simple gratings with spatial frequenciesmore than a factor of two (one octave)
apart is found to be approximately equal to the sum of the probabilities of
detecting each of component gratings presented alone. This “probability sum-
mation”isconsistentwiththeideathateachcomponentgratingisdetectedbya
different narrowly tuned spatial frequency channel (BlakemoreandCampbell,
1969; CampbellandRobson, 1968), which is “blind” to the other grating of a
verydifferentspatialfrequency.
Probabilitysummationisessentiallytheabsenceofsummation(integration).
In comparison, the probability of detection for an ideal observer (a statistically
optimal decision rule) is determined by the sum of the contrast energy of the
component gratings. Energy summation, which is optimal in this case, pre-
dicts a much larger effect of summation than probability summation when the
detectabilities of the components are comparable. Moreover, when the con-
trast energy of the components are sub-threshold, while their sum is supra-
threshold, probability summation predicts the compound grating to be sub-
threshold,whileenergysummationpredictsittobesupra-threshold.
Let us now return to letter identification, a more complex form-vision task.
Our goal was to measure and compare how efficiently the foveal and periph-
eral vision systems integrate features across spatial frequencies. We wanted to
determine if theform-vision deficits in theperiphery areattributable tosubop-
timal feature integration along the spatial-frequency dimension. We adapted a
typical summation paradigm. In the critical experiment, wemeasured contrast
thresholds for identifying band-pass filtered letters at two frequencies that are
87
twooctavesapart(f/2and2f). Wealsomeasuredthethresholdforidentifying
letterscomposedbyaddingthesetwofrequencycomponentstogether.
To preview, we found the periphery to be equally efficient at integrating
features across spatial frequencies as the fovea. More surprisingly, both fovea
and periphery exhibit optimal summation in the letter identification task, in
sharp contrast to the sub-optimal probability summation for a simple grating
detectiontask. Thesignificantimplicationsoftheseresultsforidentifyingform-
vision deficits in the periphery and for the understanding of form vision pro-
cessesingeneralwillbeaddressedinthe Generaldiscussionsection.
4.2 Analysisprocedure
In this section, we will describe the necessary steps to quantify the extent of
integrationacrossspatialfrequencies. Abroad-bandstimulusliketheletter“a”
shown in Figure 4.1A can be decomposed into narrow-band components by
a set of band-pass filters whose center frequencies form a progression in fre-
quency space (Figure 4.1B). We can then choose two narrow-band components
that are sufficiently far apart in frequency space so as to have negligible over-
lap in their frequency contents. Moreover, if the separation is further than the
typical bandwidth of a spatial-frequency channel (between 1 to 2 octaves), we
canensurethatthecomponentswillbeprocessedbydifferentspatialfrequency
channelsinthevisualfront-end. Letthecenterfrequenciesofthesecomponents
bef
1
andf
2
,respectively. Wecanform acomposite stimulusbysimplyadding
thetwonarrow-bandcomponents(Figure4.1C).
Let c
f
be the contrast threshold for identifying a narrowband letter at cen-
ter frequency f and CS
f
= 1/c
f
be the corresponding contrast sensitivity. If
88
= + + + + + + +
. . . . . .
f
1
f
2
f
1
+f
2
+
A
D
B
C
2 2
2
2 1
2 1
f f
f f
CS CS
CS
+
= Φ
+
2 1
f f
CS
+
1
f
CS
2
f
CS
Φ = 1 Φ < 1 Φ > 1
Figure4.1: Abroadbandstimulusliketheletter“a”(A)canbedecomposedintonarrow-band
components(B)byfilteringthestimuluswithasetofband-passfilterscenteredatdifferentspa-
tialfrequencies. (C)Twonarrow-bandcomponentsthataretwooctaveapartinfrequencyspace
andtheircompositesumareusedtomeasuretheindexofintegrationΦ(Equation4.2)inExper-
iment4.2(§4.5). (D) Threepossible outcomesof theintegrationindex: optimalsummation (left
panel),sub-optimalsummation(middlepanel)andnonorthogonal summation(rightpanel).
thereisnooverlapinthefrequencydomainbetweenthecomponentswithcen-
ter frequencies f
1
and f
2
(i.e., the components are orthogonal, with a zero dot
product), then for an ideal observer limited by an additive Gaussian equiva-
lentinputnoise,thecontrastsensitivityforthecompositecanbepredictedfrom
thoseforthecomponents(AppendixF):
CS
2
f
1
+f
2
=CS
2
f
1
+CS
2
f
2
(4.1)
89
To quantify a visual system’s ability to integrate information across spatial
frequencies,wedefineanindexofintegration Φasfollows:
Φ =
CS
2
f
1
+f
2
CS
2
f
1
+CS
2
f
2
(4.2)
Thisindexhasasimpleinterpretationifweassumethatthevisualsystemis
like a Gaussian-noise-limited maximum a posteriori ideal observer but differs
from itbymakinguse of onlyaconstant fraction of the signal-to- noise ratioof
thestimulus(Leggeetal.,1987;Peli,1990;Tjanetal.,1995,seealsoAppendixF).
If the component stimuli are handled independently by the visual system and
iftheinformationacrossfrequencychannelsisoptimallycombined,thenΦwill
beequalto1. Ifontheotherhandtheinformationacrossthefrequencychannels
isnotoptimallycombined,Φwillbelessthan1.
There exists a third possibility where Φ can be greater than 1. In this case,
eventhoughthecomponentstimuliarefarapartinfrequencyspace,theunder-
lying spatial mechanisms utilized for the task are not independent. Example
scenarios include inappropriately broad channels for the orthogonal compo-
nents and performance-dominating “late” noise situated after the outputs of
the spatial mechanisms have been combined. In the first scenario, thresholds
for identifying the orthogonal components in the stimulus are elevated by the
use of inappropriately broad channels, which admit more noise from positions
or spatial frequencies where there is no signal. The integration indexis greater
than 1.0 not because the composite is more efficiently identified, but because
the components are poorly identified. In the second scenario, the threshold for
thecomposite islowerthanexpected becausethereisonlyonedominantnoise
sourceasopposedtotwowhenthecomponentsarecombined.
90
Thethreecategoriesoftheintegration indexaresummarizedasfollows:
Φ =
< 1 sub-optimalintegration
= 1 optimalintegration
> 1 non-orthogonal integration
(4.3)
Wenote thatthisinterpretation oftheintegration indexiscontingent onthe
validityofEquation4.1,whichinturnsreliesontheobserverbeingamaximum
aposterioriobserverwithadditiveGaussianequivalentinputnoise. Thederiva-
tionofthebasecaseforEquation4.1reliesonthelinearproportionalitybetween
signal contrast and the Euclidean distance between pairs of alternatives in the
internaldecisionspace. Fora2-waydiscriminationtask,thismeansthatweare
assuming that the psychometric function of d
′
vs. contrast is linear, which has
been shown to be the case for a contrast discrimination task when the target
contrast is above detection threshold (“supra-threshold”) (Leggeetal., 1987).
The details of this derivation are provided in Appendix F. Appendix F also
shows that this base case can be generalized to other types of observer models
forwhichthepsychometricfunction isnotlinear. Inparticularly, weareableto
show thatEquation 4.1holdseitheranalytically orisagood approximation for
observers withcontrast-dependent noise(“multiplicative” noise) andinsupra-
thresholdconditionsforobserverswithanonlineartransducer.
Wealsonote that inso farasEquation 4.1isvalid, the interpretation for the
cases of Φ < 1 and Φ > 1 are unequivocal, as described in Equation 4.3. How-
ever, there exist conditions such that Φ = 1 does not necessarily implyoptimal
featureintegration. Forexample,avisualsystem canhaveanintegration index
91
of 1 or close to 1 by having nonorthogonal spatial mechanisms for the com-
ponents and a suboptimal integration mechanism to combine the results. Of
course, in this example, the nonorthogonality would have to precisely balance
out the suboptimal integration. Furthermore, if the primary goal of measuring
Φistocomparefeatureintegrationperformanceacrossexperimentalconditions
(inourcase,fovealvs. peripheralvision),theambiguityintheinterpretationof
Φ = 1islessofaconcern.
To empirically distinguish between the three cases of feature integration
described in Equation 4.3, the choice of the center frequencies, f
1
and f
2
, of
thecomponentstimuliisimportant. Ifthereisalargedifferenceincontrastsen-
sitivitybetweenthecomponents,thenitfollowsfromEquation4.1thatthesen-
sitivity for the composite under optimal integration will be very similar to the
sensitivity for the more sensitive component, making it difficult to distinguish
between optimal and sub-optimal integration. We chose to address this issue
byfirstmeasuringanobserver’sspatialtuningfunction(contrastsensitivityvs.
stimuli center frequency) for letter identification. Previous work (Chungetal.,
2002) has shown that such a tuning function is roughly symmetric about the
peaktuningfrequencywhenfrequencyisexpressedinlogunits. Giventhetun-
ingfunction,wechosetwocomponentsofroughlyequalsensitivitybyselecting
theircenterfrequenciesasplusandminusoneoctavefromthepeaktuningfre-
quency:
f
low
=f
peak
/2
f
high
=f
peak
×2 (4.4)
92
The two center frequencies are therefore two octaves apart. To ensure
orthogonality at the stimulus level, we used components with a bandwidth of
one octave. With this arrangement, we expect the components to be encoded
by different spatial frequency channels since the typical channel bandwidth
is believed to be between 1 and 2 octaves (BlakemoreandCampbell, 1969;
3rdStromeyerandJulesz, 1972). The composite stimulus was constructed by
summingthetwocomponents.
4.3 Methods
Experimentswereconductedatthefoveaandat10
◦
inthelowerrightquadrant
of the visual field. Two experiments were conducted. In both the experiments,
26 lowercase filtered target letters were presented ina letter identification task.
The first experiment was to measure the spatial tuning functions; the second
experiment was to measure the index of integration. In addition to the two
mainexperiments,forthepurposeofideal-observeranalysis,wealsomeasured
thecontrastsensitivityfunction(CSF)withstaticsinusoidalgratingsatboththe
foveaand10
◦
.
Subjects
Foursubjects(ASN,BW,PLB,andJS),includingoneoftheauthors,participated
in the experiments. All subjects had normal or corrected to normal vision and
three of the subjects (BW, PLB, and JS) were na¨ ıve to the purpose of the study.
All had (corrected) acuity of 20/20 or better in both eyes. Subjects viewed the
stimuli binocularly in a dark room with a dim night-light. Written informed
93
consentwasobtainedfromthenaivesubjectsbeforethecommencementofdata
collection.
Generalprocedure
All the experiments followed a block design and each experimental condition
consistedof4blockswith60trialsperblock. Theblockswererandomlyordered
with the constraint that the n
th
repeat of a particular center frequency block
occurredonlyafterallfrequencieshadbeentestedforatleastn−1blocks. This
was done in order to distribute the blocks of each condition evenly through-
out the experiment to prevent practice effects from confounding experimental
manipulations.
For Experiments 4.1 (§4.4) and 4.2 (§4.5), a filtered letter stimulus (see Stim-
uli section for details) was presented in each trial and the task of the subject
was to identify the presented letter by pressing the appropriate key. Details
of the filtering conditions are provided in the appropriate experiment sections
below. Thestimuli werepresentedataviewingdistanceof105cm. Depending
on theeccentricity condition, thesubjects hadtofixate eitheratafixation mark
at the center of the screen (foveal viewing) or at a green LED 10
◦
above and
to the left of the center of the screen (peripheral viewing). The letter contrast
was adjusted using the QUEST procedure (WatsonandPelli, 1983) as imple-
mented in the Psychophysics Toolbox (Brainard, 1997; Pelli, 1997) to estimate
threshold contrast for reaching an accuracy level of 50% (chance is
1
26
). Follow-
ing Chungetal. (2002), we defined the contrast of a filtered letter in units of
nominal contrast with respect to an unfiltered letter. Specifically, we assign a
(nominal) contrast ofx% toa filtered letterthatwasderivedfrom anunfiltered
letterofx%Webercontrastandafilterwithamodulartransferfunction ofunit
94
Subject Lettersize
ASN 0.96
◦
BW 1.15
◦
PLB 1.11
◦
JS 1.18
◦
Table 4.1: Stimuli letter size (x-height) for each subject in units of degrees of
visualangle.
peakamplitude. Priortothemainexperiment,letteracuitywasmeasuredat10
◦
in the inferior right peripheralfield. Thesubjects were asked to identifyanyof
the 26 lowercase letters, while the size of the presented letter was varied using
QUESTtoachieveanidentificationaccuracyof79%.
Stimuli
The26lowercaselettersinArialfont(MacOS9)weregeneratedseparatelyona
backgroundof256×256pixels. Lettersizewassettotwicethesubject’speriph-
eral letter acuity in both foveal and peripheral viewing conditions. The letter
size (the height of a lowercase “x”) used for each of our subjects is shown in
Table4.1. Theletterimageswereeachspatiallyfilteredbyasetofraisedcosine
filters(Alexanderetal.,1994;Chungetal.,2002,2001;Peli,1990). Eachfilterhas
abandwidth(full-width athalfheight) of 1octaveandisradiallysymmetricin
thelog-frequencydomain. Thetransferfunctionofthefilteratradialfrequency
f
r
isgivenby
G(f
r
) =
1
2
1.0+cos
π
log(f
r
)−log(f
ctr
)
log(f
cut
)−log(f
ctr
)
(4.5)
where f
ctr
is the spatial frequency corresponding to the peak amplitude of the
filter (center frequency) andf
cut
isthe frequency at which the amplitude of the
95
filter drops to zero (cutoff frequency). Figure 4.2 depicts examples of filtered
images for the letter “a”. The stimuli were displayed at the center of a 19-in.
CRT monitor (Sony Trinitron CPD-G400) and the monitor was placed at a dis-
tanceof105cmfromthesubject. Aftercalibrationandgamma-linearization,the
monitor has 11 bits (2048 levels) of linearly spaced contrast levels. The exper-
iments and the monitor were controlled from a Mac G4 running OS 9.2.2. All
11 bits of the contrast levels were addressable to render the stimulus for each
trial. This was achieved by using a passive video attenuator (PelliandZhang,
1991)andacustom-builtcontrastcalibrationandcontrolsoftwareimplemented
in MATLAB. Only the green channel of the monitor was used to present the
stimuli.
The stimuli were presented according to the following temporal design: (a)
afixationbeepimmediatelyfollowedbyafixation screenfor500ms,(b)stimu-
luspresentation for 250ms, (c) subject response period (variable) with positive
feedback beep for correct trials, and (d) 500-ms delay before onset of the next
trial. On each trial, we collected the identity and contrast of the target letter,
andtheresponseofthesubjectforsubsequentdataanalysis.
Contrastsensitivityfunctions
For the purpose of an ideal-observer analysis, the contrast sensitivity function
(CSF) was measured using a 2IAFC task in which vertically oriented cosine
phase Gabors were presented in one of two intervals. For each trial, the task
of the subject was to identify the interval in which the grating was presented.
Gratings were presented at 0.1, 1.0, 2.0, 4.0, 8.0, and 16.0 cycles/deg for the
foveal viewingcondition andat0.5, 1.0,2.0, 4.0,and 8.0cycles/degfor periph-
eralviewing. Thesubjectsweretestedwithonespatialfrequencyineachblock,
96
and there were 4 blocks of 60 trials each for each spatial frequency. The space
constant (size at
1
e
) of the Gaussian envelope of the gratings was set to 4.7 deg.
QUEST procedure was used toadjust the contrast of the gratings to achieve an
accuracylevelof79%.
f
ctr
f
ctr
= 1.25 1.77 2.50 3.54 5.00 7.07
raised cosine filter
Figure 4.2: Examples of stimuli used for measuring the letter tuning function (Experiment
4.1,§4.4). Theunfilteredlettersarepassedthroughasetofunit-gainraisedcosinefilters(Equa-
tion4.5)atdifferentcenterfrequenciesinhalf-octavesteps.
To incorporate the CSFs into a model, the data points at which the mea-
surements were made were fitted by a biparabolic function for the fovea and a
biphasic function (flatat low frequencies followed by parabolic roll-off) for the
periphery. AppendixDspecifiestheequationsusedforthefit.
97
4.4 Experiment 4.1: Measuring the letter tuning
function(LTF)
Thelettertuningfunction(Chungetal.,2002)orLTFwasmeasuredbypresent-
ing letter stimuli filtered with a set of six unit-gain 1-octave wide raised cosine
filters(Equation4.5)whosecenterfrequencies(f
ctr
)werelogarithmicallyspaced
at 1.25, 1.77, 2.5, 3.54, 5.0, and 7.07 cycles/letter, respectively. Figure 4.2 shows
anexampleofthefilteredstimuli. Foreachtrial,thenominalcontrast ofthefil-
tered letters was adjusted using a QUEST procedure to achieve a performance
threshold of50%correct(chancewas
1
26
or3.8%). Therewere240trialspercen-
terfrequency,brokeninto4blocksof60trialseach.
Results
Figure 4.4A shows the LTF (contrast sensitivity for letter identification vs. fil-
ter center frequency) obtained for our subjects in the fovea (blue) and in the
periphery (red). Each data point was estimated from 240 trials with QUEST.
The 95% confidence intervals were estimated using a bootstrap procedure
(EfronandTibshirani, 1993). The six data points for each viewing condition
were well described by a parabolic function in log−log (r
2
> 0.97). The equa-
tionoftheparabolicfunctionisgivenby
log(CS
f
) = log(A)−
4
σ
2
log(2)
(log(f)−log(f
peak
))
2
(4.6)
where CS
f
is the contrast sensitivity at frequency f, A is the peak sensitivity,
f
peak
is the peak tuning frequency, and A is the bandwidth of the function in
octaves.
98
f
peak
Bandwidth
Subject Fovea 10
◦
Fovea 10
◦
ASN 2.54 2.07 1.96 1.67
BW 2.99 2.23 2.09 1.72
PLB 2.46 2.29 2.14 1.62
JS 2.82 2.25 2.00 1.74
Table4.2: Peaktuningfrequencies(cycles/letter)andbandwidths(full-widthat
half-heightinoctaves) ofthelettertuningfunctionsobtainedin4.1(§4.4)
Thepeaktuningfrequenciesasdeterminedbythefittedparabolicfunctions
are shown in Table 4.2 in units of cycles/letter (1 letter = 1 x-height). For the
samelettersize,theaveragepeaktuningfrequencyintheperipherywaslower
than that in the fovea (t(6) = 3.7258,p < 0.01). The average tuning band-
widthwasabout2.05and1.69(octaves)inthefoveaandperiphery,respectively
(Table 4.2). The tuning bandwidths in the periphery were slightly but signif-
icantly lower than that that in the fovea (t(6) = 7.3294,p < 0.01). As will be
shown with the ideal-observer analysis, an LTF with a narrower bandwidth is
indicative of the observer using perceptual templates of broader bandwidths.
TheseresultsaregenerallyconsistentwiththoseobtainedinChungetal.(2002).
By using the same letter size in both the fovea and the periphery conditions,
the current study is more sensitive to differences in spatial tuning properties
betweenfovealandperipheralvision.
4.5 Experiment4.2: Estimatingtheintegrationindex
(Φ)
ForeachsubjectinExperiment4.1(§4.4),wegeneratedthestimuliforestimating
the integration index (Equation 4.2) with respect to the subject’s peak tuning
99
frequenciesusingEquation4.4. Specifically,wegenerated(a)thelowfrequency
component by setting the center frequency of the raised cosine filters (Equa-
tion 4.5) to one half of f
peak
; (b) the high frequency component by setting the
centerfrequencytotwicef
peak
;and(c)thecomposite stimulusbysummingthe
twocomponentsatthesamecontrastratioasthecomponentswouldinanunfil-
teredletter(i.e.,acontrastratioof1inunitsofnominalcontrast). SeeFigure4.3
foranexampleofthestimuliforthisexperiment. AkintoExperiment4.1(§4.4),
wemeasuredthenominalcontrastthresholdsofthesethreestimuluscategories
by using an adaptive QUEST procedure to attain a performance level of 50%
accuracy. TheintegrationindexwasthencalculatedaccordingtoEquation4.2.
low high composite
+=
Figure 4.3: Examples of stimuli used for estimating the integration index (Experiment 4.2,
§4.5). Theupperrowshowsthelowandhighspatialfrequencycomponents andthecomposite
stimuli fortheletter“c”. Thelowerrowshows thecorrespondingamplitudespectra. Sincethe
componentsareseparatedby2octavesinfrequencyspace,theyhavenon-overlappingspectra.
100
Results
contrast sensitivity
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
A
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
fovea
periphery
ASN BW PLB JS
spatial frequency
(cycles/letter)
JS PLB BW
B
spatial frequency
(cycles/letter)
contrast sensitivity
fovea
periphery
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
ASN
fovea
periphery
C
fovea periphery
Φ
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
predicted by
probability summation
ASN BW PLB JS
Figure 4.4: (A) The letter tuning functions–LTFs: contrast sensitivity (1/threshold contrast)
for identifying the 26 letters at 50% correct vs. spatial frequency–obtained from four human
observersforthenarrow-bandletterstimuli(Experiment4.1,§4.4)areshownforboththefovea
(blue) and the periphery (red). The dotted gray lines mark the peak tuning frequency. (B)
Theobservers’contrastsensitivitiesforidentifyingthestimuliusedinExperiment4.2(§4.5)are
shown overlaidonthecorrespondinglettertuning functions. Thesquaredatapoints represent
thesensitivity tothelow (left)andhigh (right) spatialfrequencycomponents, respectively;the
horizontal lines represent the sensitivity to the composite stimulus. (C) Integration indices for
thefoveaandtheperipheryareshownalongwiththepredictedindicesforprobabilitysumma-
tion(greendots). Errorbarsarebootstrapped95%confidenceintervals.
101
Figure 4.4Bshowsthecontrast sensitivitiesforthecomponentandthecom-
posite stimuli overlaid on the LTFcurves obtained in Experiment4.1 (§4.4). As
expected,thecontrast sensitivitiesof thecomponentslieonorveryclosetothe
respective LTF curves indicating the robust test-retest reliability. The sensitivi-
tiestothecompositestimuliaresignificantlyhigherthanthatofthecomponents
forallsubjectsandallviewingconditions.
Figure 4.4C shows the integration indices obtained from the fovea (blue)
andperiphery(red). The confidence intervals wereobtained bybootstrapping.
Unlike the typical finding in grating detection or discrimination, integration
across spatial frequency components two octave apart was optimal for all four
subjects in the fovea. Even more surprisingly, peripheral integration was opti-
malforsubjectsASN,BW,andPLB.ForsubjectJS,wefindthatΦissignificantly
greaterthan1intheperiphery. Thiscouldbeduetothenonorthogonalityofthe
underlying mechanism or perceptual templates used to encode the composite.
We will return to a more detailed discussion of this in the section on the Ideal
observeranalysis.
Also shown in Figure 4.4C (green symbol) are the predicted integration
indices for probability summation (see Equation G.2). This would be the case
if the visual system were to use information from the component it was most
sensitiveto,asopposedtointegratinginformationfrombothcomponents. Sub-
jects’ performance in both foveal and peripheral viewing conditions was sig-
nificantly better than that predicted by probability summation. Tofurther con-
firmthattheseresultsarequalitativelydifferentfrom thoseobtainedwith grat-
ing detection and discrimination, we measured contrast thresholds for sub-
jectsASN,BW,andPLBinacompound-gratingorientation discriminationtask
(Appendix G). The spatial frequencies of the components gratings were the
102
sameasthecenterfrequenciesofthecomponentlettersinExperiment4.2(§4.5).
Indeed, integration was suboptimal in this case and in the range predicted by
probability summation (Figure G.1). These results suggest that for a letter-
identification task, the visual system can indeed integrate information across
spatialfrequencychannelsanddosooptimallyinboththefoveaandperiphery.
4.6 Idealobserveranalysis
What is the most concise model that can quantitatively account for our let-
ter tuning function (LTF) and index of integration data? Chungetal. (2002)
foundthatforbothfovealandperipheralvisionandoverarangeoflettersizes,
the LTFs can be adequately modeled by an ideal observer with a limited spa-
tial resolution defined as the subject’s contrast sensitivity function (CSF) at the
test eccentricity. When it comes to accounting for both the peak tuning fre-
quency and tuning bandwidth, this CSF-limited ideal-observer model has no
free parameter. Chung et al. essentially argued that to a good approximation,
thevisualsystem isoptimal inselectingthespatial-frequencyfeaturesforalet-
teridentification task, afterdiscounting alinearlimitation inspatialresolution.
Theirmodel provided amechanisticexplanation of theobserved letterchannel
(Solomon andPelli,1994)withoutpostulatinganyspecializedchannels.
We measured the CSF for subjects ASN, BW, and PLB at the test eccentric-
ities used in Experiments 4.1 (§4.4) and 4.2 (§4.5). The CSFs and the fits to an
analytic form (Appendix D) are shown in Figure 4.5. CSF for subject JS could
not be collected due to lack of availability of the subject. When applied to the
composite lettersused inExperiment4.2(§4.5), thisCSF-limited ideal-observer
model predicts an integration index of 1.0, which matched the results of all of
103
oursubjectsinthefovealcondition,andthreeofthefoursubjectsintheperiph-
eralcondition. Themodel,however,cannotaccountfortheintegrationindexof
1.68ofJSintheperiphery. Quantitatively,thismodelalsoslightlyoverestimated
the tuning bandwidth for all subjects and underestimated the peak tuning fre-
quency for the fovea condition for subject BW. Similar quantitative deviations
wereobservedinChungetal.(2002)aswellasinChungandTjan(2007).
contrast sensitivity
fovea
periphery
0.063 0.15 0.37 0.91 2.2 5.4 13 32
10
−1
10
0
10
1
10
2
10
3
ASN
0.063 0.15 0.37 0.91 2.2 5.4 13 32
10
−1
10
0
10
1
10
2
10
3
BW
0.063 0.15 0.37 0.91 2.2 5.4 13 32
10
−1
10
0
10
1
10
2
10
3
PLB
spatial frequency
(cycles/letter)
Figure 4.5: Observers’contrast sensitivity function (CSF) forthe foveaand theperipheryare
shown relative to the corresponding letter tuning functions. The ranges of measurement of
the CSFs encompass the letter tuning functions. The analytical functions used to fit the CSF
(Appendix D) are used as the front-end filters in the ideal-observer models. Error bars are
bootstrapped95%confidenceintervals.
The CSF-limited ideal-observer model of Chung et al. does not include
any front-end spatial frequency channels. The presence of low-level spatial
frequency channels followed by transducer nonlinearity will impose a lower
bound on the bandwidth of the perceptual templates utilized by an observer.
This is because if, for example, everything is seen through two-octave chan-
nels, then the basic features that comprise a perceptual template cannot be
less than two octaves wide. We can capture this effect of front-end spatial-
frequency channels by specifying the bandwidth of the templates used by the
ideal-observermodel.
104
We simulated a family of CSF-limited ideal-observer models, each using
internaltemplatesofadifferentbandwidth,rangingfrom1(sameasthestimuli)
to8octaves. Thecenterfrequencyofthetemplatesmatchesthecenterfrequency
ofthesignal. Wealsotestedamodelthatusesunfilteredlettersastemplates(i.e.,
templates of infinite bandwidth). These letter templates were generated by fil-
tering the unfiltered letter images by raised cosine filters (Equation 4.5) at the
samecenterfrequenciesasthestimuli,butwithdifferentcutofffrequencies.
Our model therefore deviated from the true ideal observer in two respects:
(a)itsspatialresolution waslimitedbythesubject’sCSF,and(b)theperceptual
templatesitusedwerewithbandwidththatmightnotmatchthatofthestimuli.
Barring these limitations, the models were optimal and made their decision by
maximizingtheposteriorprobabilityamongthelistofcandidatetemplatesthat
areknowntothesystem. Figure4.6showsaschematicofamodel. AppendixE
providesthedetailedformulation oftheideal-observermodel.
CSF filter
+
white noise
S
IDEAL-OBSERVER MODEL
decision
T
1
,T
2
,T
3
, ... ,T
n
×
internal contrast internal templates
T
1
,T
2
,T
3
, ... ,T
n
IDEAL
OBSERVER
stimulus
0 1
Figure 4.6: Schematic of the ideal-observer models. The stimulus is first linearly filtered by
the human CSF. The result is then perturbed by white Gaussian pixel noise. This noisy and
filteredsignal isthen fedinto anidealobserverthatchooses among aset of responses(defined
bytheinternaltemplates)theonethatmaximizestheposteriorprobability.
105
The CSF and template limited ideal-observer models were executed to per-
form the tasks that were identical to those in the human experiments (Experi-
ments 4.1 and 4.2), using the same stimulus sets as the corresponding human
subjects. Figure 4.7 shows the results of the model simulations overlaid on the
corresponding human data. The ideal-observer model’s tuning curves were
fitted by parabolic functions (Equation 4.6, r
2
= 0.97). The internal noise of
the model was adjusted such that at the 50correct threshold criterion, the peak
amplitude of the model’s most sensitive tuning function (obtained with the 1-
octave templatesthat matched the bandwidth of the stimuli) isthe same asthe
peakamplitudeofthesubject’stuningfunction. Figure 4.8comparestheband-
widths and peak tuning frequencies of the tuning functions for our family of
ideal-observermodelstothatofthehumanobservers.
Results
Asthebandwidthofthetemplatesincreases,themodel’speaktuningfrequency
remainsunchanged,whileitstuningbandwidthdecreases(Figures4.7and4.8).
In general, the bandwidth of the letter tuning function (or the letter channel) is
at its widest when the bandwidth of the templates matches the bandwidth of
the stimuli. This theoretical result means that compared to a CSF-limited ideal
observer, the efficiency of a CSF and template limited observer with templates
broader than the signal decreases as the center frequency of the signal moves
away from the peak tuning frequency of the CSF-limited ideal observer. This
result makessensebecause thealternative would meanthatatsome signal fre-
quency away from the peak tuning frequency, the efficiency would approach
1.0, which is impossible because the templates used by the observer is broader
thanthesignalbyaconstantoctave.
106
contrast sensitivity
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
ASN
A
BW
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
PLB
1 1.8 3.2 5.6 10
10
−1
10
0
10
1
10
2
periphery
fovea
HUMAN
1−octave
2−octave
3−octave
4−octave
8−octave
wide−band
IDEAL OBSERVER MODEL
spatial frequency
(cycles/letter)
B
fovea periphery
Φ
0
0.5
1
1.5
2
2.5
3
ASN
0
0.5
1
1.5
2
2.5
3
BW
0
0.5
1
1.5
2
2.5
3
PLB
Figure 4.7: (A)The letter tuning functions from the family of six ideal-observermodels with
internal template bandwidth of 1, 2, 3, 4, 8, and inf octaves are shown overlaid on the corre-
spondinghumanresults. (B)Theindicesofintegrationobtainedfromtheideal-observermodels
arecomparedwiththoseobtainedfromthehumanobservers.
We found that a CSF andtemplate limited ideal-observer model using tem-
plateswithbandwidtharound2octave accountsquitewellforthehumandata
in the fovea and the periphery, both in terms of matching the human letter
tuning functions (Figure 4.7A) and in reproducing the observed integration
indices(Figure4.7B).Ideal-observermodelsthatuseinternaltemplatesbeyond
3 octaves wide produce tuning functions that are significantly narrower than
107
template bandwidth (octaves)
tuning function bandwidth (octaves)
∞ ∞ ∞
A
fovea
periphery
1 2 3 4 8
1
1.5
2
2.5
ASN
1 2 3 4 8
1
1.5
2
2.5
BW
1 2 3 4 8
1
1.5
2
2.5
PLB
∞ 1 2 3 4 8
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2
BW
1 2 3 4 8
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2
ASN
B
∞ ∞
template bandwidth (octaves)
f
peak
(cycles/letter)
1 2 3 4 8
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2
PLB
≈
≈
≈
≈
≈
≈
Figure 4.8: (A)Bandwidthsof themodels’ lettertuning functions areplottedasafunction of
template bandwidth. Lettertuning function bandwidth decreasesmonotonically with increas-
ingtemplatebandwidth. Forcomparison,thebandwidthsofthehumanlettertuningfunctions
are shown by the horizontal solid lines (dotted lines representing the bootstrapped 95% confi-
denceintervals). (B)Asimilarcomparison ismadeforthepeakfrequenciesof thelettertuning
functions. Themodelpeaktuningfrequenciesareinvarianttotemplatebandwidth.
thehumancurves(Figure4.8A)andalsoleadtointegrationindicesthataresig-
nificantly greater than 1.0. The latter is due to the fact that the templates used
forthecomponentstimuliarenolongerorthogonal. Theintegrationindicesfor
thesenonorthogonalchannelsincreasemonotonicallywithincreasingdegreeof
overlap.
108
Theseresultscorroborateatwo-stageprocessinghypothesis: afront-endsys-
tem consisting of independent feature detectors or spatial frequency channels
that are narrowly tuned (about 2 octaves) followed by a back-end system that
optimallycombinesthedetectedfeaturesacrossspatialfrequencies.
Due to events beyond our control, subject JS was not available for the CSF
measurement, and as a result, no model simulation could be performed on his
data set. Nevertheless, we can account for JS’s peripheral integration index,
which is significantly greater than 1, from the simulation results for the other
subjects. Ideal-observermodelsthatuse34octavewideinternaltemplatesyield
indices similar to that of JS in the periphery. This suggests that the peripheral
visual field for subject JS may be mediated by spatial frequency channels that
havewiderbandwidthsascomparedtotheothersubjects.
4.7 Generaldiscussion
Our results show that while contrast sensitivity to filtered letters were signifi-
cantly lower in the periphery than in the fovea, the periphery is as efficient at
integratingfeaturesacrossspatialfrequenciesasthefovea. Thatbothfoveaand
peripheryexhibitoptimalintegration acrossspatialfrequenciesinaletteriden-
tificationtaskissurprisingandisinsharpcontrasttothesub-optimalprobabil-
ity summation for a compound-grating detection or discrimination task. What
mightaccountforthisapparentdiscrepancy? Thereareatleasttwopossibilities.
109
Lettersas“natural”stimuli
First, the answer may lie in the over learning of broadband edges and bars.
Compoundgratingsarenotfamiliarobjects,anditispossiblethatthevisualsys-
tem never develops the mechanisms to optimally combine information across
the spatial frequency components in arbitrary phase alignments. In contrast,
natural(orover-learned)stimulilikelettersarenecessarilybroadband,andfea-
ture detection must be distributed across many narrowly tuned mechanisms
(or channels) found in the early stages of visual processing. More importantly,
unlikethatofacompoundgrating,thenarrowbandcomponentsextractedfrom
a natural stimulus are often confined to a specific set of phase configurations
that lead to the sharp edges and contours and other common broadband fea-
tures. It is reasonable to expect the visual system to develop mechanisms that
are efficient detectors of these naturally occurring combinations of spatial fre-
quency and phase arrangements. Using an orientation discrimination task,
ThomasandOlzak (2001) showed the existence of an efficient summing circuit
for gratings that is phase-aligned. The role of familiarity is also hinted at by
a finding in Meinhardt (2001), showing that learning a spatial-frequency dis-
crimination task with gratings over a period of weeks resulted in broadening
of the spatial tuning function for grating detection (i.e., better cross-frequency
summation). However,toourknowledge,thereisnostudytodateshowingthe
emergenceofoptimalsummationfollowinglearningafixedbutarbitraryphase
alignmentbetweenthecomponentsofacompoundgrating.
One way to test if the efficient integration with letter stimuli depends on
naturallyoccurringphasealignmentscommontobroadbandshapeswithsharp
edges is to randomly jitter the components in a composite letter stimulus. We
would expect suboptimal integration in the jittered condition if specific phase
110
alignment is required. Conversely, one can systematically measure integration
indicesusinggratingswithrelative amplitudeandphasethatareeitherconsis-
tentwiththosefoundinnaturalimagesorareparametricallydeviatedfromthe
naturally occurring ones. To test whether familiarity to a fixed but unnatural
phase alignment plays an important role, we can train participants to identify
composite letters and gratings with components that are relatively displaced
by a fixed amount and compare the integration indices between the naturally
occurringphasealignmentandthetrainedanduntraineddisplacements.
Letteridentificationandcueintegration
A second explanation for the optimal summation observed with letters is
that the sensitivity to the components, against which the sensitivity of the
composite was compared, is not limited by the front-end spatial frequency
channels but by a second-stage feature integration process. It is worth not-
ing that whereas optimal summation has rarely been observed in simple
grating detection and discrimination tasks, optimal cue integration is rela-
tively common in tasks such as 3-D depth and slant perception (Blakeetal.,
1993; ErnstandBanks, 2002; Hillisetal., 2004), texture-defined pattern detec-
tion and identification (LandyandKojima, 2001; Meinhardtetal., 2006), and
visualsearch(Shimozakietal.,2002).
A common theme among the tasks showing optimal cue integration is that
thelow-levelfeaturesthatmakeuptheindividualcuesaresupra-thresholdsuch
that the visibility of the individual cues are not limited by the sensitivity of
the front-end feature detectors. Another common theme is that perceiving the
quantity of interest (e.g., slant) using any individual cue (e.g., stereo-disparity)
requires integration across low-level features (e.g., local disparity in a random
111
dot stereogram). While different cues may be optimally integrated (e.g., dis-
parity and texture), the low-level integration process subserving each individ-
ual cue is often suboptimal (e.g., statistical efficiency for detecting disparity-
defined edges is below 20% HarrisandParker, 1992; WallaceandMamassian,
2004). Thislastpointisakintoourfindings.
Let us assume that oriented narrowband gratings (Gabors) are the features
extracted by the early stages of visual processing. To identify a narrowband
letter, these oriented Gaborsmustbe integrated spatially. Different spatialcon-
figurations of these Gabor features form different letters. The highest reported
detection efficiency for Gabors is about 70% (Burgessetal., 1981), while the
highest reported efficiency for identifying 2-octave band-pass filtered letters is
42% (ParishandSperling, 1991), with the more typical values being in the sin-
gle digits (Goldetal., 1999; Pellietal., 2006; Tjanetal., 1995). The large differ-
enceinefficiencyisindicativeofthepresenceofadditionalprocessingbetween
the stages of Gabor detection and narrowband letter identification. For our
tasks, to identify letters at a single spatial frequency band, the Gabor features
must be integrated across spatial locations and orientations; to identify letters
rendered at two spatial frequency bands, feature integration must also occur
acrossspatialfrequencies. Ourresultssuggestthatinspiteoftheinefficiencyin
cross-location or cross-orientation integration of visual features, the process of
integrationacrossspatialfrequenciesisoptimalwithinthesecond-stagefeature
space,irrespectiveofeccentricity.
112
Whatconstitutesaletterchannel?
Our results also elaborate the notion of a letter channel (Majajetal., 2002;
SolomonandPelli, 1994). The findingof optimal summation across letter com-
ponents widely separated in spatial frequencies clearly argues against treating
aletterchannelasanalogoustoaspatialfrequencychannel. Chungetal.(2002)
have shown that to a good approximation the feature selection process under-
lyingtheidentificationofisolatedband-passfilteredletterscanbeexplainedby
the two factors that limit the performance of their CSF-limited ideal-observer
models: the letter-identity information in the spatial frequency spectra of the
stimulus, called the letter sensitivity function, or LSF, and the observer’s con-
trast sensitivity function, or CSF. Chung et al. showed that an observer’s letter
tuning function (LTF) is proportional to the product of CSF and LSF, that is:
LTF(f) = k×CSF(f)×LSF(f). This simple model argues against the need to
positanactivechannelselectionmechanismthatscaleswiththesizeofthestim-
ulus. This model corresponds to our ideal-observer model with 1-octave tem-
plates. OursimulationresultsshowthatwhiletheChungetal. modelisableto
predict human performance for cross-spatial-frequency integration, a better fit
to the human letter tuning functions can be achieved with 2-octave templates
atthesamecenterfrequencyastheteststimulus. Withorwithoutmodification
to the template bandwidth, the view expressed in Chungetal. (2002) against
an active scale-dependent channel selection process remains unchanged. Our
results also support the general claim of Chung et al. that the periphery visual
system,likethefovealsystem,usestheappropriatesetofspatial-frequencyfea-
turesfromtheinputwhenperformingaletteridentificationtask.
113
Featureintegrationintheperiphery
If cross-spatial-frequency selection and integration are optimal in the periph-
ery, why is form vision so much poorer in the periphery than in the fovea?
It is likely that the cause of form-vision deficits in the periphery is not about
feature integration across spatial frequency, but across spatial locations. It is
known that spatial uncertainty is high in the periphery (HessandField, 1993;
HessandMcCarthy, 1994; LeviandKlein, 1996; Levietal., 1987), and there is
emerging evidence that crowding in the periphery is due to uncertainty about
targetorfeaturelocations(NandyandTjan,2007;Strasburger,2005). Thisview
isconsistent with the second explanation of our findingof optimal integration,
i.e.,fromtheperspectiveofcuecombination.
Summary
Ourfindingssuggestthatboththefoveaandtheperipheryareequallyefficient
at integrating information across spatial frequency channels. In fact, this inte-
gration isoptimal forletter stimuli. In lightof the sub-optimal summation that
is the norm for compound gratings (also replicated in our study), our results
suggest the existence of a stage of feature integration that combines informa-
tionacrossspatialfrequencychannelsefficiently,regardlessofeccentricity. This
stage plays an integral role in the identification of complex broadband stim-
uli. Taking into account the CSF and the bandwidth of the front-end spatial
frequencychannels,asimpleideal-observermodelexplainsourdata.
114
Chapter5
AUnifiedModelofVisual
Crowding
5.1 Introduction
With saccadic eye movements, the human visual system continuously brings
objects of interest to the central visual field for attentive processing. Being
extensively represented in both the retina (cone density in the fovea) and the
primary visual cortex, the central visual field is ideally suited for form vision
tasks. However, the bulk of visual space, beyond the central 2
◦
, projects onto
the peripheral retina. The processing of this peripheral information is inferior
due to retinal and cortical under-representation. Among the various deficits
in peripheral field processing, perhaps the one that could potentially be most
disruptive is the phenomenon of crowding (Korte, 1923). This is the marked
inability to recognize target objects in clutter (Figure 5.1A). Distracter objects
(flankers) that are within a critical distance from the target impair target iden-
tification in the periphery. Crowding is typically inconsequential for normally
sightedindividuals,butisdetrimentalforpatientswithcentralvisualfieldloss,
115
since such individuals must rely on their peripheral visual fields for everyday
taskssuchasreadingandobjectrecognition.
A
s +asn
2.5° 5° 10°
FOVEA
B
radial
tangential
d
out
d
in
Figure5.1: (A)Demonstrationofcrowding: fixatingonthered“-”itshouldbeeasytoidentify
thelettersontheleft;theequidistantsontherightwhichisflanked(crowded)byotherletters
ismuchhardertoidentify. Whenfixationisshiftedtothegreen“+”thesbecomeseasiertoiden-
tify. (B)Theextentofcrowding(crowdingzone,orangepolygons)canbeestimatedbymeasur-
ingtheperformancethresholdfortargetidentificationatperipherallocations(demarcatedby*)
with flankers placedat various relative positions around the target. The estimated zones have
threerobustsignatures: theyscaleuplinearlywitheccentricityofthetarget(Bouma’sLaw);they
are markedly elongated along the axis connecting the target to the fovea (radial axis); flankers
thataremoreeccentricthanthetargetaremoreeffectiveincrowdingthetargetthanareflankers
thatarelesseccentric. (A:adaptedfromPelli,2008;B:adaptedfromToetandLevi,1992)
Research in crowding addresses basic questions on form vision. There has
beenarobustbutunresolveddebateabouttheneuralunderpinningsofcrowd-
ing (see Levi (2008) for a review) since Bouma formally described it in 1970
(Bouma, 1970). Most theories invoke some form of pre-attentive processing in
theearlystagesofvisualprocessing sourceconfusion(KrumhanslandThomas,
1977);inappropriatefeatureintegration(Levietal.,2002;Pellietal.,2004);posi-
tionalaveraging(Greenwoodetal.,2009) astheunderlyingcause. Othersclaim
116
alackofspatialresolutionintheattentionalmechanismneededforfeatureseg-
mentation and binding as the primary cause (Heetal., 1996). Our work with
classification imageshasfound strong evidence for source confusion andinap-
propriatefeatureintegration (NandyandTjan,2007).
The zone of crowding (the spatial extent over which flankers have a detri-
mentaleffectontargetidentification)exhibitsseveralrobustcharacteristics(Fig-
ure5.1B).First, thecrowdingzonealongtheradialaxis(thelineconnectingthe
fovea to the target) scales up linearly with eccentricity and extends roughly to
half the target eccentricity (Bouma, 1970). This is often referred to as Bouma’s
scaling law. Second, flankers have an asymmetric effect on the target in that
an outward (more eccentric than the target) flanker has a greater crowding
effect than an equally spaced inward (less eccentric) flanker (Bouma, 1973;
Petrovetal., 2007). We will refer to this as the inward-outward asymmetry.
Third, the crowding zone is not round but is markedly elongated along the
radial axis (ToetandLevi, 1992). We will refer to this as the radial-tangential
anisotropy. Any viable model of crowding must reproduce these well-defined
propertiesofthecrowdingzone.
Several studies have offered explanations for some of these characteristics.
Pelli (2008) has addressed the issue of the scaling law in terms of “combining
fields”thatareimplementedbyafixednumberof cortical neuronsirrespective
of eccentricity; due to the roughly logarithmic mapping between visual and
cortical space, fixed cortical combining fields translate to eccentricity scaling
in visual space. MotterandSimoni (2007) have offered an explanation for the
inward-outwardasymmetryintermsofasymmetriccorticalseparationsofnear
andfarflankersthathavethesameangularseparationsfromthetargetinvisual
space. Currently, there is no satisfactory explanation for the radial-tangential
117
anisotropy. Moreovernosinglemodelofcrowdingcansimultaneouslyaccount
for all the three spatial characteristics of the crowding zone as well as provide
the neural underpinnings of crowding. In this paper, our aim is to propose
suchaunifiedmodelwithallbutoneparameterconstrainedbyanatomicaland
behavioral data from studies unrelated to crowding. We provide testable pre-
dictionsofthemodel,someofwhichhaveimportantclinicalimplications.
5.2 Theory
Itiswidelybelievedthatthestatisticalpropertiesofthevisualenvironmentplay
a key role in shaping the properties of the visual cortex (Geisler, 2008). It has
beenshownthatthereceptivefieldpropertiesofsimpleandcomplexcellsinV1
canbederivedfromthestatisticsofnaturalimages(OlshausenandField,1996;
KarklinandLewicki,2009). Naturalimagestatisticsalsopredicthumanperfor-
manceincontourgrouping,suggestingafundamentalroleofimagestatisticsin
shapingtheweightsoflateralconnectionsinV1(Geisleretal.,2001). However,
allthesestudiesfocusonimagestatistics of theexternalvisual worldthathave
been veridically acquired. We propose that the image statistics acquired in the
periphery are not veridical, leading to crowding. Specifically, we hypothesize
that the mis-learning is due to a temporal overlap between spatial attention,
whichmodulateslearning,andthesaccadiceyemovementselicitedbythespa-
tial attention deployed in the periphery. Since foveal attention does not often
lead to saccades, the mis-learning, and thus crowding, is most prominent only
intheperiphery.
118
The interaction between spatial attention and saccadic eye
movements
Our theory rests on three assumptions, the first two of which are well estab-
lished. First, we assume that the acquisition of image statistics occurs primar-
ily at attended spatial locations via a gating mechanism (ItoandGilbert, 1999;
Gilbertetal., 2000). Second, we assume that the physiological footprint of spa-
tial attention is constant in size on the primary visual cortex and independent
of eccentricity (Stettleretal., 2002). Finally, we assume that spatial attention
and any subsequent eye movement that it elicits overlap in time; i.e. the eyes
movebeforethespotlightofattentionisfullyretracted. Weshowthatinthepri-
maryvisualcortex, wheresaccadicsuppressionisweak(Garc´ ıa-P´ erezandPeli,
2001) or nonexistent, learning image statistics under these conditions can lead
totheformationoflateral(long-rangehorizontal)connectionsthatmisrepresent
the true image statistics in the peripheral field, which in turn leads to the tell-
tale properties of crowding. We will refer to such misrepresented statistics as
saccade-confoundedimagestatistics.
Theinteractionbetweencovertshiftsofspatialattentionandsubsequentsac-
cadic eye movements that bring the peripherally attended targets to the center
ofgazeisschematicallyillustratedinFigure5.2. Thecriticaldifferencebetween
central and peripheral vision is that the window of probable temporal over-
lap (red box) between spatial attention and the motion blur due to the saccade
(CastetandMasson, 2000; Castetetal., 2002) is only present in the periphery
but not in the fovea. This is because the saccadic eye movement is often pre-
cededbyashiftofattention(Shepherdetal.,1986;DeubelandSchneider,1996).
When the learning of image statistics is gated by attention, motion blur due to
thesaccadewillcauseabiasintheco-occurrence ofpatterns,andconsequently
119
the co-activation of cortical hypercolumns, along the radial direction that con-
nects a periphery location to the fovea. The weights of the lateral connections
between the hypercolumns should reflect this radial bias, forming the basis of
anelongatedcrowdingzone,withthelongaxispointingtowardthefovea.
A
fixation attention
t
0
t
1
t
2+
t
3
image affected by
eye-movement
B
t
deploying attention
saccade planning
eye movement
t
0
t
1
t
2
t
3
Fovea
Periphery
Modulation of attention
Figure5.2: (A)Typicalsequenceoffixation,covertdeploymentofattentiontoasalientobject
intheperipheryandsubsequentsaccadetotheattendedspot. (B)Schematicoftemporalmodu-
lationofattentionatthefoveaandattheperipheralretinallocationwherecovertattentionwas
deployed. We assume that at this covertly attended peripheral location, during the time inter-
valfromthestartof theexecutionof the saccade(t
2
) till thenextfixation(t
3
) [redbox],thereis
overlap of attention and eye movement, resulting in the acquisition of image statistics during
eyemovements.
120
5.3 Results
Geometry of lateral interactions explains Bouma’s law and the
inward-outwardasymmetry
Theunderlyingcorticalarchitectureofourmodel(Figure5.3A;seeAppendixH
for details) consists of a mosaic of cortical hypercolumns. The receptive fields
(RFs)ofthehypercolumnsscaleuplinearlywitheccentricity,withthediameter
of a RF equal to 0.1 times eccentricity (Motter, 2002). The initial lateral (long-
rangehorizontal) connectionsthatexistbetweenhypercolumnsareassumedto
be isotropic in cortical space. We will refer to the set of hypercolumns (blue
circles in Figure 5.3A) towhich a reference hypercolumn (red circle) haslateral
connections, asthelateral interaction zone. Themodification of thelateralcon-
nectionweightsisenabled(orgated)byspatialattention(ItoandGilbert,1999).
We assume that the physiological footprint of spatial attention and the under-
lyinginteractionzoneofareferencehypercolumnarecongruentinV1(seeDis-
cussion,§5.4). The attentional footprint and the interaction zone is an isotropic
regiononthecortexwitharadiusof6hypercolumnsinV1(Stettleretal.,2002),
and this is independentof the eccentricity of the reference hypercolumn. Since
themodificationofthelateralconnectionweightsaregatedbyspatialattention,
a geometric analysisof the footprint of spatial attention, and hencethe interac-
tionzone,shouldrevealthemaximumspatialextentofcrowding.
Figure 5.3B shows the extent of the RFs in the interaction zone of 3 hyper-
columns at 2
◦
, 4
◦
and 6
◦
in the periphery. Clearly, the spatial extent scales up
with eccentricity. To quantify the result, we measure the end-to-end extent of
121
V1
hyper-column
lateral interaction zone
RECEPTIVE FIELDS
(visual space)
d
d = 0.1 × eccentricity
fovea
1
2
3
4
5
6
A B
4
6
d
in
D
radial
d
out
fovea
1
2
3
4
5
6
7
8
9
10
C
Bouma’s Law
0 1 2 3 4 5 6
0
0.5
1
1.5
2
2.5
3
eccentricity (deg)
0.5 × D
radial
D
1 2 3 4 5 6
2
1
0
1
2
3
eccentricity (deg)
distance (deg)
d
in
d
out
d
out
d
in
Figure 5.3: (A) A simple geometry of V1 is assumed in which cortical hypercolumns are
arrangedina hexagonalmosaic. The receptivefields of thecomputational elementswithin the
hypercolumnsscaleuplinearlywitheccentricity. Eachhypercolumnisassumedtohavelateral
(long-range horizontal) connections to an isotropic neighborhood (lateral interaction zone) of
hypercolumns on the cortex. The radius of the neighborhood is independent of eccentricity.
(B)Theextentof thelateralinteractionzoneisshown in visualspaceforthreereferencehyper-
columnsateccentricities2
◦
,4
◦
and6
◦
. Theradiusofthezonesis6hypercolumns(seeMethods
and Materials). (C) Half the end-to-end distance of the interaction zones along the radial axis
(the line joining the receptive field center of the hypercolumn to the fovea) are plotted against
the eccentricity of the corresponding referencehypercolumn. The dotted line is the prediction
of Bouma’s scaling law (Fig 1B; Bouma, 1970). (D) The radial distance from the receptivefield
centerof a referencehypercolumn to the outer extremity(d
out
) and to the inner extremity(d
in
)
of the interaction zone is plotted against the eccentricity of the reference hypercolumn. That
d
out
isalwaysgreaterthand
in
,explainstheinward-outwardasymmetry.
122
the RFs along the radial axis (i.e. the line joining the fovea and the RF cen-
ter of the reference hypercolumn) and this is plotted versus eccentricity in Fig-
ure 5.3C. The coincidence with Bouma’s Law (dotted line) is simply due to the
linearscalingoftheRFswitheccentricityandthecorticalsizeoftheinteraction
zonebeingindependentofeccentricity. Theradiusoftheinteractionzonethatis
requiredtomatchBouma’sLawisabout6hypercolumns(AppendixH),which
isin good agreementwiththe measuredextentof horizontal connections inV1
(Stettleretal.,2002).
Further, if we split up the end-to-end radial extent into two parts the dis-
tance from the RF center of the reference hypercolumn to the outer extremity
and to the inner extremity of the radial extent we can clearly see the asym-
metry (Figure 5.3D). Although RFs at the outer extremity are farther away in
visualspacefromtheRFofthereferencehypercolumnthantheRFsattheinner
extremity, the corresponding hypercolumns are equidistant from the reference
hypercolumn on the cortex. Consider a target feature that is placed within the
RF of the reference hypercolumn. A flanker feature at the outer extremity (fur-
ther away from the target in visual space) and a flanker feature at the inner
extremity (closer to the target) will thus have similar degrees of lateral interac-
tionleadingtotheasymmetry(MotterandSimoni,2007).
We have shown that the simple assumption that the zones of lateral inter-
action areof aconstantsizeon theprimaryvisualcortex explaintheproperties
of scaling (Bouma’s Law) and the inward-outward asymmetry. However, this
assumptionalonecannotexplaintheanisotropyofthecrowdingzone,nordoes
itspecifythelateralinteractionsthatleadtocrowding.
123
Saccade-confoundedimagestatistics
Weperformed simulations of eye-movements to measurepair-wise joint statis-
tics (mutual information; AppendixH, Equation H.11) between oriented filters
in a reference hypercolumn and oriented filters in neighboring hypercolumns
within the lateral interaction zone of the reference. We reason that such pair-
wise statistics would determine the strength of lateral interactions between V1
neurons. The simulated system makes saccades to different attended locations
in the periphery. The modulation of spatial attention (Figure 5.2B) gates the
acquisition of image statistics. The time constant of the decay of spatial atten-
tion,λ,istheonlyfreeparameterofourmodel(othersaresetbyanatomicaland
behavioraldatafrompublishedstudiesunrelatedtocrowding).
Figures 5.4D and 5.4E show the strength of pair-wise mutual information
with respect to two oriented elements in the reference hypercolumn (2
◦
in the
periphery) asaresultof exposure to30000simulatedsaccades(λ = 16ms). The
reference elements are radially (oriented parallel to the direction of the fovea)
and tangentially (oriented perpendicular to the direction of the fovea) aligned
in Figures 5.4D and 5.4E respectively. For comparison, Figures 5.4B-C show
the corresponding mutual information strength had there been identical expo-
suretothestimuli(i.e. similarattentionprofile)withoutanyaccompanyingeye
movement (or if eye movement could be completely discounted). In effect, the
mutualinformationinFigures5.4B-Creflecttheveridicalstatisticsofthevisual
environment. Wecanseethattheveridicalstatisticsdictateaco-circularpattern
of connectivity (Sigmanetal., 2001) as has been demonstrated in psychophys-
ical experiments (Geisleretal., 2001) and has been proposed in mathematical
models (Ben-ShaharandZucker, 2004). In contrast, the saccade confounded
statistics deviates from the true statistics in two major aspects: (a) there is a
124
preference for iso-orientation (Boskingetal., 1997; Stettleretal., 2002) and (b)
the spatial extent of the mismatch between the veridical and the confounded
statistics has a strong radial bias irrespective of the reference orientation. An
intuitive explanation for this is that the motion streak due to the saccade co-
activates similarly oriented elements in hypercolumns whose receptive fields
lie along the eye-movement trajectory towards the fovea (the radial direction).
The iso-oriented preferential pattern of connectivity in the periphery offers a
simple explanation to the recent suggestion that crowding causes the periph-
eral fieldtobe integrated into atexture field (LeviandCarney, 2009) aswellas
theaveragingmodelofcrowding(Parkesetal.,2001).
Computing the discrepancy between the saccade-confounded statistics and
the veridical image statistics reveals the extent and strength of inappropriate
feature integration that underlies crowding. Figure 5.5 shows the normalized
difference between saccade-confounded and veridical pair-wise mutual infor-
mationonfeatureorientation,pooledoverallorientations,(AppendixH,Equa-
tion H.13) at three locations in the visual field (2
◦
horizontal, 4
◦
lower-right
and 6
◦
horizontal). The blue contours mark the zones of diminished mutual
informationinthesaccade-confoundedstatisticsascomparedtothetruestatis-
tics. Image features from the subservient hypercolumns would thus be loosely
bound to the reference features, leading to an under-integration of features.
Conversely, the red contours depict the zones of excessive mutual information
in the saccade confounded statistics. Hence features from these regions would
stronglyinfluencethereference,leadingtoexcessiveanderroneousfeatureinte-
gration. Wecanseethatthezoneofunder-integration isintheproximal neigh-
borhood of the reference hypercolumn, while the zone of over-integration is in
125
fovea
1
2
3
4
radial
tangential
reference
hypercolumn
neighbor
hypercolumn
A
pair-wise
mutual
information
Veridical statistics
low high MUTUAL INFORMATION
Saccade-confounded statistics
B D
E C
Figure 5.4: (A)Pair-wisemutualinformation betweenan orientedfilterin areferencehyper-
column at2
◦
andeachof the orientedfilterswithin theinteractionzone. (B)-(C)Veridical statis-
tics. Oneofthefiltersinthereferencehypercolumn(green)isalignedalongtheradialaxis,while
theother (blue)is alignedalongthe orthogonal tangentialaxis. Thepair-wisemutualinforma-
tion gathered without eye movements, thus representing the veridical statistics, is shown for
all oriented filters within the lateral interaction zone (blue-circles in A) of the reference. The
colorbarshowsthemagnitudeofthemutualinformation. Foreachneighboring hypercolumn,
the oriented filter with the highest mutual information is highlighted with thick lines. (D)-(E)
Saccade-confounded statistics. Mutual information gathered under the interaction of attentional
deploymentandsubsequentsaccadesfortheradiallyorientedandthetangentiallyorientedfil-
ters respectively. While the veridical statistics implicates smooth continuation of contours, the
saccade-confoundedonesfavorrepetitionofiso-orientedfragments.
the distal neighborhood. This suggests that the process of inappropriate inte-
gration istwo-fold: featuresfromthetarget objectareweaklybound whilefea-
tures from the clutter surrounding the target are excessively bound. The over-
all region of inappropriate integration is elongated along the radial axis and
has the shape and anisotropic extent of the psychophysically measured zone
of crowding (cf. Figure 5.1B), with an aspect ratio between 1.54 and 2.48. The
126
qualitative shape of the zones (radial elongation, proximal under-integration,
distal over-integration) ispreserved across moderate valuesof theparameterλ
(λ = 4,8ms; Figure 5.6). However, the simulations suggest that a larger time
constant (about 16 ms as in Figure 5.5) is necessary to match the spatial extent
dictatedbyBouma’sscalinglaw.
increase decrease
Normalized
Δ Mutual
Information
fovea
1
2
3
4
5
6
7
8
9
10
Figure 5.5: The normalized difference between saccade-confounded and veridical pair-wise
mutual information on feature orientation between a reference hypercolumn and neighboring
hypercolumns, pooled over all orientations (cf. Figure 5.4; time constant of the decay of spa-
tial attention,λ = 16ms), normalized with respectto the pooled veridical statistics, is shown in
visualspaceforthreereferencehypercolumnsat2
◦
,4
◦
and6
◦
. Thecolor-barshowsthestrength
of the deviation from the veridical statistics, indicative of inappropriate integration: shades of
red indicate that the mutual information on feature orientation between a reference hypercol-
umnandanadjacenthypercolumnishigherinthesaccade-confoundedstatisticsthanexpected;
shades of blue indicate lower mutual information than expected. Elliptical fits (dotted lines at
40%of peaknormalized difference)illustrate the elongated shapeof the spatialextent of inap-
propriateintegration
127
increase decrease
Normalized
Δ Mutual
Information
A B C
λ = 16 ms λ = 8 ms λ = 4 ms
F
1
2
3
4
5
6
7
8
9
10
F
1
2
3
4
5
6
7
8
9
10
F
1
2
3
4
5
6
7
8
9
10
Figure 5.6: (A) The normalized difference between saccade-confounded and veridical pair-
wise mutual information on feature orientation between a reference hypercolumn and neigh-
boringhypercolumns,pooledoverallorientations(cf. Figure5.4andEquationH.13),isshown
forthreereferencehypercolumnsat2
◦
,4
◦
and6
◦
. Thecolor-barshowsthestrengthofthedevi-
ationfromtheveridicalstatistics,indicativeofinappropriateintegration: shadesofredindicate
that the mutual information on feature orientation between a reference hypercolumn and an
adjacent hypercolumn is higher in the saccade-confounded statistics than expected; shades of
blue indicate lower mutual information than expected. Elliptical fits (dotted lines at 40% of
peak normalized difference) illustrate the elongated shape of the spatial extent of inappropri-
ate integration. The time constant of the decay of spatial attention (λ) is 16ms. (B)-(C). The
correspondingzonesareshownforλ= 8msandλ = 4msrespectively.
5.4 Discussion
Startingwithasimplemodelandaminimalsetofassumptions,wehaveshown
thatthescalinglawandtheasymmetryinthecrowdingzonearesimpleconse-
quences of the following: (a) the extent of lateral connections in V1 is isotropic
and independent of eccentricity; (b) the sizes of the receptive fields of V1 neu-
rons increase linearly with eccentricity. The anisotropy (elliptical shape) of the
128
crowdingzoneiscausedbythedistortedimagestatisticsencodedinlateralcon-
nections between V1 hypercolumns. The distortion is due to: (c) spatial atten-
tion gates the acquisition of image statistics at a retinal location; (d) temporal
overlap between the duration of the spatial attention at a retinal location and
the subsequent saccade it elicits. Since saccades are always radial with respect
to the fovea, the acquired image statistics are mostly confounded in the radial
direction.
Our modeling results provide clues to the nature of the anomalous feature
integration process underlying the crowding effect that has not been studied
empirically sub-optimal bindingof target features dueto proximal weakening
of connectivity coupled with inappropriate binding of distracter features due
to distal strengthening of connectivity in the lateral interaction zone. Thisdual
nature of the binding process is in agreement with our previous finding with
classificationimagesthatcrowdingreducestheuseofvalidfeaturesusedwhile
at the same increases the number of invalid features used by the visual system
(NandyandTjan,2007). Further, the iso-oriented connectivity pattern suggests
a texture like processing of the peripheral field (LeviandCarney, 2009), rather
than a Gestalt-like smooth contour integration process. It is important to note
that our model is parsimonious in that it has only one free parameter the time
constant of the temporal decayof spatial attention. All other parametersof the
modelarederivedfromempiricalstudiesthatarenotaboutcrowding;theyare
notsetbyfittingourmodeltocrowdingdata.
Thespatio-temporalfootprintofattention
Ithasbeenproposedthatcontextualeffects,mediatedbylong-rangehorizontal
connections, in V1 are modulated by spatial attention via a gating mechanism
129
(ItoandGilbert, 1999). The gating shapes the responses to stimulus configura-
tion and mediates learning (Gilbertetal., 2000). Further, there is evidence that
the anatomical extentof the feedbackconnections from V2 toV1, whichpoten-
tiallymediatetop-down attention, isroughlythesameastheextentofintrinsic
lateralconnectionswithinV1(Stettleretal.,2002). Together,thesefindingssup-
port the assumption in ourmodel that the extentof spatial attention iscongru-
entwiththeextentofthelateralconnectionsinV1.
Although attention has been a very active area of research, the temporal
dynamics of attention during saccadic eye movement, as opposed to immedi-
ately before or after a saccade, has not been characterized. In our model we
assumedanexponentialdecayfunctionandchosetoparametricallyexplorethe
effect of varying the time constant of the decay. Our simulations results show
thatevenmoderatevaluesofoverlapbetweenattentionandsaccadiceyemove-
ment, as little as 4 ms, are able to produce the anisotropy in lateral connection
weights. Further experiments, both electrophysiological and psychophysical,
areneededtoaddressthisissue.
It is possible that there could be a small but significant temporal overlap
between spatial attention and eye movement even at the fovea. For example,
this could happen if attention is divided between the fovea and the periphery.
Even in this case, the periphery will continue to exhibit the radial bias, while
thebiasatthefoveawillessentiallybeomni-directional(isotropic).
Saccadicsuppression
One of the objections that could be raised against our model is that the phe-
nomenon of saccadic suppression would prevent the retinal motion blur from
affectingtheplasticityoftheearlyvisualcortex. Thereisaconsiderableamount
130
of debate in the literature about the mechanisms underlying saccadic sup-
pression; some studies have argued for an extra-retinal suppressive mecha-
nism (Diamondetal., 2000) while others have argued for a visual masking
mechanism (Matinetal., 1972; CampbellandWurtz, 1978). While both mech-
anisms might contribute toward the suppression, albeit unequally (Wurtz,
2008), there is little evidence of any suppression in the early visual cortex
(Garc´ ıa-P´ erezandPeli, 2001). Instead there is growing consensus that peri-
saccadicstimuli areindeedprocessed bytheearlyvisualsystem andthatthese
signalsarepreventedfromreachingawarenessatalaterstageinvisualprocess-
ing (WatsonandKrekelberg, 2009). Based on current evidence, we thus have
reason to be confident that our model is immune to the debate regarding the
exact nature of the suppressive mechanism accompanying saccadic eye move-
ments.
Anisotropyandtheneurallociofcrowding
Although area V4 has been suggested as a possible locus of crowding (Levi,
2008) due to the reported anisotropy in V4 receptive field size (Pi˜ nonetal.,
1998), there is recent evidence that a V4 receptive field represents a conver-
gence of information from a circular patch of V1 (Motter, 2009). The observed
asymmetry and anisotropy in a V4 receptive field is then completely deter-
mined by the transformation of visual space according to the cortical magni-
fication factor (CMF) of V1. As illustrated in our geometric analysis of the lat-
eral interaction zone, this anisotropy is insufficient to explain the human data.
This finding lends credence to our theory that crowding originates in V1 due
to extra-classical interactions. At the same time, our theory does not preclude
131
the fact that crowding probably occurs at multiple levels in the visual system
(Farzinetal.,2009).
This finding also suggests that the anisotropy in the crowding zone is not
simply due to differential critical spacing on the cortex along the radial and
tangential axes (Pelli, 2008; vandenBergetal., 2010). The differential critical
spacing,whichhasbeenattributedtoananisotropyintheCMFmeasuredinan
fMRIstudy(LarssonandHeeger,2006),doesnottakeintoaccounttherelation-
ship between the mapsof different features (e.g. orientation, spatial frequency,
oculardominance)tothatofthemapofvisualspace. Forexample,manystudies
have shown that the local anisotropy in the retinotopic map is correlated with
the layout of ocular dominance columns in V1 (Tootelletal., 1982; Rosaetal.,
1988;BlasdelandCampbell,2001). Atthesametime,(Yuetal.,2005)show,with
both simulations and optical imaging, that the coordinated mapping of multi-
ple inter-dependent feature maps results in a locally smooth representation of
visualspacethatcannotbeinferredfromtheretinotopicmapalone.
Age-related macular degeneration and the preferred retinal
locus
Manypatientswithcentralvisionlossduetoage-relatedmaculardegeneration
(AMD)developtheuseofastableperipherallocation intheretinaforfixations
during form-vision tasks. This is known as the preferred retinal locus (PRL)
and is typically located just outside the central scotoma. Since the stable PRL
is used for fixations, saccadic eye movements for such patients are now radial
with respect to the PRL. If our theory about crowding is correct, and if visual
plasticity persists, then we can make two predictions regarding the crowding
zone for such patients: (a) the crowding zone measured at the PRL should no
132
longer be elongated since the PRL no longer experiences any radial bias in eye
movementsand(b)theelongatedaxesofthecrowdingzonesatotherperipheral
locations should point toward the PRL (Figure 5.7). Preliminary results from
AMD patients measured with a scanning laser ophthalmoscope suggest that
the zone of crowding measured at the PRL is indeed rounded (Chung, 2009,
personalcommunication; ChungandLin,2008,ARVO).
increase decrease
Normalized Δ Mutual Information
F
1
2
3
4
5
6
7
F
1
2
3
4
5
6
7
F
1
2
3
4
5
6
7
F
1
2
3
4
5
6
7
F
1
2
3
4
5
6
7
A B C
pre-scotoma 20000 50000 80000 post-scotoma
PRL location
Figure 5.7: (A) The predicted zones of inappropriate integration (normalized difference
between pooled saccade-confounded and veridical statistics) are shown at two locations in a
normal visualfield: 4
◦
totheright and4
◦
to thelower right. (B)-(C)Acentralscotoma isintro-
duced(zonedemarcatedbygraycircles,indicatingthereceptivefieldsaffectedbythescotoma)
andtheretinallocationat4
◦
totheright(starredlocation)isassumedtobethepreferredretinal
locus(PRL),servingasthecenterofgazeforallfixationssubsequenttothedevelopmentofthe
centralscotoma. Thenumberatthebottomofeachpanelindicatesthenumberofsimulatedsac-
cades. (D) Thepost-scotoma stablestatistics atthetworetinallocations. Theellipticalfits (dot-
tedlines)areat40%ofpeaknormalizeddifference.AtthePRL,thezoneprogressesfrombeing
anisotropictoisotropic. Attheotherlocation,thezoneprogressesfromfovea-centricanisotropy
(elongationwiththelongaxispointingtowardsthefovea)toPRL-centricanisotropy.
Crowdingzonesintheupperandlowervisualfields
Theuppervisualfieldinnaturalenvironmentshaslesserstructure foranyeye-
movement to create a visual pattern. Our model would thus predict a more
133
rounded crowding zone in the upper visual field as compared to the lower
field. We measured the shape of the crowding zone at 10
◦
in the upper and
lower visual fields of normally sighted individuals (see Appendix I for details
oftheexperimentdesign). MapsofthecrowdingzonesareshowninFigure5.8.
Clearly, the crowding zones in the upper visual field are more rounded than
those in the lower visual field (compare the elliptical fits; cf. Table I.1) and is
thusinaccordwithourmodelprediction.
Otherimplicationsfromthemodel
Since most naturally occurring human saccades have magnitudes of 15
◦
or
less (Bahilletal., 1975), our theory would predict that the radial-tangential
anisotropy would be less pronounced for eccentricities beyond 15
◦
. Moreover,
thecrowdingzoneininfantsshouldbemoreroundedsincetheirvisualsystems
wouldnothavehadsufficientexposuretothebiasedstatisticsduetosaccades.
134
ASN
MS
LOWER
DL
0. 5
0.55
0. 6
0.65
0. 7
0.75
0. 8
0.85
0. 9
0.95
1.0
proportion
correct
MS
MA
UPPER
DL
Figure 5.8: Maps of the crowding zone at 10
◦
in the upper visual field (left column) and at
10
◦
in the lower visual field (right column). The maps are interpolated from percentage correct
data measured by placing a single letter target at the starred location (white star) and a single
letterflankeratoneof sixtypositions aroundthetarget(blackdots;seeAppendixIfordetails).
A contour at 65% correct (magenta) and its 95% confidence interval (translucent bounds) are
superimposed on the maps. The confidence intervals were computed using a bootstrap pro-
cedure (EfronandTibshirani, 1993). A least squares elliptical fit (dotted black ellipse) to the
contourisalsosuperimposed. Theorangecirclerepresentshalfthetargeteccentricity(Bouma’s
limit).
135
Chapter6
ConclusionsandFutureDirections
In the previous chapters we first investigated form vision deficits in periph-
eral vision through a seriesof psychophysical experiments. We concluded that
therewerenoperceptualtemplatedistortionsintheperiphery,norwasintegra-
tion across spatial frequency channels suboptimal. However, we found strong
evidence for inappropriate feature integration across spatial locations in the
periphery and concluded that this was the primary cause of perceptual errors
incrowdedstimuli. Wealsofoundthatcrowdingwasassociatedwithanineffi-
cientselection andusage of low-level features. Wenextdeveloped anovel and
unified theory of visual crowding. The theory postulates a simple yet over-
looked origin of crowding – the extent and shape of non-classical receptive
fields in the primary visual cortex that are shaped by the interaction of spatial
attention and saccadic eye movements. We argued that this interaction causes
amis-representation ofimagestatisticsinthestrength andextentoflong-range
horizontal connections in peripheral V1. These mis-representations, in turn,
underly inappropriate contextual interactions that lead to crowding. Our the-
oryquantitativelyaccountsforthekeypropertiesofcrowding,anditgenerates
newhypothesesontheshapingofnon-classicalreceptivefieldsintheperiphery.
136
Belowweoutlineseverallinesofresearchthatemergefromourcurrentthe-
oreticalandempiricalfoundationregardingthephenomenonofcrowding.
Characterizingthespatiotemporalprofileofattention
aroundasaccadetarget
Asproposedinthecrowdingmodel(Chapter5),imagestatisticsareimproperly
encoded in peripheral V1 due to the temporal overlap between spatial atten-
tion in the peripheral visual field that precedes a saccadic eye movement and
the motion blur due to the subsequent saccade. Although attention has been
an active area of research, the spatiotemporal dynamics of attention around a
saccade target in the periphery has not been systematically studied. There is
anatomical evidence that the extent of feedback connections from V2 to V1 is
roughlycongruentwiththeextentoflateral(long-rangehorizontal)connections
within V1(Stettleretal.,2002). Thissuggests thatthe spatial footprint of atten-
tion is roughly constant on the cortex and is independent of eccentricity. The
spatiotemporal profile of attention around a peripheralsaccadetarget could be
measured by measuring the visibility of a dim probe flashed around the tar-
get. The probe would be presented at various locations around the target and
at various times after the onset of the cue until the completion of the saccade.
Observers would be required to make an accurate saccade to the target and
report the location of the probe. Visibility contours around the target could be
mappedby varying the contrast of the target. The 3-D probe detectability map
(2 dimensions of space, 1 dimension of time) would provide an estimate of the
spatiotemporalprofileofattention.
137
Temporalcharacteristicsofthecorticalreorganization
processafteronsetofcentralscotoma
Patients with AMD typically use a stable location in their peripheral retina for
fixation. This location, which is often task-dependent, is usually located near
the central scotoma and is known as the preferred retinal locus (PRL). Such
patientsrelyonretinallocationsatandnearthePRLforobjectrecognition. Our
proposed model of crowding predicts that with increasing exposure to PRL-
centric saccades (Figure 5.7) : (a) the shape of the crowding zone at the PRL
should evolve from pre-scotoma anisotropy (elongated zone pointing toward
the fovea) to isotropy; and (b) at intact non-PRL locations, the crowding zone
shouldundergoreorganizationfromanisotropypointingtowardthefovea(pre-
scotoma) to anisotropy pointing toward the PRL (post-scotoma). Preliminary
studies (Chung, 2009, personal communication; Chung and Lin, 2008, ARVO)
have shown that the crowding zone measured at the PRL does not exhibit the
marked anisotropy that is a hallmark of crowding in the normal periphery
(ToetandLevi, 1992). This suggests that there indeed is a cortical reorganiza-
tionprocessaftertheonsetofthecentralscotomainaccordwiththepredictions
ofourmodel.
However, the temporal characteristics of the reorganization are unknown
and would also be difficult to assess in patients recently diagnosed with AMD
who might not yet have developed a stable PRL. Instead, this problem can be
assessed by simulating a scotoma in normally sighted individuals and by the
use of a gaze contingent display (GeislerandPerry, 2002). The zone of crowd-
ing can be measured prior to and immediately after exposure to the simulated
138
environment. Thetemporaldynamicsofthereorganizationprocesscanbechar-
acterizedbyvaryingthedurationofexposure.
Fast techniques to assess crowding zones in patient
populations
Psychophysical methods that are used to quantify crowding and assess the
shape of the crowding zone are accurate, but tedious (Bouma, 1970; Chung,
2007; Chungetal., 2001;Pellietal.,2004;Strasburger etal.,1991). Theyarecer-
tainlynotsuitedtoassessingcrowdingzonesinAMDpatients. Itmightbepos-
sible to considerably shorten the assessment time by using a frequency-tagged
EEGsourceimagingprocedure(Appelbaumetal.,2006). Flickeringtargets(ori-
ented grating flickering at frequency f
1
) and flankers (bow-tie annulus flicker-
ingatfrequencyf
2
)couldbeusedtomeasurethepresenceofsumanddifference
terms of the two stimulus frequencies. Such interaction terms would provide
clearsupportforan(inappropriate)interactionbetweentargetandflankers. By
varyingtheradiusoftheflankingannulusanditsorientationwithrespecttothe
target,theextentofinappropriateintegrationcouldbemappedbydetermining
theinteraction termsofthetwofrequencycomponents. Ifthistechniqueissuc-
cessful, it would make it considerably easier to conduct crowding studies on
patientpopulations.
139
A cautionary note on the development of visual
enhancementsforAMDpatients
Thedevelopmentofvisualaidsandenhancementsisanimportantareaoflow-
vision research that can greatly benefit patients with central vision loss. Typi-
cally, such enhancements are initially tested for their efficacy in the peripheral
visual fields of normally sighted subjects. Any improvement in performance
in a normally sighted population is taken as having the potential to improve
visual performance in patient populations. Conversely, enhancements that fail
to improve performance in normal subjects are not chosen as candidates for
testing in patient populations. Given our model predictions regarding cortical
reorganizationatandaroundthePRLofAMDpatients,Iofferacautionarynote
regarding the standard practice. If indeed there is reorganization such that the
crowding zone at the PRL is isotropic and reflects the veridical statistics (albeit
atacoarserspatialresolution dependingontheeccentricity ofthePRL),thenit
would be prudent to develop enhancementsthat target this reorganized visual
apparatus. And it would be reasonable to expect that coarse image enhance-
ments that improve foveal performance in normal populations would benefit
AMDpatientswithastablePRL.
Imagesegmentationinthenormalperiphery
Our model suggests that the incorrect representation of image statistics in the
peripheral cortex not only impairs object recognition (crowding) but also the
strongly interlinked processofsegmentation. Givenathorough understanding
oftheincorrectimagestatistics asprovidedbyourmodel,itmaybepossibleto
140
design image enhancements (filters) that specifically counter the inappropriate
contextualinteractionsinthecortex,therebyleadingtobettersegmentationand
objectidentification.
141
References
3rd Stromeyer, C. F. and Julesz, B. (1972). Spatial-frequency masking in vision:
criticalbandsandspreadofmasking. JOptSocAm,62(10):1221–32. 0030-3941
(Print)JournalArticle.
3rd Stromeyer, C. F. and Klein, S. (1974). Spatial frequency channels in human
visionasasymmetric(edge)mechanisms. VisionRes,14(12):1409–1420.
Abbey,C.K.andEckstein,M.P.(2002).Classificationimageanalysis: estimation
and statistical inference for two-alternative forced-choice experiments. JOV,
2(1):66–78.
Adolphs, R., Gosselin, F., Buchanan, T., Tranel, D., Schyns, P., and Damasio,
A.R.(2005). Amechanismforimpairedfearrecognitionafteramygdaladam-
age. Nature,433(7021):68–72.
Ahumada, A. J. (2002). Classification image weights and internal noise level
estimation. JOV,2(1):121–31. 1534-7362(Electronic) JournalArticle.
Ahumada, A. J. and Beard, B. L. (1999). Classification images for detection.
InvestOphthalmolVisSci,40:S572.
Ahumada, A. J. and Lovell, J. (1971). Stimulus features in signal detection. J
AcoustSoc Am.
Ahumada,A.J.andMarken,R.(1975). Timeandfrequencyanalysesofauditory
signaldetection. JAcoustSocAm,57(2):385–390.
Alexander,K.,Xie,W.,andDerlacki,D.(1994). Spatial-frequencycharacteristics
ofletteridentification. JOptSocAmA,11(9):2375–82.
Allman,J.,Miezin,F.M.,andMcGuinness,E.(1985).Stimulusspecificresponses
frombeyondtheclassicalreceptivefield: neurophysiologicalmechanismsfor
local-globalcomparisonsinvisualneurons. AnnuRevNeurosci,8:407–30.
142
Appelbaum, L. G., Wade, A. R., Vildavski, V. Y., Pettet, M. W., and Norcia,
A. M. (2006). Cue-invariant networks for figure and background processing
inhumanvisualcortex. JNeurosci,26(45):11695–708.
Bahill, A. T., Adler, D., and Stark, L. (1975). Most naturally occurring human
saccades have magnitudes of 15 degrees or less. Invest Ophthalmol Vis Sci,
14(6):468–9.
Barth,E.,Beard,B.L.,andAhumada,A.J.(1999). Nonlinearfeaturesinvernier
acuity. Proc.SPIE.
Beard,B.L.andAhumada,A.J.(1999). Detection infixedandrandomnoisein
fovealandparafovealvisionexplainedbytemplatelearning. JOptSocAmA,
16(3):755–63. 1084-7529(Print) JournalArticle.
Ben-Shahar, O. and Zucker, S. (2004). Geometrical computations explain pro-
jection patterns of long-range horizontal connections in visual cortex. Neural
computation,16(3):445–76.
Berkley, M. A., Kitterle, F., and Watkins, D. W. (1975). Grating visibility as a
functionoforientation andretinaleccentricity. VisionRes,15(2):239–44.
Biederman, I. (1987). Recognition-by-components: a theory of human image
understanding. PsycholRev,94(2):115–147.
Blake, A., Bulthoff, H., and Sheinberg, D. (1993). Shape from texture: ideal
observersandhumanpsychophysics. VisionRes,33(12):1723–37.
Blake,R.,Tadin,D.,Sobel,K.V.,Raissian,T.A.,andChong,S.C.(2006).Strength
of early visual adaptation depends on visual awareness. Proc Natl Acad Sci
USA,103(12):4783–8.
Blakemore, C. and Campbell, F. W. (1969). On the existence of neurones in the
humanvisualsystemselectivelysensitivetotheorientationandsizeofretinal
images. JPhysiol,203(1):237–60.
Blasdel,G.G.andCampbell,D.(2001). Functionalretinotopyofmonkeyvisual
cortex. JNeurosci,21(20):8286–301.
Bosking, W. H., Zhang, Y., Schofield, B., and Fitzpatrick, D. (1997). Orientation
selectivityandthearrangementofhorizontalconnectionsintreeshrewstriate
cortex. JNeurosci,17(6):2112–27.
Bouma, H. (1970). Interaction effects in parafoveal letter recognition. Nature,
226(5241):177–8. 0028-0836(Print)JournalArticle.
143
Bouma,H.(1973).Visualinterferenceintheparafovealrecognitionofinitialand
finallettersofwords. VisionRes,13(4):767–82.
Brainard,D.(1997). Thepsychophysicstoolbox. SpatVis,10(4):433–6. 0169-1015
(Print)JournalArticle.
Burgess,A.,Wagner,R.,Jennings,R.,andBarlow,H.(1981).Efficiencyofhuman
visualsignaldiscrimination. Science,214(4516):93–4.
Campbell, F. W. and Robson, J. (1968). Application of fourier analysis to the
visibilityofgratings.JPhysiol,197(3):551–66.0022-3751(Print)JournalArticle.
Campbell,F.W.andWurtz,R.H.(1978). Saccadicomission: whywedonotsee
agrey-outduringasaccadiceyemovement. VisionRes,18(10):1297–303.
Carandini, M., Heeger, D. J., and Movshon, J. A. (1997). Linearity and nor-
malization in simple cells of the macaque primary visual cortex. J Neurosci,
17(21):8621–44. 0270-6474(Print)JournalArticle.
Castet,E.,Jeanjean,S.,andMasson,G.S.(2002). Motion perceptionofsaccade-
inducedretinaltranslation. ProcNatlAcadSciUSA,99(23):15159–63.
Castet, E. and Masson, G. S. (2000). Motion perception during saccadic eye
movements. NatNeurosci,3(2):177–83.
Chelazzi, L., Miller, E. K., Duncan, J., and Desimone, R. (2001). Responses of
neuronsinmacaqueareav4duringmemory-guidedvisualsearch. CerebCor-
tex,11(8):761–72. 1047-3211(Print)JournalArticle.
Chung, S. T. L. (2007). Learning to identify crowded letters: does it improve
readingspeed? VisionRes,47(25):3150–9.
Chung, S. T. L., Legge, G. E., and Tjan, B. S. (2002). Spatial-frequency charac-
teristics of letter identification in central and peripheral vision. Vision Res,
42(18):2137–152.
Chung, S. T. L., Levi, D. M., and Legge, G. E. (2001). Spatial-frequency and
contrastpropertiesofcrowding. VisionRes,41(14):1833–50.
Chung,S.T.L.andTjan,B.S.(2007).Shiftinspatialscaleinidentifyingcrowded
letters. VisionRes,47(4):437–51.
Curcio, C. A., Sloan, K. R., Packer, O., Hendrickson, A. E., and Kalina, R. E.
(1987). Distribution of cones in human and monkey retina: individual vari-
abilityandradialasymmetry. Science,236(4801):579–82.
144
deBoer,E.anddeJongh,H.R.(1978). Oncochlearencoding: potentialitiesand
limitations of the reverse-correlation technique. J Acoust Soc Am, 63(1):115–
135.
deBoer,R.andKuyper,P.(1968). Triggeredcorrelation. IEEETransBiomedEng,
15(3):169–179.
Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual
attention. Annu Rev Neurosci, 18:193–222. 0147-006X (Print) Journal Article
Review.
Deubel, H. and Schneider, W. X. (1996). Saccade target selection and object
recognition: evidence for a common attentional mechanism. Vision Res,
36(12):1827–37.
Diamond, M. R., Ross, J., and Morrone, M. C. (2000). Extraretinal control of
saccadicsuppression. JNeurosci,20(9):3449–55.
Duda,R.O.andHart,P.E.(1973). Patternclassificationandsceneanalysis.
Eckstein, M. P., Ahumada, A. J., and Watson, A. B. (1997). Visual signal detec-
tioninstructuredbackgrounds.ii.effectsofcontrastgaincontrol,background
variations,andwhitenoise. JOptSocAmA,14(9):2406–2419.
Eckstein,M.P.,Shimozaki,S.S.,andAbbey,C.K.(2002).Thefootprintsofvisual
attention in the posner cueing paradigm revealed by classification images.
JOV,2(1):25–45.
Efron,B.andTibshirani,R.(1993). Anintroductiontothebootstrap,volume57.
Ernst,M.andBanks,M.S.(2002). Humansintegratevisualandhapticinforma-
tioninastatisticallyoptimalfashion. Nature,415(6870):429–33.
Farzin, F., Rivera, S. M., and Whitney, D. (2009). Holistic crowding of mooney
faces. JOV,9(6):18.1–15.
Field, D. J. (1987). Relations between the statistics of natural images and the
responsepropertiesofcorticalcells. JOptSocAmA,4(12):2379–94.
Field,D.J.,Hayes,A.,andHess,R.F.(1993). Contourintegrationbythehuman
visualsystem: evidenceforalocal”associationfield”.VisionRes,33(2):173–93.
Flom, M. C. (1991). Contour interaction and the crowding effect. Problems in
Optometry,3:237–257.
145
Flom, M. C., Weymouth, F., and Kahneman, D. (1963). Visual resolution and
contourinteraction. JOptSocAm,53:1026–32. 0030-3941(Print) Journal Arti-
cle.
Foley, J. M. and Legge, G. E. (1981). Contrast detection and near-threshold dis-
criminationinhumanvision. VisionRes,21(7):1041–1053.
Garc´ ıa-P´ erez, M. A. and Peli, E. (2001). Intrasaccadic perception. J Neurosci,
21(18):7313–22.
Geisler, W. S. (2008). Visual perception and the statistical properties of natural
scenes. Annual reviewofpsychology,59:167–92. zazoo.
Geisler, W. S. and Perry, J. S. (2002). Real-time simulation of arbitrary visual
fields. ETRA ’02: Proceedings of the 2002 symposium on Eye tracking research &
applications.
Geisler, W. S., Perry, J. S., Super, B. J., and Gallogly, D. P. (2001). Edge co-
occurrence in natural imagespredicts contour grouping performance. Vision
Res,41(6):711–24.
Gilbert, C. D., Ito, M., Kapadia, M. K., and Westheimer, G. (2000). Interactions
between attention, context and learning in primary visual cortex. Vision Res,
40(10-12):1217–26.
Gold, J., Bennett, P. J., and Sekuler, A. B. (1999). Identification of band-pass fil-
teredlettersandfacesbyhumanandidealobservers. VisionRes,39(21):3537–
60.
Gold, J., Murray, R. F., Bennett, P. J., and Sekuler, A. B. (2000). Deriving
behavioural receptive fields for visually completed contours. Curr Biol,
10(11):663–6. 0960-9822(Print) JournalArticle.
Gosselin, F. and Schyns, P. (2003). Superstitious perceptions reveal properties
ofinternalrepresentations. PsycholSci,14(5):505–9. 0956-7976(Print) Journal
Article.
Graham, N., Robson, J., and Nachmias, J. (1978). Grating summation in fovea
andperiphery. VisionRes,18(7):815–25.
Green, D. M. (1964). Signal detection and recognition by human observers—
Contemporary readings, chapter Psychoacoustics and detection theory, pages
pp.58–91.
Green,D.M.andSwets,J.A.(1966). Signaldetectiontheoryandpsychophysics.
146
Greenwood, J., Bex, P., and Dakin, S. C. (2009). Positional averaging explains
crowdingwithletter-like stimuli. ProcNatlAcadSciUSA.
Harris, J. and Parker, A. (1992). Efficiency of stereopsis in random-dot stere-
ograms. JOptSocAmA,9(1):14–24.
He, S., Cavanagh, P., and Intriligator, J. (1996). Attentional resolution and the
locusofvisualawareness. Nature,383(6598):334–7.
Heeger, D. J. (1992). Normalization of cell responses in cat striate cortex. Vis
Neurosci,9(2):181–97. 0952-5238(Print) JournalArticle.
Hess,R.F.andField,D.J.(1993). Istheincreasedspatialuncertaintyinthenor-
malperipheryduetospatialundersamplingoruncalibrateddisarray? Vision
Res,33(18):2663–70. 0042-6989(Print)JournalArticle.
Hess, R. F. and McCarthy, J. (1994). Topological disorder in peripheral vision.
VisNeurosci,11(5):1033–6. 0952-5238(Print)JournalArticle.
Hillis, J., Watt, S., Landy, M., and Banks, M. S. (2004). Slant from texture and
disparitycues: optimalcuecombination. JOV,4(12):967–92.
Hubel, D. H. and Wiesel, T. N. (1965). Receptive fields and functional archi-
tecture in two nonstriate visual areas (18 and 19) of the cat. J Neurophysiol,
28:229–89. 0022-3077(Print) JournalArticle.
Intriligator,J.andCavanagh,P.(2001). Thespatialresolutionofvisualattention.
CognitPsychol,43(3):171–216. 0010-0285(Print)JournalArticle.
Ito, M. and Gilbert, C. D. (1999). Attention modulates contextual influences in
theprimaryvisualcortexofalertmonkeys. Neuron,22(3):593–604.
Jones, J. P. and Palmer, L. A. (1987). The two-dimensional spatial structure of
simplereceptivefieldsincatstriatecortex. JNeurophysiol,58(6):1187–1211.
Karklin, Y. and Lewicki, M. S. (2009). Emergence of complex cell properties by
learningtogeneralizeinnaturalscenes. Nature,457(7225):83–6.
Kooi, F., Toet, A., Tripathy, S., and Levi, D. M. (1994). The effect of similarity
anddurationonspatialinteractioninperipheralvision. SpatVis,8(2):255–79.
0169-1015(Print)JournalArticle.
Korte, W.(1923).
¨
Uberdiegestaltauffassung imindirektensehen. Zeitschriftf¨ ur
Psychologie,93:17–82.
147
Krumhansl, C. L. and Thomas, E. A. (1977). Effect of level of confusability on
reporting letters from briefly presented visual displays. Percept Psychophys,
21:269–279.
Landy, M. and Kojima, H. (2001). Ideal cue combination for localizing texture-
definededges. JOptSocAmA,18(9):2307–20.
Larsson, J. and Heeger, D. J. (2006). Two retinotopic visual areas in human
lateraloccipitalcortex. JNeurosci,26(51):13128–42.
Leat, S., Li, W., and Epp, K. (1999). Crowding in central and eccentric vision:
the effects of contour interaction and attention. Invest Ophthalmol Vis Sci,
40(2):504–12. 0146-0404(Print) JournalArticle.
Lebedev,S.,Gelder,P. V.,andTsui, W.H.(1996). Square-root relationsbetween
mainsaccadicparameters. InvestOphthalmolVisSci,37(13):2750–8.
Legge, G. E. and Foley, J. M. (1980). Contrast masking in human vision. J Opt
SocAm,70(12):1458–71. 0030-3941(Print) JournalArticle.
Legge, G. E., Kersten, D., and Burgess, A. (1987). Contrast discrimination in
noise. JOptSocAmA,4(2):391–404.
Levi, D. M. (2008). Crowding–an essential bottleneck for object recognition: a
mini-review. VisionRes,48(5):635–54.
Levi, D. M. and Carney, T. (2009). Crowding in peripheral vision: Why bigger
isbetter. CurrBiol.
Levi, D. M., Hariharan, S., and Klein, S. (2002). Suppressive and facilitatory
spatial interactions in peripheral vision: peripheral crowding is neither size
invariantnorsimplecontrastmasking. JOV,2(2):167–77.
Levi, D. M. and Klein, S. (1986). Sampling in spatial vision. Nature,
320(6060):360–2. 0028-0836(Print)JournalArticle.
Levi, D. M. and Klein, S. (1996). Limitations on position coding imposed by
undersamplingandunivariance. VisionRes,36(14):2111–20. 0042-6989(Print)
JournalArticle.
Levi,D.M.,Klein,S.,andYap,Y.(1987).Positionaluncertaintyinperipheraland
amblyopicvision. VisionRes,27(4):581–97. 0042-6989(Print)JournalArticle.
Livingstone, M. S. and Hubel, D. H. (1987). Psychophysical evidence for sep-
arate channels for the perception of form, color, movement, and depth. J
Neurosci,7(11):3416–68.
148
Luck,S.,Chelazzi,L.,Hillyard,S.,andDesimone,R.(1997).Neuralmechanisms
ofspatialselectiveattention inareasv1,v2,andv4of macaquevisual cortex.
JNeurophysiol,77(1):24–42. 0022-3077(Print)JournalArticle.
Majaj, N. J., Pelli, D. G., Kurshan, P., and Palomares, M. C. (2002). The role of
spatialfrequencychannelsinletteridentification. VisionRes,42(9):1165–84.
Manjeshwar,R.M.andWilson,D.L.(2001). Hyperefficient detection oftargets
innoisyimages. JOptSocAmA,18(3):507–513.
Marr, D.(1982). Vision: a computational investigationintothe humanrepresentation
andprocessingofvisualinformation.
Matin, E., Clymer, A. B., and Matin, L. (1972). Metacontrast and saccadic sup-
pression. Science,178(57):179–82.
Meinhardt, G. (2001). Learning a grating discrimination task broadens human
spatialfrequencytuning. BiolCybern,84(5):383–400. 0340-1200(Print)Journal
Article.
Meinhardt,G.,Persike,M.,Mesenholl,B.,andHagemann,C.(2006).Cuecombi-
nationinacombinedfeaturecontrastdetectionandfigureidentificationtask.
VisionRes,46(23):3977–93.
Moran, J. and Desimone, R. (1985). Selective attention gates visual processing
in the extrastriate cortex. Science, 229(4715):782–4. 0036-8075 (Print) Journal
Article.
Motter, B.C.(2002). Crowdingandobject integration within thereceptive field
ofv4neurons. JOV,2(7):274–274.
Motter, B. C. (2009). Central v4 receptive fields are scaled by the v1 cortical
magnificationandcorrespondtoaconstant-sized samplingofthev1surface.
JNeurosci,29(18):5749–57.
Motter, B. C. and Simoni, D. A. (2007). The roles of cortical image separation
andsizeinactivevisualsearchperformance. JOV,7(2):6.1–15.
Movshon, J. A., Thompson, I. D., and Tolhurst, D. J. (1978). Receptive field
organizationofcomplexcellsinthecat’sstriatecortex. JPhysiol,283:79–99.
Murray, R. F., Bennett, P. J., and Sekuler, A. B. (2002). Optimal methods for
calculatingclassification images: weightedsums. JOV,2(1):79–104.
Nachmias,J.andSansbury,R.V.(1974). Letter: Gratingcontrast: discrimination
maybebetterthandetection. VisionRes,14(10):1039–1042.
149
Nandy,A.S.andTjan,B.S.(2007). Thenatureoflettercrowdingasrevealedby
first-andsecond-orderclassification images. JOV,7(2):5.1–26.
Nandy, A. S. and Tjan, B. S. (2008). Efficient integration across spatial frequen-
ciesforletteridentificationinfovealandperipheralvision. JOV,8(13):3.1–20.
Nasanen, R. and O’Leary, C. (1998). Recognition of band-pass filtered hand-
writtennumeralsinfovealandperipheralvision. VisionRes,38(23):3691–701.
0042-6989(Print)JournalArticle.
Neri,P.(2004). Estimationofnonlinearpsychophysicalkernels. JOV,4(2):82–91.
Neri,P.andHeeger,D.J.(2002). Spatiotemporal mechanismsfordetectingand
identifyingimagefeaturesinhumanvision. NatNeurosci,5(8):812–816.
Neri, P., Parker, A., and Blakemore, C. (1999). Probing the human stereoscopic
system with reverse correlation. Nature, 401(6754):695–8. 0028-0836 (Print)
JournalArticle.
Nolte, L.andJaarsma, D. (1967). More on the detection of oneof m orthogonal
signals. TheJournal oftheAcousticalSocietyofAmerica.
Olshausen, B. A. and Field, D. J. (1996). Emergence of simple-cell recep-
tive field properties by learning a sparse code for natural images. Nature,
381(6583):607–9.
Osterberg, G. (1935). Topography of the layer of rods and cones in the human
retina. ActaOphthalmologica,Supplement,6:1–103.
Papoulis,A.(1990). Probability&statistics.
Parish, D. and Sperling, G. (1991). Object spatial frequencies, retinal spatial
frequencies,noise,andtheefficiencyofletterdiscrimination. VisionRes,31(7-
8):1399–415.
Parkes,L.,Lund,J.,Angelucci,A.,Solomon,J.A.,andMorgan,M.(2001). Com-
pulsoryaveraging of crowded orientation signals inhuman vision. NatNeu-
rosci,4(7):739–44. 1097-6256(Print)JournalArticle.
Peli,E.(1990). Contrastincompleximages. JOptSocAmA,7(10):2032–40.
Pelli, D. G. (1985). Uncertainty explains many aspects of visual contrast detec-
tionanddiscrimination. JOptSocAmA,2(9):1508–32.
Pelli, D. G. (1997). The videotoolbox software for visual psychophysics: trans-
forming numbers into movies. Spat Vis, 10(4):437–42. 0169-1015(Print) Jour-
nalArticle.
150
Pelli,D.G.(2008). Crowding: acorticalconstraintonobjectrecognition. Current
OpinioninNeurobiology.
Pelli, D. G., Burns, C., Farell, B., and Moore-Page, D. (2006). Feature detection
andletteridentification. VisionRes,46(28):4646–74.
Pelli, D. G., Palomares, M. C., and Majaj, N. J. (2004). Crowding is unlike
ordinary masking: distinguishing feature integration from detection. JOV,
4(12):1136–69.
Pelli, D. G. and Tillman, K. A. (2008). The uncrowded window of object recog-
nition. NatNeurosci,11(10):1129–35.
Pelli,D.G.andZhang,L.(1991). Accuratecontrolofcontrastonmicrocomputer
displays. VisionRes,31(7-8):1337–50. 0042-6989(Print) JournalArticle.
Peterson, W., Birdsall, T., andFox, W.(1954). The theory of signal detectability.
InformationTheory.
Petrov,Y.andMcKee,S.P.(2006).Theeffectofspatialconfigurationonsurround
suppression of contrast sensitivity. JOV, 6(3):224–38. 1534-7362 (Electronic)
JournalArticle.
Petrov, Y., Popple, A.V., andMcKee, S.P. (2007). Crowding andsurround sup-
pression: nottobeconfused. JOV,7(2):12.1–9.
Pi˜ non, M. C., Gattass, R., and Sousa, A. P. (1998). Area v4 in cebus monkey:
extentandvisuotopicorganization. CerebCortex,8(8):685–701.
Quick, R., Mullins, W., and Reichert, T. (1978). Spatial summation effects on
two-component grating thresholds. J Opt Soc Am, 68(1):116–24. 0030-3941
(Print)JournalArticle.
Reynolds,J.H.,Chelazzi,L.,andDesimone,R.(1999).Competitivemechanisms
subserveattentioninmacaqueareasv2andv4.JNeurosci,19(5):1736–53.0270-
6474(Print)JournalArticle.
Rosa, M. G., Gattass, R., and J´ unior, M. F. (1988). Complete pattern of ocular
dominance stripes in v1 of a new world monkey, cebus apella. Experimental
brainresearchExperimentelleHirnforschungExp´ erimentationc´ er´ ebrale,72(3):645–
8.
Ruderman,D.and Bialek,W.(1994). Statistics of naturalimages: Scalingin the
woods. PhysRevLett,73(6):814–817.
151
Rust, N. C., Schwartz, O., Movshon, J. A., and Simoncelli, E. P. (2004). Spike-
triggeredcharacterizationofexcitatoryandsuppressivestimulusdimensions
inmonkeyv1. Neurocomputing,58-60:793–799.
Rust, N. C., Schwartz, O., Movshon, J. A., and Simoncelli, E. P. (2005). Spa-
tiotemporalelementsofmacaquev1receptivefields. Neuron,46(6):945–56.
Saleem, K. S., Tanaka, K., and Rockland, K. S. (1993). Specific and columnar
projection from area teo to te in the macaque inferotemporal cortex. Cereb
Cortex,3(5):454–464.
Shepherd,M.,Findlay,J.M.,andHockey,R.J.(1986). Therelationshipbetween
eye movements and spatial attention. The Quart. J. of Expt. Psych., 38(3):475–
91.
Shimozaki, S. S., Eckstein, M. P., and Abbey, C. K. (2002). Stimulus informa-
tion contaminates summation tests of independent neural representations of
features. JOV,2(5):354–70.
Shimozaki,S.S.,Eckstein,M.P.,andAbbey,C.K.(2005). Spatialprofilesoflocal
and nonlocal effects upon contrast detection/discrimination from classifica-
tionimages. JOV,5(1):45–57. 1534-7362(Electronic) JournalArticle.
Sigman, M., Cecchi, G. A., Gilbert, C. D., and Magnasco, M. O. (2001). On
a common circle: natural scenes and gestalt rules. Proc Natl Acad Sci USA,
98(4):1935–40.
Solomon, J. A. (2002). Noise reveals visual mechanisms of detection and dis-
crimination. JOV,2(1):105–20. 1534-7362(Electronic) JournalArticle.
Solomon, J.A.andPelli,D.G.(1994). Thevisualfiltermediatingletteridentifi-
cation. Nature,369(6479):395–7.
Stettler, D. D., Das, A., Bennett, J., and Gilbert, C. D. (2002). Lateral connec-
tivity and contextual interactions in macaque primary visual cortex. Neuron,
36(4):739–50.
Steveninck, R. and Bialek, W. (1988). Real-time performance of a movement-
sensitive neuron in the blowfly visual system: coding and .... Proc R Soc
Lond,B,BiolSci.
Strasburger,H.(2005).Unfocusedspatialattentionunderliesthecrowdingeffect
inindirectformvision. JOV,5(11):1024–37.
152
Strasburger, H., Harvey, L., and Rentschler, I. (1991). Contrast thresholds for
identification of numericcharacters in direct and eccentric view. PerceptPsy-
chophys,49(6):495–508. 0031-5117(Print)JournalArticle.
Stuart, J.andBurian,H.(1962). Astudyofseparationdifficulty. itsrelationship
to visual acuity in normal and amblyopic eyes. Am J Ophthalmol, 53:471–7.
0002-9394(Print)JournalArticle.
Tanner, W.(1961). Physiological implications of psychophysical data. Ann N Y
AcadSci,89:752–765.
Tanner, W. and Swets, J. A. (1954). The human use of information–i: Signal
detectionforthecaseofthesignalknownexactly. InformationTheory.
Thomas,J.(1985).Effectofstatic-noiseandgratingmasksondetectionandiden-
tification of grating targets. J Opt Soc Am A, 2(9):1586–92. 0740-3232 (Print)
JournalArticle.
Thomas,J.andKnoblauch,K.(2005). Frequencyandphasecontributions tothe
detectionoftemporalluminancemodulation. JOptSocAmA,22(10):2257–61.
1084-7529(Print)JournalArticle.
Thomas,J.andOlzak,L.(2001).Spatialphasesensitivityofmechanismsmediat-
ingdiscriminationofsmallorientationdifferences.JOptSocAmA,18(9):2197–
203.
Tjan,B.S.(2002). Thehandbookofbraintheoryandneuralnetworks,chapterObject
recognition.
Tjan, B. S., Braje, W. L., Legge, G. E., and Kersten, D. (1995). Human efficiency
forrecognizing3-dobjectsinluminancenoise. VisionRes,35(21):3053–69.
Tjan, B. S. and Legge, G. E. (1998). The viewpoint complexity of an object-
recognition task. VisionRes,38(15-16):2335–2350.
Tjan,B.S.andNandy,A.S.(2006). Classificationimageswithuncertainty. JOV,
6(4):387–413. 1534-7362(Electronic) JournalArticle.
Toet,A.andLevi,D.M.(1992).Thetwo-dimensionalshapeofspatialinteraction
zonesintheparafovea. VisionRes,32(7):1349–57.
Tootell, R. B. H., Silverman, M. S., Switkes, E., and Valois, R. L. D. (1982).
Deoxyglucose analysis of retinotopic organization in primate striate cortex.
Science,218(4575):902–4.
153
Tootell,R.B.H.,Switkes,E.,Silverman,M.S.,andHamilton,S.L.(1988). Func-
tional anatomy of macaque striate cortex. ii. retinotopic organization. J Neu-
rosci,8(5):1531–68.
Touryan,J.,Lau,B.,andDan,Y.(2002). Isolationofrelevantvisualfeaturesfrom
randomstimuliforcortical complexcells. JNeurosci,22(24):10811–10818.
Townsend, J., Taylor, S., and Brown, D. (1971). Lateral masking for letters with
unlimitedviewingtime. PerceptPsychophys.
Treisman, A.andSchmidt, H.(1982). Illusoryconjunctions inthe perception of
objects. CognitPsychol,14(1):107–41. 0010-0285(Print)JournalArticle.
Tripathy,S.andCavanagh,P.(2002).Theextentofcrowdinginperipheralvision
does not scale with target size. Vision Res, 42(20):2357–69. 0042-6989 (Print)
JournalArticle.
Tyler, C. W. and Chen, C. C. (2000). Signal detection theory in the 2afc
paradigm: attention, channel uncertainty and probability summation. Vision
Res,40(22):3121–3144.
van den Berg, R., Roerdink, J. B. T. M., and Cornelissen, F. W. (2010). A neu-
rophysiologically plausible population code model for feature integration
explainsvisualcrowding. PLoSComputBiol,6(1):e1000646.
Verghese, P. and McKee, S. P. (2002). Predicting future motion. JOV, 2(5):413–
423.
Wallace, J. and Mamassian, P. (2004). The efficiency of depth discrimina-
tion for non-transparent and transparent stereoscopic surfaces. Vision Res,
44(19):2253–67.
Watson, A. B. and Pelli, D. G. (1983). Quest: a bayesian adaptive psychometric
method. PerceptPsychophys,33(2):113–20.
Watson, T. and Krekelberg, B. (2009). The relationship between saccadic sup-
pressionandperceptualstability. Curr Biol.
Wilson, J. R. and Sherman, S. M. (1976). Receptive-field characteristics of neu-
ronsincatstriatecortex: Changeswithvisualfieldeccentricity. JNeurophysiol,
39(3):512–33.
Wolford, G. (1975). Perturbation model for letter identification. Psychol Rev,
82(3):184–99. 0033-295X(Print)JournalArticle.
154
Wurtz, R. H. (2008). Neuronal mechanisms of visual stability. Vision Res,
48(20):2070–89.
Yu, H., Farley, B. J., Jin, D. Z., and Sur, M. (2005). The coordinated mapping of
visualspaceandresponsefeaturesinvisualcortex. Neuron,47(2):267–80.
155
AppendixA
SignalClamping
We first consider an ideal observer for identifying known patterns in additive
Gaussian noise (Tjanetal., 1995; TjanandLegge, 1998). An ideal observer is
a theoretically optimal decision mechanism for a given task and its stimuli.
Strictlyspeaking,anidealobserverisnotamodelofanyactualobserver. Itsfor-
mulation is completely determined by the given task and its stimuli. An ideal
observerestablishestheupperboundofthelevelofperformanceachievableby
any observer, biological or otherwise, and often provides a good starting point
formodelinghumanobservers.
A typical task used in a classification-image experiment is to discriminate
betweentwopatternsembeddedinadditiveGaussianwhitenoise. Adetection
task is a special case of this, where one of the patterns is a blank (noise only)
display. Foreachofthetwopatterns,theremaybeoneormoreinstances. Con-
siderforexampleatasktoidentifyifthenoisystimuluscontainstheletter“O”
or“X”.Asingle-instanceversionofthistaskisonewherethereisonlyonever-
sion of “X” and one version of “O”. For a single-instance task, the signal for
each response is known exactly, and there is no stimulus uncertainty. Stimu-
lusuncertaintyisintroducedwhendifferentimagepatternsaretobeassociated
156
withthesameresponse—forexample,inamultiple-instanceversionofthetask,
thelettersmayappearindifferentfonts,sizes,orpositions.
LetT
r,j
bethej
th
versionofanoise-freecontrastpatternwitharesponselabel
r. (Unlessthecontextsuggestsotherwise,wegenerallypresenta2-Dpatternas
acolumnvectorbyconcatenatingallcolumnsofanimageintoasinglecolumn.)
LetN
σ
beasampleofaGaussianwhitenoise(amultinomialnormaldistribution
of zero mean and diagonal covariance σ
2
I). A noisy stimulus with a signal
contrastofcis
I =cS +N
σ
, S∈{T
r,j
} (A.1)
The general form of the ideal observer for identifying the embedded pat-
tern in I with maximum accuracy is to select the response label r that maxi-
mizes the posterior probability (DudaandHart, 1973; GreenandSwets, 1966;
Petersonetal.,1954). Thatis,
r = argmax
r
P(r|I)
= argmax
r
X
j
P(r,j|I)
(A.2)
The summation over j (marginalization) in the second expression follows
strictlyfrom probabilitytheorybecausetheoccurrence ofthedifferentversions
ofapatternismutuallyexclusiveinasinglepresentation.
Assumingthatallpatternsareequallylikelytooccur,byapplyingtheBayes
theorem and the probability density function (p.d.f.) of a normal distribution
157
and by collecting into a constant the terms that do not vary with either r or j,
wehave:
P(r|I) =
X
j
P(r,j|I)
=
M
X
j=1
P(I|T
r,j
)P(T
r,j
)
P(I)
=k
M
X
j=1
P(I|T
r,j
)
=k
′
M
X
j=1
exp
−
kI−cT
r,j
k
2
2σ
2
!
=k
′′
M
X
j=1
exp
2I
T
cT
r,j
−c
2
T
T
r,j
T
r,j
2σ
2
!
(A.3)
whereM isthenumberofdistinctpatternswiththesameresponselabel,theks
areconstants,andthesuperscriptT denotesmatrixtranspose. WenotethatI
T
I
doesnotvarywitheitherr orj andhasthereforebeentreatedasaconstant.
Equation A.3 provides us with the optimal decision rule for pattern identi-
fication with or without stimulus uncertainty. The optimal decision rule is to
choosetheresponser thatmaximizesaunivariatedecisionvariableλ(r):
λ(r) =
M
X
j=1
exp
2I
T
cT
r,j
−c
2
T
T
r,j
T
r,j
2σ
2
!
(A.4)
An appendix in TjanandLegge (1998) provides a computationally efficient
way of implementing this decision mechanism when the stimulus uncertainty
(M)islarge(inthetensofthousands).
158
Two special cases of the optimal decision rule are noteworthy. For a task
whereallsignalpatternshavethesamecontrastenergy,thedotproductT
T
r,j
T
r,j
isapositiveconstantandcanberemovedfromthedecisionrule:
λ(r) =
M
X
j=1
exp
I
T
cT
r,j
2σ
2
(A.5)
For a task with equal-energy signals and no stimulus uncertainty (M = 1),
the optimal decision rule can be further reduced to that of a linear correlator
by taking advantage of the fact that the exponential function is monotonically
increasingandbyremovingallconstantterms:
λ(r) =I
T
T
r
(A.6)
What we have shown is that the popular linear observer model (Equa-
tion A.6), which makes a decision by linearly correlating the input with a tem-
plate,istheoptimaldecisionmechanismwhenthereisnostimulusuncertainty
andwhenthestimulusnoiseiswhite,awell-knownresultthatisworthreiterat-
ing. Tomaintain optimality undertheseconditions, itisnotnecessary toknow
eitherthesignalcontrast(c)orthenoisevariance(σ
2
). Theassumptionofalin-
earobserver isthesingularly most important assumption forthe classification-
image method (Ahumada, 2002; Murrayetal., 2002). Also evident from our
derivation is the reason whyuncertainty presents a significant challenge to the
classification-imagemethod—becausetherearenoapparentmeansofapprox-
imating the optimal decision variable of Equation A.5 to something similar to
EquationA.6.
159
Theuncertaintymodel
When stimulus uncertainty is due to the task (multiple input patterns are to
be associated to the same response per task requirement), such uncertainty is
often referred toas“extrinsic uncertainty” becauseitisexternaltoanobserver.
Withextrinsicuncertainty,EquationA.4orA.5istheoptimaldecisionruleifthe
equalcontrast energycondition ismet. Extrinsic uncertaintyiscontrasted with
“intrinsicuncertainty”,whichreferstotheuncertaintyassumedbytheobserver.
For example, in a letter identification task where there is only one instance of
“X” and one instance of “O”, observers may still insist on considering differ-
ent versions of the letters during a trial either because they lack the precision
for encoding certain attributes of the instances (e.g., the exact stimulus size or
position) orbecausetheyaremisinformed aboutthe task. Withintrinsicuncer-
tainty, Equation A.4 or A.5 becomes an ideal-observer model of the observer.
When M in Equation A.4 or A.5 is greater than 1, the decision rule would
be suboptimal for the task, which has no uncertainty, but it is optimal for the
observer with the explicit limitation that the observer had assumed that there
was uncertainty in the task. Tanner (1961) pointed out that if an observer did
not know the signal exactly and had to consider a number of possibilities, the
observer, which could be otherwise ideal, would have a steeper psychometric
function compared with that of an ideal observer. Early studies in audition (cf.
Green, 1964) and vision (e.g., FoleyandLegge, 1981; NachmiasandSansbury,
1974; 3rdStromeyerandKlein, 1974; TannerandSwets, 1954) found that when
a subject was asked to detect a faint but precisely defined signal, the resulting
psychometric function had a slope consistent with the presence of a significant
intrinsicuncertainty.
160
Inaseminalpaper,Pelli(1985)madethecasethatintrinsicuncertaintycould
account for a large range of psychophysical data related to contrast detection
and discrimination. Pelli demonstrated that a simple model of intrinsic uncer-
tainty,whichwasalreadyquitepopularatthetimeofhiswritingbutwithprop-
ertiesnotwellunderstood, provided anexcellentfittopsychophysical datafor
contrast detection and discrimination in many different conditions. In a nut-
shell, the uncertainty model makes a decision based on a decision variable of
theform:
λ(r) = max
j
I
T
T
r,j
(A.7)
The model essentially says that the observer selects a response associated
with the “loudest” channel. With hindsight, it is not difficult to see why Equa-
tion A.7 is a reasonable approximation to the optimal decision rule (Equa-
tionA.5):
λ(r) =
M
X
j=1
exp
I
T
cT
r,j
σ
2
≈ max
j∈[1,M]
exp
I
T
cT
r,j
σ
2
↔ max
j∈[1,M]
exp
I
T
cT
r,j
σ
2
↔ max
j∈[1,M]
exp
I
T
cT
r,j
(A.8)
We use ↔ to indicate that two functions are monotonically related such that
replacing one with the other does not affect the rank order of the values of the
function. TheonlyapproximationinEquationA.8isthereplacementofthesum
ofasetofexponentialsbythelargestvaluefromtheset(Equations12and13in
NolteandJaarsma, 1967). Thisapproximation is reasonable if the largest value
161
isverylargerelativetotheothervaluestobesummed,asisoftenthecasewith
anexponentialfunction.
The uncertainty model (Equation A.7) isthe keytheoretical foundation that
led to our proposed method for obtaining a classification image in the face of
uncertainty. The results of Pelli (1985) showing the general validity of this
model to a large set of empirical data and the rather ubiquitous applications
ofthemodelinvisualpsychophysicsjustified thisstartingpoint. Nevertheless,
we note that our approach does not depend on any subtle assumptions of the
uncertaintymodelbeyondEquationA.7andistheoreticallyrobust.
Isolatingachannelintheuncertaintymodelbyusingasignal
If there is no uncertainty and if the linear observer model (Equation A.6) is a
good approximation of an actual observer, then it has been established that
the classification-image method could uncover the observer templates T
r
(cf.
Ahumada, 2002). The same, however, could not be said when there is signifi-
cantextrinsicorintrinsicuncertainty.
An inherent property of the uncertainty model (Equation A.7) offers a way
to reduce or eliminate intrinsic uncertainty and thus reduces the uncertainty
model to a linear observer model. Because the channel with the highest
response drivesthe netoutput of the uncertainty model,the presenceof a rela-
tivelystrongsignalinthenoisystimuluswillbiasonechannelovertheothersin
termsofitscontributiontotheobserver’sresponse. Whentheobservermadean
incorrectresponsewhilethesignalispresent,weknowwithrelativecertaintythat
it was that channel that often responded maximally to the signal that was sup-
pressed. The linear kernel associated with this channel can then be recovered
usingtheconventionalclassification-imagetechnique.
162
We can illustrate this logic more precisely by combining Equation A.7 with
thedefinitionofthestimulus(EquationA.1):
λ(r) = max
j∈[1,M]
I
T
T
r,j
= max
j∈[1,M]
cS
T
T
r,j
+N
σ
T
r,j
≈
cS
T
T
r,z
+N
σ
T
r,z
=I
T
T
r,z
(A.9)
whereweletT
r,z
,z∈ [1,M]denotethechannelthathasthehighestresponsefor
signalS. Thelastlineofapproximationisjustifiedbecause(1)forequal-energy
signals,N
σ
T
r,z
isstatistically identicalforallchannelsj,and(2)thetermS
T
T
r,z
leadsone particular channel to have the highest response most of the time and
thustosingle-handedlydrivethedecisionvariableλ(r). Whatiscriticalforthis
approximation is that the response S
T
T
r,z
must be significantly larger than the
responses from the other channels. We refer to this requirement as the “signal-
clamping” requirement and the approximation in Equation A.9 as the signal-
clampingapproximation.
In short, we are using a fixed signal to hold on to a specific channel and a
varyingnoisetomapthelinearkernelofthechannel. Werefertothisapproach
as the signal-clamped classification-image method. Our logic is essentially the
same as the two-bar method by Movshonetal. (1978) for mapping the linear
component of the receptive field of a complex cell. We can think of a complex
cell as an observer with uncertainty in the phase of a grating and is approxi-
mately equivalent to a detector that does a max-response pooling from a large
setofdetectors,eachselectivetoaspecificphase. Withthisperspective,wecan
think of the two-bar method as using one bar to select a channel of a specific
163
phase, andthe otherbar, with varying positions relative tothe first bar, tomap
thereceptivefieldoftheselectedchannel.
Propertiesofsignal-clampedclassificationimages
Signal-clamped classification images have distinct properties that can be
exploited to estimate the amount of intrinsic uncertainty (or equivalently, the
degree of invariance) of an observer. We will illustrate these properties, first
analytically and then by simulation using an ideal-observer model (Equa-
tion A.4), for which we know the ground truth about the observer’s internal
templates and the amount of intrinsic uncertainty. We used an ideal-observer
model in the simulation instead of the uncertainty model (Equation A.7) to
show that the analytical properties of signal clamping derived from the uncer-
taintymodeldoesnotdependontheabsolutevalidityoftheuncertaintymodel.
Whether these properties are valid for a human observer is an empirical ques-
tion. The three human experiments in Chapter 2 will confirm that these ana-
lytical properties of signal-clamped classification images are indeed valid and
robust.
Contrastofsignal-clampedclassificationimages
For the rest of this appendix, we will consider a two-letter identification task
(“O” vs. “X”) and a single-letter detection task (detecting “O” against a noisy
background). We restrict the form of uncertainty to the uncertainty about the
location of the stimulus on the display. The templates(or channels) for a given
responseareshiftedversionsofoneanotherbutareotherwiseidentical;thatis,
T
r,j
= shift(T
r
,p
j
) (A.10)
164
whereT
r
istheposition-normalizedtemplateforresponser andp
j
isaposition
onthedisplay. OurgoalsaretorecoverT
r
andtherangeofp
j
. Possiblegeneral-
izations of the signal-clamping technique to other types of uncertainty beyond
thatofshiftinvariancewillbeaddressedintheGeneraldiscussionssection.
Aconventionalclassification imageisacompositionofasetofclassification
sub-images. Asub-imageCI
AB
istheaverageofallthenoisepatternsN
σ
(Equa-
tion A.1) from trials where the signal in the stimulus wasA and the observer’s
response was B. Consider the two-letter identification task (“O” vs. “X”). The
sub-image CI
OX
isthe average of the noise patternsN
OX
from trials where “O”
was in the stimulus but the observer responded “X” (we refer to this as an OX
trial). An “X” response implies that the internal decision variable for an “X”
response was greater than that for an “O” response; that is, λ(“X”) > λ(“O”).
Appealing to the uncertainty model (Equation A.7) and the composition of a
stimulus (Equation A.1) and letting X
j
= T
x,j
and O
j
= T
o,j
to improve read-
ability,wehave
λ(“X”)>λ(“O”)
⇔
max
j∈[1,M]
cO
T
X
j
+N
T
OX
X
j
> max
j∈[1,M]
cO
T
O
j
+N
T
OX
O
j
(A.11)
where O (without any subscript) is the “O” signal in the noisy stimulus pre-
sented to the observer. If there is no uncertainty (M = 1), Equation A.11
becomesthefamiliarformthatunderliestheconventionalclassificationimage:
cO
T
X
1
+N
T
OX
X
1
>cO
T
O
1
+N
T
OX
O
1
⇒N
T
OX
(X
1
−O
1
)> (O
T
O
1
−O
T
X
1
)c (A.12)
165
The right-hand side of the inequality is a positive number because a noiseless
“O”stimuluswillactivatethe“O”channel(O
1
)morethanthe“X”channel(X
1
);
thatis,O
T
O
1
> O
T
X
1
. Forthis inequalitytohold, theaverage noise pattern on
the left-hand side must have a positive correlation with the X template and a
negativecorrelation withtheO template. Ahumada(2002)showedanalytically
that
E[N
OX
]∝ (X
1
−O
1
) (A.13)
where E[·] denotes a mathematical expectation (see also AbbeyandEckstein,
2002; Murrayetal., 2002). The proportional constant is affected by the proba-
bility of an OX trial (stimulus “O”, response “X”), and the internal-to-external
noiseratio(ratiobetweenthevariancesofthenoiseinternaltoanobserverand
thatinthestimuli;e.g.,seeEquationA3inMurrayetal.,2002). CI
OX
approaches
E[N
OX
]asthenumberofOXtrials(N
OX
)approachesinfinity. Forafinitenumber
oftrials,thevarianceofCI
OX
israthercumbersomebecausetheprobabilityden-
sity of CI
OX
isa truncated version of the multidimensional Gaussian (N
σ
) used
to form the stimuli. Ahumada (2002) pointed out that the variance of CI
OX
is
upperboundedbythevarianceofthenon-truncateddistribution. Murrayetal.
(2002, Appendices A and F) further argued that the difference between the
upper bound and the actual variance is negligible for a typical classification-
image experimentwhere (1) the amountof the stimulus noise iscomparable to
the level of the observer’s internal noise, (2) the numberof independentimage
pixels (and hence the dimensionality of stimulus) is large, and (3) the accuracy
level is above 75%. All of the experiments discussed in Chapter 2 met these
threeconditions. Thus,CI
OX
canbeapproximatedas
CI
OX
≈ E[N
OX
]+
N
σ
√
n
OX
(A.14)
166
where N
σ
is a sample of white noise from the distribution used to form the
stimuli(EquationA.1).
Equations A.13 and A.14 show that in a conventional classification-image
experiment, where a great deal of effort is directed toward the elimination of
uncertainty in the experiment, each classification sub-image contains both a
positiveimageofonetemplateandanegativeimageofthealternativetemplate.
Inthecaseofanerrortrial,thenegativeimageisthetemplateforthepresented
signal,andthepositiveimageisthetemplateassociatedwiththeresponse.
Now consider a condition where there is no extrinsic uncertainty (i.e., the
XsandOswerealwayspresentedatthesamepositiononthedisplay)butwith
a significant amount of intrinsic uncertainty (M ≫ 1). Applying the signal
clampingapproximation(EquationA.9)totheright-handsideofEquationA.11,
wehave
λ(“X”)>λ(“O”)
⇔
max
j∈[1,M]
cO
T
X
j
+N
T
OX
X
j
>cO
T
O
z
+N
T
OX
O
z
(A.15)
Thesignal-clampingapproximation appliesonlytoλ(“O”)becausethe“O”
signal in the stimulus consistently biasesone particular “O” channel (O
z
in the
equation). There is no such trial-to-trial consistency among the “X” channels
because none of them are tuned to the “O” signal. Hence, the signal-clamping
approximation does not apply to λ(“X”). Following the logic of Ahumada
(2002),wecanshowthat(AppendixB)
E[N
OX
]∝ (E[X
j
]−O
z
) (A.16)
167
Furthermore, therelationshipbetweentheexpectedvalueofthenoise(E[N
OX
])
and the classification sub-image (CI
OX
) remains the same as stated in Equa-
tionA.14.
EquationA.16showsthattheaverageofthenoisepatternsoftheerrortrials
containsanegativeimageofexactlyoneofthemanytemplatesforthepresented
signal. Important for our purpose is that this negative image is not affected by
uncertainty and thus provides a good estimate of the unknown template. This
is due to the signal-clamping approximation applied to the right-hand side of
EquationA.11. Thatis,thepresenceofarelativelystrong“O”signalinthestim-
ulusbiasedthesignalresponsetopreciselyoneofthemany“O”channels;when
theobservermadeanerrorandresponded“X”,wearerelativelycertainthatthe
noisepattern suppressed theparticular“O”channel(O
z
inEquationsA.15and
A.16)thatwouldotherwiseberesponding.
Critically, the signal-clamping approximation is applicable only when the
signal contrast in the noisy stimulus is sufficiently strong. The disadvantage
of this requirement is that when the signal contrast is high, the number of the
error trials, which is more informative than the correct trials, will be low, and
the average of the noise patterns from the error trials will have ahigh variance
(Equation A.14). Hence, the contrast of the signal must be sufficiently high but
not too high. As will be shown in the simulations and with human data, a
contrastthatachievesanaccuracyof75reachesthisbalance.
Unlike the negative image, the positive image in the average noise pattern
isseverelyaffectedbyuncertainty. Thispositiveimage(E[X
j
]ofEquationA.16)
correspondstotheaverageofallthechannelsassociatedwiththeresponse(“X”
in our example). As a result, there will not be any clear positive image in the
classification sub-images when there is significant intrinsic uncertainty. The
168
clarity of the positive image provides a way to estimate the degree of uncer-
tainty.
Estimationofspatialuncertainty
In the case of spatial uncertainty, the channels (or templates) are assumed to
be shifted versions of one another (Equation A.10). If we represent the spatial
distribution of the channels with an image S, with each pixel corresponding
to a location in the image and the pixel value representing the probability of a
channelatthelocationrespondingerroneouslytonoise,then
E[X
j
] =X
z
∗S (A.17)
where∗ denotes a convolution and X
z
is the position-normalized template for
“X”.CombiningEquationsA.16andA.17,wehave
E[N
OX
]∝ (X
z
∗S−O
z
) (A.18)
If S can be parameterized with a small number of parameters (e.g., S being a
squareregionwithuniformdistribution),thenEquationA.18providesawayto
estimate both the perceptual templates and the amount of spatial uncertainty.
We can obtain these estimates in stages. The classification sub-image CI
OX
,
which, in the limit, approaches E[N
OX
], contains a negative image of the “O”
template, unaffected by uncertainty. Likewise, the sub-image CI
XO
provides a
direct estimate of the “X” template. Knowing both the “O” and “X” templates,
Equation A.17 and the corresponding equation for E[N
XO
] can be used to esti-
matethespatialuncertaintyS.
169
Inpractice, theestimation ofthetemplatesisneverpreciseandmethodsfor
removing the noise term in the sub-images tend to introduce various idiosyn-
cratic artifacts. Fortunately, we will show with simulation that estimation of
thespatialuncertaintyS appearsrobust, particularly ifitcanbeparameterized
withveryfewparameters.
Classificationimageswithextrinsicuncertainty
So far, we have assumed that there is no spatial uncertainty in the experiment,
and the only uncertainty is intrinsic to the observer. In this case, the presenta-
tion of an “O” signal at a fixed location will most likely elicit a response from
one particular “O” channel. The classification sub-images (e.g., CI
OX
) can be
calculatedbyaveragingthenoisepatternsintheconventionalmatter:
CI
OX
=
1
n
OX
X
i
N
OX,i
(A.19)
Wecanobtainacleartemplatedespitethespatialuncertaintycausedbyhaving
a relatively strong signal at a fixed position. No special operation is needed to
reconstruct theclassification images.
Withasmallbutimportantmodification,EquationsA.16andA.18willhold
evenwhenthespatialuncertaintyisbothinthestimuliandtheobserver. Sucha
conditionarisesinexperimentswhenwewanttotestashift-invariantobserver
by usingsignals whose positions vary from trial to trial. Themodification isto
simply shift the noise pattern (with wraparound) by an amount that either re-
centers the signal with respect to the image or otherwise normalizes its spatial
170
position. That is, if a stimulus at trial i was created by shifting the stimulus O
1
byanamountp
i
:
O = shift(O
1
,p
i
)+N
OX,i
(A.20)
thenwewillreplaceN
OX
inalloftheprecedingequationswithashiftedversion
S
N
OX
,where
S
N
OX,i
= shift(N
OX,i
,−p
i
), and
CI
OX
=
1
n
OX
X
i
S
N
OX,i
=
1
n
OX
X
i
shift(N
OX,i
,−p
i
) (A.21)
This modification is valid under the assumption that the templates for a given
responseareshiftedversionsofoneanother(EquationA.10).
Simulations
To illustrate the various properties of the signal-clamped classification images,
we consider an observer model that is otherwise optimal except for two limi-
tations: (1) it uses templates that are slightly different from the presented sig-
nal and (2) it may have a high degree of intrinsic spatial uncertainty — spatial
uncertainty that is not present in the stimuli but nevertheless assumed by the
observer. The decision rule for such an ideal-observermodel isgiven byEqua-
tion A.5. Weassumethatthere isnointernalnoise inthe ideal-observermodel.
The presence of internal noise before or after the template comparison stage
will lower the contrast of the resulting classification images without qualita-
tivelyaffectingthecriticalpropertiesthatwearetryingtoillustrate. Incontrast,
internalnoiseduringtemplatematchingwillinteractwithintrinsicuncertainty
171
and can lead to complex effects on the classification images. Template estima-
tionbythesignal-clampingmethodwillremainrobustunderthistypeofnoise;
however, such noise will lead to a biased estimation of the spatial extent of an
observer’s intrinsic uncertainty when the method that we will be describing in
EquationsA.23andA.24isused.
We simulated two tasks, a two-letter identification task and a single-letter
detectiontask. Foreachtask,wesimulatedtwolevelsofintrinsicspatialuncer-
tainty. For each pair of conditions (task and uncertainty level), we estimated
theobservertemplatesandtheamountoftheintrinsicspatialuncertaintyfrom
the classification images. We also illustrated the effect of signal clamping by
simulatingthetasksattwodifferentsignalcontrastlevels,oneleadingtoa55%
correctperformancelevelandanothertoa75%correctlevel.
For the letter identification task, the signals were lowercase “o” and “x” in
TimesNew Romanfont with an x-height of 21 pixels. The signals were always
presented at the center of a 128× 128 pixel image. The ideal-observer model
used lowercase “p” and “k” from the same font and size as its templates for
“o” and “x”, respectively. In the case of no uncertainty (M = 1), the templates
were positioned to have the maximum overlap with the signal. In case of high
spatial uncertainty, the center position of a template is uniformly distributed
within the center64×64 pixelsof the image. There were 1000spatially shifted
templates for each response (M = 1000). The relative positions of the signals
and the templatesare shown in Figure A.1a. For each trial, the observer model
made a decision according to Equation A.5. The external noise had a variance
of 1/16 (σ = 0.25), identical to thatused in the human experiments. The signal
contrast was set to a level to obtain an accuracy of 55% correct (low contrast)
172
or 75% correct (high contrast). The observer model was assumed to know the
signalcontrast(parametercinEquationA.5).
Figure A.1bshowsthe foursetsof classification sub-imagesfrom thesefour
simulatedconditions. Considerthehighsignalcontrastconditions(middlecol-
umn). Whentherewasnospatialuncertainty(firstrow),thesub-imagescontain
an equal portion of both a positive and a negative image of the two templates
(“p” and “k”) used by the observer model. As predicted by Equation A.13,
thesearethetemplatesoftheobserverandnotthepresentedstimuli. Compare
these sub-images to the ones obtained with high degree of spatial uncertainty
(second row, middle column). As predicted by Equation A.16, only one clear
template is visible in each sub-image. Specifically, for the trials where the sig-
nal was “o” and the response was “x”, the classification sub-image CI
OX
con-
tains a clear negative image of the template for the “o” response, which in this
case was the letter “p” — the template we built into the ideal-observer model.
Remarkably, this image is sharp and unaffected by the high degree of intrinsic
uncertainty. This is the main result of the signal-clamping technique. Also, as
predictedbyEquationA.16,thereisnoclearpositivetemplateinCI
OX
,whichis
thesinglemostimportantdifferencebetweenthetwouncertaintylevels(M = 1
vs. M = 1000). Itis important to reiterate the point that the negative image in
CI
OX
resemblestheobserver’stemplate“p”andnotthesignal“o”thatwaspre-
sented. The “o” signal biased a “p” template at a particular location, allowing
the effect of noise on that particular template to accumulate over all the error
trials when the presented signal was “o”. The effect of the noise was on the
nonzeroregionsofthebiasedtemplate,althoughtheseregionsmaynotoverlap
withthesignal(e.g.,thedescenderofthelowercase“p”).
173
The signal-clamping approximation that led to Equation A.16 relies on the
fact that there is sufficient signal contrast in the stimulus to select a particular
channel for imaging. When the signal contrast was reduced, the image quality
of the signal-clamped classification images is markedly degraded (left column
ofFigureA.1b). Thisisinstarkcontrasttotheconventionalclassification-image
method (or reverse correlation) without uncertainty. When there is no uncer-
tainty, the overall image quality of the classification images improved with
decrease in signal contrast, as is commonly observed. The improvements are
duetoadecreaseinnoisefortheerror-trial sub-images(becausethenumberof
error trials increases) and an increase in signal for the correct trial sub-images
(because with a weak signal, correct responses are often aided by coincidence
with noise). With uncertainty, however, these improvements were overridden
by a failure of the signal-clamping approximation, allowing the uncertainty to
affect the accumulated template images and rendering the templates invisible.
Thiseffectisclearlyshownintheleftcolumn,secondrowofFigureA.1b,where
signalcontrastwassettoalowvaluetoachieveanaccuracyof55%.
Wenextturntotheestimationoftheextentofthespatialuncertaintyintrinsic
to the observer using Equation A.18. We assumed S to be a uniform square
regioncenteredintheimagewithdpixelsonaside. Thus,
S
d
(x,y) =
1
d
2
if|x|6 d/2and|y|6d/2
0 otherwise
(A.22)
174
FromEquationsA.14,A.18,andA.22,wehave
CI
OX
≈ E[N
OX
]+
N
σ
√
n
OX
=k(X
z
∗S
d
−O
z
)+
N
σ
√
n
OX
(A.23)
Likewise,
CI
XO
≈k(O
z
∗S
d
−X
z
)+
N
σ
√
n
XO
(A.24)
ThenoisetermsinEquationsA.23andA.24arewhiteandcanbemadetohave
the same variance if we multiply both sides of Equations A.23 and A.24 by
√
N
OX
and
√
n
OX
, respectively. If we knew the observer’s signal-clamped tem-
plates(O
z
andX
z
),thenk,andmostimportantlytheextentofthespatialuncer-
tainty d, can be estimated from the classification sub-images for the error trials
by minimizing the least-squared error. The right-most column of Figure A.1b
plots the residual sum-of-squares error for different valuesofd (with the value
ofk chosen to minimize the residual at each level ofd). The solid green curves
were obtained using the veridical observer templates (lowercase “p” and “k”).
The value of d at which a global minimum is achieved provides the estimate
of the extent of the spatial uncertainty. The estimated values for the two levels
of uncertainty are 1 and 35 pixels, respectively, and are indicated by the first
character of the template label “pk”. For the high-uncertainty condition, the
residuallandscapesuggeststhatalthoughthelowerboundofdiswelldefined,
theupperboundisnot. Inthecontextofthislimitation,theestimatedvaluesare
ingood agreementwiththeveridicalvalues(1forthenouncertaintycondition
and64forthehigh-uncertaintycondition).
The black curves and the one red curve represent the residual landscape of
d computed using incorrect observer templates. Each of the three black curves
175
was obtained with a pair of lowercase letters (except “p” and “k”) that resem-
bledtheclassification sub-imagesasthepresumedobservertemplates. Thered
curve was obtained with the presented signals (“o” and “x”) as the presumed
observer templates. Note that the values of d at the global minimum of each
of these residual curves are very similar. This result demonstrates the robust-
ness of the estimate of the spatial extent d of the underlying uncertainty, even
whentheobservertemplateisnotpreciselyknown. Inpractice,thismeansthat
we can obtain a reasonable estimate of the spatial extent by assuming that the
observertemplateswereidenticaltothepresentedsignals.
Figure A.2 shows the results of the single-letter detection task. The signal
in this task is the lowercase letter “o” from the two-letter identification task.
Theideal-observermodelusedalowercase“e”asthetemplatetodetectthesig-
nal (Figure A.2a). Two levels of intrinsic uncertainty were simulated: spatial
extents with a uniform distribution of 32 (medium uncertainty) and 64 (high
uncertainty) pixels on a side of a square centered on the image. For the condi-
tion with the smaller spatial extent, two types of spatial uncertainty were con-
sidered: onewithaconstantM forbothlevelsofspatialextents(M = 1000)and
another with a constant density (M = 1000 for high uncertainty, M = 250 for
mediumuncertainty). Thetelltalesignofuncertaintyisevidentintheclassifica-
tion sub-imagesfor allconditions (Figure A.2b). In particular, the classification
sub-imagefromthemisstrials(CI
miss
)showsanegativeimageoftheobserver’s
template (a lowercase letter “e”), whereas the sub-image of the false-alarm tri-
als (CI
FA
) shows only a positive haze (if there were no uncertainty, it would be
apositiveimageofobserver’stemplate).
Performance of the ideal-observer model in the two medium-uncertainty
conditionswasessentiallythesameintermsofthresholdcontrast(C
250
/C
1000
=
176
1.1) and classification images(Figure A.2b, first row, left andmiddle columns).
This is consistent with the finding of TjanandLegge (1998) that there exists a
task-dependent upper bound of the effective level of uncertainty, which can
besubstantiallylessthanthehighestpossiblelevelofphysicaluncertainty. With
respecttoourcurrentletterdetectiontask,thismeansthatincreasingM beyond
adensityof250possiblepositionsper32×32pixelshasnoconsequenceinper-
formance.
For the signal-clamping approximation (Equation A.9) to be exact, an
observer’sinternaltemplatesshouldbeorthogonal,thesignalshouldbestrong,
orboth. Orthogonalityiseffectivelyreducedwhenthespatialextentofthetem-
plates are confined to a smaller space. That is, a randomly selected channel
will tend to be in closer proximity to the channel at the stimulus position. A
reduction inthespatial extentalsoreduced thethreshold contrast fordetection
(by a factor of about 1.5 for the ideal-observer model). The combined effect
of reduced orthogonality and reduced signal contrast was incomplete signal
clamping,whichresultedinthenoticeabledarkhazearoundthenegativeimage
oftheobservertemplateinCI
miss
inbothofthemedium-uncertaintyconditions.
Thisdarkhazewasabsentinthehigh-uncertaintycondition.
The white haze in CI
FA
is noticeably broader and fainter in the high-
uncertaintycondition comparedwiththemedium-uncertaintycondition.
177
Equations A.23 and A.24 were used to estimate the spatial extent (d) of the
uncertainty. Notethatfor adetection task, oneof the templates(X in thiscase)
isanimageofzeros;thatis,
CI
miss
≈k(−O
z
)+
N
σ
√
n
miss
CI
FA
≈k(O
z
∗S
d
)+
N
σ
√
n
FA
(A.25)
The residual landscape for estimating d is plotted in the second row of Fig-
ureA.2b. Aswiththecaseoftheletteridentificationsimulation,thegreencurve
representsusingtheveridicalobservertemplate(“e”)toperformtheestimation,
theredcurve representsusingthesignal inthestimuli asthetemplate,andthe
three black curves were obtained using other lowercase letters that resembled
the classification sub-images. Again, the values of d that minimize these resid-
ualfunctionsarerelativelyindependentoftheassumedobservertemplates. The
averagedestimatedvalueofdwas14.6pixelsforthemedium-uncertaintycon-
ditionand37.4pixelsforthehigh-uncertaintycondition. Althoughshowingthe
sameratioofdifferenceastheveridicalvalues(32vs.64pixels,respectively),the
estimated values are admittedly a factor of 2 less. This is probably because the
simulation used only 1000 positions within S
d
, as opposed to a true uniform
distribution ofpositions.
178
Figure A.1: (a) Signals and templates used for simulating the letter identification task using
anideal-observermodel. Thewhitehazeshowsthespatialextentof theintrinsic spatialuncer-
tainty of the model forM = 1000 and spatial extent (d) equal to 64 pixels. The templates used
bythemodelareshowningreen. Theletterstimuliareshowninredandoverlappingregionsin
yellow. (b)Classificationimagesfromtheideal-observermodelfortheletteridentificationtask:
first row, simulations with no spatial uncertainty (M = 1); second row, simulations with high
spatialuncertainty(M = 1000,d = 64);leftcolumn,lowsignal-contrastsimulationsatanaccu-
racy criterion of 55% correct; middle column, high signal-contrast simulations at an accuracy
criterionof75%correct;rightcolumn,estimationsofthespatialextent(d)oftheuncertaintyfor
the high signal-contrast condition (middle column) — each curve is an error function labeled
bythetemplatesusedtoobtaintheestimate. Thevalueofdattheminimumofeacherrorfunc-
tion represents the estimated spatial extent of the uncertainty. The minimum of each curve is
markedbytheposition of thefirstcharacterof thecorrespondinglabel. Thegreencurveswere
obtained using the actual observer templates from the model, the red curves were obtained
using thestimuli letters astemplates, andthe blackcurveswereobtained using pairs of letters
thatclosely resembled(interms of r.m.s.distance)thetrue templates. Thehigh degreeof simi-
larityintheestimatedvaluesofdusingdifferentputativetemplatesshowstherobustnessofthe
method. The stimulus noise had a pixel-wise standarddeviation of 0.25. rSNR was computed
usingonlytheerrortrials,asdescribedinEquation2.2.
179
FigureA.2: (a)Signalandtemplateusedforsimulatingtheletterdetectiontaskwithanideal-
observer model. The white haze shows the extent of the intrinsic spatial uncertainty of the
model observer for M = 1,000 and spatial extent (d) equal to 64. The template used by the
model is shown in green. The letter stimulus is shown in red. The overlapping regions are
shown inyellow. (b)Classificationimagesfromtheideal-observermodelperformingtheletter
detectiontaskatanaccuracylevelof75%correct: firstcolumn,classification-imageandspatial-
extent estimations for the medium spatial uncertainty condition (M = 1000, d = 32); second
column, classification-image and spatial-extent estimations for a medium spatial uncertainty
condition (M = 250, d = 32), which has the same spatial density of templates as the high-
uncertainty condition; thirdcolumn, classification-image andspatial-extentestimations for the
high spatial uncertainty condition (M = 1000, d = 64). The error functions of spatial-extent
estimations arelabeled by the putative template used for the estimation. The value of d at the
minimum of each curve represents the estimated spatial extent and is marked by the position
of the corresponding label. The green curves were obtained using the model’s template, the
red curves were obtained using the stimulus letter as the template, and the black curves were
obtainedusinglettersthatresembled(intermsofr.m.s.distance)themodeltemplate.
180
AppendixB
ExpectedValueofNoisefromError
Trials
We want to show that that noise sample N
OX
from the OX error trials, where
the signal was “O” but the response was “X”, has the mathematical expecta-
tion asdescribed in Equation A.16, whereO
z
isthe channel thatistuned to the
presented “O” signal,X
j
, j ∈ [1,M] are the channelsthat are tuned to the pos-
siblesignalsforthe“X”response,andE[X
j
]denotestheaverageacrossallX
j
s.
Our starting points are (1) the result from Ahumada (2002) for M = 1 (Equa-
tion A.13) and (2) the internal decision variable of the observer during these
trials,withthesignal-clampingapproximationapplied(EquationA.15).
We shall prove Equation A.16with mathematical induction onM. The case
of M = 1 is true from Ahumada (2002) (i.e., Equation A.13). Assuming M = k
is true, we consider the case ofM = k +1. Letv be the number of trials where
X
j
, j 6 k, were the maximum-responding X channels on the left-hand side of
181
Equation A.15. For these trials only, it was as if M = k, and Equation A.16 is
truebyassumption. Thesumofthenoisesamplesfromthesetrialsis
vE[N
OX
]∝ (vE[X
j
]−vO
z
) j∈ [1,k] (B.1)
Let w be the number of trials whereX
k+1
is the maximum-responding X chan-
nel. For thesetrials, itwasasifM = 1, forwhich theresult of Ahumada(2002)
(EquationA.13)applies. Thesumofthenoisesamplesfromthesetrialsis
wE[N
OX
]∝ (wX
k+1
−wO
z
) (B.2)
AddingEquationB.1toEquationB.2anddividingthesumbythetotalnumber
oftrials(v+w),wehave
E[N
OX
]∝ (E[X
j
]−O
z
) j∈ [1,k+1] (B.3)
Thus, Equation A.16will be true forM = k +1, if itistrue forM = k. Because
it is true for M = 1, by mathematical induction, Equation A.16 is true for all
M> 1.
182
AppendixC
SignalClamping: BeyondFirst
OrderAnalysis
C.1 Signal-clampedclassificationimages
Signal-clamped classification images (TjanandNandy, 2006) are simply the
classification images obtained at high signal contrast (which serves to “clamp
down” a particular perceptual channel, thus minimizing the effect of uncer-
tainty)andwithoutaddingupthesub-imagesfromdifferentstimulus–response
categories. The analysis of signal-clamped classification images focuses on
the data from the error trials. As shown in TjanandNandy (2006), a unique
property of signal-clamped classification imagesis that any spatial uncertainty
intrinsic to the observer is significantly reduced or even eliminated for the
“miss”componentofanerrortrial. Forconditionsinwhichthereisasignificant
amountof intrinsicspatialuncertainty inthevisualsystem (e.g.,formvision in
theperiphery),thispropertyofthesignal-clampedclassification-imagemethod
dissociates the miss components from the false-alarm components of an error
trial and allows separate imaging of the shift-invariant perceptual template for
183
each of the stimuli. In this study, we began with experimental conditions suit-
able for extracting signal-clamped classification images and proceeded to ana-
lyze only the error trials. The detailed experimental setup will be given in the
Methodssection. Theanalyticandempirical propertiesof signal-clamped clas-
sification images in general are given in the study of Tjan and Nandy. For the
purpose of this appendix, it is sufficient to bearin mind that an error-trial sub-
imagecontainsthefollowing: (1)aperceptualtemplatethatisnegativelycorre-
lated with the clamped target mechanism and (2) a spatially dispersed “haze”
thatcorrespondstotheunclampedalternativemechanism.
C.2 Flankeranalysis
Inconditionswhenflankerswerepresentandinfluencedperformance,wewant
to determineif specific partsfrom the flankersaffected an observer’s response,
leadingtocrowding, andifthesepartsresembledthoseof thetargets. Thepro-
cedure is similar to that used to obtain the standard classification sub-images
with a few modifications. First, instead of classifying only the noise field at
each trial, the sum of the noise and the flankerimages are classified, according
to the stimulus presented and the observer’s response, and then averaged. We
thusobtainclassificationsub-imagesofthefollowingform,exemplifiedherefor
thecaseinwhichthetargetpresentedwas“o”whiletheresponsewas“x”:
I
OX
=
1
n
OX
X
i
(N
OX,i
+c
OX,i
F
OX,i
) (C.1)
whereN
OX,i
isthe noise fieldfrom triali,F
OX,i
isthe flankerimageatunitcon-
trast and c
OX,i
is the flanker (and target) contrast for the trial, and n
OX
is the
number of trials in that category. This classification sub-image represents the
184
category mean for the population of trials in which the presented target was
“o”andtheresponsewas“x”.
Similarly, we can obtain a classification sub-image for all the trials in which
the presented target was “o”, irrespective of the response. Let us call this I
O∗
. This sub-image represents the expected mean for the population of trials in
which the target was “o”, regardless of the behavioral response. As one fur-
ther step, this expected mean (under the null hypothesis that flankers have no
affect on response) needs to be corrected for contrast to account for the slight
difference in contrast between the error trials and the correct trials. The differ-
ence exists because (1) the contrast of the target and the flankers were varied
dynamically from trial to trial using QUEST (see §3.3) and (2) the probability
of a subject making a discrimination error is higher when the trial contrast is
slightly below the average population contrast. Thus, the average contrast for
the error trials is slightly lower than the average population contrast. If we do
not correct for this difference, it will lead to a biased estimate of the z scores.
Thecorrection canbeexpressedasfollows:
c
OX
=
1
n
OX
X
i
c
OX,i
r =
c
OX
−c
O∗
c
O∗
I
C
O∗
=I
O∗
+
r
n
O∗
X
i
c
O∗,i
F
O∗,i
(C.2)
wherec
OX
istheaveragecontrastforthetarget“o”–response“x”trials,andc
O∗
istheaveragecontrast foralln
O∗
target“o”trials(irrespective oftheresponse).
Thus,I
C
O∗
representsthecontrast-corrected versionofI
O∗
.
185
Finally, under assumptions of normality, because n
OX
is typically large, we
can perform az test between the category mean and the expected mean under
thenullhypothesisandobtainaz-score mapasfollows:
SS
I
O∗
=
X
i
(N
O∗,i
+c
O∗,i
F
O∗,i
)
2
σ
I
O∗
=
q
SS
I
O∗
−n
O∗
I
2
O∗
n
O∗
Z
OX
=
I
OX
−I
C
O∗
σ
I
O∗
q
n
O∗
n
OX
(C.3)
whereσ
I
O∗
isthestandarderroroftheexpectedmeanunderthenullhypothesis.
Under the null hypothesis that there is no effect of the flankers, we would
notexpecttoseeanysignificantzscoresintheregionswheretheflankerswere
presented. Ontheotherhand,ifwedofindsignificantz scoresintheseregions,
then displaying the z-score map (Z
OX
) as a color-coded image provides a way
tovisualize thepresence(indicated bystatistically significant positivez scores)
or the absence (negative z scores) of pixels in the flankers (and elsewhere) that
bias the observer toward responding “x”. Similarly, we can obtain the z-score
map,Z
OX
,forthe“o”response. ThisprocedureisillustratedinFigureC.1.
186
many trials
stimulus = o
response = “o”
stimulus = o
response = “x”
stimulus = x
response = “o”
stimulus = x
response = “x”
z-score
⇉ ⇉ ⇉ ⇉ Figure C.1: Flanker analysis procedure. The masking noise plus the flankers are classified
akin to the conventional first-order classification images, and a z test is performed for each
categoryagainsttheexpectedmeanforthatcategoryunderthenullhypothesisthatneitherthe
flankersnorthenoiseaffectsthebehavioralresponse. Thez scoresareplottedonacolor-coded
map to reveal statistically significant structures of the flankers and noise that are correlated
with the subject’s response. The regions where the target and the flankers were presented are
demarcatedbytheboundingboxesoftheletters.
Anotherwaytovisualizethesameinformationistoperformattestbetween
thetwoerrorresponsecategories:
I
2
OX
=
1
n
OX
X
i
(N
OX,i
+c
OX,i
F
OX,i
)
2
σ
pool
=
v
u
u
t
n
OX
I
2
OX
−I
2
OX
+n
XO
I
2
XO
−I
2
XO
n
OX
+n
XO
−2
T =
I
XO
−I
OX
σ
pool
q
1
n
OX
+
1
n
XO
(C.4)
187
Thet-test map, T, provides a direct contrast of the pixels (significant positive t
values) that biastoward the “o” response versus the features (significant nega-
tivetvalues)thatbiastowardthe“x”response.
C.3 Featuremaps
For our“o”versus“x” task, we canpostulate two competingmechanisms, one
selective to “o” and the other selective to “x”. The error-trial noise fields of
signal-clamped classification images provide a natural wayof separating these
two mechanisms. This is because the high-contrast target reduces the effect of
intrinsicpositionaluncertaintyonlyinthemechanismselectivetothepresented
target. Briefly, an error response occurs when the “target” mechanism (the one
selectivetothepresentedtarget)missesorthe“alternative”mechanism(theone
selective to the target not presented) false-alarms or both. In TjanandNandy
(2006), we showed that by presenting a high contrast target at a fixed location,
one of the many spatially dispersed channels in the target mechanism can be
localized,inthesensethatitleadstomostoftheresponses. Whenamissoccurs
in this mechanism, it is most likely due to the fact that the masking noise sup-
presses this one channel that would otherwise be “clamped” to the presented
target. As a result, the average of the error-trial noise fields yields a clear tem-
platethatisalmostunaffected byintrinsicspatialuncertainty andisnegatively
correlated with the template used by the target mechanism. In contrast, none
of the channelsin the alternative mechanismare clampedor biasedbythe pre-
sented signal. When an error is caused by a false alarm from this alternative
mechanism, the false alarm can originate from any of its spatially dispersed
channels. Asa result, the average of the noise fieldsdoes not contain any clear
188
template for this mechanism if the intrinsic spatial uncertainty in the mecha-
nismishigh. Insum,intheaverage(thus,first-order) noisefieldfromtheerror
trials, the template for the mechanism that missed is revealed but the template
forthemechanismthatfalse-alarmedisnot.
If we remove, by projection, the spatially localized first order template
(say, for the “o” mechanism) from the corresponding error-trial noise samples
(N
OX,i
), the residual noise samples will contain mostly the spatially dispersed
structures corresponding to the alternative (“x”) mechanism. Extracting the
second-order statistics from the residual noise fields can therefore reveal the
spatial structure of the sub-template fragments (“features”) of the alternative
(“x”) mechanism. Similarly, we can obtain the second-order feature statistics
for the “o” mechanism from the error-trial noise samplesN
XO
. In other words,
the first-order statistics of the noise fields N
OX
give us the shift-invariant tem-
plate of the “o” mechanism, which represents the spatiotemporal average of
the sub-template features used by the mechanism (excluding the effect of spa-
tial uncertainty), whereas the second-order statistics of the noise fields N
XO
(notetheswappingofthesubscripts) revealthesub-templatefeatures. Wenext
describethestepstoextractthesecond-orderfeaturestatistics.
As indicated above, we first project out the first-order signal-clamped tem-
platefromtheerror-trial noisefields. Thisisshownbelowforthecaseinwhich
189
the clamped mechanism wasfor the target “o” (exchange the O andX suffix to
projectoutthe“x”template):
M =
1
f
2
1 ··· 1
.
.
.
.
.
.
.
.
.
1 ··· 1
f×f
N
d
OX,i
(x,y) = [N
OX,i
⊗M](xf,yf)
~
N
′
OX,i
=
~
N
d
OX,i
−
~
N
d
OX,i
·
~
N
d
OX,i
~
N
d
OX,i
(C.5)
whereN
OX,i
isthenoisefieldfortheithtrialwhenthestimuluswas“o”andthe
responsewas“x”,N
d
OX,i
isadown-sampledversionofthenoisefieldandN
d
OX,i
istheaveragedown-samplednoisefield,whichcontainstheclampedtemplate
for the “o” mechanism to be removed by projection. In the above equation,
~
N
d
OX,i
and
~
N
d
OX,i
representthevectorizedformofthecorrespondingnoisefields.
Removalbyprojection amountstosubtracting outthe vectorcomponentinthe
direction of the vector representing the first-order template. The purpose of
the down-sampling operation is to preclude any spurious correlations due to
the stroke widths of the letter stimuli. For example, if a particular stroke of
the letter “x” is 3 pixels wide, this will introduce correlation in the error-trial
noisepixelsthatarecongruentwiththestrokewidth. Downsamplingthenoise
field by a factor of 3 will reduce the effect of this correlation in the final fea-
ture maps. A letter in a down-sampled image would have a stroke width of
1 pixel. This is accomplished by first convolving the noise field with an f ×f
rectangular maskM that preserves the signal-to-noise ratio (SNR) followed by
a down-sampling operation by a linearfactorf. The downsamplingfactor f is
chosen such that it matches the mean stroke width among the presented target
190
letters. For the foveal viewing conditions, the down-sampling operation is not
performedbecausetheletterstimulihaveastroke widthof1.
Next,weidentifyanoptimalregion-of-interest(ROI
opt
)intheresidualnoise
fields N
′
OX,i
for extracting the second-order statistics (see Equation C.14); this
criticalstepofidentifyingtheoptimalROIwillbediscussedinthenextsection.
Wethencalculate,inthestandardfashion,thepixel-wisecorrelation coefficient
between all points in ROI
opt
and another region ROI
off
within the same noise
fieldthatisoffsetfromROI
opt
by(Δx,Δy)andaccumulatethecorrelation coef-
ficientacrossalltheresidualnoisefields,N
′
OX,i
:
μ
ROIopt
=
1
n
ROIopt
n
OX
n
OX
X
i=1
X
(j,k)∈ROIopt
N
′
OX,i
(j,k)
μ
ROI
off
=
1
n
ROIopt
n
OX
n
OX
X
i=1
X
(j,k)∈ROIopt
N
′
OX,i
(j +Δx,k+Δy)
SS
ROI
opt
=
n
OX
X
i=1
X
(j,k)∈ROIopt
N
′
OX,i
(j,k)−μ
ROI
opt
2
SS
ROI
off
=
n
OX
X
i=1
X
(j,k)∈ROIopt
N
′
OX,i
(j +Δx,k+Δy)−μ
ROI
off
2
SS
ROIopt,ROI
off
=
n
OX
X
i=1
X
(j,k)∈ROIopt
N
′
OX,i
(j +Δx,k+Δy)−μ
ROI
off
×
N
′
OX,i
(j,k)−μ
ROIopt
r
OX
(Δx,Δy) =
SS
ROIopt,ROI
off
p
SS
ROIopt
·SS
ROI
off
(C.6)
Thiscorrelationcoefficient,r
OX
(Δx,Δy),isevaluatedforarangeofdisplace-
ments (Δx,Δy) within±1 letter size measured in units of the height of the let-
ter “x” (i.e., −L 6 Δx,Δy 6 L, where L is the x- height in pixels). To aid
directcomparisonsacrosssubjectsandconditionswithdifferentsizesofROI
opt
191
and different numbers of error trials, we convert the correlation coefficient to
Fisher’sZ (Papoulis,1990),suchthatthecorrelationatadisplacement(Δx,Δy)
is expressed in terms of a random variable rZ
OX
(Δx,Δy) with an approximate
standardnormaldistribution:
rZ
OX
(Δx,Δy) =
1
2
ln
1+r
OX
(Δx,Δy)
1−r
OX
(Δx,Δy)
√
n
OX
−3 (C.7)
The ensemble of correlation coefficients (expressed in terms of Fisher’s Z)
can then be plotted on a color-coded maprZ
OX
. The map (see Figure C.2) thus
obtained can be interpreted as follows: The center of the map represents any
pixel in ROI
opt
; a “hot” spot or apositive correlation atan offset (Δx,Δy) from
the center indicates that the observer is biased toward a particular response
when two pixels separated by (Δx,Δy) have the same contrast polarity; sim-
ilarly, a “cold” spot or a negative correlation at an offset (Δx,Δy) from the
center indicates that the response is partly driven by two pixels separated by
(Δx,Δy)with oppositecontrast polarity. Thissetofcorrelated spotsrepresents
the second-order statistics of the features that are recruited to identify a letter
(“x”forthecorrelation mapderivedfromN
′
OX,i
).
Ingeneral,weshallrefertothecorrelationmap(rZ
OX
)derivedfromN
′
OX,i
as
the second-order feature map for the “x” mechanism, or simply rZ
X
; likewise,
weshallcallthecorrelationmapderivedfromN
′
XO,i
asthesecondorderfeature
mapfor“o”,orsimplyrZ
O
.
By comparing feature maps obtained from an ideal-observer model (for
whichweknowthegroundtruth)tothatofahumanobserver,onecanuncover
the nature of the strategy employed by the visual system at an atomic level
thathasnotbeenpossiblewithstandardclassificationimages. Figure3.6shows
192
1
2
n
ΔX
ΔY
ΔX
ΔY
CORR
ROI
opt
ROI
off
Figure C.2: Procedure to calculate the second-order feature maps. Correlation coefficients
between each pixel in an ROIopt and the corresponding pixel in an offset region (ROI
off
) are
accumulatedoverallresidualnoisefieldsforaparticularerror-responsecategoryandareplot-
ted in terms of Fisher’s Z to facilitate comparison across conditions and subjects on a color-
coded map. This map reveals the second-order statistics of the features that comprise an
observer’s perceptual template. The center of the map represents any pixel in ROI
opt
; a “hot”
spot or a positive correlation at an offset (Δx,Δy) from the center indicates that the observer
is biased toward a particular response when two pixels separated by (Δx,Δy) have the same
contrast polarity; a “cold” spot or a negative correlation at an offset (Δx,Δy) from the center
indicates that the response is partly driven by two pixels separatedby (Δx,Δy) with opposite
contrastpolarity.
the ideal-observer feature maps obtained from stimuli used in our study. The
ideal-observer model, detailed in TjanandNandy (2006), was limited by static
193
white contrast noise and an intrinsic spatial uncertainty equated to ±1.5 let-
ter size (x-height) in both horizontal and vertical directions. For our purpose,
the ideal-observer model was given the task of discriminating the letter “o”
from “x”usingthesameletterimagesshown tooursubjects butintheabsence
of flankers. The level of the white noise is inconsequential and was set to an
r.m.s. contrast of 1.0. Figure 3.7 shows the human feature maps. Many chal-
lengesabound in trying to quantitatively compare the model and human data.
Theseincludethefollowing: (1)thecorrelationcoefficientsandthecorrespond-
ing Fisher’s Z values are not directly comparable because of the differences in
internal noise and uncertainty between model and human observers; (2) the
representation used by the human observers may correspond to letters of a
slightly different shape and size than the ones used in the experiments, result-
ing in second-order correlations that are morphologically similar to features
used by the ideal-observer model but are spatially displaced; (3) the process
of estimating the correlation coefficients from human data is noisy. We over-
came these challenges by considering thresholded versions of the correlation
maps to extract three quantities (separately for the two putative “o” and “x”
mechanisms): (1)qualityofmatch(Q
m
)betweenhumanandideal-observerfea-
turemaps,(2)proportionofideal-observerfeaturesusedbythehumanobserver
(U), and (3) proportion of features used by the human observer that are valid
accordingtotheideal-observermodel(V).
Because each of the target letters was essentially a binary image, with
background at zero contrast and foreground (letter) at a positive contrast, the
second-orderfeaturesdirectlyassociatedwithaletteralwayshasapositivecor-
relation coefficient. Thenegativecorrelation intheideal-observer(andhuman)
194
featuremapsoriginatedfromthedifferencebetweenthetwolettersinthepres-
enceofspatialuncertainty. Therefore,inouranalysis,weconsideronlytheposi-
tivelycorrelatedfeatures. Tofurthersimplifytheanalysis,weconsideronlythe
presence or absence of a feature, discarding the magnitude of the correlation
coefficientsbythresholdingthesecond-orderfeaturemaps.
The basicproblem of comparing the humanobserver feature mapto thatof
an ideal-observer feature map lies in the arbitrariness of setting thresholds for
themapstodecidewhatcounts asafeature. Thisproblemcan beovercome by
performingtherelevantcomputations atallpossible threshold settings, akinto
tracingoutreceiveroperatingcharacteristic(ROC)curves.
Thus, for the purpose of calculating Q
m
(e.g., for the “o” mechanism), we
evaluateafamilyofROCcurvesparameterizedbyathresholdλ
hm
fordefining
thelevelofcorrelation thatmayconstitute afeatureusedbyahumanobserver.
Each ROC curve is in turn defined by a set of points (h
O,λ
hm
(λ),f
O,λ
hm
(λ))
dependingonathresholdλfordemarcatingtheideal-observerfeatures(a“hit”
[h]correspondstothepresenceofanideal-observerfeatureatthresholdλwhen
the same feature is also present in the human feature map; similarly, a “false-
alarm”[f]corresponds tothepresenceof anideal-observerfeaturewithout the
presenceofthesamefeatureinthehumanmap):
h
O,λ
hm
(λ) = Pr
rZ
ideal
O
>λ|rZ
hm
O
>λ
hm
f
O,λ
hm
(λ) = Pr
rZ
ideal
O
>λ|rZ
hm
O
6λ
hm
(C.8)
with rZ
ideal
O
and rZ
hm
O
being the second-order feature maps for the “o” mech-
anism obtained from the ideal-observer model and a human observer, respec-
tively. λ
hm
ischosentospanthepositiverangeofrZ valuesinthehumanmap.
195
For the ideal-observer feature map, the threshold λ spans the full range of the
rZ values comprising the map, thereby tracing out the ROC curve. We define
the quality of match between the model and human feature map as the area
undertheROCcurve(AUC)maximizedwithrespecttoλ
hm
:
AUC
O
(λ
hm
) =
∞
Z
λ=−∞
h
O,λ
hm
(λ)d(f
O,λ
hm
(λ))
Q
m,O
= max
λ
hm
AUC
O
(λ
hm
) (C.9)
Note that Q
m
does not depend on any arbitrary thresholds for either the ideal-
observer model or the human. It is also insensitive to the raw amplitude of
correlation coefficients of either human or ideal-observer feature maps and is
robust with respect to the level of an observer’s intrinsic noise, provided that
thenumberoftrialsusedtoestimatethefeaturemapsissufficient.
The ideal-observer model uses the entire letter as a unitary template “fea-
ture”. Its second-order feature map therefore contains all pairwise correlations
between pixels on a letter (excluding correlation of equal strength on both let-
ters). Ifthehumanvisualsystemusesacombinationofsub-templatefragments
as features to identify the letters (e.g., using a pair of vertical parallel lines to
identifytheletter“o”),thehumanfeaturemap,intermsofitsregionswithpos-
itive correlation coefficients, will be a subset of the ideal-observer feature map.
A corollary of this observation is that any human feature that is not present in
the ideal-observer feature map isaspurious feature, one that mayleadto erro-
neous performance. Therefore, we are interested in quantifying the proportion
ofideal-observerfeaturesthatarealsousedbyahumanobserver(U),aswellas
the proportion of human features that are valid features (V). To estimate these
twoquantities, wefirst setathreshold ofrZ = 1,which,intermsofFisher’sZ,
196
corresponds to 1 SD away from zero correlation in both the human and ideal-
observermaps; thatis, weconsiderafeature asanycorrelation with Fisher’sZ
greaterthan1.0. Thequalitativeresultsarenotsensitive tothissomewhatarbi-
trary threshold. We define the proportion of ideal-observer features used by a
humanobserver(U,or“featureutilization”)as
U
O
= Pr
rZ
hm
O
> 1|rZ
ideal
O
> 1
(C.10)
We define the proportion of human features used by the ideal-observer
model(V,or“featurevalidity”)as
V
O
=h
O,1
(1) (C.11)
wherehisdefinedinEquationC.8. Standarderrorestimatesforthethreequan-
tities Qm, U, andV are obtained by bootstrapping on the human feature maps
(EfronandTibshirani,1993).
C.4 Optimalregionofinterest
Let us return to the estimation of the second-order feature maps (rZ). With
one critical exception, our method of computing the correlation coefficients is
mathematically analogous to averaging the power spectra of the noise sample
within each stimulus—response category, which have been applied with stim-
uli of narrow spatial bandwidth (e.g., Gabors) and large spatial extent relative
to the noise field (Solomon, 2002; ThomasandKnoblauch, 2005). A significant
difficultyofapplyingthesametechniquetospatiallybroadbandstimulisuchas
197
lettersisthatwhilethestimuliarehighlylocalizedinspace,therecanbeasignif-
icantamountofspatialuncertaintyintrinsictothehumanobserver. Computing
the spectrum of the entire noise field will render any highly localized correla-
tion undetectable. However, because of intrinsic spatial uncertainty, it is also
unwise to compute the spectrum (or equivalently, pairwise correlation) only in
the target region. Theapproach we took wasto search for anROI
opt
within the
noise field that maximized a measure of pairwise correlation. This ROI reveals
thespatialrangewithinwhichfeaturesareextractedbytheobserver.
Theintuitionisasfollows(seeFigureC.3): Consideranoisefieldwithacen-
tralcorrelatedregion;iftheROIchosentocalculatethecorrelationcoefficientis
small, then the level of significance will be low (or equivalently, with a high p
value) because the ROI isinsufficient to capture all the correlations in the data;
thelevelofsignificancewillincreaseasthesizeoftheROIisincreased;if,how-
ever, the ROI is too large such that it includes mostly the uncorrelated noise,
the level of significance again drops. Thus, the optimal ROI isthe one with the
maximumlevelofsignificance(oralternatively,theminimumpvalue).
For computational tractability, we restrict our candidate ROIs to be rectan-
gular and centered in a residual noise field (Equation C.5) to coincide with the
center of the target letter. First, an ROI of size h×w pixels (ROI
h,w
) is selected
fromtheset
h,w∈{0.5L,1.0L,1.5L,2.0L,2.5L,3.0L} (C.12)
whereL is the letter size (x-height) in pixels. The significance level of this ROI
isthencalculatedasthemeanoflogpvalues:
P
h,w
=
1
L
2
X
Δx
X
Δy
log(p
Δx,Δy
) (C.13)
198
where the range of Δx and Δy is restricted to within±1 letter size (i.e.,−L 6
Δx,Δy 6 L), where L is the x-height in pixels, and p
Δx,Δy
is the p value of the
correlationcoefficientr(Δx,Δy)betweenROI
h,w
andaregionthatisoffsetfrom
ROI
h,w
by(Δx,Δy)asdefinedinEquationC.6.
P
h,w
is calculated for all possible combinations of h and w chosen from the
setgiveninEquationC.12. Finally,theoptimalROIisselectedasfollows:
(h
min
,w
min
) = argmin
h,w
P
h,w
ROI
opt
= ROI
h
min
,w
min
(C.14)
Inessence,theoptimalROIistheROIthatminimizesthegeometricmeanof
p values in the resulting correlation map according to Equation C.6. The opti-
mal ROI defined in Equation C.14 is used to calculate the second-order feature
mapsdescribed intheprevious section. Italsoprovidesanestimate ofthe spa-
tial extent over which features are detected and utilized. In this light, we will
refertotheoptimalROIasthefeatureutilizationzoneandwewillcomparethe
featureutilizationzonesbetweendifferentexperimentalconditions(flankedvs.
unflanked;fovealvs. peripheral).
199
ROI size
1
−
N
Σ
log(p)
Figure C.3: Rationale behind the estimation of the ROI
opt
. Regions smaller than the optimal
ROI do not capture all the correlation in the data and, hence, have lower significance (higher
p value). Regions larger than the optimal also have lower significance due to the inclusion of
uncorrelatednoise.
200
AppendixD
AnalyticalFormsforFittingtheCSF
The fovea contrast sensitivity function (CSF) data points were fitted with the
followingbi-parabolicfunction:
log(CS
f
) =
log(A)−
4
σ
2
1
log(2)
(log(f)−log(f
peak
)) for f 6f
peak
log(A)−
4
σ
2
2
log(2)
(log(f)−log(f
peak
)) for f >f
peak
(D.1)
TheperipheryCSFdatapointswerefittedwithabiphasicfunction,whichis
flattotheleftofthepeakfrequencyandhasaparabolicroll-off totheright:
log(CS
f
) =
log(A) for f 6f
peak
log(A)−
4
σ
2
log(2)
(log(f)−log(f
peak
)) for f >f
peak
(D.2)
In both Equations D.1 and D.2, A is the peak sensitivity of the CSF, f is the
spatialfrequency,CS
f
isthe contrast sensitivityatfrequencyf,f
peak
isthespa-
tial frequency at peak sensitivity, and A is the bandwidth of one limb of the
function in octaves. Equations D.1 and D.2 provide accurate interpolation of
the CSF within the relevant range of spatial frequencies needed for simulating
ourideal-observermodel.
201
AppendixE
Ideal-ObserverModel
HerewebrieflydescribethemathematicalformulationoftheCSFandtemplate
limited ideal-observer models used in Chapter 4. A schematic of the model is
shown in Figure 4.6, which is an extension of the model used in Chungetal.
(2002). As in Chung et al. we assumed that a linear filter derived from the
humanCSFandanadditivewhiteGaussiannoisesourceweresituatedbetween
the inputsignal, S, and the ideal observer (a maximum a posteriori(MAP) clas-
sifier). Amore detaileddescription of the decision rule for such a classifier can
be found elsewhere (Tjanetal., 1995). Here we restate the derivations that are
specifictoourcurrentapplication.
LetG(f) denote the transfer function of the CSF (Appendix D) andNint be
the noise source with each noise pixel being normally distributed with a mean
of zero and standard deviation σ. Then the resulting input that is fed into the
MAPclassifierisgivenby
I =F
−1
{F{S}·G}+Nint (E.1)
202
whereF{·}andF
−1
{·} represent forward and inverse Fourier transform oper-
ationsandS isthefilteredletterstimulus.
Templates used by the model were generated by filtering broadband letters
with raised cosine filters of various bandwidths at the same center frequency
as that of the stimulus. Because the templates can be of a different bandwidth
than the stimulus, the contrast of the internal templates is estimated as part of
thedecisionprocedure. Inotherwords, theideal-observermodelhasanuncer-
taintyinstimuluscontrast.
To be maximally correct on average, the MAP classifier selects the response
(corresponding to the internal template T ) that is the most probable given the
inputI:
p(T
j
|I) =
Z
P(cT
j
|I)dc
=
Z
P(I|cT
j
)P(cT
j
)
P(I)
dc (Bayesrule)
(E.2)
wherecistheinternaltemplatecontrast,whichisnotnecessarilythesameasthe
stimuluscontrast. SincethepriorprobabilityP(cT
j
)isconstantwithintherange
offeasiblecontrasts(alllettersareequallylikely),andP(I)doesnotdependon
T
j
,wecanfurthersimplifytheposteriorprobabilityasfollows:
p(T
j
|I) =k
Z
P(I|cT
j
)dc
=K
Z
exp
−
1
2σ
2
kI−cT
j
k
2
dc
(E.3)
The chosen response of the MAP classifier is then the signal corresponding
tothetemplateT
j
MAP
where
j
MAP
= argmax
j
P(T
j
|I) (E.4)
203
Since taking the integral in Equation E.3 is nontrivial, we adopted the fol-
lowingapproximation byreplacingintegration with maximization. Thatis, we
computed:
j
MAP
= argmax
j
P(T
j
|I)
= argmax
j
Z
P(I|cT
j
)dc
≈ argmax
j
n
max
c
P(I|cT
j
)
o
= argmax
j
max
c
exp
−
1
2σ
2
kI−cT
j
k
2
= argmax
j
n
min
c
kI−cT
j
k
2
o
(E.5)
204
AppendixF
OptimalIntegration
F.1 Optimal integration for white-noise limited
idealobserver
Claim F.1.1. Let c
f
be the contrast threshold in nominal contrast units for identify-
ing a narrow-band letter at center frequency f. If there is no overlap in the frequency
domainbetweenthecomponentswithcenterfrequenciesf
1
andf
2
(i.e.,thecomponents
areorthogonal,withazerodotproduct),thenforanidealobserverlimitedbyaninvari-
ant (stimulus-independent) additive white input noise (at the front-end), the contrast
sensitivity(reciprocalof the thresholdcontrast) forthe compositecanbepredictedfrom
thoseforthecomponents:
1
c
2
f
1
+f
2
=
1
c
2
f
1
+
1
c
2
f
2
(F.1)
Proof. An ideal observer (a maximum a posteriori classifier) with white addi-
tiveinputnoisemakesdecisionsbymaximizingtheposteriorprobabilitygiven
by Equation E.3. The posterior is computed in terms of the squared Euclidean
distance between the input image (I) and the letter templates at the test con-
trast (cT
j
), normalized by the noise variance, which is a constant for invariant
205
additive noise. As a result, the pairwise Euclidean distances between the let-
tersatthetestcontrastjointlydeterminetheaverageaccuracy(Tjanetal.,1995,
Appendix E). Hence, without lost of generality, we can confine our derivation
to the Euclidean distance between two randomly chosen letters. At a given
criterion accuracy, the Euclidean distance between the generic letter pair is a
constant.
Let A and B be the templates of two letters such that at contrast c, the sig-
nalscorresponding to these letters arecA andcB, and their Euclidean distance
is ckA−Bk. For a composite at contrast c, the nominal contrast of its compo-
nentsisalsocbydefinition. Thisdefinitionofcontrastconvenientlyreflectsthe
equality: cA
f
1
+f
2
= c(A
f
1
+ A
f
2
) = cA
f
1
+ cA
f
2
, where c is the nominal con-
trastofA
f
1
,A
f
2
,andA
f
1
+f
2
. NowconsidertheEuclideandistancebetweentwo
compositelettersatnominalcontrastc:
ckA
f
1
+f
2
−B
f
1
+f
2
k =c
q
(A
f
1
+f
2
−B
f
1
+f
2
)
2
=c
q
A
2
f
1
+A
2
f
2
−2A
f
1
B
f
1
−2A
f
2
B
f
2
+B
2
f
1
+B
2
f
2
=c
q
(A
f
1
−B
f
1
)
2
+(A
f
2
−B
f
2
)
2
(F.2)
Thisisbecausethedotproductsofthecross-frequencyterms(e.g.,A
f
1
A
f
2
or
A
f
1
B
f
2
)arezerosincethecomponentsareorthogonal. Equivalently:
c
2
kA
f
1
+f
2
−B
f
1
+f
2
k
2
=c
2
kA
f
1
−B
f
1
k
2
+c
2
kA
f
2
−B
f
2
k
2
(F.3)
206
As noted earlier, at a given criterion accuracy, the Euclidean distance
betweenagenericpairoflettertemplatesisaconstant. Hence,
c
2
f
1
kA
f
1
−B
f
1
k
2
=c
2
f
2
kA
f
2
−B
f
2
k
2
=c
2
f
1
+f
2
kA
f
1
+f
2
−B
f
1
+f
2
k
2
(F.4)
wherec
f
1
,c
f
2
,andc
f
1
+f
2
arethethresholdnominalcontrastsforthecomponents
f
1
and f
2
and the composite f
1
+f
2
, respectively. Expressing the composite of
EquationF.4intermsofitscomponentsusingEquationF.3,wehave:
c
2
f
1
kA
f
1
−B
f
1
k
2
=c
2
f
2
kA
f
2
−B
f
2
k
2
=c
2
f
1
+f
2
kA
f
1
−B
f
1
k
2
+c
2
f
1
+f
2
kA
f
2
−B
f
2
k
2
(F.5)
Solving the set of equations in Equation F.5 results in Equation F.1, hence
provingClaimF.1.1:
1 =
c
2
f
2
kA
f
2
−B
f
2
k
2
c
2
f
1
kA
f
1
−B
f
1
k
2
1 =
c
2
f
1
+f
2
kA
f
1
−B
f
1
k
2
+c
2
f
1
+f
2
kA
f
2
−B
f
2
k
2
c
2
f
1
kA
f
1
−B
f
1
k
2
⇒ 1 =
c
2
f
1
+f
2
c
2
f
1
+
c
2
f
1
+f
2
c
2
f
1
⇒
1
c
2
f
1
+f
2
=
1
c
2
f
1
+
1
c
2
f
2
(F.6)
CorollaryF.1.2. EquationF.1holdsiftheadditiveinputnoiseoftheidealobserverisa
multivariateGaussianand notnecessarilyuncorrelatedorwhite.
Proof. A linear transformation (a matrix multiplication) that decorrelates the
noise can be applied simultaneously to the input, noise, and letter templates,
207
such that a white-noise ideal observer formulation is applicable to the trans-
formed input. LetP be the “pre-whitening” matrix. We can follow the deriva-
tionofEquationF.1inClaimF.1.1byreplacingAandB byPAandPB,respec-
tively.
CorollaryF.1.3. EquationF.1holdsifalinearfilterisplacedatthefront-endofanideal
observer(asintheCSF-limitedidealobserver)beforetheinputnoise.
Proof. We can filter both the input and the templates and apply a Gaussian-
noiseidealobservertothetransformed input. IfF islinearfilter,wecanderive
Equation F.1 by replacingA andB byFA andFB, respectively, in Claim F.1.1.
CorollaryF.1.4. EquationF.1 holdsfor anobserverthatisotherwiseaGaussian-noise
idealobserverbutwithsamplingefficienciesη
1
forf
1
andη
2
forf
2
.
Proof. If the sampling efficiency is η, then the effective Euclidean distance
betweena genericpairof letters isscaled byη. Inthe derivation of ClaimF.1.1,
thisscalingcanbeimplementedbyreplacingkA
f
1
−B
f
1
kbyη
1
kA
f
1
−B
f
1
kand
kA
f
2
−B
f
2
kbyη
2
kA
f
2
−B
f
2
k,startingatEquation F.5.
Corollary F.1.5. Equation F.1 holds for an observer with a contrast nonlinear trans-
ducersituatedaftertheinputnoise.
Proof. Such a nonlinearity does not affect pixel-wise signal-to-noise ratio and
can be undone by taking its inverse. The effective Euclidean between any pair
oflettersisunaffected;hencethederivationofClaimF.1.1stillapplies.
208
F.2 Optimalintegrationforidealobserverwithmul-
tiplicativenoise
ClaimF.2.1. EquationF.1holdsifastimulus-dependentGaussiannoisesourceisadded
aftertheinvariantinputnoise,suchthatthevarianceofthissecondnoisesourceispro-
portionaltothesumofthecontrastenergyofthesignalandthevarianceoftheinvariant
input noise. Such a noise source is often calleda “multiplicative”noise. The invariant
noise is commonly referred to as an “additive” noise. The additive and multiplicative
noisesarestochasticallyindependent.
Proof. As stated earlier, the relevant quantity in computing the posterior is the
ratiobetweenthesquaredEuclideandistancebetweentheinputandatemplate,
andthevarianceofthenoise(EquationE.3). InthederivationofClaimF.1.1,the
variance of the noise is ignored because it isa constant. In the case of having a
contrast-dependentnoise,theproofofClaimF.1.1canbereplayedtoarriveata
version ofEquationF.6byconsideringtheequalityinthe“effective”Euclidean
distance between a pair of generic letters, which is the Euclidean distance nor-
malizedbythetotalvarianceofthenoisesources. Tosimplifynotations,wewill
consideronlythecase whenboth noisesources arewhite andindependent. By
applyingCorollaryF.1.2,theproofcanbeextendedtocorrelatedGaussiannoise
sources.
Let m be the proportionality constant that relates the variance of the multi-
plicativenoisetothesumofthecontrastenergyandthevarianceoftheadditive
noise. Let a be the image area in units of pixels, X be an arbitrary letter and [·]
209
denote expected value. The equality of effective Euclidean distances at thresh-
oldnominalcontrastcanbeexpressedas:
c
2
f
1
kA
f
1
−B
f
1
k
2
σ
2
+
σ
2
+
kX
f
1
k
2
c
2
f
1
/a
m
=
c
2
f
2
kA
f
2
−B
f
2
k
2
σ
2
+
σ
2
+
kX
f
2
k
2
c
2
f
2
/a
m
=
c
2
f
1
+f
2
kA
f
1
+f
2
−B
f
1
+f
2
k
2
σ
2
+
σ
2
+
kX
f
1
+f
2
k
2
c
2
f
1
+f
2
/a
m
(F.7)
FollowingthederivationofClaimF.1.1,wearriveatanormalizedversionof
EquationF.6:
σ
2
+
σ
2
+
kX
f
1
+f
2
k
2
c
2
f
1
+f
2
/a
m
c
2
f
1
+f
2
=
σ
2
+
σ
2
+
kX
f
1
k
2
c
2
f
1
/a
m
c
2
f
1
+
σ
2
+
σ
2
+
kX
f
2
k
2
c
2
f
2
/a
m
c
2
f
2
(F.8)
Dividing both side of the equality by (1 + m)σ
2
and rearrange terms, we
have:
1
c
2
f
1
+f
2
=
1
c
2
f
1
+
1
c
2
f
2
+
m
(1+m)σ
2
a
×
kX
f
1
k
2
+
kX
f
2
k
2
−
kX
f
1
+f
2
k
2
(F.9)
210
Moreover,sincethef
1
andf
2
componentsareorthogonal toeachother,
kX
f
1
+f
2
k
2
=
kX
f
1
+X
f
2
k
2
=
kX
f
1
k
2
+kX
f
2
k
2
+2X
T
f
1
X
f
2
=
kX
f
1
k
2
+
kX
f
2
k
2
(F.10)
Hence, the second line of Equation F.9 is zero, which gives us Equation F.1.
Corollary F.2.2. Equation F.1 holds under supra-threshold conditions for an observer
whose front-end consists of an additive “peripheral” noise, followed by a pixel-wise
logarithmic compressive nonlinearity, followed by a second additive “central” noise,
andisotherwiseideal.
Proof. Leggeetal. (1987) showed in their Appendix A that for a contrast
discrimination task when the pedestal is above detection threshold (supra-
threshold), an observer with a nonlinear transducer situated between two
invariant noise sources is equivalent to an observer with contrast-invariant
periphery noise followed by a contrast-dependent central noise, without any
nonlinearityinbetween. TheydidsobyexpressingthenonlinearitywithaTay-
lorseriesandconsiderthe netsignal-to-noise ratio. Theyshowed thatthevari-
ance of the contrast-dependent noise is inversely proportional to the squared
derivative of the transducer function under supra-threshold conditions. If we
are to require the variance of the central noise to be proportional to input con-
trastenergy,asisrequiredbyClaimF.2.1,thenthederivativeofthenonlinearity
has to be inversely proportional to input contrast, thus implying that the non-
linearity must be logarithmic (and thus compressive) in contrast. Claim F.2.1
appliesundertheseconditions, implyingEquationF.1.
211
Although the derivation of Legge et al. concerns a contrast discrimination
task,itisequallyapplicabletoan-wayidentificationtasksinceaswehavemen-
tioned in the derivation of Claim F.1.1, all that is relevant for determining the
performance of a maximum a posteriori observer for an n-way discrimination
task is the generic pairwise distance (d
′
in the case of a 2-way discrimination)
betweenapairofalternatives.
CorollaryF.2.3. Foranobserverwithanexpansivenonlinearityinbetweentheperiph-
eral and central noise, the threshold contrast sensitivities to the components and the
compositestimuliapproachEquationF.1insupra-thresholdconditionsasstimuluscon-
trastincreases.
Proof. RecallthatLeggeetal.(1987)showedthatthevarianceofequivalentcen-
tral contrast-dependent noise is inversely proportional to the squared deriva-
tive of the transducer function. For an expansive nonlinearity, the derivative
increases as a function of contrast. High contrast thus reduces the variance of
the central noise and diminishes its influence. In the limit, the central noise
becomesirrelevantandtheperformanceisonlylimitedbytheinvariantperiph-
ery noise, meeting the conditions of Claim F.1.1. Hence, Equation F.1 holds in
thelimitofhighstimuluscontrastforanexpansivenonlinearity.
212
AppendixG
GratingOrientationDiscrimination
Ourgoalinthissupplementaryexperimentwastomeasuretheindexofintegra-
tionforgratingsinawaythatiscomparabletothespatialfrequenciesandsizeof
the letter stimuli in Experiment 4.2 (§4.5). The component gratings were tested
atthesamefrequenciesasthatofthecenterfrequenciesofthelettercomponents
for subjects ASN, BW,and PLB in Experiment4.2. For ASN, this corresponded
tof
low
= 1.63andf
high
= 6.51cpdinthefoveaandf
low
= 1.3andf
high
= 5.2cpd
in the periphery. The compound gratings were formed by combining (in sine
phase) the corresponding component gratings according to the ratio of their
detection thresholds, estimated from the subject’s CSF. For example, the detec-
tionthresholdintheperipheryofASNwas0.86%inWebercontrastforf
low
and
1.92%forf
high
;asaresult,thecontrastratioofthecomponentsinthecomposite
was 0.86 (f
low
) to 1.92 (f
high
). Similar in principle to the definition used for let-
ters, we define the nominal contrast of a component or the composite to be the
correspondingWebercontrastoftheflowcomponentinthecomposite.
The task was a 2-way discrimination task between horizontally and verti-
cally oriented Gabor patches. The space constant of the Gaussian envelope of
the gratings wassettothe samesizeasthatof the letterstimuli (0.96
o
forASN,
213
1.15
o
for BW, 1.11
o
for PLB). QUEST procedure was used to adjust the contrast
ofthegratingstoachieveanaccuracylevelof74%. AswithExperiment4.2,the
componentandthecompositegratingsweretestedininterleavingblocks. Stim-
ulus presentation timing was identical to that of Experiment 4.2. The number
oftrialsperconditionwas180dividedoverthreeblocks. Theintegration index
wasestimatedasinExperiment4.2:
Φ =
CS
2
f
low
+f
high
CS
2
f
low
+CS
2
f
high
(G.1)
Figure G.1 shows the obtained integration indices in the fovea and in the
periphery. Also shown are the predicted indices for probability summation
whichwereestimatedbyconsideringonlytheresponsetothemaximallysensi-
tivecomponent:
Φ
prob.sum.
=
max
CS
f
low
,CS
f
high
2
CS
2
f
low
+CS
2
f
high
(G.2)
fovea periphery
Φ
0
0.2
0.4
0.6
0.8
1.2
1
ASN
predicted by
probability summation
0
0.2
0.4
0.6
0.8
1.2
1
0
0.2
0.4
0.6
0.8
1.2
1
BW PLB
Figure G.1: Integration indices of observers ASN, BW, and PLB for the orientation discrim-
ination task with sinusoidal gratings. The human indices are suboptimal for both the fovea
andperipheryconditionsandincludewithintheir95%CIsthepredictedindicesforprobability
summation(greendatasymbols).
214
Ashasbeenshowedinthepast(Grahametal.,1978),theprocessofintegra-
tionofcompoundgratingsissuboptimalandintherangepredictedbyprobabil-
itysummation. Wefoundnopracticeeffectonintegrationindexacrossblocksof
trials(highestpositivechangeinintegrationindexbetweenfirstandfinalblocks
was 0.15 for BW in the fovea; highest negative change was 0.22 for PLB in the
periphery).
215
AppendixH
Saccade-ConfoundedImage
Statistics: Methods
This appendix elaborates on the model and analysis methods developed in
Chapter5.
H.1 GeometryofV1
We assume a columnar architecture of V1 with the cortical hypercolumns
packed hexagonally in cortical space (Figure 5.2A). The computational units
withineachhypercolumnhavereceptivefieldscenteredatthesameretinalloca-
tion but each tuned to different orientations and spatial scales. For our pur-
poses, we are concerned mainly with the orientation tuning of these units. We
thususe8broad-band orientedfilterswhich extractthecorresponding orienta-
tion energy in the image patch under the RF (θ = [0,1,...,7]×
π
8
radians; see
Figure5.4A).Theaveragediameterofthereceptivefieldsincreaselinearlywith
eccentricitywithaslopeof0.1(Motter,2002).
216
We can show that this geometry captures the logarithmic cortical magni-
fication (Tootelletal., 1982) as follows. Let s be the slope of the linear func-
tion relating RF diameters to eccentricity and γ be the proportion of RF over-
lapbetweenadjacenthypercolumns(assumedconstantacrossalleccentricities).
For any hypercolumn, the eccentricity of the RF (φ) and the cortical distance p
from the cortical location in V1 that represents the center of the fovea (in units
ofhypercolumns) canberelatedasfollows:
dφ
dp
=s(1−γ)φ (H.1)
For our model, s = 0.1 (Motter, 2002) and γ = 0.3 (average value in
(WilsonandSherman,1976)). Solvingthedifferentialequation,weget
φ(p) =e
s(1−γ)p
(H.2)
Invertingthefunction,wecanexpresspasafunctionofφas
p(φ) =
log(φ)
s(1−γ)
(H.3)
EquationH.3thusgivesthelogarithmiccorticalmagnification.
217
Ifweassumethecritical spacinginthevisual fieldforcrowdingtobeΔφ =
bφ,wherebisBoumasconstant(about0.5),thenthecriticalspacinginthecortex
is
Δp =p(φ+Δφ)−p(φ)
=
ln(φ+Δφ)−ln(φ)
s(1−γ)
=
ln(φ+bφ)−ln(φ)
s(1−γ)
=
ln(1+b)
s(1−γ)
(H.4)
Using b = 0.5 we get Δp ≈ 6. That is, the critical spacing for crowding corre-
spondstosixhypercolumnsinV1,independentofeccentricity. Thisisinagree-
ment with the anatomical extent of lateral (long-range horizontal) connections
in V1 (Stettleretal., 2002) and with the estimated extent of “combining fields”
inV1(Pelli,2008)ifeachhypercolumn isroughly1mmonthecortex.
Conversely, Equation H.4 shows that if a computational unit in a particular
hypercolumn has lateral connections to all computational units in neighboring
hypercolumns up to an isotropic extent of a constant number of hypercolumns
on the cortex, then the resulting spatial interaction in the visual field must fol-
low Bouma’sLawof linearscaling(Δφ = bφ). Wewill refertothe setof hyper-
columns (blue circles in Figure 5.2A) to which a reference hypercolumn (red
circle in Figure 5.2A) has lateral connections, as the lateral interaction zone. In
ourmodel,wesettheradiusofthelateralinteractionzoneto6hypercolumns.
218
H.2 Saccadiceyemovements
Forthepurposeofeyemovementsimulations,thesaccadicvelocityprofilewas
modeledasfollows. LetAbethesaccadeamplitude(thedistancebetweensuc-
cessivefixationsindegreesofvisualangle),T thedurationandv(t)thevelocity
profile of a saccade. v(t) must satisfy the following conditions (Lebedevetal.,
1996):
v(0) = 0
v(T) = 0
v
peak
=v
T
2
=k
√
A
dv
dt
t=
T
2
= 0 (H.5)
wherek is a constant. A sinusoidal velocity profile of the following form satis-
fiestheconstraintsinEquationH.5intherange[06t6T]
v(t) =k
√
Asin
πt
T
(H.6)
SinceA =
R
T
0
v(t)dt,wehave
T =
π
2k
√
A (H.7)
Thedistancetraversed,D(τ),intimeτ isthereforegivenby
D(τ) =
Z
τ
0
v(t)dt
=
A
2
1−cos
2kτ
√
A
(H.8)
219
Thedistributionofsaccadeamplitudesalongtheradialaxisfromthefoveawas
modeled as an exponential distribution (Bahilletal., 1975) with the following
p.d.f. :f(x) = μe
−μx
, μ =
1
7.6
. The distribution along the iso-eccentric axis was
assumedtobeuniform.
H.3 Eyemovementsimulationsandimagestatistics
Using the distribution of saccade amplitudes and the corresponding velocity
profile described above, we simulated saccadic eye movements in which the
visual stimulus presented tothe system wasarandom clutter of uppercase let-
ters (Palatino Linotype font) at various sizes and orientations. For computa-
tional tractability we calculated the outputs of the setof 8 broad-band oriented
filters for each hypercolumn at discrete time points in the interval [0...T]. Let
r[t]denotetheresponseofafilterattimet. Thecumulativeresponseofthefilter
overthetimecourseoftheeyemovementis
r =
T
X
t=0
r[t]e
−
t
λ
(H.9)
where the modulation of spatial attention during its overlap with saccadic eye
movement (Figure 5.2B) is modeled as an exponential decay function with a
time constant λ, a free parameter of the model. For the purpose of calculating
joint image statistics, the cumulative filter response,r, is first converted into a
firingprobabilitypwithasaturatingnon-linearity:
p = tanh(kr) (H.10)
220
Let Θ
i,R
be a random variable associated with a filter with orientation θ
i
in
thereferencehypercolumnR. Θ
i,R
isequalto1ifthecellfires,elseitiszero. The
jointprobabilitydistributionP(Θ
i,R
,Θ
j,N
),betweentheorientedfilterintheref-
erence hypercolumn and another oriented filter in a neighboring hypercolumn
(R) can be calculated by accumulating and averaging the joint firing probabil-
ities across many eye-movement traces (30000 in our simulations). To obtain
robustestimatesofthejointprobabilitydistributionweusedtheBootstrappro-
cedure(Efron andTibshirani,1993). Foranysaccadetrace, theprobabilities are
accumulated only if both the reference and the neighboring hypercolumn are
underthespotlightofattention. Finally,thestatisticaldependencebetweenΘ
i,R
andΘ
j,N
canbecalculatedintermsofthepair-wisemutualinformation
I(Θ
i,R
;Θ
j,N
) =
X
Θ
i,R
={0,1}
Θ
j,N
={0,1}
P(Θ
i,R
,Θ
j,N
)log
2
P(Θ
i,R
,Θ
j,N
)
P(Θ
i,R
)P(Θ
j,N
)
(H.11)
Themutualinformationiszerowhenthetworandomvariablesarestatistically
independent.
For a reference hypercolumn R, pooled mutual information (pooled across
allorientations) betweenRandaneighboringhypercolumn Nisdefinedas
I
SC
(R;N) =
X
i
X
j
I
SC
(Θ
i,R
;Θ
j,N
)
I
V
(R;N) =
X
i
X
j
I
V
(Θ
i,R
;Θ
j,N
) (H.12)
where I
SC
and I
V
are the pair-wise mutual information for the saccade-
confounded and veridical conditions respectively (Equation H.11). We express
the gross difference between the saccade-confounded and veridical statistics in
221
term of the normalized difference between saccade-confounded and veridical
mutualinformations:
ΔI(R;N) =
I
SC
(R;N)− I
V
(R;N)
I
V(R;N)
(H.13)
This normalized difference when plotted in visual space for all neighboring
hypercolumns maps the spatial extent of inappropriate integration for a refer-
encehypercolumn.
222
AppendixI
DetailedMeasurementofCrowding
Zones
Manyofthecurrentstudiesthatmapouttheshapeofthecrowdingzone(Figure
1.1C) have two limitations: (a) symmetric flankers are displayed on either side
of the target and due to the inward-outward asymmetry the measured zone is
biasedtowardthefoveaand(b)performanceismeasuredatonlyafewcardinal
directions (typically 4 or 8) around the target. We wanted to obtain a bias-free
estimate of the crowding zone by (a) using a single flanker and (b) measuring
performanceatmultiplelocationsaroundthetarget.
I.1 Methods
Humanexperimentswereconductedinwhichobservershadtoperformaletter
identification task in the presence of a single letter flanker. The target was one
of 10 letters (‘C’,‘D’,‘H’,‘K’,‘N’,‘O’,‘R’,‘S’,‘V’,‘Z’) rendered in Sloan font, while
the flanker was the letter ‘E’ rotated at one of 4 orientations (0
◦
,90
◦
,180
◦
,270
◦
).
Theflankerwaspresentedatoneof12azimuthanglesaroundthetargetandat
223
Bouma’s circle
single flanker ∈ { } E E
E E
target ∈ { C D H K O N R S V Z }
(Sloan letters)
flanker locations (r,θ)
r = (0.125, 0.25, 0.375, 0.5, 0.625)
×
Φ
θ = (0°,30°,60°, ... ,330°)
θ
r
tangential
radial
Z
E
+
fixation
Φ
100 trials per flanker location
Measure percentage correct
Full contrast target & flanker
Figure I.1: Experiment design to measure the fine structure of the crowding zone. A single
flanker(‘E’atanyoneof4orientations)isplacedat60differentlocationsaroundthetarget(one
ofthe10Sloanletters). Performance(percentagecorrect)ismeasuredateachflankerlocation.
oneof5separationsfromthetarget,thusyieldingatotalof60flankerpositions
(Figure I.1). The center-to-center separation between the target and the flanker
wasatmultiplesof0.125,0.25,0.375,0.5or0.625ofthetargeteccentricity.
The stimuli were presented on a 19” CRT monitor (Sony Trinitron CPD-
G400)ataviewingdistanceof80cms. Ineachtrial,thesubjectshadtofixate at
a cross while the stimulus was presented according to the following temporal
design: (1)afixationbeepimmediatelyfollowedbyafixationscreenfor250ms,
(2)stimuluspresentationfor250ms,(3)subjectresponseperiod(variable)with
positivefeedbackbeepforcorrecttrials,and(4)250msdelaybeforeonsetofthe
next trial. Depending on the experimental condition, the target was presented
either at 10
◦
in the upper or at 10
◦
in the lower visual field. The target and the
224
orientation of the flanker was randomly chosen in each trial. Both the target
andtheflankerwerepresentedatfullcontrast. Theresponse ofthesubjectwas
storedineachtrialaseithercorrectorincorrect.
Each experiment consisted of 48 blocks, with 125 trials per block. In each
block, the flanker was presented at a fixed azimuth angle with respect to the
targetbutatanyofthe5separations. Withinablock,therewere25trialsateach
separation and the order of the presentations were randomly shuffled. There
were4suchblocksforeachofthe12azimuthangles. Theblockswererandomly
ordered withtheconstraint thatthen
th
repeatof aparticularazimuth occurred
onlyafterallazimuthshadbeentestedforatleastn−1blocks.
Prior to the main experiment, letter acuity was measured at the desired
target location. The subjects were asked to identify any of the 10 tar-
get letters, while the size of the presented letter was varied using QUEST
(WatsonandPelli,1983)toachieveanidentificationaccuracyof79%.
Analysisprocedure
Foreachof the60flankerlocations, wefirstobtainedapercentage correctmea-
sure (100 trials per location) from the experimental data. We next performed a
2-D surface interpolation of the data points. This 2-D interpolated map gives
a visual map of the crowding zone. To further quantify the shape character-
istics, we obtained a bootstrapped estimate (EfronandTibshirani, 1993) of the
95% confidence interval for a particular map contour at 65% correct. This was
done by sampling with replacementthe percentage correct data collected from
the experiment multiple times (n = 100). For each such sample, we obtain an
interpolatedmapasaboveandacontouratthesamecriterion. Fromtheensem-
bleofsuchcontourswecouldthuscalculateaconfidenceintervalofthecontour
225
obtainedfromtherawdata. Aleastsquaresellipticalfittothecontourprovides
asummaryoftheshapeofthezoneattheparticularcriterion.
I.2 Results
Acomparison of the crowding zone betweenthe upperand lowervisual fields
is shown in Figure 5.8. The zones in the upper field are less elongated than
those in the lower field. The aspect ratios of the fitted ellipses are shown in
Table I.1. The relationship between this empirical finding and our proposed
modelofcrowdingisdiscussedinChapter5(§5.4).
Upper Lower
1.15(MS) 4.32(MS)
1.10(MA) 2.09(ASN)
1.53(DL) 2.65(DL)
Table I.1: Aspect ratios of the least squares elliptical fits to the contours at 65%
correct (Figure 5.8) at10
◦
inthe uppervisualfield (leftcolumn) andat10
◦
inthe
lowervisualfield(rightcolumn). Subjectinitialsaregiveninparenthesis.
226
Abstract (if available)
Abstract
Visual crowding is an ubiquitous limitation of peripheral vision and manifests itself as the marked inability to identify shapes when targets are flanked by other objects. It presents a fundamental bottleneck to object recognition in peripheral vision. Although the phenomenon has been widely studied over the last four decades, the neural mechanisms underlying crowding remain unsettled. Such an understanding is critical for the development of visual enhancement aids for patients with central field loss.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Biologically inspired mobile robot vision localization
PDF
Functional magnetic resonance imaging characterization of peripheral form vision
PDF
Crowding in peripheral vision
PDF
Spatiotemporal processing of saliency signals in the primate: a behavioral and neurophysiological investigation
PDF
The representation of medial axes in the perception of shape
PDF
Attention, movie cuts, and natural vision: a functional perspective
PDF
Mode of visual perceptual learning: augmented Hebbian learning explains the function of feedback and beyond
PDF
Eye-trace signatures of clinical populations under natural viewing
PDF
The measurement of motion parameters in the perception of optic flow
PDF
Invariance to changes in contrast polarity in object and face recognition
PDF
Characterizing the perceptual performance of a human obseerver as a function of external noise and task difficulty
PDF
The neural representation of faces
PDF
Autonomous mobile robot navigation in urban environment
PDF
Sensitivity and dynamic range of rod pathways in the mammalian retina
PDF
Procedural animation of emotionally expressive gaze shifts in virtual embodied characters
PDF
Computational modeling and utilization of attention, surprise and attention gating
PDF
Computational modeling and utilization of attention, surprise and attention gating [slides]
PDF
Quantification and modeling of sensorimotor dynamics in active whisking
PDF
Emotion, attention and cognitive aging: the effects of emotional arousal on subsequent visual processing
PDF
Explicit encoding of spatial relations in the human visual system: evidence from functional neuroimaging
Asset Metadata
Creator
Nandy, Anirvan S. (author)
Core Title
Crowding and form vision deficits in peripheral vision
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
06/10/2010
Defense Date
04/23/2010
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
AMD,computational modeling,crowding,form vision,OAI-PMH Harvest,peripheral vision,visual psychophysics
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Tjan, Bosco S. (
committee chair
), Baker, Laura A. (
committee member
), Biederman, Irving (
committee member
), Grzywacz, Norberto M. (
committee member
), Itti, Laurent (
committee member
)
Creator Email
anirvan.nandy@gmail.com,nandy@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3122
Unique identifier
UC1129968
Identifier
etd-Nandy-3824 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-335299 (legacy record id),usctheses-m3122 (legacy record id)
Legacy Identifier
etd-Nandy-3824.pdf
Dmrecord
335299
Document Type
Dissertation
Rights
Nandy, Anirvan S.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
AMD
computational modeling
crowding
form vision
peripheral vision
visual psychophysics