Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The neural representation of faces
(USC Thesis Other)
The neural representation of faces
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
THE NEURAL REPRESENTATION OF FACES
by
Xiaomin Yue
_____________________________________________________________
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(PSYCHOLOGY)
August 2007
Copyright 2007 Xiaomin Yue
ACKNOWLEDGEMENTS
First of all, I would like to thank, from the bottom of my heart, my
advisor, Professor Irving Biederman. Without his relentless intellectual
support and tremendous encouragement during all phases of my graduate
career, this dissertation would have been “mission impossible.” He is a
fabulous, brilliant scientist with a great heart. I am so lucky to be one of his
graduate students.
I would also like to thank my lovely wife, Rosemary Behizadeh Yue, for
her continual support and encouragement which so very much buoyed my
spirits on this long a challenging journey.
Special thanks are deserved to Prof. Bosco Tjan and Prof. Zhong-Lin
Lu who taught me fMRI design and data analysis, Prof. Laura Baker who
taught me multivariate regression which is at the core of fMRI data analysis,
Prof. Bartlett Mel who promoted me to think rigorously on a number of
issues, and Prof. Christoph von der Malsburg who allowed me use his Face
Recognition System to scale the physical similarity of the images and for a
number of useful discussions.
The research described in this dissertation owed much to my fellow
graduate students, Dr. Michael C. Mangini for writing the Maltab code to
generate the blobs, Marissa Nederhouser for training the blob experts,
Shuang Wu for helping me use the Malsburg’s face recognition system,
ii
and Kenneth Hayworth for many invaluable and critical discussions. Thanks
are due to the Image Understanding Laboratory work study students,
including Henry Nguyen, Tiger Nguyen, Christine Nguyen, and Kyle Huynh
for running the subjects.
iii
TABLE OF CONTENTS
Acknowledgements ii
List of Figures vi
Abstract viii
Chapter 1: A theory of face recognition 1
1.1 Behavioural differences between face and object recognition 1
Table 1: A summary of behavioural differences between 6
face and object recognition
1.2 Gabor-jet model of face recognition 7
Chapter 2: Stimuli generation and testing 15
2.1 Stimuli generation 15
2.2 Gabor-jet similarity measurement of pairs of images 19
2.2.1 Participants and experimental procedures 19
2.2.2 Results 21
2.2.3 Conclusion regarding the scaling of physical 22
similarity
Chapter 3: A behavioural study 23
3.1 Methods 23
3.1.1 Stimuli generation 23
3.1.2 Expertise training 26
3.1.3 Participants and experimental procedures 26
3.2 Results 29
3.2.1 Behavioural results 29
3.2.2 Ideal Observer results 30
3.3 Discussion 32
Chapter 4: A fMRI study 35
4.1 Methods 36
4.1.1 Stimuli 36
4.1.2 Expertise training 36
4.1.3 fMRI design 37
4.1.3.1 Imaging parameters 38
4.1.3.2 Region of interest (ROI) 38
4.1.3.3 fMRI data analysis 41
iv
4.2 Results 42
4.2.1 Behavioural results 42
4.2.2.Ideal observer analysis 45
4.2.3 fMRI results 46
4.3 Discussion 51
Chapter 5: Contributions and speculations 56
5.1 Contributions 56
5.2 Some additional speculations 59
Bibliography 62
v
LIST OF FIGURES
Figure 1.1: Contrast negation and reversal effect 3
of face recognition.
Figure 1.2: Gabor-jet model for face recognition. 9
Figure 1.3: Composite face demonstration. 11
Figure 2.1: Generation of visual stimuli. 17
Figure 2.2: Match-to-Sample paradigm. 21
Figure 2.3: Mean correct reaction times and error rates as 22
a function of similarity between distractor and
matching stimuli.
Figure 3.1: Generation of complementary images. 25
Figure 3.2: Illustration of Match-to-Sample displays in 28
the complementary experiment.
Figure 3.3: Match-to-Sample error rates for matching 31
identical and complementary faces and blobs
for novices, experts, and the ideal observer.
Figure 4.1: Localizer images and BOLD responses 40
at the right FFA and LOC.
Figure 4.2: Same-different error rates for matching 44
identical and complementary faces and blobs
for novices, experts and the ideal observer.
Figure 4.3: Hemodynamic response functions of blobs 47
Experts and blob novices to blobs and faces at FFA.
Figure 4.4: Hemodynamic response functions for correct 48
trials when matching faces in the right FFA.
vi
Figure 4.5: Hemodynamic response functions in LOC 49
for changes in Identity and Frequency-Orientation
combinations when viewing faces.
Figure 4.6: Hemodynamic response functions in LOC for 50
blob experts and novices.
Figure 4.7: Hemodynamic response functions in the right LOC 51
for blob experts when matching blobs.
vii
ABSTRACT
What is the nature of the representation of faces and objects that results
in the striking behavioral differences in their recognition? Biederman &
Kalocsai (1997) proposed that faces were represented in terms of their original
spatial-filter excitation values with allowance for scale and position invariance;
objects were represented as a structural description specifying arrangements
of parts and relations in terms of their edge discontinuities. Behavioural and
fMRI studies were conducted to examine their hypothesis. A comparison of
the representation of face and nonface objects requires the same low-level
processing demands for the two kinds of stimuli. To achieve this, a set of
smooth, irregular 3D blobs were generated that differed only metrically (rather
than nonaccidentally) and without the discontinuous edges that characterize
most objects. During the fMRI experiment, subjects matched a sequence of
two filtered images, each containing the content of every other combination of
eight spatial frequency scales and eight orientations, of faces or of these
blobs, judging whether the person or blob was the same or different. On a
match trial, the images were either identical or complementary (containing the
remaining spatial frequency and orientation content). The physical similarity of
faces and blobs on non-match trials was equated according to a Gabor-jet
model of V1 similarity (Lades et al, 1993). Relative to an identical pair of
images, a complementary pair of faces, but not blobs, reduced matching
viii
accuracy and released fMRI adaptation in the right fusiform face area (FFA).
These results held for both blob novices and experts who had been trained for
over 8,000 trials in discriminating a different set of blobs. Moreover, the
magnitude of the BOLD response to blobs in the left and right FFA was
identical for experts and novices. This outcome is contrary to claims that
extensive discrimination training, in adulthood will produce face-like
representations for non-face objects. Taken together the results provides
strong evidence behaviorally and neurally that faces, but not nonface objects,
are represented spatially (e.g., as Gabor activation values) in the FFA and that
extensive adult discrimination training, cannot produce such coding for non
face objects.
ix
CHAPTER 1: A THEORY OF FACE RECOGNITION
In a glance, we quickly, accurately, and effortlessly recognize objects
and individual faces. Most advanced computer algorithms developed so far
cannot match human recognition performance (Zhao, et al, 2003; Hancock,
Bruce, & Burton, 1998). Some of the most challenging questions in cognitive
neuroscience center on the representations of objects and faces that allow
this extraordinary achievement. This dissertation investigates the nature of
these representations to determine whether they differ for faces and objects,
and, if so, what the differences would be.
1.1 Behavioural differences between face and object recognition.
There are a number of striking differences in the recognition of faces
and objects, even when the to-be-distinguished objects are as similar as
faces. Faces are much more affected by contrast reversal (Subramaniam &
Biederman, 1997; Galper, 1970) and orientation inversion (Yin, 1969). Faces
show “configural” effects (Leder & Bruce, 2000; Tanaka & Farah, 1993).
Differences between similar faces are extraordinarily difficult to articulate
whereas differences between similar objects tend to be readily describable in
terms of their part differences (Biederman & Kalocsai, 1997).
Contrast negation and reversal. Contrast negation is defined here as
the case when both to-be-matched stimuli are negative. Contrast reversal is
1
defined here as the case when, one stimulus is positive and the other the
other negative. Face images presented in negative contrast are known to be
difficult to recognize (Galper, 1970; Galper & Hochberg, 1971; Phillips, 1972,
1979; Luria & Strauss, 1978; Hayes, Morrone & Burr, 1986; Kemp, McManus
& Pigott, 1990; Johnston, Hill & Carman, 1992; Bruce & Langton, 1994;
Kemp, Pike, White, & Musselman, 1996; Liu & Chaudhuri, 1997; Gauthier,
Williams, Tarr & Tanaka, 1998; Liu & Chaudhuri, 1998), while recognition of
nonface objects is not affected by presenting the object images in negative
contrast. For instance, it is difficult to identify a person from a negative
picture of his or her face while objects remain readily recognizable. Given
that the memory of the person’s face is a positive representation, the
matching of a negative to such a case would be an example of contrast
reversal. In general, it is remarkably hard to match two faces that have
opposite (i.e., reversed) contrast during brief presentations, as effect not
witnessed for non-face objects (Nedehouser, Yue, Mangini, & Biederman,
2007; Vuong, Peissig, Harrison, & Tarr, 2005).
Using sequentially presenting stimuli, Subramaniam and Biederman
(1997) compared the costs of contrast reversal and negation on the same-
different matching of faces and chairs of equal average shape dissimilarity on
“different” trials. There was almost no cost of contrast reversal on matching
chairs but a massive 20% increase in error rates (chance = 50%) and a 80
msec increase in reaction times when matching faces of opposite contrast
2
polarity. They reported only a slight effect of negation on either class of
stimuli, suggesting that positive faces might be represented differently from
negative faces, but representation of objects in positive contrast, chairs in
this case, does not differ from those of negative contrast objects (Fig. 1.1).
Figure 1.1. Contrast negation and reversal effect of face recognition.
Stimuli and Results are adapted from Subramaniam & Biederman
(1997). Reversing the contrast of to-be-matched faces resulted in
much larger increases in RTs and error rates than chairs. Cont.
Rev. (Contrast Reversed) by the usage adopted here would be
termed Contrast Negation.
3
Figure 1.1: Continued
The Face inversion effect. Since Yin (1969) first demonstrated face
recognition was disproportionately affected by orientation inversion
compared to object recognition, the face inversion effect (FIE) has become a
significant hallmark distinguishing face recognition from object recognition.
(Jolicoeur, 1985; Johnston, Hill, & Carman, 1992; Freire, Lee, & Symons,
2000; Leder & Bruce, 2000; Collishaw & Hole, 2000; Rock, 1974; Yarmey,
1971).
4
Nederhouser and her colleagues (2005) compared orientation
inversion effect for faces and objects (blobs, to be described below) by using
a match-to-sample paradigm in which three images were displayed on a
computer screen simultaneously, one on the upper part of the screen
(sample) and two on the lower part of the screen (target and distractor). All
three images could be in the same orientation or the orientation of the
sample could differ from that of the target and distractor (both of which were
always in the same orientation). Subjects were instructed to respond by
pressing a key to indicate which of the two pictures on the lower part of the
screen were identical to the image on the upper part of the screen. A
mismatch of the orientation of faces produced an increase of 25% in error
rates compared to an increase of only 6% for the blobs (chance was 50%).
This marked difference in the susceptibility of recognition of faces and
objects to inversion was witnessed under conditions where target and
distractor blobs were only distinguished by metric variations and the physical
similarity of target and distractor blobs and faces were matched in terms of
Gabor-jet measure (to be described below), suggesting that faces might be
represented differently from objects.
5
FACES OBJECTS
Contrast Polarity Yes No
Illumination Direction Yes No
Rotation in Depth Yes
No, within part aspects
(~ 60
o
)
Rotation in Plane Yes Slightly
SENSITIVE TO
Metric Variation Yes Slightly
Configure Effect Yes No
Differences Verbalizable
No
Faces are “ineffable”
Yes
Basis of Expertise
Configural
Representation
Distinctive Feature
Discovery
Table 1. A summary of behavioural differences between face and object
recognition, adapted and modified from Biederman & Kalocsai (1997).
The noticeable differences between face and object recognition are not
limited to contrast polarity and rotation in the plane. In fact, our recognition
ability for faces and object is differentially affected by a number of variables.
For example face recognition is more sensitive to illumination variation than
object recognition (Hill & Bruce, 1996; Liu, Colling, Burton, & Chaudhuri,
1999). A relatively complete comparison between face and object
recognition is summarized in the Table 1, which indicates that faces might be
represented differently from objects in such a manner that face recognition is
more vulnerable to variables that have little or no effect on object recognition.
6
1.2 Gabor-jet model of face recognition.
What is the nature of the representation of a face—not shared with
objects--that causes its reduced recognition with a mismatch of contrast and
inversion? Biederman and Kalocsai (1997) proposed that these differences
could be understood if the representation of faces retained aspects of the
original (V1 to V4) spatial filter representation, in a manner similar to that
proposed by C. von der Malsburg’s Gabor-jet model (Lades et al., 1993), with
translation and scale invariance (with scale expressed as cycles per stimulus
rather than cycles per degree). Retention of the spatial frequency and
orientation information allows storage of the fine metrics, pigmentation, and
the surface luminance distribution important for the individuation of similar
faces. However, such a representation is susceptible--as is human face
matching performance--to variations in lighting conditions (Hill & Bruce, 1996;
Liu, et al, 1999), viewpoint (Bruce, 1982; Hill, Schyns, & Akamatsu, 1997),
and direction of contrast (Kemp, Pike, White, & Musselman, 1996).
The Gabor-jet model (Lades et al., 1993) was inspired by the biology of
the primary visual cortex. The fundamental unit of representation in the
Gabor-jet model is the Gabor wavelet as indicated in the name of this model.
The Gabor function describes a radically symmetric Gaussian window
applied to a sine (or cosine) wave (Fig. 1.2a). This function provides an
excellent approximation to the receptive field properties of V1 simple cells
(DeValois and DeValois, 1990), and reduces the redundancy of visual
7
information as well as keeps representation as sparse as possible
(Olshausen & Field, 1996). A number of such wavelet filters differing in
orientation and scale (8 orientation and 5 scales in the model) (Fig. 1.2b),
with receptive fields centered at the same location of an input image, is
termed a Gabor “jet” and is analogous to a V1 hyper column (Fig. 1.2c). In
the model, an input image is filtered by the Gabor-jet, and then the activation
pattern of Gabor coefficients in the selected position is stored in the gallery
for later comparison (Fig. 1.2c). Similarity of two images is taken to be the
sum of the correlation of the magnitudes of activation values in
corresponding jets (Gabor coefficients) for the 80 corresponding kernels.
The correlation for each pair of jets is the cosine of the angular difference
between the vectors of the kernels in a 40-dimensional space (40-
dimensional ‘power spectrum vector’). The correlations over jets are
averaged to get a total similarity score, expressed as a proportion of the
maximum score.
8
Figure 1.2. Gabor-jet model for face recognition. a) a single Gabor kernel. b)
a Gabor jet in the Fourier domain with 8 orientation and 5 scales that scale
(lowest to highest) outward from the center. For efficiency, the sine and
cosine kernels are combined into a single equation so that the kernels in the
Fourier domain are asymmetric. c) a face and the position of the center of
the Gabor jets. d, e, f) Gabor coefficients of the face at different scales, 32
cycles/stimulus (d), 16 cycles/stimulus (e), 8 cycles/stimulus (f) and
orientations, 0 degree = vertical (d), 90 degree = horizontal (e), and 135
degree (f). The contrast indicates the magnitude of Gabor coefficients.
Unlike the Gabor-jet computer-based system, which essentially
converts simple cell activations to complex cell activations, as an aid to
reduce the effect of small image variations on matching, our application to
biological vision retains the original simple cell activations, thus rendering the
representation sensitive to variations in the direction of contrast. Although
nonface objects (as well as any visual stimulus) are similarly encoded in
9
the early stages of visual processing, the ultimate representation would not
be defined by the initial filter values, but “moderately complex features”
(Kobatake & Tanaka, 1994) often corresponding to simple parts largely
defined by orientation and depth discontinuities (Biederman, 1987; Kobatake
& Tanaka, 1994; Grill-Spector, Kourtzi, & Kanwisher, 2001), allowing
invariance to lighting conditions (Vogels & Biederman, 2002), viewpoint
(Biederman & Bar, 1999), and direction of contrast (Subramanian &
Biederman, 1997). These features might be at a small scale when
distinguishing among highly similar members of a subordinate-level class
(Biederman, et al., 1999).
Configural effects. Faces are said to be represented “configurally” or
“holistically”, which means that faces are recognized as a whole instead of an
assembly of different features (e.g. eyes, nose, and mouth). The most
convincing demonstration of a configural effect has been provided by Young,
Hellawell, & Hay (1987), which they called the ‘composite face effect’. (Fig.
1.3) They combined the top half of one face with the bottom of another.
When correctly aligned, it was hard to recognize the individual identities of
the two halves; when misaligned, identification was much easier. Biederman
and Kalocsai (1997) suggested that the misalignment provided a parsing cue
that allowed the separation of the composites, and the shift cued which jets
could be turned off to make the partial comparison possible. This
phenomenon demonstrates that when upright faces are processed, the
10
internal features are so strongly tied together that it is almost impossible to
parse the face into isolated features for feature-by-feature comparisons
(Hole, 1994). Although the configural effect makes it difficult to express
verbally the differences between two similar faces, it could facilitate
recognition of facial features in context. Tanaka & Farah (1993)
demonstrated that subjects are more accurate in recognizing the identity of a
feature when it is presented in the context rather than as an isolated feature.
Figure 1.3. Composite face demonstration. The faces of the two individuals
comprising the left composite (aligned) are more difficult to discern than
when in the composite in the right panel (misaligned). Perceptually, we treat
the aligned composition face as a new face. This image is adapted and
modified from Schwaninger, Carbon, & Leder (2003).
Perhaps the only neurocomputational account of the face configural
effect is offered by the Gabor-jet model (Lades et al, 1993). Biederman and
Kalocsai (1997) argued that the coverage of the face by Gabor kernels,
11
especially those with larger receptive fields, would produce configural effects
insofar as variations in contrast in any one region of the face would affect
activation of kernels whose receptive fields were not centered at that region
and that, conversely, any one kernel would be affected by contrast from all
regions of the face within its receptive field. Such activation would be
extremely difficult to articulate, thus contributing to the ineffability of
describing the differences between similar faces, a difficulty rarely witnessed
when people discriminate highly similar nonface objects (Biederman &
Kalocsai, 1997; Mangini & Biederman, 2004).
Expertise. Some researchers have suggested mechanisms (and the
representation) underlying face and object recognition are the same, differing
only in their within-category expertise (Gauthier & Tarr, 1997). They have
argued that experts who have achieved their expertise as adults distinguish
objects in the domain of their expertise in the same way that faces are
distinguished, viz., configurally, and that expertise for nonface objects is
expressed at the same neural locus as faces (Tarr & Gauthier, 2000;
Riesenhuber & Poggio, 2000).
This expertise hypothesis in face and object recognition can be tested
using behavioural paradigms. Face recognition uniquely suffers significantly
when a face is negated or inverted compared to an object. If face recognition
is not special, just a reflection of expertise with any object class,, then
recognizing other object categories in which people have gained expertise
12
should show the significantly more cost when the objects are inverted or
negated by experts than novices. Gauthier and her colleagues (1997, 1998)
trained subjects over 3000 trials to be experts of distinguishing a family of
“Greebles”, artificial, symmetrical, 3D objects with human-like configurations
with ears, a nose, and an erect penis. Greeble experts then were tested in
an array of experiments of matching Greebles in reversed contrast, inverted
orientation, and composite Greebles (Gauthier, Williams, Tarr, & Tanaka,
1998). Being consistent with the expertise hypothesis, the Greebles experts
performed “significantly” worse than the Greebles novices in all of those
conditions.
However, attempts to replicate Gauthier, et al’s findings using a variety
of different paradigms have been unsuccessful (Nederhouser, Mangini, &
Biederman, 2002, 2005; Nederhouser, Yue, Mangini, & Biederman, 2007;
Robbins & McKone, 2007; McKone, Kanwisher, & Duchaine, 2007). More
recently, McKone & Robbins (2007) published a paper re-examining Gauthier
and her colleagues’ data, and found no significant cost at performing contrast
reversal, orientation inversion, and composite matching of the experts
relative to novices casting further doubts on the expertise hypothesis.
Even so, the question still remains whether the unique effects
associated with face but not object recogntion might be due to the object
stimuli not matching the face stimuli in low-level information. In particular,
much of those stimuli, such as Greebles, have nonaccidental features
13
present which can easily be employed to distinguish one from another.
Faces—at least those employed in the experiments under examination--do
not differ in these nonaccidental feature differences. It is possible that some
of the small costs of rotation or contrast reversal with Greebles and other
object stimuli could be a consequence of the employment of such features.
In the following chapters, stimuli will be created to address this issue, and
experiments will be presented to examine the spatial-filter hypothesis of the
representation of faces, and expertise hypothesis by using those stimuli.
14
CHAPTER 2: STIMULUS GENERATION AND TESTING
One difficulty comparing face to object recognition is manufacturing
nonface subordinate-level stimuli which capture the variations of faces
without the discontinuous edges and nonaccidental properties readily
distinguishing most objects. The goal of this chapter is to demonstrate how
the experimental stimuli were created and how well the human
discriminability of a pair of such stimulus can be correlated with a measure of
physical similarity, that offered by Gabor-jet filtering.
2.1 Stimuli generation
The stimulus generation was inspired by Shepard and Cermak’s (1973)
production of a set of complex, asymmetric, 2D novel shapes. Michael
Mangini created 3D artificial objects of those shape that only differed from
each other metrically. These stimuli were produced in three steps.
The first step was to add a given orientation (one of eight) of the 2
nd
and
3
rd
harmonics, 3D shapes with either two or six equally spaced convex lobes
respectively, to a sphere and the 4
th
harmonic of a sphere using Matlab (Fig.
2.1a).
Second, by rotating of the 2
nd
and 3
rd
harmonics, a toroidal space of
smooth, asymmetric 3D blobs was created (Fig. 2.1b). The space is toroidal,
as was Shepard and Cermak’s, in that it curves around on itself so that the
15
top and bottom rows are identical and the left and right columns are identical
so that there are no boundaries delineating the edges of the space. This
characteristic avoids the problem that stimuli at the edges of a non-toroidal
space produce enhanced discriminabilities with the absence of near
neighbours on one side.
Third, from this toroidal space four equally spaced blobs (circled in Fig.
2.1b) with maximal diagonal distances for a given pair were selected to form
the seeds of the experimental stimuli in which the amplitudes of the 2
nd
and
3
rd
harmonics were varied along eight different values (Fig. 2.1c), so that the
stimuli mimicked the way in which faces systematically vary in the magnitude
of their protuberances within the same general shape, i.e., all have
cheekbones, a nose, etc., in approximately fixed positions and orientations.
16
Figure 2.1. Generation of visual stimuli. a) Blobs were generated by adding
the 2
nd
and 3
rd
harmonics of a sphere in one of eight different orientations to
a sphere and the 4
th
harmonic. b) Blob space produced by combining
different orientations of the 2
nd
and 3
rd
harmonics, as specified by the
orientations shown above and to the left of the blob space. Proximate blobs
were highly similar in shape and those distant were less similar, as confirmed
by a Gabor-jet similarity measure. The four circled blobs were the seeds
used to generate the blob spaces, defined by variation in the sizes of the 2
nd
and 3
rd
harmonics. c) A blob space generated by holding constant the
orientation of the harmonics but only varying their size, as shown above and
to the left of the blob space. The illustrated space is generated from the
upper left circled blob in Fig. 2.1b. The variations in sizes of the harmonics
are taken to somewhat mimic the variation in the sizes and distances of facial
parts within a given configuration as specified by the orientations of the
harmonics. Numbers along arrowed lines pointing to pairs of blobs show the
Gabor-jet similarity values for those pairs.
17
Figure 2.1: Continued
18
2.2 Gabor-jet similarity measurement of pairs of images
Are the metric variations scaled by the Gabor-jet measure
psychologically valid? Are larger differences in Gabor-jet similarity more
readily discriminated than smaller distances and, if so, to what extent?
Gabor-jet similarity (Lades et al, 1993) can provide a principled scaling
of image similarity because it is designed to mimic V1 filtering, the first step
of cortical visual processing. Presumably the differences in processing of
faces and objects are a consequence of later stages so the V1 scaling can
provide a picture of how stages after V1 (and, likely, after V2 and V4 as well)
distinguishes faces from objects
The following experiment was designed to test how well the Gabor-jet
similarity measurement of images is correlated with human performance of
discriminating blobs that mimic the variation of faces without introducing
nonaccidental properties distinguishing more of object categories.
2.2.1 Participants and Experimental Procedures
Five undergraduate and graduate students with normal or corrected-to-
normal vision from the University of Southern California participated in the
experiment for monetary compensation (Mean age = 21.20, SD = 1.64, two
females, three males, all right handed). The experiment was approved by
USC IRB at the University Park campus.
19
64 blobs were used in the experiment, yielding 2,016 pairs of images.
On each trial a forced-choice, match-to-sample task, three stimuli, were
presented simultaneously for 2000 ms (Fig. 2.2). The sample stimulus was
above the two possible matches. The relatively long presentation duration
was chosen to provide the subjects sufficient time to process all three
images. The diagonal position of the sample to each of the matching stimuli
undoubtedly rendered the task more difficult than if the stimuli were arranged
vertically or horizontally. Subjects were instructed to press the left or right
arrow key on the keyboard to indicate which (left or right) image matched to
the sample (the one on top) (Fig. 2.2) only after the display was terminated
after 2000 ms. Error feedback was provided by a tone immediately following
an incorrect response.
20
Figure 2.2. Match-to-Sample paradigm. When distractor stimuli are less
similar to matching stimuli (left) measured by the Gabor-jet similarity (= 85.2),
it is readily easy to perform the task. When they are more similar to each
other (right), Gabor jet similarity = 98.1, matching is more difficult. In both
panels, the left images are identical to the top images (so that the correct
response should be left). Only the distractor images differ between the two
panels.
2.2.2 Results
When distractor stimuli were more similar to target stimuli in terms of
Gabor-jet similarity, subjects made more errors and responded much slower
than when distractor stimuli were less similar, as demonstrated in Fig. 2.2.
Fig. 2.3 shows that Gabor-jet similarity between matching and distractor
stimuli is positively correlated with reaction times (r = 0.7670) and error rates
(r = 0.8538). These correlations will undoubtedly increase as more subjects
are run so that the RTs and error rates for a particular level of Gabor
similarity become more reliable.
21
Figure 2.3. Mean correct reaction times (left) and error rates (right) as a
function of similarity between distractor and matching stimuli (with 100 =
identity). The positive correlation indicates that the more similar the
distractor and matching stimuli, the slower the response and the greater the
error rate. The error bars are the means of the standard errors of the
individual subjects and thus do not include the between-subjects variance.
The eight intervals were selected so that approximately 1/8 of the trials (~244
target-distractor pairs) fell in each of the intervals.
2.2.3 Conclusion regarding the scaling of physical similarity.
In the absence of salient nonaccidental features (e.g., differences in
parts or whether a particular contour is straight vs. curved), a model based
on V1 computations does remarkably well in scaling the psychophysical
similarity of metrically-varying, complex, novel shapes. The model is thus
justified as a bases for equating the image similarities between pairs of faces
and objects so that the processing these different classes of stimuli can be
compared.
22
CHAPTER 3. A BEHAVIOURAL STUDY
The spatial-filter hypothesis of face representation holds that faces are
represented by the original spatial activation values so fine metric and
surface information critical for individualizing faces is preserved. Objects, in
contrast, are represented by a non-accidental characterization of parts and
relations activated by orientation and depth discontinuities. The present
experiment tested this theory by creating complementary pairs of images
(details described below) of faces and objects in the Fourier domain. These
images were filtered into 8 scales and 8 orientations with each member of a
complementary pair having every other scale-orientation combination. In a
match-to-sample task, the target could either be identical to the sample or it
could be the sample’s complement. The distractor was a face of a different
person or a different object (a blob). The prediction was that compared to
matching targets that were identical to the sample, objects should be less
adversely affected by the complementation.
3.1 Methods
3.1.1 Stimulus generation
Eight-bit grey-level images of faces (ten male faces without hair and
ears) and blobs (ten blobs) were Fourier transformed and filtered by two
complementary filters (Fig 3.1). Both filters cut off the highest (above 181
23
cycles/image) and lowest (below 12 cycles/image, which is about 8 cycle/
face) spatial frequencies. The remaining part of the Fourier domain was
divided into 64 areas, 8 orientations by 8 spatial frequencies. The orientation
borders of the Fourier spectrum were set up in successive 22.5º steps. The
spatial frequency range covered four octaves in steps of 0.5 octaves. Each
member of a complementary pair was comprised of the content of every
other region of the radial checkerboard patterns based on these divisions in
the Fourier domain and shared no common combinations of spatial
frequency and orientation. For example in Fig. 3.1, the upper member of a
complementary pair (either the face or the blob) included the content
centered at the vertical orientation (0º±22.5º) at the highest spatial (most
peripheral in the radial checkerboard) but not the adjacent orientations
(315º±22.5ºand 45º±22.5º) at the highest frequency. Content at the next
lowest scale were contained at those orientations in that image. Thus all
images had content from all scales and all orientations but differed in the
specific combinations of orientations and scales. The average similarity of
the pairs of faces, as assessed by the Gabor-jet model, was equal to that of
the blobs.
24
Figure 3.1. Generation of complementary images. Complementary images
were created by filtering an image in the Fourier domain into 8 orientations
by 8 spatial frequencies. The content of every other 32 frequency-orientation
combination, as illustrated by the circular checkerboards, was assigned to
one image of a complementary pair and the remaining content to the other
member of that pair. Each member of a complementary pair thus had all 8
orientations and all 8 frequencies but in different combinations. The Fourier-
domain images were then converted to images in the spatial domain by
inverse FFT. a) An example with a face. b) An example with a blob.
25
3.1.2 Expertise training
Seven subjects, two male and five female, all right handed (Mean age =
22.14, SD = 2.79), were trained on discriminating blobs for 8,192 trials over
eight hour--long sessions on one of four blob configurations (Fig.2.1b) to
make them “blob experts.” They demonstrated superior performance with a
significantly lower error rate than novice subjects on their last training block.
They were subsequently tested on a different set of blobs (i.e., with a
configuration from one of the other circled blobs in Fig. 2.1b) which were
identical to those presented to novices. If expertise is to be relevant to face
recognition, then the training to expertise should transfer to new instances of
a class. This did occur. The superior performance of the experts on the last
training block resulted in a significantly lower error rate than novice subjects
when transferred, on their ninth session, to a new blob configuration (which
matched that of the novices). This demonstration that the expertise was not
to a specific set of blobs but to blobs in general is important in that
proponents of an expertise account of face recognition wish to address
phenomena that all include effects, such as inversion, contrast reversal, and
configuration, that are readily demonstrated with unfamiliar faces.
3.1.3 Participants and Experimental Procedures
Twenty-one USC undergraduate and graduate students, “blob novices,”
with normal or corrected-to-normal vision participated in the experiment for
26
course credits or money (Mean age = 20.72, SD = 2.94, seven males,
fourteen females, two left handed.).
Subjects performed a two-alternative, simultaneous match-to-sample
forced-choice task with 1,440 trails. The mach-to-sample task eliminates the
criterion bias effects apparent with same-different tasks and the
simultaneous presentation minimizes memory requirements compared to
when the stimuli are presented sequentially. In the experiment, three images
(sample, distracter and target) of either faces or blobs were briefly presented
on the screen for 294 ms (25 frames with an 85 Hz CRT monitor) with a
horizontal visual angle of 9º. The individual faces subtended a horizontal
visual angle of 3
o
. Subjects pressed the left or right arrow key to indicate
which (left or right) image matched the sample (the one on top) (Fig. 3.2).
On half the trials the target was on the left (as would be the correct
response). On the other half it was on the right. A tone was played if the
response was correct. No sound was played after an error. On the identical
trials, the matching stimulus was the one identical to the sample; on
complementary trials, the matching stimulus was a complement to the
sample (Fig. 3.2). Prior to the main experiment, subjects were given 20
practice trails with blobs and faces that were not used in the main
experiment. All conditions were fully balanced, and the physical similarity of
matching and distracter faces and blobs were equated according to the
Gabor-jet model.
27
Figure 3.2 Illustration of match-to-sample displays in the complementary
experiment. a) Examples of the “identical” condition where the matching
stimulus was an image that was identical to the sample person (top) and blob
(bottom). b) Examples of the “complementary” condition where the images
matching the sample was a complement of the image of the same person
(top) or blob (bottom) presented as the sample. In all four panels, the
matching stimulus to the sample is on the left for identical condition and
complementary condition. Note that whereas the matching complementary
blob looks identical to the sample, the matching face looks like a different
person.
28
3.2 Results
3.2.1 Behavioural Results
Fig. 3.3 shows the error rates on the match-to-sample tasks for Novices
(a) and Experts (b), respectively. The expertise training had its expected
effect, with the experts showing a significant advantage over novices when
matching blobs: t(26) = -2.67, P < 0.02 (for blob experts: M = 0.24; for blob
novices: M = 0.33). There were no significant differences between blob
experts and novices on reaction times (t(26) = 1.18, ns. For blob experts: M =
472 ms; for blob novices: M = 504 ms), indicating that the higher error rates
for blob novices were not due to a speed-accuracy trade off.
Combining the performance of novices and experts, matching
Complements of faces produced significantly higher error rates than when
matching Identical face images: t(27) = -10.16, P < 0.000. This cost of
matching complements was absent for blobs, where there was no significant
difference in the error rates for Identical and Complementary matches: t(27)
=
-1.36, ns. These results held for both blob experts and novices (for experts,
t(6) = 0.47, ns; for novices, t(20) = -1.73, p = 0.10). The modest differences
between Identical and Complementary conditions for the novices and experts
on the blob matching task was in a direction opposite to what might have
been expected if blob expertise produced performance that resembled that
for the matching of faces. This interaction between Complementation and
Expertise for the blobs was significant: F(1, 25) = 7.12, P = 0.013.
29
3.2.2 Ideal Observer Results
It is possible that the greater cost of matching complements of faces
compared to blobs was a result of intrinsic differences between these classes
of stimuli. To assess whether this was the case, an ideal observer model
(Tjan, Braje, Legge, & Kersten, 1995; Pelli, Farell, & Moore, 2003; Gold,
Bennett, & Sekuler, 1999) was defined, which was assumed to have perfect
pixel knowledge of all 20 images. White noise was added to the images to
limit performance (Pelli, 1990). The ideal observer sums the likelihoods of all
possible combination of images that correspond to the “same” response and
compares the sum against the sum-of-likelihood for the “different” response.
There was no difference in the ideal observer’s performance between faces
and blobs on the identical and complementary conditions (Fig. 3.3c),
suggesting that the results with the human observers were not a
consequence of inherent differences in the stimuli that rendered the matching
of complements of faces more susceptible to error than the matching of
complements of blobs.
30
a
31
b
Figure 3.3. Match-to-Sample error rates for matching Identical and
Complementary faces and blobs for a) novices, b) experts, and c) the Ideal
Observer. Error bars are standard error of the mean.
c
Figure 3.3: Continued
3.3 Discussion
The major findings of the current research are consistent with the
Biederman and Kalocsai findings (1997) that the matching of nonface objects
-- chairs in their study, blobs in ours -- is not disrupted when the detailed
spatial content is changed. Because the blobs differed only in the
magnitudes of their smooth curved surface -- like faces -- we can eliminate
nonaccidental properties of edges and small parts as accounting for the
Biederman and Kalocsai results.
The inclusion of all frequencies in each member of a complementary
pair was an important feature of the complementary images in that face
recognition has been shown to be more sensitive to the middle range of
frequencies, approximately 8-13 cycles per face, than higher or lower
32
frequencies (Costen, Parker, & Craw, 1994; Costen, Parker, & Craw, 1996;
Nasanen, 1999). Although both members of a complementary pair had all
spatial frequencies, it is possible that particular combinations of orientation
and frequency are critical for face recognition and the removal of such a
combination from one member of a complementary pair disrupted
recognition, and thus priming, more than it did from the other member. But
there was no evidence for this. In the behavioural task we compared
performance on all identical trials composed of one combination of
frequencies and orientations to the trials with the other combination of
frequencies and orientations. That is, as shown in Fig. 3.2a, we compared
performance with matching the images of faces in the upper circular
checkerboard to the images of faces in the lower circular checkerboard.
There was absolutely no difference in performance with the two sets of
images, with the mean error rates for the two checkerboards both equal to
32%.
As it has been shown (Bruce & Langton, 1994) that pigmentation might
play an important role in face recognition, a question remains that those
artificial grids in our complementized images, which are caused by the sharp
cutoff filters, may damage the pigmentation of faces more than those of blobs
which have no pigmentation. This disruption of pigmentation could be an
alternative explanation to our results. Several studies are relevant to this
issue. First, Collin et al. (2004) have shown that with smooth filters that
33
produce no grids on their images, subjects’ performance was still affected
more by complements on face recognitions than on object recognition in a
sequential same-different task. Second, Nederhouser, Yue, Mangini, &
Biederman (2007) reported that pigmented blobs didn’t show any of the costs
of mismatches in contrast polarity that is so evident in face recognition.
We failed to see any blob expert who showed a performance deficit on
the complementary conditions at blob trials, suggesting that whatever the
basis of expertise for non-face discrimination, it does not produce the
sensitivity to spatial information that is evident when discriminating faces.
In conclusion, this study is consistent with the hypothesis that a
representation preserving spatial values may be mediating the recognition of
faces but not objects.
34
CHAPTER 4. AN FMRI STUDY
Acquired prosopagnosics are patients who lost their ability to identify a
person from his/her face with unilateral or bilateral brain damage (Damasio,
Damasio, & Van Hoesen, 1992). That damage is mostly located in the
fusiform gyrus. Using fMRI, Kanwisher, McDermont, & Chun (1997) have
convincingly shown in normal subjects that the bilateral middle fusiform gyrus
responds much strongly to faces than to other object categories. They
termed the region the fusiform face area (FFA). There are now considerable
evidences that the FFA is involved in face recognition (Kanwisher & Yovel,
2006; McKone, Kanwisher, & Duchaine, 2007).
In this chapter, I describe an fMRI test of the neural basis for the spatial
filter hypothesis of the representation of faces and objects. While being
scanned, blob experts and novices performed a same-different matching task
of images of blobs and faces, responding same (by response button) to both
Identical and Complementary images of the same faces or blobs. The
sequential presentation allowed assessment of fMRI adaptation (Grill-
Spector & Malach, 2001) and the extent to which a change in identity and/or
the spatial content of the blob or face could produce a release from
adaptation. The inclusion of a condition in which the same person’s face is
shown but with complementary spatial frequency and orientation content is
important in determining the extent to which FFA is individuating faces as
35
opposed to detecting faces. Prior experiments (e.g., Henson, Shallice, &
Dolan, 2000) showing release from adaptation by presenting pictures of two
different people compared to the identical image of the same person,
confound a change of person with a change in low-level image properties
(Grill-Spector, Knouf and Kanwisher, 2004). The subjects were also
administered localizer runs to define FFA and the lateral occipital complex
(LOC), an area critical for object recognition (Fig. 4.1c).
4.1 Methods
4.1.1 Stimuli
The stimuli were the same as those used in the experiment described in
Chapter 3.
4.1.2 Expertise training
Six subjects (M = 26.3 years, SD = 6.9), three females and three males,
one left handed, were trained on discriminating a set of 64 blobs varying in
the size of their harmonics for 8,192 trials over eight hour--long sessions on
one of four blob configurations (circled in Fig. 2.1b) to make them “blob
experts.” Two out of six blob experts participated in the previous behavioural
experiment. They were subsequently tested on a different set of blobs (i.e.,
with a configuration from one of the other circled blobs in Fig. 2.1b) which
were identical to those presented to novices.
36
4.1.3 fMRI design
The main experiment used a rapid event-related (Burock, et al, 1998),
2 X 2 design (same vs. different person/blob and same vs. different
frequency-orientation combinations). A total of 1,512 trials were run in 6
blocks, comprised of 3 blocks for faces and 3 for blobs. The sequences of
blob and face blocks were counter-balanced across subjects. Subjects
performed a same-different matching task on a sequence of two-stimuli
(faces or blobs), each for 200 ms with a 300 msec ISI presented at the
beginning of a 2-s trial on a screen viewable from within the bore of the
magnet. In each trial, the images were presented at the center of the screen
but at different sizes (4º for the large size and 2
o
for the small size) to reduce
the possibility of using local features to perform the task. The order of the
sizes of the presented images was randomized within each trial. The
subjects were instructed to judge, by button press, whether the two images
were of the same person or blob, regardless of the differences in size and
spatial frequency content of the images. The subjects were the six blob
experts, described previously, and six blob novices (M = 27.7 years, SD =
3.08, four females, and two males, one left handed). An additional two blob
novices were run but excluded from the data analysis; one because of
excessive head movement, the other because the face localizer failed to
reveal a FFA (i.e., there was no region with stronger activation to faces than
37
objects). The study was approved by the USC Internal Review Board. All
subjects provided informed consent.
4.1.3.1 Imaging parameters
fMRI imaging was conducted in a 3T Siemens MAGNETOM Trio at
the University of Southern California’s Dana and David Dornsife Cognitive
Neuroscience Imaging Center. Functional scanning used a gradient echo EPI
PACE (prospective acquisition correction) sequence with 3D k-space motion
correction (TR = 1s; TE = 30ms; flip angle = 65
o
; 14 slices; 64 X 64 matrix
size with resolution 3X3X3 mm) on the functional scans. The anatomical T1-
weighted structural scans were done using MPRAGE sequence (TI = 1100,
TR = 2070ms, TE=4.1ms, Flip angle = 12 deg, 192 sagittal slices, 256 X 256
matrix size, 1X1X1 mm voxels).
4.1.3.2 Region of Interest (ROI)
Two localizer runs were administered, one in the beginning and the
other at the end of the face matching trials. Each localizer run included five
conditions (Fig. 4.1a): intact faces, objects, blobs, scrambled faces and
textures. The faces were frontal views with neutral expressions, randomly
selected from the University of Stirling face database
(http://pics.psych.stir.ac.uk) with equal numbers of male and female faces.
The scrambled faces were created from the intact faces in such a way that
38
the individual features were kept intact, altering only the relations among the
features. The objects were grey-scale images photographed in our
laboratory, with the background excluded by Photoshop (Adobe Systems
Inc., San Jose, US). Textures were created by scrambling intact blobs with 8
X 8 patches of pixels so that no discernible blob structure was apparent.
Each 18 s block of a localizer was composed of 36 images, each shown for
500 msec. The five conditions were randomized in the first and last localizer
run. A region of interest (ROI) for face activation (FFA) (Fig. 4.1b) was
defined by a conjunction of two contrasts -- faces vs. scrambled faces and
faces vs. objects -- at uncorrected p < 10
-4
(Fig. 4.3a). LOC (Fig. 4.1c) was
defined by a conjunction of two contrasts -- objects vs. textures and blobs vs.
textures -- at uncorrected p < 10
-6
.
Although it is apparent that blobs showed lower activation, in both FFA
and LOC (Fig. 4.1b, and Fig. 4.1c), compared to the common objects, there
was no region with a minimum cluster-size of 10 voxels showing statistically
significant differential activation for blobs vs. objects, even with a low
threshold.
39
40
a
b
Figure 4.1. Localizer images and BOLD responses at the Right FFA and
LOC. a) examples of images were used in the localizer runs. b) The left
panel shows one subject’s right FFA defined by a conjunction of Faces minus
Object and Faces minus Scrambled Faces projected to an inflated brain.
The right panel shows the event-related average BOLD response for the five
classes of images. The colours of the lines are indicated in a. c) The left
panel shows one subject’s right LOC defined by a conjunction of objects
minus textures and blobs minus textures. The right panel shows event-
related average BOLD response for five classes of images.
41
c
Figure 4.1: Continued
4.1.3.3 fMRI data analysis
Brainvoyager QX (Brain Innovation BV, Maastricht, The Netherlands)
was used to analyze the fMRI data. All data from a scan were preprocessed
with 3D motion-correction, slice timing correction, linear trend removal and
temporal smoothing with a high pass filter set to 3 cycles over the run’s
length. No spatial smoothing was applied to the data. Each subject’s motion
corrected functional images were coregistered with their same-session high-
resolution anatomical scan. Then each subject’s anatomical scan was
transformed into Talairach coordinates. Finally, using the above
transformation parameters, the functional scans were transformed into
Talairach coordinates as well. All statistical tests reported were performed on
this transformed data.
For the rapid event-related runs, a deconvolution analysis was
performed on all voxels within each subject’s localizer-defined ROI’s to
estimate the time course of the BOLD response. Deconvolution was
computed using the BrainVoyager software by having fifteen 1-second
shifted versions of the indicator function for each stimulus type and response
(correct vs. incorrect) as the regressors in a general linear model.
In order to quantify the statistics between the deconvolved
hemodynamic responses for the four conditions, the peak (average of the
percent signal change for time points 6 sec and 7 sec) for each correct
response was computed for each condition for each subject. The peak
location was found to be 6.75 sec by fitting a double-gamma function
(Boynton & Finney, 2003) on each subject’s deconvolved hemodynamic
response. The statistical analyses were performed on these values.
4.2 Results
4.2.1 Behavioural Results
Fig. 4.2 shows the error rates on the same-different tasks performed
in the scanner by Novices (a) and Experts (b), respectively. The expertise
training had its expected effect, with the experts showing a significant
42
advantage over novices when matching blobs; F(1,10) = 15.48, p < 0.01, but
not when matching faces; F(1,10) = 1.02, ns. The decline in error rates over
the eight sessions (see methods) for the experts trained on one set of blobs
transferred to the different set of blobs used in the scanner.
As shown in Figs. 4.2a and b, matching complementary-images of a
person’s face produced significantly higher error rates than when matching
identical-images of that person’s face; F(1,10) = 58.57, p < 0.001. This cost
of matching the complements was absent for blobs, where there was no
significant difference in the error rates for Identical and Complementary
matches; F(1,10)
= 3.76, ns. These results held for both blob experts and
novices; F(1,10) = 2.28, ns, and for the interaction between stimulus type
and expertise. The small cost of complementation on the blobs was confined
to the novices, an effect opposite to what would be expected if expertise led
to a representation of blobs that resembled that of faces. The cost of
complementation when matching faces was not a consequence of a general
increase in difficulty for matching faces: When the matching faces were
identical to the samples, error rates were lower than when matching identical
blobs.
43
a
44
b
Figure 4.2. Same-different error rates for matching Identical and
Complementary faces and blobs for a) novices, b) experts, and c) the Ideal
Observer. Error bars are S.E.M. Because the main effect of identity (same
vs. different person/blob) was not significant, they were combined in these
figures.
c
Figure 4.2: Continued
4.2.2 Ideal Observer Analysis
The ideal observer analysis was calculated as the same manner (but
different possible combination of images that correspond to the “same” and
the “different” response, which is the intrinsic to the current task) as showed
in the chapter 2 to ensure the subjects’ performance difference is not due to
differences of two image categories. There was no difference in the ideal
observer’s performance between faces and blobs on the identical and
complementary conditions (Fig. 4.2c), indicating that the results with the
human observers were not a consequence of inherent differences in the
45
stimuli that made the matching of complements of faces more susceptible to
error than the matching of complements of blobs in this task.
4.2.3 fMRI Results
FFA. Activation of both the right and left FFA was markedly greater
when matching faces than blobs as shown in Figs. 4.3a and 4.3b; F(1,10) =
79.03, p < 0.001 for right FFA and F(1,10) = 78.30, p < 0.001 for left FFA.
The magnitude of this difference was the same for experts and novices (i.e.,
there was no interaction between Expertise and Stimulus Class) F(1,10) <
1.0. All the above effects were manifested equally by blob experts and
novices; F(1,10) < 1.00, for all comparisons) (Figs 4.3a and b). There was
no reliable differential activation between experts and novices for matching
blobs at FFA as a consequence of changes in frequency and individuation
(both Fs(1,10) < 1.00). However, the three way interaction of Frequency-
Orientation Combination Change X Blob Change X Expertise was significant,
F(1,10) = 9.10, p < .05. The pattern shown by the experts (compared to the
novices) for this interaction did not resemble the pattern for faces, i.e.,
greater release from adaptation to a change in the spatial content or identity
of the blobs. Instead, the condition showing the smallest percent signal
change for the experts was when both blob and frequency content changed.
For the novices, this condition showed the greatest release from adaptation,
consistent to what was found for faces.
46
47
Figure 4.3. Hemodynamic response functions of blob experts and blob
novices to blobs and faces at FFA (left and right hemispheres combined). a)
Hemodynamic response functions of blob experts to faces (left) and blobs
(right). b) Hemodynamic response functions of blob novices to faces (left)
and blobs (right). Labels: Ps-FOCs: person same and frequency-orientation
combination same; Ps-FOCd: person same and frequency-orientation
combination different; Pd-FOCs: person different and frequency-orientation
combination same; Pd-FOCd: person different and frequency-orientation
combination different; Bs-FOCs: blob same and frequency-orientation
combination same; Bs-FOCd: blob same and frequency-orientation
combination different; Bd-FOCs: Blob different and frequency-orientation
combination same; Bd-FOCd: blob different and frequency-orientation
combination different.
Fig. 4.4, shows the combined response of blob experts and novices to
faces on correct trials in right FFA. A change in the spatial content of an
image of a face and/or a change in the person resulted in a significant
release from adaptation in right FFA, compared to the response to an
identical image (for frequency, F(1,10)
= 42.91, p < 0.001; for individuation,
F(1,10)
= 7.10, p < 0.05).
Figure 4.4. Hemodynamic response functions for correct trials when
matching faces in the right FFA, showing a release from adaptation for a
change in the person (identity) or a change in the frequency-orientation
combination.
48
Figure 4.5. Hemodynamic response functions in LOC for changes in Identity
and Frequency-Orientation Combinations when viewing faces. As there were
no differences between left and right LOC and blob experts and novices, the
data are shown collapsed over these variables. Neither person nor spatial
content produced a release from adaptation (in contrast to the pattern
produced in right FFA for faces shown in Fig. 4.4).
49
LOC. The release-from-adaptation results for faces in right FFA were
not witnessed in either the right or left lateral occipital complex (LOC) nor
was there a difference between experts and novices so the data are shown
in a combined figure (Fig. 4.5). That is, a change in identity and/or spatial
frequency and orientation content of a face did not result in a greater BOLD
response in LOC than repetition of the identical image of a face. However,
presentation of blobs to blob experts did produce a greater BOLD
response bilaterally in LOC compared to novices (F(1,22) = 7.18, p < 0.05),
as shown in Fig. 4.6. In addition, a change in the identity of a blob for the
blob experts produced a release from adaptation at right LOC, F(1,5) = 9.76,
p < 0.05, (Fig. 4.7). However, no significant release from adaptation was
found for blob experts when the second blob was a complement of the first in
either right or left LOC (right LOC: F(1,5) = 1.21, ns; left LOC: F(1,5) = 0.40,
ns) (Fig. 4.7). Despite the lack of significance, there was a small release
from adaptation of a change in the frequency-orientation combination of a
blob in right LOC for the experts. However, this non-significant effect was
only about one-third of that witnessed for faces in FFA: In terms of percent
signal change in the BOLD response for the experts, it was .036 for faces in
right FFA (t(5) = 3.44, p =.02) and .011 for blobs in right LOC. We also note
that BOLD responses from different cortical regions are not strictly speaking
directly comparable.
Figure 4.6. Hemodynamic response functions in LOC for blob experts and
novices. Blob experts (left) showed larger response to blobs than blob
novices (right) at LOC in all conditions.
50
Figure 4.7. Hemodynamic response functions in the right LOC for blob
experts when matching blobs. There was a significant release from
adaptation with a change in blob identity but the release with a change in
frequency-orientation content was non significant.
4.3. Discussion
The results provide strong support for the hypothesis that the
representation of faces, but not objects, retains aspects of their original
spatial frequency and orientation coding, both behaviourally and in their
activation of FFA. What is disruptive about matching complements of images
of faces compared to nonface objects in the present experiment is not the
absence of particular frequencies or particular orientations but that these
combinations are different.
Expertise for making within-category discriminations of nonface
objects, i.e., the bobs, did not result in greater activation of the FFA, either
51
in the magnitude or in the manner of faces. By employing blobs that required
discriminating the metrics of smoothly varying surfaces, we attempted to
engage FFA, if such engagement depended on processing those kinds of
low-level features. But blob experts and novices produced the same
magnitude and pattern of FFA activation and the same pattern of release
from adaptation (or the lack thereof) when discriminating faces and blobs.
Although the experts performed substantially better than our novices, they
showed the same invariance to the spatial complementation of the blobs.
Insofar as sensitivity to the spatial composition may be an indicant of a
configural representation, our results are consistent with the recent results of
Robbins and McKone (2007) who failed to find that expertise produced face-
like—especially configural--processing of dogs in dog experts, thus failing to
replicate the results of Diamond and Carey (1986).
Using an event-related design, Xu (2005) showed larger BOLD
responses in the right FFA to cars than objects for car experts, a result
consistent with the expertise hypothesis. However, the conclusion from her
results is weakened by the fact that car experts had larger BOLD response in
bilateral FFA to birds than cars, which is contradictory to the expertise
hypothesis. Another problem in the interpretation of Xu’s experiment is that
her task --judging the location of a stimulus relative to the fixation--would not
seem to engage FFA. It is not unreasonable to expect that experts pay more
attention with such an easy task to the stimulus in which they have greater
52
interest/expertise, which could induce larger BOLD responses (Wojciulik,
Kanwisher, & Driver, 1998). On the other hand, paying more attention to a
preferred stimulus might be one indication of expertise. A shortcoming of this
study is that it did not contrast FFA activity with LOC activity, as was done in
the current experiment. If LOC has a larger BOLD response to the stimulus
preferred by the experts, then the FFA response is more likely to be a
consequence of increased attention than to enhanced processing or a shift in
processing of objects of expertise to the FFA.
Where might expertise be expressed? A difference that did emerge
between experts and novices in the present investigation was in the right
LOC, where there was a greater response to the blobs for the experts than
for the novices (Fig, 4.6) and a significant release from adaptation to a
change in blob identity (Fig, 4.7). But in contrast to what was observed for
faces, there was only a small and non-significant release from adaptation
due to spatial complementation for the experts (Fig. 4.7). Previous studies
(Gauthier, et al, 1999; Gauthier, Skudlarski, Gore, & Anderson, 2000)
suggesting FFA involvement in expertise did not report any differential
activity in LOC. Nonetheless, the lack of a reliable release from adaptation
with the presentation of a complementary image of a blob suggests that
whatever the representation experts developed for discriminating blobs, it did
not as strongly reflect the sensitivity to spatial frequency/orientation content
evident with the representation of faces in right FFA.
53
Our results differ from those of Rotshtein (Rotshtein, et al, 2005) et al.
who reported that the right FFA was sensitive to identity but not to physical
image variations. Rotshtein et al. used famous faces in their experiment,
which might be processed differently from the unfamiliar faces used in the
present investigation (Malone, Morris, Kay, & Levin, 1982). The effects of
expertise that are under examination here are not that for particular faces but
for faces in general. Further research is needed to resolve the role of facial
familiarity as well as the neural bases of facial familiarity.
We confined our discussion of our fMRI results to FFA and LOC, but not
to the occipital face area (OFA) and superior temporal sulcus (STS). Some
investigators have suggested that OFA might be engaged in encoding
physical information of faces and STS with the coding of gaze and
expression (Hooker, et al, 2003). However, the face localizers (see Methods)
did not produce significant differential activation in either of these areas.
Some recordings from human patients (McCarthy, Puce, Belger, & Allison,
1999) have suggested that single facial features may be sufficient for FFA
activation. Our scrambled face localizer, which contained intact face features,
such as the eyes and nose, but with scrambled relations so that it was
impossible to achieve a coherent perception of a complete face, failed to
produce significant activation in either OFA or STS.
The results of the present study are consistent with FFA’s involvement
in individuation of faces in that the release produced by a change in
54
individuation, despite its smaller physical image change (as assessed by the
Gabor-jet model) compared to the change of spatial content, nonetheless
produced as much release as a complete change in spatial content. What is
required to determine if FFA accomplishes individuation beyond that
produced by any facial image change is a study in which release from
adaptation produced by different images (say poses) of the same face are
compared with pictures of different individuals where the two conditions of
image change are equated for their physical differences.
Our results provide strong evidence, both behavioural and neural, that
what makes the recognition of faces special vis-à-vis nonface objects is that
the representation of faces retains aspects of combinations of their spatial
frequency and orientation content extracted from earlier visual areas.
55
CHAPTER 5. CONTRIBUTIONS AND SPECULATIONS
The goal of this dissertation was to test the spatial-filter hypothesis of
face representation. The results, both behavioural and neural, indicate that
face, but not object, recognition is sensitive to specific combinations of
frequency and orientation information. Our work presents several
contributions outlined below that clarifies issues in face and object
recognition, and motivates new directions in research on face recognition.
Also, our work opens challenges that remain to be examined, which will be
part of my future research.
5.1 contributions
It has been long debated whether faces are special (Liu & Chaudhuri,
2003; Gauthier & Logothetis, 2000; Kanwisher, 2000; Duchaine &
Nakayama, 2006; McKone, Kanwisher, & Duchaine, 2007). Much of the
debate has been focused on what information in faces causes the contrast
reversal, face inversion, and configural effects. However, there are very few
studies on the neural representation of faces that might be actually causing
these effects. In this dissertation, we demonstrated that preservation of
spatial frequency and orientation information is critical for representing faces,
but object representation is invariant to the details of such coding. This
distinction significantly advances our understanding of face and object
56
recognition because it clarifies the critical difference between object and face
recognition. Therefore, face unique phenomena results from the neural face
representation, which renders the faces vulnerable to contrast polarity,
lighting directions, illumination variations, and rotation in plane and depth.
Since FFA was identified as a critical region for face recognition, its
actual functionality has been under debate. Several groups argued the FFA
is involved in face individuation (Grill-Spector, Knouf, & Kanwisher, 2004;
Rotshtein, et al., 2005). Grill-Spector and her colleagues (2004) showed that
larger BOLD activity was observed when subjects were able to correctly
recognize individuals presented briefly compared to when they were only
able to distinguish that a face was present but unable to identify the
individual.
Criticisms of these studies are they used familiar faces that might be
represented differently from unfamiliar faces (Hancock, Bruce, & Burton,
2000; Gobbini & Haxby, 2006), and that they did not control for physical
image similarity. For example, a larger BOLD response in an fMRI-A design
might be due to larger dissimilarity rather than the claimed independent
variable, i.e., a different person. By controlling for image similarity in our
experiment and using unfamiliar faces, we control for these possible
confounds. Our data show that FFA is sensitive to spatial frequency and
orientation information of faces as predicted by the spatial-filter theory of face
recognition, but fails to respond—in addition--to changes in identity. This
57
failure of the FFA to respond to identity is revealed by the lack of a difference
in the release from adaptation between different faces with different identities
and complementary faces with same identity. The one caveat to this
conclusion is that it is possible that there was a ceiling effect in the
magnitude of the release of the adaptation. The sensitivity of FFA to image
similarity rather than identity has also been found in a recent fMRI study
using morphed faces by Gilaie-Dotan and Malach (2007). They showed that
there was a release from adaptation in FFA only when a morphed face had
less than 35% of the identity of the adapting face. In addition, single cell
recordings (Leopold, Bondar, & Giese, 2006) also showed that face neurons
in IT fire more strongly to faces far from the average face (larger image
dissimilarity) than those close to the average face (larger image similarity).
To the best of our knowledge, our results are among the first to show an
effect of expertise in LOC with fMRI. Blob experts showed larger BOLD
responses at LOC than blob novices. Our results are consistent with a
recent high-resolution fMRI study with monkeys (Op de Beeck, Baker,
DiCarlo, & Kanwisher, 2006) where, after training, LOC showed larger
responses to trained stimuli than prior to training. However, the uniqueness
of our results is that the right LOC also shows release from adaptation for
blobs with different identities (but not to the complementary blobs with the
same identity). These results cannot be explained by blob experts paying
more attention to blobs than blob novices, and suggests that training might
58
have formed representations for trained stimuli in LOC (in our case, the right
LOC).
5.2 Some additional speculations
Why do faces have a representation that retains aspects of the
original simple-cell coding, rendering their recognition so vulnerable to
contrast reversal (and inversion and variations in direction of lighting)?
Considering the human visual system from an information theoretic
perspective we never have more information about the light coming from our
world (through reflectance or emission) than what is captured by our retina.
Through successive stages of computation the visual system estimates the
necessary information to appropriately interact with our world. Scenes can
be navigated, objects recognized and manipulated, speeds and trajectories
estimated, all with little or no consequence to the particular lighting conditions
in which they occur. The “early information” that accurately represents our
visual surroundings is quickly forsaken by these systems in order to provide
meaningful, robust interpretations for actions and build generalizable
memories. Presumably there is a biological cost for carrying the “early
information” forward to successive stages, akin to what is often called the
combinatorial explosion. However, face recognition might have unique
requirements: accurate storage of and discrimination among thousands of
highly similar exemplars, with the further requirement for processing minute
59
changes within individuals (to detect emotional states, aggression, even
deception). Here may be a case where early, highly accurate
representations outweigh the biological costs and lack of robustness (Murray,
et al., 2002).
This hypothesis does not necessitate a specialized face system.
Instead a flexible and general system could access early spatial filter coding
for any stimulus which must be differentiated from many highly similar
exemplars. In fact, along these lines Gauthier and Tarr (1997) argue that
visual expertise is likely to cause such coding in any population of stimuli.
That only faces appear to access such coding is not a theoretical proposition,
but an empirical finding. Here experts for blobs showed robustness to
contrast reversal for the objects but not faces even though image different
between the blobs were scaled to match the faces and the identities were
discriminable only by smooth alterations in shape with face-like
pigmentation,.
The resulting question that remains is how the original spatial
frequency and orientation information is delivered to the later face areas,
such as FFA. Along the conventional ventral visual pathway where
information is processed from V1, V2, V4, AIT to PIT (Thorpe & Fabre-
Thorpe, 2001), the receptive field of a neuron becomes larger and larger as it
responds more complex features. For instance, IT neurons in monkeys often
tend to respond to simple parts of an object (Tsunoda, Yamane, Nishizaki,
60
& Tanifuji, 2001) with greater sensitivity to nonaccidental than metric features
(Kayaert, Biederman, & Vogels, 2003). What this signifies is neurons farther
down the visual pathway lose initial spatial frequency and orientation
information that is critical for face recognition. One possible way to avoid
loss of spatial frequency and orientation information is to use a bypass to the
next level of processing, which is from V1 to V4 directly in the monkey
(Nakamura, Gattass, Desimone, & Ungerleider, 1993). As stated by these
authors:
“The significant bypass projections……. One possibility is that they
provide a means for coarse-grained information to arrive rapidly in the
temporal lobe. This advance information about the current stimulus might aid
in constructing the initial representation of its overall (low-pass) shape and
color within area TE, with the fine-grained information arriving later to fill in
the important details.”
61
BIBLIOGRAPHY
Biederman, I. (1987). Recognition-by-components: a theory of human image
understanding. Psychological Review, 94, 115-147.
Biederman, I. & Bar, M. (1999). One-shot viewpoint invariance in matching
novel objects. Vision Research, 39, 2885-2889.
Biederman, I. & Kalocsai, P. (1997). Neurocomputational bases of objects
and face recognition. Philosophical Transactions of the Royal Society
of London: Biological Sciences, 352, 1203-1219.
Biederman, I., Subramaniam, S., Bar, M., Kalocsai, P, & Fiser, J.
Subordinate-Level object classification reexamined. (1999).
Psychological Research, 62, 131-153.
Boynton, G. M., Finney, E. M. (2003). Orientation-specific adaptation in
human visual cortex. Journal of Neuroscience, 23, 8781-8787.
Bruce, V. (1982). Changing faces: Visual and nonvisual coding in face
recognition. British Journal of Psychology, 73, 105-116.
Bruce, V., & Langton, S. (1994). The use of pigmentation and shading
information in recognising the sex and identities of faces. Perception,
23, 803–822.
Burock, M.A. et al. (1998). Randomized event-related experimental designs
allow for extremely rapid presentation rates using functional fMRI.
Neuroreport, 9, 3735-3739.
Collin, C.A., Liu, C.H., Troje, N.F., McMullen, P.A., & Chaudhuri, A. (2004).
Face recognition is affected by similarity in spatial frequency range to
a greater degree than within-category object recognition. Journal of
Experimental Psychology: Human Perception and Performance, 30,
975-987.
Collishaw, S. M., & Hole, J. C. (2000). Featural and configurational
processes in the recognition of faces of different familiarity.
Perception, 29, 893-909.
62
Costen, N. P., Parker, D. M., & Craw, I. (1994). Spatial content and spatial
quantisation effects in face recognition. Perception, 23, 129-146.
Costen, N. P., Parker, D. M., & Craw, I. (1996). Effects of high-pass and low-
pass spatial filtering on face identification. Perception &
Psychophysics, 58, 602-612.
Damasio, A.R., Damasio, H., & Van Hoesen, G.E. (1992). Prosopagnosia:
anatomic basis and behavioural mechanisms. Neuropsychologia, 2,
237-246.
Duchaine, B.C., & Nakayama, K. (2006). Developmental prosopagnosia: a
window to content-specific face processing. Current Opinion of
Neurobiology, 16, 166-173.
DeValois, R. L. & DeValois, K. K. (1990) Spatial Vision. Oxford University
Press: New York.
Diamond, R., & Carey, S. (1986). Why faces are and are not special: an
effect of expertise. Journal of Experimental Psychology: General,
115, 107-117.
Freire, A., Lee, K.,&Symons, L. A. (2000). The face-inversion effect as a
deficit in the encoding of configural information: Direct evidence.
Perception, 29, 159-170.
Galper, R. E. (1970). Recognition of faces in photographic negative.
Psychonomic Science, 19, 207-208.
Galper, R. E., & Hochberg, J. (1971). Recognition memory for photographs
of faces. American Journal of Psychology, 84, 351–354.
Gauthier, I., & Logothetis, N. (2000). Is face recognition not so unique after
all? Cognitive Neuropsychology, 125-142.
Gauthier, I., Skudlarski, P., Gore, J.C. & Anderson, A.W. (2000). Expertise
for cars and birds recruits brain area involved in face recognition.
Nature Neuroscience, 3, 191-197.
Gauthier, I. & Tarr, M. J. (1997). Becoming a “Greeble” expert: Exploring the
face recognition mechanism. Vision Research, 37, 1673-1682.
63
Gauthier, I., Tarr, M.J., Anderson, A.W., Skudlarski, P., & Gore, J.C. (1999).
Activation of the middle fusiform area increase with expertise in
recognizing novel objects. Nature Neuroscience, 6, 568-573.
Gauthier, I., Williams, P., Tarr, M. J., & Tanaka, J. (1998). Training ‘greeble’
experts: a framework for studying expert object recognition processes.
Vision Research, 38 (15-16), 2401-2428.
Gilaie-Dotan, S., & Malach, R. (2007). Sub-examplar shape tunning in
human face-related areas. Cerebral Cortex, 17, 325-338.
Gobbini, M.I., & Haxby, J.V. (2007). Neural systems for recognition of familiar
faces. Neuropsychologia, 45, 32-41.
Gold, J., Bennett, P. J., & Sekuler, A. B. (1999). Identification of band-passed
filtered letters and faces by human and ideal observers. Vision
Research, 39, 3537-3560.
Grill-Spector, K., Knouf, N., & Kanwisher, N. (2004). The fusiform face area
subserves face perception, not generic within-category identification.
Nature Neuroscience, 7, 555-562.
Grill-Spector, K., Kourtzi, Z., & Kanwisher, N. (2001). The lateral occipital
complex and its role in object recognition. Vision Research, 41, 1409-
1422.
Grill-Spector, K. & Malach, R. (2001). fMRI-adaptation: a tool for studying the
functional properties of human cortical neurons. Acta Psychologica,
107, 293-321.
Hancock, P.J., Bruce, V., & Burton, A.M. (1998). A comparison of two
computer-based face identification systems with human perceptions of
faces. Vision Research, 38, 2277-2288.
Hancock, P.J., Bruce, V.V., & Burton, A.M. (2000). Recognition of unfamiliar
faces. Trends of Cognitive Science, 4, 330-337.
Hayes, T., Morrone, M. C., & Burr, D. C. (1986). Recognition of positive and
negative bandpass-filtered images. Perception, 15, 595–602.
Henson, R., Shallice, T. & Dolan, R. (2000). Neuroimaging evidence for
dissociable forms of repetition priming. Science, 287, 1269-1272.
64
Hill, H. & Bruce, V. (1996). The effects of lighting on the perception of facial
surface. Journal of Experimental Psychology: Human Perception and
Performance, 22, 986-1004.
Hill, H., Schyns, P. G. & Akamatsu, S. (1997). Information and viewpoint
dependence in face recognition. Cognition, 62, 201-222.
Hole, G. (1994). Configurational factors in the perception of unfamiliar faces.
Perception, 23, 65-74.
Hooker, C. I. et al. (2003). Brain networks for analyzing eye gaze. Brain
Research: Cognitive Brain Research, 17, 406-418.
Johnston, A., Hill, H. & Carman, N. (1992). Recognition faces: effects of
lighting direction, inversion, and brightness reversal. Perception, 21,
365-375.
Jolicoeur, P. (1985). The time to name disoriented natural objects. Memory
Cognition, 13, 289-303.
Kanwisher, N. (2000). Domain specificity in face perception. Nature
Neuroscience, 3, 759-763.
Kanwisher, N., McDermott, J. & Chun, M. M. (1997). The fusiform face area:
a module in human extrastriate cortex specialized for face perception.
Journal of Neuroscience, 17, 4302-4311.
Kanwisher, N., & Yovel, G. (2006). The fusiform face area: a cortical region
specialized for the perception of faces. Philosophical Transactions of
the Royal Society of London: Biological Sciences, 361, 2109-2128.
Kayaert, G., Biederman, I., & Vogels, R. (2003). Shape tuning in macaque
inferior temporal cortex. Journal of Neuroscience, 23, 3016-3027.
Kemp, R., McManus, I. C., & Pigott, T. (1990). Sensitivity to the displacement
of facial features in negative and inverted images. Perception, 19,
531–554.
Kemp, R., Pike, G., White, P., & Musselman, A. (1996). Perception and
recognition of normal and negative faces: The role of shape from
shading and pigmentation. Perception, 25, 37-52.
65
Kobatake, E. & Tanaka, K. (1994). Neuronal selectivities to complex object
features in the ventral visual pathway of the macaque cerebral cortex.
Journal of Neurophysiology, 71, 856-867.
Lades, M. et al. (1993). Distortion invariant object recognition in the dynamics
link architecture. IEEE Transactions on Computers: Institute of
Electrical and Electronics Engineers, 42, 300.
Leder, H. & Bruce, V. (2000). When inverted faces are recognized: The role
of configural information in face recognition. The Quarterly Journal of
Experimental Psychology, 53, 513-536.
Leopold, D.A., Bondar, I.V., & Giese, M.A. (2006). Norm-based face
encoding by single neurons in the monkey inferotemporal cortex.
Nature, 442, 572-575.
Liu, C. H., & Chaudhuri, A. (1997). Face recognition with multi-tone and two-
tone photographic negatives. Perception, 26, 1289–1296.
Liu, C. H., & Chaudhuri, A. (1998). Are there qualitative differences in face
recognition between positive and negative? Perception, 27, 1107–
1122.
Liu, C.H., & Chaudhuri, A. (2003). What determines whether faces are
specials? Visual Cognition, 10, 385-408.
Liu, C. H., Colling, C. A., Burton, M. & Chaudhuri, A. (1999). Lighting
direction affects recognition of untextured faces in photographic
positive and negative. Vision Research, 39, 4003-4009.
Luria, S. M., & Strauss, M. S. (1978). Comparison of eye movements over
faces in photographic positives and negatives. Perception, 7, 349–
358.
Malone, D.R., Morris, H.H., Kay, M. C. & Levin, H.S. (1982). Prosopagnosia:
A double dissociation between recognition of familiar and unfamiliar
faces. Journal of Neurology, Neurosurgery, and Psychiatry, 45, 820-
822.
Mangini, M. C., & Biederman, I. (2004). Making the ineffable explicit:
Estimating the information employed for face classification. Cognitive
Science, 28, 209-226.
66
McCarthy, G., Puce, A., Belger, A. & Allison, T. (1999). Electrophysiological
studies of human face perception II: response properties of face-
specific potentials generated in occipitotemporal cortex. Cerebral
Cortex, 9, 431-444.
McKone, E., Kanwisher, N., & Duchaine, B.C. (2007). Can generic expertise
explain special processing for faces? Trends of Cognitive Science, 11,
8-15.
McKone, E. & Robbins R. (2007). The evidence rejects the expertise
hypothesis: Reply to Gauthier & Bukach. Cognition, 103, 331-336.
Murray, S.O., Kersten, D., Olshausen, B.A., Schrater, P., & Woods, D.L.
(2002). Shape perception reduces activitity in human primary visual
cortex. Proceedings of the National Academic Sciences of United
States of America, 99, 15164-15169.
Nakamura, H., Gattass, R., Desimone, R., & Ungerleider, L.G. (1993). The
modular organization of projections from areas V1 and V2 to areas V4
and TEO in macaques. Journal of Neuroscience, 13, 3681-3691.
Nasanen, R. (1999). Spatial frequency bandwidth used in the recognition of
facial images. Vision Research, 39, 3824-3833.
Nederhouser, M., Mangini, M., & Biederman, I. (2002). The matching of
smooth, blobby objects – but not faces – is invariant to differences in
contrast polarity for both na їve and expert subjects. Journal of Vision,
2, 745a.
Nederhouser, M., Mangini, M., & Biederman, I. (2005). Recognition of non
face objects, designed to require the same stimulus processing as that
for faces, show only minimal effects of differences in contrast polarity
or orientation direction. Journal of Vision, 4, 439a.
Nederhouser, M., Yue, X., Mangini, M., & Biederman, I. (2007). The
deleterious effect of contrast reversal on recognition is unique to
faces, not objects. Vision Research, in press.
Olshausen, B.A. & Field, D.J. (1996). Emergence of simple-cell receptive
field properties by learning a sparse code for natural images. Nature,
382, 607-609.
67
Op de Beeck, H.P., Baker, C.I., DiCarlo, J.J., & Kanwisher, N.G. (2006).
Discrimination training alters object representations in human
extrastriate cortex. Journal of Neuroscience, 26, 13025-13036.
Pelli, D. G, Farell, B., & Moore, D. C. (2003). The remarkable inefficiency of
word recognition. Nature, 423, 752-756.
Pelli, D. G. (1990). The quantum efficiency of vision. In: C. Blakemore (Ed.)
Vision: coding and efficiency (pp. 3-24). Cambridge: University Press.
Phillips, R. J. (1972). Why are faces hard to recognize in photographic
negative? Perception and Psychophysics, 12, 425–426.
Phillips, R. J. (1979). Some exploratory experiments on memory for
photographs of faces. Acta Psychologica, 43, 39–56.
Riesenhuber, M. & Poggio, T. (2000). Models of object recognition. Nature
Neuroscience, 3, 1199-1204.
Robbins, R., & McKone, E. (2007). No face-like processing for objects-of-
expertise in three behavioural tasks. Cognition, 103, 34-79.
Rock, I. (1974). The perception of disoriented figures. Scientific American,
230, 78-85.
Rotshtein, P., Henson, R.N., Treves, A., Driver, J., & Dolan, R.J. (2005).
Morphing Marilyn into Maggie dissociates physical and identity face
representations in the brain. Nature Neuroscience, 8, 107-113.
Schwaninger, A., Carbon, C.C., & Leder, H. (2003). Expert face processing:
Specialization and constraints. In G. Schwarzer & H. Leder (Eds),
Development of face processing (pp. 81-97), Göttingen: Hogrefe.
Shepard, R. N. & Cermak, G. W. (1973). Perceptual-cognitive explorations of
a toroidal set of free-from stimuli. Cognitive Psychology, 4, 351-377.
Subramaniam, S. & Biederman, I. (1997). Does contrast reversal affect
object identification? Investigative Ophthalmology & Visual Science.
38, 998.
Tanaka, J. W. & Farah, M. J. (1993). Parts and wholes in face recognition.
The Quarterly Journal of Experimental Psychology, 46A, 225-245.
68
Tarr, M. J. & Gauthier, I. (2000). FFA: a flexible fusiform area for
subordinate-level visual processing automatized by expertise. Nature
Neuroscience, 3, 764-769.
Tjan, B. S., Braje, W. L., Legge, G. E. & Kersten, D. J. (1995). Human
efficiency for recognizing 3-D objects in luminance noise. Vision
Research, 35, 3053-3069.
Thorpe, S.J., & Fabre-Thorpe, M. (2001). Seeking categories in the brain.
Science, 291, 260-263.
Tsunoda, K., Yamane, Y., Nishizaki, M., & Tanifuji, M. (2001). Complex
objects are represented in macaque inferotemporal cortex by the
combination of feature columns. Nature Neuroscience, 4, 832-838.
Vogels, R. & Biederman, I. (2002). Effects of illumination intensity and
direction on object coding in macaque inferior temporal cortex.
Cerebral Cortex, 12, 756-766.
Vuong, Q., Peissig, J., Harrison, M., & Tarr, M. (2005). The role of surface
pigmentation for recognition revealed by contrast reversal in faces and
Greebles. Vision Research, 45, 1213-1223.
Wojciulik, E., Kanwisher, N., & Driver, J. (1998). Covert visual attention
modulates face-specific activity in the human fusiform gyrus: fMRI
study. Journal of Neurophysiology, 79, 1574-1578.
Xu, Y. (2005). Revisiting the role of the fusiform face area in visual expertise.
Cerebral Cortex, 15, 1234-1242.
Yarmey, A. D. (1971). Recognition memory for familiar “public” faces: Effects
of orientation and delay. Psychonomic Science, 24, 286-288.
Yin, R. K. (1969). Looking at upside down faces. Journal of Experimental
Psychology, 81, 141-145.
Young, A.W., Hellawell, D., & Hay, D.C. (1987). Configurational information
in face perception. Perception, 16, 747-759.
Zhao, W., Chellappa, R., Phillips, P.J., & Rosenfeld, A. (2003). Face
recognition: A literature survey. ACM Computing Surveys, 35, 399-
458.
69
Abstract (if available)
Abstract
What is the nature of the representation of faces and objects that results in the striking behavioral differences in their recognition? Biederman & Kalocsai (1997) proposed that faces were represented in terms of their original spatial-filter excitation values with allowance for scale and position invariance
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Adaptation in fusiform face area (FFA): image or person?
PDF
Face classification
PDF
The neural correlates of face recognition
PDF
The representation of medial axes in the perception of shape
PDF
Invariance to changes in contrast polarity in object and face recognition
PDF
Explicit encoding of spatial relations in the human visual system: evidence from functional neuroimaging
PDF
Behavioral and neural investigations of perceptual affect
PDF
The neural basis for shape preferences
PDF
Object adaptation in the lateral occipital complex: shape or semantics?
PDF
Mode of visual perceptual learning: augmented Hebbian learning explains the function of feedback and beyond
PDF
Crowding and form vision deficits in peripheral vision
PDF
The neural correlates of creativity and perceptual pleasure: from simple shapes to humor
PDF
Characterizing the perceptual performance of a human obseerver as a function of external noise and task difficulty
PDF
Temporal dynamics of attention: attention gating in rapid serial visual presentation
PDF
Cognitive-linguistic factors and brain morphology predict individual differences in form-sound association learning: two samples from English-speaking and Chinese-speaking university students
PDF
Functional magnetic resonance imaging characterization of peripheral form vision
PDF
Sensitive, specific, and generative face recognition in a newborn visual system
PDF
The angry brain: neural correlates of interpersonal provocation, directed rumination, trait direct aggression, and trait displaced aggression
PDF
The importance of not being mean: DFM -- a norm-referenced data model for face pattern recognition
PDF
Temporal dynamics of perceptual decision
Asset Metadata
Creator
Yue, Xiaomin (author)
Core Title
The neural representation of faces
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
06/20/2007
Defense Date
06/14/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
face recognition,fMRI,neural representation,OAI-PMH Harvest,object recognition,perception
Language
English
Advisor
Biederman, Irving (
committee chair
), Baker, Laura A. (
committee member
), Lu, Zhong-Lin (
committee member
), Mel, Bartlett W. (
committee member
), Tjan, Bosco S. (
committee member
)
Creator Email
xyue@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m542
Unique identifier
UC1488816
Identifier
etd-Yue-20070620 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-505750 (legacy record id),usctheses-m542 (legacy record id)
Legacy Identifier
etd-Yue-20070620.pdf
Dmrecord
505750
Document Type
Dissertation
Rights
Yue, Xiaomin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
face recognition
fMRI
neural representation
object recognition
perception