Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The neural correlates of face recognition
(USC Thesis Other)
The neural correlates of face recognition
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
i
THE NEURAL CORRELATES OF FACE
RECOGNITION
by
Xiaokun Xu
_____________________________________________________________
A Thesis Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(PSYCHOLOGY)
August 2013
Copyright 2013 Xiaokun Xu
ii
Table of Contents
List of Tables iv
List of Figures v
Abstract vii
Chapter 1: Introduction 1
A specialized neural module for face recognition 2
Neural representation of face attributes 5
Case study of an acquired prosopagnosic MJH 8
A biologically plausible model of face representation 13
Chapter 2: Detection of face vs. non-face objects 15
Abstract 15
Introduction 15
Methods and Results 20
Experiment 1: Detection Thresholds defined by
phase spectrum coherency 20
Experiment 2: Detection threshold with equal
power spectra 25
Discussion 28
Age effect and individual variance 28
Face detection thresholds 31
Face sensitivity examined in other visual tasks 33
Neural correlates of face detection 36
Conclusion 38
Chapter 3: The configural processing of face 40
Abstract 40
Introduction 41
Methods and Results 45
Experiment 1: Part-Whole Configural Effect 46
Experiment 2: Testing configural effect in spatial frequency
filtered image 51
Experiment 3: Testing configural effect in prosopagnosic 55
Discussion 59
Conclusion 66
Chapter 4: Neural representation of face attributes 67
Abstract 67
Introduction 68
Methods and Results 69
Experiment 1: Sensitivity to Identity and Viewpoint 69
iii
Experiment 2: Sensitivity to Viewpoint and Expression 74
Discussion 79
Encoding Expression 79
Encoding Viewpoint 81
Encoding Identity 83
Conclusion 84
Chapter 5: Summary and speculation 85
References 88
iv
List of Tables
Table 3.1: A summary of differences between face and object recognition. 60
Table 4.1: The location and size of Face-selective ROIs. 76
v
List of Figures
Figure 1.1: Example faces from paintings of Arcimbaldo. 4
Figure 1.2: Face processing model proposed in literature. 6
Figure 1.3: Bottom View of the 3D reconstruction of MJH’s brain lesion. 9
Figure 1.4: Localization of face-selective activation in MJH and controls. 12
Figure 1.5: The Gabor representation of face. 14
Figure 2.1: Illustration of phase coherence modulation for faces. 22
Figure 2.2: Illustration of experimental procedure. 23
Figure 2.3: Detection thresholds measured as phase 25
spectrum signal-to-noise ratio.
Figure 2.4: Illustration of phase coherence modulation 26
with power spectrum equalization.
Figure 2.5: Regression of the detection threshold on 30
age for controls in Exp. 1.
Figure 2.6: Regression of the detection thresholds on 31
age for controls in Exp. 2.
Supplementary Figure 2.1. Original input images used 39
in Exp. 1 under phase-blending procedure.
Figure 3.1: Stimuli used in Tanaka & Farah’s Experiment (1993). 42
Figure 3.2: The Gabor-jet model of face representation. 44
Figure 3.3: Overlapping Receptor fields of large scale Gabor kernels. 45
Figure 3.4: Face features and composite target faces. 47
Figure 3.5: Behavioral results and stimuli similarity analysis. 49
Figure 3.6: Better account of the configural effect by LowSF / LargeRFs 51
than HighSF / SmallRFs components.
vi
Figure 3.7: Spatial filtering of the part and whole face stimuli. 53
Figure 3.8: Configural effect examined in spatial frequency filtered faces. 54
Figure 3.8: Gabor-jet models with fiducial points. 62
Figure 3.9: The composite face effects. 63
Figure 4.1: The scaling of Gabor-jet similarity in Exp. 1. 70
Figure 4.2: Face-selective ROIs and BOLD signal for different conditions. 74
Figure 4.3: The scaling of all Gabor-jet similarity in Exp. 2. 75
Figure 4.4: BOLD signal in bilateral FFA, OFA, and STS 78
as function of adaptation condition.
vii
Abstract
Humans are exceptionally good at face recognition. Within a fraction of a
second, we not only detect the presence of a face, but also determine its identity,
expression, orientation, age, sex, and attractiveness. This process involves
collaboration of specific regions in the cortex, but how the different face attributes
are processed and represented has remained unclear. This thesis is toward the
understanding of the neural correlates supporting the detection and configural
coding of face, as well as the representations of various face attributes,
particularly of face individuation in both normal subjects and an acquired
prosopagnosics MJH.
In the first study, I examined the detection threshold of face in MJH and
controls. The results revealed a significant face detection deficit in MJH
compared with controls, suggesting the contribution of ventral temporal cortex to
face detection. Secondly, I provided a neurocomputational account of the
configural effect in face perception, that is the better discrimination of whole
faces, compared with isolated face parts. The computational simulation and
psychophysical results suggested that it is the overlapping receptor field of Gabor
kernels that coding different part of faces gave rise to the holistic representation
of face. Finally, previous research has shown that the Gabor-jet model predicts
almost perfectly the psychophysical similarity of faces and other complex shapes.
By using this model to scale image similarities among faces, I could compare the
neural representation of the high-level face attributes without the low-level image
similarity confounds. The results of two fMRI adaptation experiments revealed
viii
two cortical loci - the Fusiform face area (FFA) and occipital face area (OFA)
tuning to face identity in normal subjects. On the other hand, MJH’s lesions in
these areas could have led to his complete loss of ability in identifying faces.
Behavioral and imaging results indicate no functional plasticity that had alleviated
his deficits, even after the four decades subsequent to the lesions he suffered in
early childhood. Taken together, these results elicited the important role of FFA
and OFA played in both detection and more abstract coding of human faces.
1
Chapter 1: Introduction
Among all entities encountered in human visual experience, human face is
of particular importance given its social and ecological value in human
interactions. With a quick glance, we not only could detect the presence of a face,
but also determine its identity, expression, orientation, age, sex, and
attractiveness. In spite of tremendous amount of research on the psychological
and neurological basis of face recognition, it remains unclear how human
achieved the extraordinary performance in face recognition, which consistently
surpasses the state of arts in computer vision (O’Toole, 2011; Sinha et al., 2006).
Using psychophysical and neuroimaging methods, this dissertation investigated
several critical aspects in human face recognition, in both normal subjects and an
acquired prosopagnosic suffered from severe impairment in face recognition.
Chapter 1 provides a review of literature in the field. Chapter 2 addressed
the detection of face, compared with non-face objects. Chapter 3 explored the
configural processing of faces, and provided a biologically plausible computation
account of the face-specific configural effect. Chapter 4 used the imaging and
fMRI adaptation technique probing the neural representation of various face
attributes such as identity, viewpoints and expression. A summary and
speculation was provided in Chapter 5.
2
A specialized neural module for face recognition
Does face perception rely on specialized neural mechanisms, compared
with non-face objects? Gauthier and Tarr (1997) proposed that the two shared a
common mechanism and differed only in within-category discrimination expertise.
Through extensive training, they argued that humans could develop the same
kind and level of expertise for novel objects that they exhibited for face
discrimination. By using a computational model incorporating a hierarchy of
increasingly complex visual features (Riesenhuber & Poggio., 2000), Jiang et al
(2006) suggested that a common model is sufficient to generate results similar to
human performance for both face and object discrimination, without a “dedicated
face module”. However, there exist also striking differences between the
recognition of faces and objects by humans, even when the objects in a
discrimination task are highly similar with each other like just like faces. Unlike
objects, which show no effect of contrast reversal or lighting orientation, human
perception of faces is greatly disrupted by contrast reversal and lighting
orientation (Nederhouser et al., 2007). Faces, but not objects, show configural
effects (Tanaka and Farah, 1993), although Gauthier & Tarr (2002) reported a
weak configural effect after considerable training in distinguishing non-face
objects. The difference between similar faces is often ineffable whereas
observers can typically readily describe the local features distinguishing similar
objects (Biederman et al., 1999).
3
Evidence from neurophysiology using single/multi unit recording and
functional Magnetic Resonance Imaging (fMRI) reveal specialized brain areas for
face processing (Kanwisher et al., 1997, Puce et al., 1996., Grill-Spector et al.,
2004, but see Haxby et al., 2001, who argued for an intermixed and distributed
activation pattern for face and objects). Tsao et al. (2008) reported that in primate
cortex, a few areas along the ventral visual pathway show highly localized
activation/turning to faces than other objects. Most importantly, microstimulation
of any one of these face areas evoked activation in other areas only within this
network (Moller et al., 2008).
Studies on patients with visual agnosia and prosopagnosia reveal a link
between specific lesion sites and corresponding deficits in recognition of specific
classes of objects. For example, lesions in the medial fusiform gyrus (Fusiform
Face Area, FFA) and the inferior occipital gyrus (Occipital Face Area, OFA)
typically lead to severe deficits in recognizing faces (Damasio et al., 1986;
Rossion et al., 2003). A striking complementary deficit was found in patient C.K.
(Moscovitch et al., 1997) who was normal at face recognition but extremely poor
at object recognition, perhaps due to the thinning of cortices in the bilateral
occipital region (as there were not apparent focal lesions). Given the Arcimbaldo
Faces (see Fig 1.1), patient C.K. only saw the integrated faces but not the
constituent objects (vegetables and books). This double dissociation in face and
object agnosia therefore suggests a specialized face recognition component.
4
Figure 1.1. Example faces from paintings of Arcimbaldo. (Moscovitch et al., 1997)
5
Neural representation of face attributes
Bruce and Young (1986) proposed a model with distinct pathways for
processing facial identity, emotional expression and eye gaze, respectively (Fig.
1.2a). A corresponding neural model (Fig. 1.2b) was proposed by Haxby et al
(2000) based on reviews of single unit and fMRI studies in both humans and
monkeys (Puce et al., 1996, Hoffman & Haxby, 2000). The model assumed that
the changeable aspects of faces, such as facial expression, eye-gaze and mouth
movement, are processed in the superior temporal sulcus (STS), whereas the
invariant properties of faces such as facial identity were processed in FFA. The
Occipital Face Area (OFA) receives input from early visual stages and feeds the
output to both FFA and STS. This model is further supported by temporal and
anatomical disassociation in the processing of faces. The evidence from MEG
and EEG studies revealed that the early (~90ms from stimulus onset) and late
(~170ms) signatures in the time course of face processing arose from different
locations, namely the inferior occipital and temporal cortices, respectively (Liu et
al., 2002; Smith et al., 2009, Sugase et al., 1999). Electrophysiology in monkeys
also suggested different tuning of neurons in different areas: neurons in STS are
tuned to facial expression and orientation, whereas those in the inferior temporal
gyrus (ITG) are tuned to facial identity (Hasselmo et al., 1989, Eifuku et al., 2004).
6
Figure 1.2. Face processing model proposed by Burce & Young (1986) and
Haxby, Hoffman, & Gobbini (2000), figure adopted from Calder & Young (2005).
In fMRI-adaptation (fMRI-a) the repetition of identical stimuli leads to a
decreased BOLD signal compared to non repeated stimuli (Grill-Spector et al.
1999; Krekelberg et al., 2006). Using this method, Winston et al. (2004) reported
that a change of identity but not expression between two sequentially presented
faces produced a release from adaptation in FFA, whereas a change of
expression but not identity resulted in a release of adaptation in mid-STS,
7
suggesting a functional segregation of identity and expression. Some evidence
for independent coding of facial identity and expression has been reported in
behavioral studies of visual aftereffects, in which, for example, prolonged viewing
of the face of person A results in a biased perception of a face with ambiguous
identity (i.e., a morph between the faces of persons A and B) as person B. Fox
et al. (2008) found that the aftereffect of facial identity could be largely
transferred across different emotional expressions. Similarly, the aftereffect of
face orientation was found to be independent of identity. That is, adaptation to a
face oriented to the left results in a frontal-oriented face appearing to be oriented
to the right with only a small cost when the depicted individuals are changed
(Fang, Lijichi, & He, 2007), but not across exemplars from different categories
like faces and paperclips (Fang & He, 2005). Based on such results, Fang et al.
suggested that the processing of facial identity and viewpoint are largely
independent.
However, there are numerous challenges to the notion of independent
processing of changeable and invariant properties of faces (Calder & Young,
2005). In the identity-dependent expression aftereffect experiments, the shift in
the perception of emotion only partially transferred across faces of different
people (Ellamil et al., 2009; Fox et al., 2008; Campbell et al., 2009; Vida and
Mondloch, 2009). The existence of identity-dependent and identity-independent
emotion aftereffects suggests at least partially overlapped representations of
identity and expression. Single units in monkey STS are tuned to features such
8
as iris size, inter-eye distance, face aspect ratio and their interaction (Freiwald et
al., 2009). The combination of these features contributes not only to facial
expression, but also to the identification of faces (Hasselmo, Rolls, & Baylis,
1989). The level to which the representation of face identity, expression and view
are separated thus needs to be further clarified in Chapter 4.
Case study of an acquired prosopagnosic MJH
To a great extent, most prosopagnosics are congenital in origin, with
uncertainty as to the cortical loci of the effects. High-functioning acquired
prosopagnosia is rare, but affords an opportunity to illuminate the anatomical
basis of the condition. In 1972 (40 years prior to the time of testing) at age five,
as the result of a fall from an 8 ft. high ledge, MJH suffered extensive bilateral
lesions (greater in the right hemisphere) to his ventral occipito-temporal cortices,
with extensive lesions in areas that would normally encompass FFA and OFA
(Fig. 1.3). Anatomical inspection revealed no lesions in his superior temporal
sulci.
9
Figure 1.3. Bottom View of the 3D reconstruction of MJH’s brain showing that the
loci of FFA and OFA based on control subjects are within the sites of MJH’s
cortical lesions. Image and provided by Hanna Damasio and Jessica L.
Wisnowski from the Dornsife Imaging Center, USC based on MR data obtained
under P50NS019632.
Although there was a period of time immediately following his accident
when he reported being completely blind, he regained close-to-normal vision,
and currently exhibits normal contrast sensitivity as assessed with the Pelli-
Robson Contrast Chart (Pelli et al., 1988) although he (inconsistently, over a
number of years, in ophthalmic perimetry testing) sometimes presents some
lower visual field loss in the periphery, particularly in the right visual field. On the
Boston Naming Task (Kaplan et al., 1983) his performance is in the normal range
(actually slightly above average with 47/50 correct) in identifying objects
(Michelon & Biederman, 2003). He drives--in Los Angeles. To casual and non-
10
rigorous examination, as well as subjective report, he is normal, or near normal,
in his detection of faces.
He can readily individuate a person on the basis of voice and shows
normal, if not superior, memory for names and biographical details of the people
he encounters. However, he shows pronounced impairment in identifying people
from visual input, could not recognize his own face in the mirror nor that of his
close family members. He is much worse than controls on both standard tests
such as the Benton Face Recognition Test: 35/54 (Benton et al., 1983), the
Cambridge Face Memory Test 42/58 (Duchaine et al., 2006), and a match-to-
sample test in which an identical matching face is paired with a distracter face
differing in identity (Yue et al., 2012). Mangini et al. (2004) reported that on a test
administered in 1999 he was at chance (controls were perfect) in selecting a
celebrity (e.g., Bill Clinton) from a non-celebrity in pairs of faces, all of whom
were highly familiar to him. In 2012, he was still at chance in individuating faces
of celebrities in a similar choice test (again, controls were perfect). He can readily
individuate a person on the basis of voice and shows normal, if not superior,
memory for names and biographical details of the people he encounters. He is in
the normal range in discriminating expression and sex (Mangini et al., 2004) and
reports that he has mental imagery (i.e., image memory) of faces that he has
previously encountered (Michelon & Biederman, 2003). He is well aware of his
deficit in individuating faces, reporting that he does not recognize his own face in
a mirror nor those of close family members. However, he does not subjectively
11
complain about face detection, instead reporting that all faces (within broad
categories of age, sex, race, etc.) look the same.
MJH and eight control subjects were scanned when viewing pictures of
faces and objects in a series of block-designed localizer runs. The contrast of
BOLD signal when viewing blocks of static images of faces versus objects did not
reveal any differential activation in MJH, while the control subjects showed higher
activation in the typical face-selective areas, such as the fusiform face area (FFA)
and occipital face area (OFA) (Fig. 1.4, top panel). However, the contrast of
dynamic face and object blocks consisting of short video clips of changing facial
expressions and objects in motion did reveal higher activation to faces, but only
in MJH’s bilateral posterior superior temporal sulci (STS) (Fig. 1.4, bottom panel).
His lesion encompassed what, in control subjects, constitutes the posterior face
areas (FFA and OFA); so no differential activation of faces minus objects could
be witnessed there.
12
Figure 1.4. Localization of face-selective activation in MJH and controls, using
static images (top panel) and dynamic video clips (bottom panel) of faces and
objects.
13
A biologically plausible model of face representation
The multi-orientation, multi-scale tuning property of neurons in the early
visual cortex in human brain could be well captured by the Gabor kernels (Hubel
& Wiesel, 1968; De Valois & De Valois, 1988). In the Gabor-jet model proposed
by Von de Malsburg et al (Lades, et al., 1993), each pair of stimuli was filtered by
a 10 x 10 grid of jets. Each jet was composed of 40 Gabor kernels (each a
convolution of a sinusoid and a 2D Gaussian envelope) of 8 equally spaced
orientations (i.e., 22.5 differences in angle) and 5 spatial frequencies (i.e.
ranging from 8 to 32 cycle/face with half octave logarithmic span), each centered
on their jet’s grid point (Fig. 1.5). The coefficients of the kernels (the magnitude
corresponding to an activation value for a complex cell) within each jet were then
concatenated to a 4000-element (100 jets X 40 kernels) vector G: [g1,
g2….g4000]. For any pair of pictures with corresponding jet coefficient vectors G
and F, the similarity of the pairs could be defined as the Un-centered Pearson
Correlation (formula 1) or the Euclidean Distance (formula 2) between the two
vectors. Yue et al. (2012) reported a perfect correlation between the image
difference (Figure 3a) - measured as the Euclidean distance between the target
and distractor faces in Gabor space - and the human discrimination performance
in a match-to-sample task.
() F G Sim , =
∑ ∑
∑
= =
=
4000
1
2
4000
1
2
4000
1
*
i
i
i
i
i
i i
f g
f g
, 4000 ... 2 , 1 = i (1)
() F G Dist , =
∑
=
−
4000
1
2
) (
i
i i
f g , 4000 ... 2 , 1 = i (2)
Figure 1.5. The Gabor representation of face images. Image was filtered with a
10x10 grid of Gabor-jets, with each jet composed of 40 Gabor kernels of various
orientation and spatial frequency. The magnitudes of each kernel’s convolution
output were concatenated and vectorized into a 4000-element feature vector for
each input (face) images.
14
Chapter 2. Detection of face vs. non-face objects
Corresponding publication: Xu, X., & Biederman, I. (2013). Neural correlates
of face detection. Cerebral Cortex. doi: 10.1093/cercor/bht005. First published
online: January 30, 2013
Abstract
Although face detection likely played an essential adaptive role in our
evolutionary past as well as in contemporary social interactions, there have been
few rigorous studies investigating its neural correlates. MJH, a prosopagnosic
with bilateral lesions to the ventral temporal-occipital cortices encompassing the
posterior face areas (fusiform and occipital face areas), expresses no subjective
difficulty in face detection, suggesting that these posterior face areas do not
mediate face detection exclusively. Despite normal contrast sensitivity, the
present study nevertheless revealed significant face detection deficits in MJH.
Compared to controls, MJH showed lower tolerance to noise in the phase
spectrum for faces (vs. cars), reflected in his higher detection threshold for faces.
In a saccadic choice task, the involuntary and fast initial saccade to a face (vs.
vehicle) image was present in normal subjects (Crouzet et al., 2010) but absent
in MJH. MJH’s lesions in bilateral occipito-temporal cortices thus appear to have
produced a deficit not only in face individuation, but also in face detection.
Introduction
Twenty thousand years ago, at low light, you are making your way through
a dense forest inhabited by a hostile clan. Your likelihood of detecting a face
15
peering out of the foliage could well determine your chance of survival. Given this
evolutionary selective pressure for face detection, is there actually greater
sensitivity for the detection of faces compared to other stimuli?
There is ample anecdotal evidence for a bias to interpret ambiguous
stimuli as faces, as when we interpret clouds or rock formations as faces or the
“face on Mars” (NASA, 2001). Newborns spontaneously track a naturalistic or
schematic face with eye and head movements (Johnson et al., 1991). Despite
these qualitative observations, there has been little quantitative research
comparing the detection of faces with non-face objects, with control of both
decision bias (the tendency to interpret ambiguous image as faces) and the low-
level image statistics such as the power spectrum. In the present investigation,
we present a methodology for such an assessment, with detection
operationalized as the accuracy of selecting a degraded target against an image
of pure noise in a Two Alternative Force Choice (2-AFC) task.
What might be the critical features that activate the representation of a
face? Dakin and Watt (2009) suggested key features derived from the horizontal
structure of the face. These tend to form horizontally elongated, vertically aligned
bands, mimicking (from the top) a three cycle/face barcode of dark (hair), light
(forehead), dark (eye sockets), light (cheeks), dark (mouth) and light (chin).
Sinha et al. (2006) also proposed a scheme that simply relies on the contrast
regularity among the forehead, eye socket and cheekbone. Similarly a template
consisting of a few dark and light rectangles has also been used for face
detection in the initial screening in a popular computer vision model (Viola &
16
Jones, 2001). Since the early visual areas are tuned to the position and
orientation of local contrast at multiple spatial frequencies (Hubel & Wiesel, 1968),
it is possible that face detection, i.e. distinguishing a face from random natural
noise, could be carried out solely from the information computed in this early
stage.
On the other hand, extensive neuroimaging research has implicated
regions in the later stage of ventral pathway that are specialized in the
processing visual of faces. Specifically the fusiform and occipital face areas (FFA
and OFA, respectively) have been suggested as neural loci for the conscious
perception of face vs. non-face objects: higher activation in FFA has been found
not only in viewing faces verses non-faces (Kanwisher et al., 1997; Puce et al.,
1996), but also in the conscious awareness of a face in binocular rivalry (Tong et
al., 1998), in the figure/ground illusion (Hanson et al, 2001), as well as in face
imagery without external visual input (O’Craven & Kanwisher, 2000; Ishai et al,
2000). Grill-Spector et al. (2004) showed that the BOLD signal in FFA was
correlated not only with the successful identification of individual faces, but also
with the simple detection of faces, against non-face objects in brief masked
stimuli. Yue, Tjan, & Biederman (2006) reported a release from FFA adaptation
when discriminating faces of different (vs. same) individuals but not when
discriminating equally similar blobs, scaled by a measure of V1 similarity,
designed to mimic the low level features of faces. Andrews & Schluppeck (2004)
also reported higher activation in FFA associated with the awareness of
ambiguous Mooney faces compared to blobs. All these findings strongly suggest
17
that FFA is critically involved in the categorical and conscious perception of faces
against non-face objects. However, it remains unclear whether these areas
mediate face detection, i.e. discriminate faces against random noise.
Prosopagnosia, known as “face blindness”, is a disorder of face
perception where individuals present severe deficits when individualizing faces.
Depending on its origin, prosopagnosia could be classified as congenital (also
termed “familial” or “developmental”) in the absence of apparent brain lesions
(Duchaine et al., 2003). Numerous case studies of congenital prosopagnosia
have reported normal face detection, with the majority reporting normal brain
anatomy (de Gelder & Rouw, 2000; Duchaine et al., 2003, 2006; Le Grand et al,
2006; Garrido et al, 2008). Alternatively, prosopagnosia could be acquired, as a
result of brain lesions, likely in the fusiform gyrus in the ventral temporal lobe
(Damasio et al., 1986; Rossion et al., 2003). Most investigations of
prosopagnosia have, understandably, focused on impairments in face
identification or individuation, the core complaint of prosopagnosics. Little
attention has been paid to their detection of faces.
To the best of our knowledge, there has not been a single report of
impaired face detection in acquired prosopagnosia. In some of these cases
where there was a report of normal face detection, no lesion after recovery from
closed head injury was observed (de Gelder & Rouw, 2000). In a few other cases
(also with normal face detection), the face areas were only partially lesioned,
such as in patient P.S who suffered damage to left FFA and right OFA but with
sparing of right FFA (Rossion et al, 2003; Schiltz et al, 2006). If the posterior face
18
areas, i.e., OFA and FFA, do mediate face detection, it may be the case that
some surviving tissue in those areas is sufficient to achieve normal face
detection. Would a more extensive lesion in those face areas impair face
detection?
A potential shortcoming in previous investigations of face detection in
prosopagnosia arises from their employment of designs in which detection of
faces was compared to grid-scrambled faces, or two-tone face contours
embedded in a scramble of face features (Garrido et al., 2008). These types of
stimuli do not readily permit principled parametric variation in the context of more
general accounts of shape coding.
In the present study, in two experiments we investigated face detection in
MJH, an acquired prosopagnosic with extensive bilateral lesions to areas that
would encompass FFA and OFA in normal individuals, using more rigorous and
theoretically motivated psychophysical tasks and metrics. Specifically, we
manipulated the coherency of images by adding variable proportions of noise to
the phase spectrum and measured the detection threshold—defined as the
proportion of phase noise that allowed 75% 2AFC accuracy at distinguishing an
instance of a target class (faces and cars) against a foil that was pure noise. In
both experiments, the power spectra of the target and foil were identical on each
trial. In Experiment 1, however, the power spectra of the faces and cars were
unmatched which allowed the possibility of an interaction between phase
coherency and power spectra to affect performance. In Experiment 2, the power
spectra were identical for the two stimulus classes.
19
Methods and Results
Subjects
Control subjects (details specified separately for the two experiments)
were comprised of students, faculty and staff recruited from USC’s campus
community. All controls reported normal or corrected to normal vision and no
history of neurological disorders or injuries. The detailed description of the
functional and neurological condition of MJH, an acquired prosopagnosic, could
be found in the Ch1 introductions. The study was approved by USC’s Institutional
Review Board.
Experiment 1: Detection Thresholds defined by phase spectrum coherency
Stimuli
Because shape identifiability is largely a function of spatial phase
(Openheim & Lim, 1981), by introducing external noise into the phase spectrum,
the detectability of instances of stimulus classes (faces and cars in the present
investigation) can be varied parametrically without affecting the global luminance
and contrast of the images (Sadj & Sinha, 2003; Dakin et al., 2002). Thirty-six
individual Caucasian faces (half female) were created using the FaceGen
Modeller 3.2 (Singular Inversions, Toronto, Canada, http://facegeon.com), with
moderate variation in their pose, size and expression. Thirty-six car images were
downloaded from the internet, also with variation in their make, pose, and size
(all 72 raw stimuli are shown in Supplementary Figure 1). All images were
converted to 8 bit gray scale images and then normalized in their global
20
luminance histogram and the Root Mean Square (RMS) contrast using the
SHINE toolbox (Willenbockel et al., 2010). For each image, after a 2-D Fourier
transformation, we combined its phase spectrum with the counterpart of a white
noise image with complimentary weights while maintaining its power spectrum.
We adopted Dakin et al.’s method (2002) in the phase-blending process to avoid
the over-representation of near-zero-degree phase components. The new image
was produced by combining the original power and the synthesized phase
spectrum through the reverse Fourier transformation, as shown in Fig. 2.1a. A
demo video showing a transition from 0% to 100% phase-spectrum integrity of a
face could be downloaded at http://geon.usc.edu/~kun/FaceEmerge.avi. The
phase spectrum signal-to-noise ratio (psSNR) was defined as the ratio between
the proportions of phase contributed by the original image (signal) and by the
noise. Given the log-linear relationship between the psSNR and the detectability
of the stimuli (Horner & Andrews, 2009), the QUEST method (Pelli et al, 1986)
was used to probe the threshold of psSNR for 75% detection accuracy for faces
and cars, respectively.
21
Figure 2.1. Illustration of phase coherence modulation for faces. a) Example of
how the images were generated by the introduction of random noise into the
phase spectrum. b) Examples of resultant images (here shown with a face target)
with variation in the proportion of the original phase. The identical procedure was
used with car targets.
Procedure
Participants performed a 2 Alternative Force Choice (AFC) task where two
images were presented side-by-side briefly for 100ms or 200ms, in separate
blocks, and covered by grid-scrambled masks. Each image subtended a visual
angle of approximately 3º and was centered at 2º eccentricity, left and right, from
central fixation. Nineteen control subjects (10 females, mean age = 32.4 years,
SEM = 4.1) and MJH pressed the left or right arrow key to indicate the target
image (face or car, in separate blocks, each comprised of 72 trials, as illustrated
in Fig. 2.1b) that were modulated by different proportions of noise (variable
psSNR) in its phase spectrum, against a distracter composed of 100% phase
noise. Subjects were instructed, prior to each block, whether the target class
22
would be faces or cars. The QUEST algorithm estimated the best psSNR for the
target in the next trial by using Bayesian statistics based on previous trials. The
terminating asymptotic psSNR was deemed as the detection threshold for each
type of stimulus and exposure duration. Subjects heard a beep as the error
feedback, and were permitted to take a break between blocks.
Figure 2.2. Illustration of experimental procedure, where the subject is to chose
the image (left or right) containing the target category (in this case, a face) with
variable SNR in the phase spectrum against the 100% Phase Noise image. The
two images share identical power spectra. The QUEST staircase procedure
(Watson & Pelli, 1983) was used to probe the threshold SNR--the ratio between
original phase and random noise phase--to achieve 75% detection accuracy
separately for each stimulus category and exposure duration.
Results
For controls, the detection threshold, defined in terms of phase SNR, was
lower for faces than cars; and for the longer than the shorter exposure duration
23
(Fig. 2.3a). A repeated measures ANOVA revealed significant main effects of
both Exposure Duration: F(1,18) = 16.9, p = 0.001, and Stimulus Category:
F(1,18) = 9.7, p = 0.006, as well as their interaction: F(1,18) = 5.3, p < 0.05.
Pairwise comparisons further revealed lower threshold for detecting faces than
cars at both the 100ms exposure: t(18) = 4.3, p < 0.001, and the 200ms
exposure: t(12) = 3.1, p = 0.006; as well as lower thresholds for longer than
shorter exposure durations, for both faces, t(12) = 2.3, p = 0.04, and cars, t(12) =
2.7, p = 0.01. A further comparison of the exposure effects on the thresholds of
faces and cars showed that the interaction was driven by a larger drop of car
than face detection thresholds as a result of the longer presentation duration.
Compared to controls, MJH showed a markedly higher threshold in
detecting faces (dashed line Fig. 2.3a) at both the 100ms exposure: t = 3.0, p <
0.01 and 200ms exposure durations: t = 1.7, p = 0.05, and to a lesser extent in
detecting cars at the 100ms exposure duration, t = 1.7, p = 0.06, but not at the
200 ms exposure duration: t < 1, p > 0.1. The modified t-test proposed by
Crawford and Howell (1998) was adopted to correct the bias introduced by the
relatively small sample (< 30) of controls. Although the absolute threshold
difference between MJH and controls, defined by Phase SNR, was larger for cars
than faces (Fig. 2.3a, left panel), in units of the standard error of the face
threshold in controls (shown by the miniscule error bars and calculated as a t-
value), the deficit for MJH was greater for faces than for cars particularly at the
100ms exposure duration.
24
Figure 2.3. Detection thresholds measured as phase spectrum signal-to-noise
ratio (SNR), with the face vs. car power spectrum unmatched in Exp. 1 (panel a)
and equalized in Exp. 2. (panel b). Error bars indicate the standard error of the
controls. Right panels are the values of the t-scores of MJH vs. controls.
Experiment 2: Detection threshold with equal power spectra
Although we equalized the global luminance and RMS contrast among all
stimuli in Exp. 1, there were differences in the power spectra between faces and
cars that might have affected their detection thresholds. Therefore in Exp. 2, the
original power spectrum of each exemplar was replaced by the grand-mean of
the power spectra of all faces and cars in the stimulus set (Fig. 2.4a). Thus, in
Exp. 2, the variation among the modulated images would be solely a function of
25
their phase coherency. We analyzed the resultant detection threshold similarly as
in Exp. 1. In addition, the results of controls in Exps 1 and 2 were subjected to a
mixed type repeated measure ANOVA, with Exposure Duration (100ms vs.
200ms) and Category (face vs. car) as within-subject factors, and the image
property (unmatched vs. matched power spectra between faces and cars) as the
between-subject factor, given that different groups of controls were recruited in
the two experiments. The result of this analysis would therefore help to clarify
whether the categorical difference between faces and cars in their detection
thresholds defined by phase coherence could be influenced indirectly by the
power spectrum, even when the power spectra were always kept identical for the
target and the distractor by design. Another set of 19 control subjects (10 female,
mean age = 30.0 years, SEM = 3.5) participated in Exp. 2. The remaining
procedures were identical to those of Exp. 1.
26
Figure 2.4. Illustration of phase coherence modulation with power spectrum
equalization for faces. a) Example of how the images were generated by the
introduction of random noise into the phase spectrum. b) Examples of resultant
images (here shown with a face target) with variation in the proportion of the
original phase. The identical procedure was used with car targets.
Results
Again, a lower detection threshold in terms of phase spectrum SNR was
evident for faces compared to cars, and for longer compared to shorter
presentation durations in control subjects (solid line in Fig. 2.3b). As in Exp. 1, a
2 Stimulus category (face, cars) X 2 Exposure duration (100ms, 200ms) repeated
measures ANOVA in controls revealed significant main effects for Stimulus
Category: F(1,18) = 74.6, p < 0.001, and Exposure duration: F(1,18) = 14.1, p =
0.001, as well as their interaction: F(1,18) = 7.4, p < 0.05. Pairwise comparisons
revealed lower thresholds for detecting faces than cars at both 100ms exposure:
t(18) = 8.4, p < 0.001, and the 200ms exposure durations: t(12) = 7.1, p < 0.001;
as well as lower thresholds for longer than shorter exposures, for both faces, t(12)
= 3.1, p = 0.006, and cars, t(12) = 3.3, p = 0.004; Similar to the results of Exp. 1,
the interaction was driven by a relatively larger effect of exposure duration on the
detection of cars compared to faces.
The equalization of image power spectrum did not significantly increase
the threshold for the detection of faces, but did so for the cars. A mixed ANOVA
was conducted with Exposure Duration and Stimulus Category as a within-
subject factor and the Spectral Difference as a between-subject factor. There
was a significant effect of both Exposure Duration: F(1,36) = 23.7, p < 0.001, and
27
Stimulus Category: F(1,36) = 80.9, p < 0.001, and their interaction: F(1,36) = 12.6,
p = 0.001. More importantly, there was a significant interaction between Category
and Power-Spectrum Equality: F(1,36) = 9.9, p = 0.003. A further independent t-
test between the thresholds with and without equal power spectra was performed
for face and car stimuli, collapsed over exposure duration. This analysis showed
a significant effect of power spectrum equalization only for cars: t(74) = 3.9, p <
0.001, not for faces: t(74) = 1.2, p > 0.2, suggesting a minimal role of power
spectra on thresholds for face detection, and a moderate role for car detection.
Compared to controls, MJH still showed a markedly higher threshold in
detecting faces (dashed line in Fig. 2.3b) at the 100ms exposure duration: t = 5.8,
p < 0.01 and the 200ms exposure duration: t = 2.4, p < 0.02, but close-to-normal
thresholds at detecting cars at both exposure durations: t < 1, p > 0.2. In
summary, even under power spectra equalization, MJH’s deficit for face
detection remained, but not for car detection, as shown in the right panel of Fig.
2.3b.
Discussion
Age effect and individual variance
To test for a potential age effect, we split the controls into two groups: 13
young (9 females, mean age = 23.3 years, SEM = 4.1) and 6 older (2 females,
mean age = 49.7 years, SEM = 2.3), approximately matched to MJH’s age), and
performed the same analyses as reported above for all subjects. The results for
both groups were highly similar to those when all controls were pooled: controls’
28
detection thresholds were lower for faces than for cars, and lower for longer
exposure durations than shorter ones. More importantly, MJH’s deficits in face
detection remained significant compared with either age group: MJH vs. 13
young controls, ts > 2, ps < 0.02 for both exposure, MJH vs. six age-matched
controls, t = 2.7, p < 0.02 for 100ms and t = 1.2, p = 0.13 for 200ms exposure.
We further conducted a regression of the controls’ detection thresholds on the
age, for each combination of Exposure, Stimulus category, and the Power
spectrum manipulation. All regressions showed a slope of SNR as a function of
age that was close to zero (~.01 increase in SNR units/year, ps > .2) except for
the 200ms presentation containing a car with unmatched (with respect to faces)
power spectrum with the slope of 0.02 and p = .05 (Figs. 2.5 and 2.6, for Exps. 1
and 2, respectively). MJH’s face detection threshold was well above the
regressive prediction of his age, whereas his threshold for car detection was well
among the controls.
The larger variance in car compared to face thresholds could be attributed
to differences both in the variability of the stimuli and variability in the features
relied on by different subjects, particularly with the cars. To prevent the subjects
using certain canonical templates for detection, we chose images of faces and
cars of different identities/makes, and various poses and sizes. However, there
still existed higher similarity among all faces then cars, due to the nature that all
human faces are highly similar with each other. In debriefing subjects, all
mentioned the usage of eyes as the most salient feature for face detection;
whereas for the car detection, different subjects relied on various features such
29
as headlights, windows, and sharp angles in the contour, yielding considerable
variability in their performance. In summary, the detection of faces seems more
dependent on phase-spectrum integrity, which was hardly affected by age. In
contrast, the detection of cars appeared to rely not only on phase integrity, but
also the correspondence between phase and power spectrum. The explanation
for this difference awaits further research.
Figure 2.5. Regression of the detection threshold on age for controls in Exp. 1.
30
Figure 2.6. Regression of the detection thresholds on age for controls in Exp. 2
(equal power spectrum for faces and cars).
Face detection thresholds
By parametrically injecting noise in the phase domain, we degraded the
detectability of images, somewhat mimicking the “walking in the woods” scenario
where a target could be disguised by noise that resembles a phase scrambled
face or object. In control subjects, we found higher sensitivity to faces than cars,
as evidenced by the lower detection threshold to faces. This is consistent with
the previously reported saccade bias to faces (Crouzet et al 2010), as well as the
attention capture by faces (Bindemann et al., 2005; Vuilleumier et al., 2000).
What is the underlying mechanism supporting our higher sensitivity to faces?
31
Torralba & Oliva (2003) suggested that the statistics of the power spectrum were
sufficient to predict human detection of animals in natural images. However,
Wichmann et al. (2010) reported that observers’ rapid detection of animals in the
Corel image database to be essentially unchanged after power spectrum
normalization among images of all categories. This result makes it highly
unlikely that human observers make use of the global power spectrum. In the
present study, could the lower threshold for faces vs. cars be a function of a
difference in the power spectra between the stimulus classes?
Strictly speaking, the power spectrum is not directly informative for
choosing a target against noise in the present 2-AFC tasks, because the two
always shared the identical power spectrum by design. However, the categorical
difference in the power spectrum between faces and cars could still contribute to
their detection threshold discrepancy by interacting with the phase spectrum
modulation. We therefore further eliminated this possible confound by using the
grand mean power spectrum of all stimuli in the phase-blending process, and
then repeated the staircase procedure to measure the detection thresholds in
Exp. 2. Again, controls showed lower thresholds for faces than cars. This result
suggests an intrinsic sensitivity to faces compared to non-face objects by human
observers. Furthermore, this sensitivity is primarily driven by the local contours
and features determined by the phase spectrum, whereas the authenticity of the
power spectrum plays only a minimal role. On the other hand, the detection of
cars relied more on both phase-spectrum coherency, and its interaction with the
32
power spectrum structure even when the power spectrum by itself was not
directly informative.
Face sensitivity examined in other visual tasks
Hershler and Hochstein (2005) first reported “pop-out” in visual search for
face targets among non-face objects such as vehicles and animals. Specifically,
search efficiency, defined as the ratio between reaction time and search set size,
appeared to be much higher for non-face objects than face images. VanRullen
(2006) reexamined this phenomenon by replacing the power spectrum of the
target face and all distractors with that of various cars, and found decreased
search efficiency as a result of the mismatched power and phase spectrum of
face images. Therefore VanRullen concluded that face pop-out is not driven by
high-level visual information such as contour localization determined by phase
spectrum structure, but rather by low-level properties such as the power
spectrum. This result is in partial agreement with the present study: when the
original power spectrum of the image was replaced by the grand-mean power
spectrum of 36 faces and 36 cars, control subjects’ detection thresholds
measured as phase spectrum coherency were raised for car targets, and to a
lesser degree and insignificantly, for face targets as well. The discrepancy
between the VanRullen et al. and the present studies might be a function of the
following factors: a) the power spectrum was kept identical between the target
and the distractors in the present study by design, therefore the discriminative
information in the power spectrum could only be utilized via interaction with the
33
modulation in the phase spectrum, whereas in the previous visual search
experiments the power spectrum was directly informative; b) Eccentricity of the
face target. It has been shown that peripheral vision suffers from crowding and
more distortion in the phase spectrum (Greenwood et al., 2009), compared with
the relatively unlocalized power spectrum. Therefore, the alteration of the power
spectrum could have been more disruptive when the face is presented in the
periphery, as when they were randomly positioned in a large 8-by-8 search array
in the visual search studies (Hershler & Hochstein, 2005; VanRullen 2006);
whereas in the 2-AFC paradigm in the present experiment, the target and the
distractor were always presented close to foveal vision.
Honey, Kirchner, and VanRullen (2008) used a saccadic choice task to
measure a bias toward faces. In every trial, one of the two images had a contrast
value of 40% (the reference) and the other (the probe) a randomly determined
level between 16% and 100%. They measured probe choice probability as a
function of probe contrast, separately for faces and means of transport as the
probe category, and fitted the two resulting curves with cumulative normal
functions. The found that the equal probability of choosing the category was
reached at a lower contrast level when the face was the probe, and at a higher
contrast when the vehicle was the probe. This bias toward faces held even when
the phase spectrum of the images was completely wiped out by random noise,
such that subject could not recognize the objects at all. The mean luminance and
RMS contrast were matched between the faces and vehicles. However, when the
two images had different contrast (in 80% of the trials over the whole experiment),
34
the power spectrum was modified as a consequence as well; although the effect
of contrast modulation did not necessarily eliminate the categorical information in
the power spectrum, it remained unclear whether the persistent bias toward
faces could be completely attributed to the categorical difference in the power
spectrum structure between faces and cars.
Crouzet et al (2010) reported an ultra-fast (~100ms) initial saccadic
preference toward a face image when a face and a vehicle were simultaneously
presented in the periphery, even when the subjects were instructed to fixate the
vehicle. Crouzet and Thorpe (2011) further tested this saccadic bias toward faces
when the power spectra were normalized among the faces and cars, just as we
did in Exp. 2. They found a decreased but still significant tendency in saccading
toward a face compared to a car, regardless of which one was instructed as the
target. An even stronger test of the effect of power spectra has been made by
swapping the power spectra between faces and cars. Again, subjects’ initial
saccades were biased to the hybrid of face-phase and car-power spectrum,
indicating the dominance of phase-spectrum over power-spectrum information.
Furthermore, the detrimental effect of power spectrum swapping was comparable
for face and car targets, reflected in the comparable magnitude of increases in
both reaction times and error rates for the two types of targets.
Importantly, the phase spectrum of the cars and faces remained intact in
that study. In comparison, the present study was primarily concerned with the
integrity of the phase spectrum, which turned out to dominate the detectability of
both faces and cars: higher signal-to-noise ratio in the phase spectrum
35
composition led to higher detection accuracy. In addition, the contrast between
Exps. 1 and 2 also revealed a minor role for power spectra, but only for cars, not
for faces.
In an unpublished study, we replicated Crouzet et al.’s (2010) finding of a
significant bias in saccades to faces against vehicles (with their power spectra
unmodulated) in a group of control subjects. Unfortunately MJH’s saccadic
response accuracy to either face or car targets was at chance. Considering that
the two images were presented simultaneously in the periphery (10º eccentricity
from the central fixation) by design of the saccade paradigm, MJH’s chance-level
performance was likely caused by his deficits in peripheral vision, confirmed by a
recent perimetry test (performed by his ophthalmologist), although his foveal
vision appears to be intact as assessed by visual acuity and contrast sensitivity
tests that we administered. .
In summary, the preferential saccade to a face is driven principally by its
phase spectrum. However, when the coding of phase-coherency is inefficient,
either through experimental manipulation (Honey et al, 2008), or due to a cortical
lesion (MJH), the power spectrum signature of faces can be utilized to
accomplish face detection.
Neural correlates of face detection
Could face detection be achieved in early visual areas without the
involvement of posterior face-selective areas, such as FFA/OFA? We
investigated this question in our study of MJH, who suffered lesions to those face
36
areas but with sparing of early visual areas. When the power spectra for cars and
faces were unmatched (Exp. 1), MJH’s threshold was higher than controls for
both cars and faces, with his deficit more pronounced when detecting faces than
cars. However, when the stimuli were equalized in their power spectra (Exp. 2),
MJH’s deficits in face detection remained significant, whereas his car detection
threshold was close to that of the normal controls. This dissociation in the effect
of power spectrum equalization is in line with the neural tuning properties along
the visual hierarchy: FFA/OFA has been shown to respond well to the modulation
of phase but not the power spectrum of face images (Horner & Andrews. 2004).
In fact those areas could be reliably localized by contrasting fMRI responses to
images of faces versus their phase-scrambled versions (Grill-Spector et al.,
1998). However, the early visual areas are more sensitive to modulation in the
power spectrum rather than in the phase spectrum (Olman et al., 2004). It is
likely that MJH’s lesions in higher-order face areas impaired his sensitivity to
phase coherency but only minimally affected his sensitivity to power spectra. It is
therefore possible that when trying to detect a face, he was relying more on the
power spectra supported by his spared early visual areas. When the power
spectrum was made non-informative (identical for target and distractor) in both
Exp. 1 and 2, MJH showed persistent deficits regardless of the normalization of
the images’ power spectrum. In contrast, as shown in the threshold results for
control subjects in Exp. 2, although car detection was primarily modulated by the
integrity of phase, it was also affected by the modification of the power spectrum,
such that the car’s threshold was further increased by normalization of the power
37
spectrum. Therefore the power spectrum normalization in Exp. 2 significantly
increased thresholds in controls, and to a lesser degree, in MJH, therefore
diminishing the advantage of the controls over MJH.
Because we only investigated one non-face category, cars, our finding of
a greater reliance on phase-integrity for the detection of faces means that we
cannot exclude the possible importance of phase-integrity for other non-face
categories, such as human bodies, animals, tools, and scenes.. Additional
research is required to assess the relative importance of phase integrity for these
other stimulus classes.
Conclusion
Prosopagnosics have been reported not to have difficulty in face detection,
despite their pronounced deficits in face individuation. The present case study of
prosopagnosic MJH nonetheless revealed a significant deficit in his detection of
faces when assessed with rigorous psychophysical testing. The detection
threshold for faces by controls was largely invariant with variations in the power-
spectrum but heavily affected by the integrity of the phase spectrum. MJH’s
heightened face detection threshold, together with his profound impairment in
face identification, are likely consequence of his bilateral occipito-temporal
lesions that encompass extrastriate, posterior face-selective areas. His largely
spared early visual areas alone are insufficient to support normal face detection.
38
Supplementary figure
Supplementary Figure 2.1. Original input images (36 faces and 36 cars) used in
Exp. 1 under phase-blending procedure.
39
Chapter 3. The Neurocomputational Basis of
Configural Effects
Abstract
The representation of faces is said to be configural. But what could
“configural” mean in neurocomputational terms? A candidate hypothesis is the
overlapping receptor fields (RFs) of neurons coding for different face features.
We used the von der Malsburg Gabor-jet system (Lades et al., 1993), which
captures the mutli-scale, multi-orientation tuning of neurons in V1 hypercolumns,
to model the paradigmatic configural effect in face recognition (Tanaka & Farah,
1993): After learning a set of composite faces, recognition of a whole face,
against a distractor differing only in the shape of a single face part (e.g., the
nose), was more accurate than the recognition of that part in isolation. The
Gabor-jet model yielded the same ordering of image similarity, producing greater
Euclidean distances between whole faces than that between the single parts
distinguishing those faces. Although the low spatial frequency and large
receptive field size, relative to high frequency and small receptive field size
components’ response provided a stronger account of the configural effect, a
study in which spatial frequency and receptive field size were manipulated
independently showed that virtually all of the configural effect was a function of
receptive field size, with no differential contribution of low vs. high spatial
frequency.
40
Introduction
People tend to generate a collection of descriptors of individual face
features, such as those for the nose, mouth, eyes, chin, etc, when they are asked
to describe a face. Modifiers are also typically restricted to separate parts, such
as “round face”, “large eyes”, “high cheekbones” and “wide mouth”. Therefore, it
is natural to speculate that the individuation of faces is also accomplished
component-by-component, as “piecemeal” processing (Carey & Diamond, 1977).
However, other research has suggested that faces are processed holistically,
that the features are integrated and perceived as a gestalt. We are not only
sensitive to the variation (shape, color, shade, etc) of those individual features,
but also the spatial relationship among them. Specifically, the recognition of
individual face features (e.g. Obama’s mouth) is harder when presented in
isolation, compared to presentation of his intact face when the mouth is placed in
whole face context, as demonstrated by Tanaka & Farah (1993), in a part-whole
recognition task. Their subjects learned the names of six Mac-a-Mug faces in
which different shaped eyes, noses, and mouths cold be swapped within an
identical face outline to produce a compose face. They did the comparable
operations with houses and house parts (Fig. 3.1). In the recognition test, the
subjects had to distinguish, Larry, say, from a composite foil which they had
never seen in the learning phase, and differed from the target faces by a single
part, the nose in this example. On other trials in the experiment, they had to
select the particular face part that distinguished the original face from the foil face,
with the two parts presented in isolation. Identification performance was better for
41
the composite faces, compared with the parts in isolation, although in both cases
the target differed from the foil by a single part. However, these contextual
benefits were absent when the faces were flipped upside-down, or spatially
scrambled. The authors therefore argued that the processing of face features
was an integrated/holistic, rather than isolated and serial process.
Figure 3.1. Stimuli used in Tanaka & Farah’s Experiment (1993). Notice that in
the target face and foil face only differed in the nose, as shown in isolation in the
upper part of the figure. Similarly, the pair of house only differed in the door.
A neurocomputational account of this phenomenon can be derived from
the by the Gabor-jet model proposed by Lades et al. (1993). Each jet in the
model is composed of a series of Gabor kernels, which is produced by the
convolution of sinusoidal gratings of various spatial frequencies, and a Gaussian
kernel envelope with specific window width, mimicking the multi-scale multi-
42
orientation tuning property of V1 neurons (De Valois & De Valois, 1988). Given a
pair of input images, their similarity could be measured as the Euclidean distance
between the Gabor-jet coefficients vector of each image. Specifically, each pair
of stimuli was filtered by a 10 x 10 grid of jets (Fig. 3.2a). Each jet was composed
of 80 Gabor kernels (each a convolution of a sinusoid and a 2D Gaussian
envelope) of 8 equally spaced orientations (i.e., 22.5 ° differences in angle) and 5
spatial frequencies (i.e. ranging from 8 to 32 cycle/face with half octave span),
and 2 phases with 90 degree shift (sine and cosine), each centered on their jet’s
grid point. The coefficients of the kernels (the magnitudes and phases
corresponding to an activation value for a V1 neuron) within each jet were then
concatenated to an 8000-element (100 jets X 40 kernels x 2 phases) vector G:
[g1, g2….g8000]. For any pair of pictures with corresponding jet coefficient
vectors G and F, the dissimilarity of the pairs is defined as the Euclidean
Distance (formula 2) between the two vectors.
() F G Dist , =
∑
=
−
4000
1
2
) (
i
i i
f g , 8000 ... 2 , 1 = i
Previous studies have revealed that the Gabor-jet distance/similarity is a
good predictor of human discrimination of complex shapes, such as faces, or
convex blobs. Yue et al. (2012) reported a perfect correlation between the
Gabor-jet distances of two faces and subjects’ discrimination accuracy in a
match-to-sample task, with one face serving as the target and the other foil. In
other fMRI adaptation experiments, the adaptation magnitudes of BOLD signal in
FFA was also shown to be proportional to the Gabor-jet distance of two face
images shown in succession (Xu et al., 2009).
43
Figure 3.2. Gabor-jet model. a) Image was filtered with a 10x10 grid of Gabor-
jets, with each jet composed of 40 Gabor kernels of 8 orientations and 5 spatial
frequencies. The magnitudes of each kernel’s convolution output at 0 and 90˚
phase were concatenated and vectorized into a 4000-element feature vector for
each input (face) images. b) face stimuli in a match-to-sample task, where the
distractor differed from the target by feature spacing. c) strong negative
correlation between subject’s discrimination accuracy and Gabor similarity (Yue
et al., 2012).
Given sufficient sampling density over the visual input, there is a
considerable degree of overlap in the receptive fields of different kernels, such as
the two centered on the tip of the nose and right eye of Tom Cruise’s face (Fig.
3.3). Any one kernel is activated by contrast variation over a large area of the
44
face and any one area of the face was modulated by variation in local kernels
throughout the lattice. Therefore, the interaction between local features such as
those arising from the eyes and nose and the contextual face background
created additional visual features, which were picked up by the kernels covering
those areas, especially those with larger receptor fields.
Figure 3.3. Overlapping of Receptor fields of neurons with Gabor-like turning
property, on top of the face image.
Could the part-whole configural effect reported by Tanaka & Farah (1993)
also be explained by the Gabor-like representation of human faces? We tested
this hypothesis by creating a similar stimulus set, replicating the experiment, and
correlating the behavioral performance with the Gabor-jet representation of those
faces and parts.
Methods and Results
45
Experiment 1. Replication of the Tanaka-Farah Configural Effect
Subjects
Eleven students from University of Southern California (Mean Age = 24.3
years old, 3 female). All subjects reported normal or corrected-to-normal vision,
and normal face recognition ability.
Stimuli
The design followed that in the original Tanaka and Farah (1993)
experiment. Three exemplars of each of three face features, eyes, nose and
mouth, were chosen to create 27 composite faces, with the face features
maintained in the same spatial locations and embedded in a common head
contour background, using the Morphases Editor (Morphases, Kajaani, Finland).
Six among the 27 faces were each given a name (Fig. 3.4) while the others
served as foils. The association between each of the six target faces and its
name were acquired during a self-paced learning session, during which each
face-name pair was presented five times in randomized order. Two of the six
target faces shared one feature and differed in the other two, e.g. Larry and Bob
had identical eyes, Mike and Derek identical mouths, Bob and Mike identical
noses, such that no one part exemplar was unique to a particular face.
In each trial of the test session, subjects were presented with either a pair
of isolated face features or a pair of composite faces, and performed an
identification task. For example, given a pair of composite faces, say Larry’s face
and a foil that differed from Larry (who had eyes1) by only one feature (e.g. foil
46
had eyes2), subjects pressed either the left or right arrow key to indicate which
one was Larry. Similarly, when presented with eyes1 and eyes2 in isolation,
subjects were asked to identify which one depicted Larry’s eyes. Importantly, the
foils were chosen outside of the six learned identities, and were not seen during
the learning session.
Subjects observed the presentation of the stimuli in grayscale on a CRT
computer monitor, from a distance of approximately 57 cm. Each image
subtended a visual angle of approximately 6º and was centered at 4º eccentricity,
left and right, from central fixation. Subjects had to respond within 5s from the
onset of the target images.
Figure 3.4. Face features and composite target faces created for the replication
of the part-whole identification experiment in Tanaka & Farah. (1993)
47
Results
On average, subjects’ identification accuracy was markedly worse for the
isolated parts (mean accuracy 55%), compared to the 73% accuracy for
identifying parts embedded in a face configuration: t(10) = 4.2, p = 0.002, even
though the face backgrounds were identical for the target and foil. Fig. 3.5a
shows identification performance for each of the three features (eyes, nose and
mouth), and for each target-foil combination of the exemplars, such as eyes1 vs.
eyes2 (indicated by the bars e1e2), with the feature either in isolation, or in the
identical face context, respectively. All nine combinations showed an advantage
of identification of face parts within context than in isolation, confirmed by a
paired t-test: t(8) = 4.7, p < 0.002, Was this perceptual effect (Fig. 3.5a) reflected
in the image-based similarity analysis (Fig. 3.5b)? Yes. The Euclidean distance
between the Gabor feature vectors of the target and foil was also larger for the
part in face context than part in isolation for every feature and exemplar, t(8) =
4.3, p < 0.003.
48
Figure 3.5. Behavioral results and stimulus similarity analysis in Exp.1. a) The
accuracy in identification of face features was higher when they were placed in
face configurations than when they were presented in isolation. Each pair of bars
indicates the specific exemplar pairing for eyes, nose and mouth, i.e. n1n3
indicate the discrimination of nose 1 and nose 3, with or without a contextual face
background, respectively in each bar. b) The image dissimilarity in Gabor
Euclidean distance metric, reflecting higher similarity between face features
when presented in isolation compared with when placed in face contextual
background.
Since the composite faces were made by placing different feature
exemplars in the same spatial configuration, a pixel-intensity based
representation would yield exactly the same distances between parts and
49
between whole faces. Why did the Euclidean of the Gabor representation yield
the advantage of the composite over the isolated parts? We suggest that the
advantage of the composite arises from the overlap of the receptive fields of face
neurons, as illustrated in Fig. 3.3. The interaction between local features such as
those arising from the eyes and nose and the contextual face background
created additional visual features, which were picked up by the kernels covering
those areas, especially those with larger receptor fields. To test this hypothesis,
we conducted an image-similarity analysis using the highest SF (32 cycle/face) +
smallest RFs components and lowest SF (8 cycle/face) + largest RFs
components of the Gabor features of each face, separately. The results as
illustrated in Fig. 3.6, showed that the mean distances between the isolated parts
were significantly lower than that between composite faces in both components
(both t(8) > 1, p < 0.01). However, the mean distance difference between the part
and whole (e.g. distance between isolated mouth 1 and mouth 2, minus the
distance between composite faces with mouth 1 and mouth 2) was 19 for high
SF and small RFs components, and 56 for low SF and large RFs components, a
sharp contrast confirmed by post-hoc t-test: t(8) = 3.6, p< 0.01. The identification
accuracy was better correlated with the distance measurement in the low SF and
large RFs band (r = 0.66, p< 0.003) than in the high SF small RFs band (r = 0.51,
p < 0.03). This result suggests that the configural effect could be mediated to a
larger extent by the neural encoding of low spatial frequency, or larger receptor
field size, or a combination of the two factors.
50
Figure 3.6. The Low Frequency+Larger RFs components provide a better
account of the configural effect than the High Frequency + Small RFs
components in the Gabor-jet representation of faces.
Experiment 2. Testing the configural effect with low-pass and high-pass
filtered images
Although a V1 type of representation necessarily links RFs and SF with
larger RFs associated with lower SF, subsequent face (and object) selective
areas need not manifest this linkage. In fact, the general observation is that later
ventral pathway areas are associated with large RFs composed of all SF. Our
analyses of the configural effect in Exp. 1 showed that it was primarily accounted
for by low SF coupled with large RFs rather than high SF coupled with small RFs.
By varying SF independent of RFs, Exp. 2 was designed to investigate whether
the configural effects are a consequence of larger RFs, independent of SF.
51
Subjects
Seven students from University of Southern California (Mean Age = 25.3
years old ± 4.8 S.D, 1 female). All subjects reported normal or corrected-to-
normal vision, and normal face recognition ability.
Stimuli
The same set of stimuli in Exp.1 was filtered by two 2D Gaussian filterrs.
The cutoff frequency was set at 8 cycles per face (cpf) for the low-pass filter, and
32 cpf for the high-pass filter. The two filters therefore had negligible overlap in
the frequency domain, as shown in the Fig. 3.7a. The original images went
through Fourier transformation into the frequency domain, multiplied by the
corresponding filter respectively, and reverse transformed back into the image
domain. Finally, the pixel intensities of each filtered image were standardized to
match the mean luminance and RMS contrast of the corresponding original
image, as shown in Fig. 3.7b. The experiment procedure was identical to that in
Exp. 1: Subjects learned 6 individual faces, and performed the 2AFC task given a
pair of composite face, or isolated face parts, high-passed and low-passed in
separate runs. The order of high-pass and low-pass runs was counterbalanced
across subjects.
52
Figure 3.7. Spatial filtering of the part and whole face stimuli. a) The low pass
and high pass 2D Gaussian filters in the frequency domain, with a cutoff
threshold of 8 cpf and 32cpf, respectively. The 1D silhouettes of the high pass
(blue) and low pass filter (green), and their product (red) are also shown in the
rightmost plot, showing a minimal overlap between the two frequency channels. b)
Exemplar face part and whole composite face before and after spatial filtering.
Results
The configural effect, defined as the performance difference between
identifying composite face and isolated face parts, was evident in both high and
53
low passed components, as shown in Fig. 3.8. For the high passed stimuli, the
mean identification accuracy for parts was 61.5% and for the composite faces
was 75.0%. For the low passed stimuli, the mean identification accuracy for
parts was 63.5% and for the composite faces was 75.0%. A repeated measures
2 way ANOVA test (Spatial Filtering: Highpass vs. Lowpass X Stimuli Type:
Composite face vs. Isolated part) revealed a significant main effect of Stimuli
type, F(1,6) = 10.4, p = 0.018, with the mean identification accuracy of 62.5% for
Parts vs. 75% for Whole faces. However, the main effect of spatial frequency
filtering was not significant, F(1,6) <1,,with the mean accuracy of 68.3% for High
Pass vs. 69.1% for Low Pass filtered images. Most important, the configural
effect was not modulated by spatial frequency as revealed by the lack of a
significant interaction between stimulus type and spatial frequency, F(1,6) < 1.
Figure 3.8. Configural effect examined in spatial frequency filtered faces.
Results of Exp. 1 shows that the configural effect could be better
accounted by the image component of low SF coupled with large RFs, compared
with high SF coupled with small RFs. To further tease apart the contribution of
54
SF and RFs to configural effect, we manipulated the SF independent of RFs. The
results showed that the advantage in identifying the composite whole face over
isolated face parts was independent of the spatial frequency spectrum. Therefore,
the configural effect is a function of receptor field size rather than spatial
frequency band.
Goffaux et al. (2005) reported that the configural effect relied more on the
low-spatial frequency component by testing subjects in a 2AFC match-to-sample
task while using spatial filtering procedures similar to those in our Exp. 2.
However, in their reports the configural effect was qualitatively defined as the
difference between “featural” and “configural” processing, where the target and
foil differed in the shape of individual features, or the distance between features,
respectively. Importantly, the face images were further smoothed after the spatial
filtering rendered the difference between target and foil extremely subtle when
the differential cue was “featural”, compared with the “configural” condition. In our
Exp. 2, the contrast of identifying learned individuals based on parts or the whole
face provided a more direct test of configural effect. With the results from Exp. 1
and 2 taken together, we propose that the configural effect was largely a function
of the overlapping encoding of multiple face features, rather than the information
carried in low spatial frequency components.
Experiment 3. Testing the configural effect in prosopagnosic MJH
In 1972 (40 years prior to the time of testing) at age five, as the result of a
fall from an 8 ft. high ledge, MJH suffered extensive bilateral lesions (greater in
55
the right hemisphere) to his ventral occipito-temporal cortices, with extensive
lesions in areas that would normally encompass the Fusiform Face Area (FFA,
Kanwisher, McDermott, & Chun, 1997) and the occipital face area (OFA,
Gauthier et al., 2000). Anatomical inspection revealed no lesions in his superior
temporal sulci.
Although there was a period of time immediately following his accident
when he reported being completely blind, he regained close-to-normal vision,
and currently exhibits normal contrast sensitivity as assessed with the Pelli-
Robson Contrast Chart (Pelli et al., 1988) although he (inconsistently, over a
number of years, in ophthalmic perimetry testing) sometimes presents some
lower visual field loss in the periphery, particularly in the right visual field. On the
Boston Naming Task (Kaplan et al., 1983) his performance is in the normal range
(actually slightly above average with 47/50 correct) in identifying objects
(Michelon & Biederman, 2003). To casual and non-rigorous examination, as well
as subjective report, he is normal, or near normal, in his detection of faces.
He can readily individuate a person on the basis of voice and shows
normal, if not superior, memory for names and biographical details of the people
he encounters. However, he shows pronounced impairment in identifying people
from visual input; he cannot recognize his own face in the mirror nor that of his
close family members. He is much worse than controls on standard tests such as
the Benton Face Recognition Test: 35/54 (Benton et al., 1983), the Cambridge
Face Memory Test 42/58 (Duchaine et al., 2006), and a match-to-sample test in
which an identical matching face is paired with a distracter face differing in
56
identity (Yue et al., 2012). Mangini et al. (2004) reported that on a test
administered in 1999 he was at chance (controls were perfect) in selecting a
celebrity (e.g., Bill Clinton) from a non-celebrity in pairs of faces, all of whom
were highly familiar to him. In 2012, he was still at chance in individuating faces
of celebrities in a similar choice test (again, controls were perfect). He is in the
normal range in discriminating expression and sex (Mangini et al., 2004) and
reported normal mental imagery of faces that he has previously encountered
(Michelon & Biederman, 2003). However, he does complain that all faces (within
broad categories of age, sex, race, etc.) look the same. An fMRI scan using
blocked presentation of faces, objects, and scrambled pictures did not reveal any
activation in the face versus non-face contrast, even with a liberal threshold
(details reported in Xu & Biederman, 2013).
Given his severe deficits in face identification, one might speculate that he
might not manifest a configural effect when identifying faces. Ideally, we would
test him with the paradigm of experiment 1. However, despite extensive training
and effort, MJH’s deficits in face identification prevented him from learning a set
of target faces and associating each with its name. We adopted a variation of the
task invented by Farah (1995), in which a single face was presented for study,
followed by a blank interval, followed by a second presentation of a pair of faces,
one target and one foil that differed only in one feature. The subject's task was to
choose the target face by key press. There were two different conditions for the
presentation of the first face: Either a) presented intact or b) decomposed into
four separate frames containing the head, eyes, nose and mouth (in their proper
57
relative spatial position within each frame). Subjects control the presentation of
the frames by pressing the space key. The faces in the second presentation were
always intact, so that the two conditions can be called 'parts-to-whole' and
'whole-to-whole'.
Eight controls (Mean age = 24.6, 2 female) were recruited from the
University of Southern California community, all with normal or corrected-to-
normal vision and reported normal face recognition ability.
Results
Controls were better at matching the target face to the intact face samples
with a mean accuracy of 87%, compared with the sample face composed of
serial presentation of individual features (head, eyes, nose, and mouth) in the
corresponding normal spatial location, with a mean accuracy of 79%. The
difference was significant in a paired t-test: t(7) = 3.4, p = 0.01, suggesting that
the “whole is more than sum of the parts” in normal face recognition and face
memory. In contrast, MJH’s accuracy was close to chance, in both the whole-to-
whole condition (56% accuracy), and the part-to-whole condition (54% accuracy).
If we quantify the configural effect as the accuracy difference between the former
and latter conditions, the 95% Confidence Interval for control subjects was [2.3%,
13.3%], while the difference for MJH was 2.1%, showing a greatly diminished
configural effect in his perception of faces, if there is even a configural effect
present at all.
58
Discussion
We replicated the part-whole configural effect reported by Tanaka and
Farah (1993): subjects identified face features better when they were placed in a
whole face context than in isolation. Critically, the contextual face background
(other than the distinguishing part) was identical for the target and foil composite
face, and thus not informative by itself for identification. Therefore, the contextual
benefit had to arise from the “holistic” processing of all face features; more
specifically, it had to be attributed to the interaction of the parts with the whole
face. To the best of our knowledge, no previous research has addressed or
proposed a computational account of this phenomenon. This advantage of the
whole over the parts could be derived from the image similarity analysis using the
Gabor feature output and Euclidean distance metric. The distances between
isolated face features were smaller than the distances between the whole face
composites where each feature was embedded in the identical configuration
context. Therefore, it is perceptually harder to discriminate isolated features
given their proximity in the Gabor feature space relative to the composite whole
face.
The current account of face configuration effects as arising from
overlapping receptive fields in larger spatial kernels challenges the
characterization of face 2D inversion costs as a “configural effect.” Given our
extensive visual experience of upright, intact faces lit from above during
development, the prototype representation of faces (e.g. the norm-based face
representation proposed by Leopold et al., 2006) could also be constructed with
59
the Gabor representation that confirms with those regularities. When the input
face images are incompatible with the prototype, there will be a cost in
recognition. For example when the luminance contrast is reversed, such as faces
depicted in photo negatives, or when the lighting direction is abnormal,
recognition suffers (Nederhouser et al., 2007). Similarly inverting a face greatly
degrades its identifiability (Yin, 1969). These operations produce mismatches in
the 2D coordinate space between stored representations and, for example, an
inverted image. But such costs should not be characterized as “configural.” In
general, the recognition of non-face objects that are based on the parts that are
defined by distinctive nonaccidental properties (NAPs), and the structural
description of the relation among the parts (Biederman, 1987). In contrast, the
individuation of a face is based on a collection of its entire surface properties,
characterized by the overlapping RFs with Gabor-kernel type of turning, which
give rise to the ineffable and configural representation of faces (Table 3.1).
Table 3.1. A summary of behavioral differences between face and object
recognition, adapted and modified from Biederman & Kalocsai (1997).
Sensitivity Face Object
Contrast Polarity Yes No
Illumination Direction Yes No
Rotation in Depth Yes No, Within Parts Aspect (~60˚)
Rotation in Plane Yes Slightly
Configural Effect Yes No
Effability No Yes
Basis of Expertise Configural
Representation
Distinctive NAPs and Structural
Description
60
The rectangular grid array of jets could be employed when the faces are
normalized with respect to position and size but typically they are not. Also, there
was no face knowledge in that version of the Gabor jet model. Fig. 3.9 depicts an
advance of the Gabor-jet model proposed by Wiskott et al. (1997) to address
these problems. In the training stage, they hand positioned 70 individual jets over
a number of face landmarks for 70 people and stored the activation vectors for
each jet at each landmark (or fiducial point) in a bank. When a new face is
encountered, the jets automatically position themselves onto their corresponding
fiducial points. Essentially, each jet searches for a good match for its fiducial
point among the 70 hand positioned jets at that point. The matching is
constrained by an “elastic bunch graph” in which an acceptable jet must be
consistent with its neighbors. Remarkably, with only 70 faces in the calibration
sample, the input face could be almost perfectly reconstructed by just using the
best fitting jet (pattern) for each fiducial points in the Gabor-jet bank. Face
recognition could be achieve by comparing and matching the Gabor-jet activation
pattern on the graph of fiducial points of the input face with that of a known face
(M ϋller et al, 2012). The fiducial point model also allows the recognition of
familiar faces undergoing partial occlusion, such as occurs with eye patches and
hairs, or moderate rotation in depth when part of the face becomes occluded. It is
possible that if the face image is distorted or partially occluded, the
corresponding Gabor representation of the corrupted part of the face can be
suppressed or discarded, essentially allowing a selective attention operation over
a face template.
61
Figure 3.9. Gabor-jet models with fiducial points. a) Each input training face
received a manual placement of Gabor-jets in the graph of fiducial points. b)
Activation pattern of each jet at each fiducial point constituted a bank, based on
70 individual faces. c) A new face could be reconstructed using the best fitting
jets in the bank for each corresponding fiducial point. Figure adopted from
Wiscott et al. (1997)
The other line of research of configural effects focused on the effect of
alignment of face parts. It was first reported by Young, Hellawell, & Hay (1987)
as the “composite face effect”: that the identification of the upper half of a
composite face, e.g., upper half of Marilyn Monroe’s face with the bottom half of
Margaret Thatcher’s face, was much harder when the two halves were aligned.
When misaligned so that the bottom half was offset from the upper half, the
62
identification of each half was greatly facilitated (Fig. 3.10a). A similar effect has
been demonstrated in a discrimination task (Hole, 1994). Subjects were slower,
and made more errors in recognizing that the two top-halves of the faces were
identical, when they were aligned with the bottom half of a different person. In
comparison, the contextual disruption was largely recovered when there was a
large gap between the upper and bottom half, or when the two halves were
vertically misaligned, as shown in Fig. 3.10b (Le Grand et al, 2006).
Figure 3.10. Composite face effects. a) the identity of the composites of a face
are harder to identify when the top and bottom halves are aligned than
misaligned. b) The identical upper-half of a face was perceived as belong to
different person when fused with the lower half of a different person. Le Grand et
al (2006).
Both of these composite phenomena could be explained by the fiducial
points model: When the upper and lower half of faces of different people were
fused together with good alignment and confirmation to the spatial configuration
of a face, the new face creates a totally new Gabor-feature representation, which
seamlessly blends the upper and lower halves. Critically, there is nothing in the
representation that signals that any of the jets are in an inappropriate position
63
relative to the other jets, rendering it difficult to suppress or discount the jets from
either half. Essentially, without a mismatch from the bunch graph, there is
nothing to cue a selective attention operation that would suppress either half.
When the two halves are misaligned, the spatial constraints in the bunch graph
from each half can signal that the other half is in an inappropriate position
allowing suppression of each half. It is important to note that the inability to apply
selective attention over the bunch graph arises from the acceptability of the face
representation itself (when the halves are aligned) in terms of its array of jets
rather than a failure of a general selective attention mechanism, e.g., a
“spotlight,” to be applied over the image..,
Another example whereby selective attention effects can or cannot be
witnessed in face individuation stems from Cooper & Wojan’s (2000) report that
raising both eyes in the head renders celebrity recognition more difficult than
when only one eye is raised, despite the more grotesque appearance of the latter.
Just as in the face composite effect, the asymmetry induced by raising only one
eye provides a basis for jets in their original position to suppress the contribution
of the out-of-position jets consistent with the jets for the eye remaining in its
original position. When both eyes are raised, all the jets are in accedptable
positions 9as they are with the aligned faces), so there is no basis to cue a jet
suppression mechanism..
The Gabor representation is also more parsimonious and computationally
economic than the alternatives. For example, researcher has proposed different
coding schemes such as shape description of individual features, or coding of the
64
second-order relations, i.e. the spatial distance among features (Leder et al.,
2001, Barton et al., 2001). However, these two dimensions are not strictly
independent from each other. Change of facial features would inevitably affect
the spacing between features, and vice versa. Nevertheless, the Gabor
representation would capture changes in the face, such as individual feature
shapes, feature distances, and many more ineffable aspects of face, perhaps
with a certain degree of redundancy.
Finally, the testing result of MJH indicated a diminished configural effect in
his face perception, which is consistent with previous case studies of acquired
prosopagnosic with unilateral or bilateral ventral temporal lobe (Farah, 1996;
Busigny et al., 2010; Ramon, Busigny, & Rossion, 2010) lesion that
encompasses FFA and/or OFA. Recent fMRI studies also revealed a great
degree of functional connectivity, reflected in the correlation of BOLD activation
time course between FFA and early visual area. (Yue et al., 2011); Furthermore,
the strength of the correlation between FFA and OFA was positively correlated
with the face recognition performance, including matching the part or whole face
(Zhu et al., 2012). Therefore, it is possible that these face-selective area are
critical in receiving the Gabor representation of face input, presumably from early
visual area, and integrating them into an holistic representation.
65
Conclusion
The configural processing of faces, but not objects, is well replicated and
documented in previous research. For the first time, we provided a computational
account based on a biologically plausible model the multiscale, multiorientation
tuning characteristic of neurons in early visual areas. Specifically, the relative
ease of discrimination of face features when they were in the configural context
rather than in isolation could be explained by the larger distance between the
whole composite faces, compared with that between isolated face parts, in Gabor
feature space. This integrative processing is diminished in a prosopagnosic MJH,
whose lesion in the ventral temporal cortex might impaired the Gabor-type
representation of faces, thus lead to his severe deficits in face recognition. More
importantly, it is the overlapping receptor fields of low-spatial frequency
components contributed to the integrative and holistic coding of faces.
66
Chapter 4. Neural representation of face attributes
Corresponding publication:
Xu, X., Yue, X., Lescroart, M.D., Biederman, I., & Kim, J.G. (2009). Adaptation in
the fusiform face area (FFA): image or person. Vision Research, 49, 2800-2807.
doi:/10.1016/j.visres.2009.08.021
Xu, X., & Biederman, I. (2010). Loci of the release from fMRI adaptation for
changes in facial expression, identity and viewpoint. Journal of Vision, 10(14):36,
1-13. doi:10.1167/10.14.36.
Abstract
Face recognition involves collaboration of a distributed network of neural
correlates. However, how different attributes of faces are represented has
remained unclear. We used functional magnetic resonance imaging-adaptation
(fMRIa) to investigate the representation of viewpoint, expression, and identity of
faces in the fusiform face area (FFA) and the occipital face area (OFA). In an
event-related experiment, subjects viewed sequences of two faces and judged
whether they depicted the same person. The images could vary in viewpoint,
expression and/or identity. Critically, the physical similarity between view-
changed and between expression-changed faces of the same person were
matched by the Gabor-jet metric, a measure that predicts almost perfectly the
effects of image similarity on face discrimination performance. In FFA, changes
of identity produced the largest release from adaptation followed by changes of
expression; but the release caused by changes of viewpoint was smaller and not
reliable. OFA was sensitive only to changes in identity, even when image
67
changes produced by identity variations were matched to those of expression
and orientation. These results suggest that FFA is involved in the perception of
both identity and expression of faces, a result contrary to the hypothesis of
independent processing of changeable and invariant attributes of faces in the
face-processing network.
Introduction
A difficulty in interpreting prior research of neural representation of face
attributes (especially for those with an adaptation design) is that the physical
similarities between stimuli for the different classes of image change are often
unspecified. How does one render the physical change in, say, a change in
expression to be equal to a change in a person? Without such scaling, one
doesn’t know if the release of an adapted BOLD response is an effect of high-
level properties, such as identity or expressed emotion, or merely the physical
changes, per se, that would be produced by any change in the image. Fox et al.
(2008) used the contrast thresholds of an ideal observer to scale stimulus
similarity for different attributes for face (i.e. different identity vs different
expression). Although an ideal observer scaling does offer some control over
low-level stimulus factors, it assumes a pixel-based representation that might be
more characteristic of retinal and lateral geniculate bases of similarity than early
cortical stages. The Gabor-jet model (Lades et al., 1993; Biederman & Kalocsai,
1997) goes beyond a pixel representation in that it captures the essential
characteristics of the multiscale, multiorientation of V1 filtering. This scaling
68
model is justified by its extremely high prediction of the psychophysical similarity
of face discrimination (Yue et al, 2010a). We used the Gabor-jet model to equate
the extent of the image changes produced by changes in expression,
individuation, and rotation in depth. The stimulus scaling allowed us to
investigate the loci sensitive to different aspects of faces, namely, of viewpoint
and identity, without possible confounds of low-level image similarity in an event-
related fMRI adaptation paradigm.
Methods and Results
Experiment 1: Sensitivity of face-selective areas to Identity and Viewpoint
of faces
To test the sensitivity of each face-selective area in human brain to facial
identity and viewpoint, we generate artificial faces with variation of both local
features (the shape of the eyes, nose, mouth) and their spatial configuration.
Each face pair could differ in either identity (same,different), viewpoint
(same,different) or both. Critically, the within-pair similarities were scaled using
the Gabor-jet model (Lades, et al., 1993).
In this model, each pair of stimuli was filtered by a 10 x 10 grid of jets.
Each jet was composed of 40 Gabor kernels (each a convolution of a sinusoid
and a 2D Gaussian envelope) of 8 equally spaced orientations (i.e., 22.5 °
differences in angle) and 5 spatial scales, each centered on their jet’s grid point.
The coefficients of the kernels (the magnitude corresponding to an activation
value for a complex cell) within each jet were then concatenated to a 4000-
69
element (100 jets X 40 kernels) vector G: [g1, g2….g4000]. For any pair of
pictures with corresponding jet coefficient vectors G and F, the similarity of the
pairs was defined as:
( ) F G Sim , =
∑ ∑
∑
= =
=
4000
1
2
4000
1
2
4000
1
*
i
i
i
i
i
i i
f g
f g
, 4000 ... 2 , 1 = i
Figure 4.1. The scaling of stimuli in the between-face similarity experiment. a)
The kernels included in each jet, b) an illustration of how similarity was computed
by correlating the corresponding kernels (scale and orientation) in corresponding
jets (at each node in the grid), and c) the mean similarity between pairs of faces
in each of the four conditions of Identity (same-different) and View (same-
different).
70
Subjects were instructed to judge whether the images on a given trial were
the same or different in size. By having the task independent of the variables of
interest (pose and person), we reduced the likelihood that attentional strategies
tuned to individuation or pose would modulate the BOLD response.
We analyzed the BOLD signal in series of independently defined face- and
object-selective Region of Interests (ROI), namely the right Fusiform Face Area
(rFFA), the right Superior Temporal Sulcus (rSTS) and (bilaterally) the Lateral
Occipital Area (LOC). (We only report results from RIGHT STS and FFA because
they are consistently localized in all subjects. Bilateral LOC activation shows up
in everyone.) The localizer for these areas was static images of faces presented
in blocks of 12 sec with faces presented at a rate of 2 Hz. A deconvolution
analysis of the data revealed the time course of ROI response to different
conditions: faces that are Identical, differed in either Identity, or Viewpoint, or
both.
Results
A three-way ANOVA of percent BOLD signal change in response to
Identity (same-different) X View (same-different) X Size (same-different),
performed separately for each of the three ROIs (rFFA, LOC, and rSTS),
revealed neither a main effect of Size, nor any interaction of Size with Identity
and/or Viewpoint, all Fs <1. We therefore collapsed the size-variation across the
Identity/Viewpoint conditions and ran a repeated measures 2 (Identity) X 2 (View)
ANOVA for each ROI.
71
In the right FFA of every subject, the change either in identity or viewpoint
alone produced a greater BOLD response than the response to identical faces
yielding significant main effects of both Identity F(1,16) = 7.6, p = 0.01 and
Viewpoint F(1,16) = 4.2, p = 0.05, but no reliable interaction between the two
factors F(1,16) = 1.4, p > 0.25. A post hoc paired t-test showed that the BOLD
response in the Identical-Faces condition was significantly smaller (p < .05) than
each of the other three conditions. However, the release of adaptation for the
three non-identical conditions did not differ from each other (all paired t-tests t<1).
In particular, a change of person did not produce a greater release from
adaptation than a change in viewpoint.
In the right superior temporal sulcus (rSTS) of 17 subjects (in whom we
localized this ROI), the change in expression or identity alone elicited equivalent
magnitude of activation as the identity-image condition, but the double change
condition had a larger BOLD release than the other three conditions. For these
subjects there was no significant main effect of either Identity or Viewpoint Fs(1,9)
< 1, and the interaction fell short of significance, F(1,9) = 1.6, p > 0.2.
In the LOC area which is not face-selective and serves as a control, the
similar ANOVA revealed no significant effects of either Identity or Viewpoint:
F(1,16) < 1, F(1,16) = 1.9, p > 0.2, respectively, or their interaction, F(1,16) < 1 in
both hemispheres. The Location of ROIs and the mean BOLD time course of a
trial of different conditions were shown in Fig. 4.2.
72
Implication
What might be the role of FFA in the face-processing system, given our
result that sensitivity to identity changes in this area is not any greater than that
for pose? It is possible that FFA is primarily serving as a face vs. non-face gate,
passing on the image information relevant to individuation to a later area where
individuation is made explicit. This information might be the spatial frequency and
orientation content as suggested by Yue, Tjan, & Biederman (2006) or it could be
the fragments suggested by Nestor, Vettel and Tarr (2008). Given that so much
of the image variation required for individuation of faces is subtle, it may be best
to restrict the inputs to these later face areas so that the only inputs that affect
connection weights of these face individuation networks are faces. The area
where individuation would actually be accomplished might be expected to be
closer to associative cortex (Kriegeskorte et al, 2007, Nestor et al., 2011), where
units coding perceptual individuation could be linked to associative knowledge
about the person, such as his or her profession, nationality, and name.
73
Figure 4.2. Face-selective region of interests and BOLD signal for different
conditions.
Experiment 2: Sensitivity of face-selective areas to expression and
viewpoint of faces
Given the multiplicity of attributes elicited by a face, do they engage
common cortical loci? Does, for example, variations in facial expression,
produce the same pattern of activation as, for example, variations in pose or
74
identity? We addressed this question using fMRI adaptation, in a design similar
to that employed in the previous experiment. Again, we generated all stimulus
faces with the FaceGen software, with face pairs of the same person that were
identical, or varied in expression ( ΔE), viewpoint ( ΔV), or both ( ΔE+ ΔV). In
addition, a triple-change condition was designed to include a change in Identity
as well ( ΔE+ ΔV+ ΔI) that could be compared to the ΔE+ ΔV condition. The stimuli
in each condition were varied such that the mean similarity, as calculated by the
Gabor-jet model, for expression changes was comparable to that for viewpoint
changes. We also split the triple change condition into two halves, such that the
more-similar half of ΔE+ ΔV+ ΔI, where there was an identity change, was
comparable to the double change condition, ΔE+ ΔV, as shown in Fig. 4.3.
Figure 4.3. Image similarity scaling of all conditions in Exp. 2. The mean similarity
for expression changes ΔE was comparable to that for viewpoint changes ΔV.
The triple change condition was split into two halves, such that the more similar
half of ΔE+ ΔV+ ΔI, where there was an identity change, was comparable to the
double change condition ΔE+ ΔV.
75
The traditional face-selective localizer using static images of faces and
objects did not consistently localize the STS and OFA areas across subjects. We
instead employed video clips of faces and objects, which had been shown to
produce more reliable localization of these areas (Fox, Laria, & Barton, 2008;
Schultz & Pilz, 2009; Trautmann, Fehr, & Herrmann, 2009). The face videos
depicted a change in natural human facial expressions that went from neutral to
happy or sad, while the corresponding dynamic object videos showed changes of
state, such as a spraying fountain or a spinning wheel (for details see Fox et al.,
2009a). The Regions of Interests localized using the dynamic localizer are
summarized in Table 4.1. In the adaptation phase of the experiment, the subjects
judged whether the sequential presentation of two faces in each trial were of the
same or different person by key press.
ROI X Y Z
t-value of
Peak Voxel
± SEM
Mean No. of
Voxels ±
SEM
No. of
subjects
(n = 9
max)
rFFA 35 ± 1 -51± 3 -16 ± 1 11.4 ± 1.11 766 ± 122 8
lFFA -40 ± 2 -52 ± 3 -16 ± 1 10.4 ± 1.13 499 ± 110 8
rOFA 30 ± 2 -83 ± 2 -7 ± 2 8.5 ± 1.15 1190 ± 400 7
lOFA -36 ± 3 -86 ± 2 -7 ± 4 6.5 ± 0.80 250 ± 84 7
rSTS 48 ± 2 -42 ± 3 6 ± 2 11.2 ± 1.13 2340 ± 240 9
lSTS -55 ± 3 -47 ± 4 9 ± 1 8.1 ± 1.15 1135 ± 223 9
Table 4.1. The average Talaraich coordinates (Mean ± Standard Error of Mean :
SEM) for each of the designated areas, the t-value of voxel in each ROI, the
mean number of voxels of each ROI defined by the dynamic localizer, and the
76
number of subjects in whom the ROIs could be localized. Each voxel extended 1
mm
3
.
Results
A three-way ANOVA of percent BOLD signal change in response to
Viewpoint (same-different) × Expression (same-different) × Hemisphere (left-right)
performed separately for each of the three ROIs (FFA, OFA, and STS), did not
reveal a reliable main effect of Hemisphere. There was a trend for higher
activation of the right hemisphere in FFA (consistent with previous findings), but
this effect fell short of significance, F(1,7) = 3.25, p = 0.12. None of the
hemispheric differences interacted with viewpoint or expression in any of the
ROIs, all Fs < 1. We therefore collapsed the data from both hemispheres across
the Viewpoint/Expression/Identity condition and ran a repeated 2(Viewpoint) X
2(Expression) ANOVA for each ROI.
In FFA, a change of expression produced a significant release from
adaptation to identical faces; F(1,7) = 16.35, p < 0.01, while there was no release
from adaptation from a change in viewpoint. In OFA neither the change of
expression nor the change of viewpoint produced a release from adaptation
compared to identical faces (both Fs < 1). In both OFA and FFA, a change of
identity in the ΔE+ ΔV+ ΔI condition resulted in an additional release of adaptation
compared to the ΔE+ ΔV condition in which expression and viewpoint changed
without a change in identity. In STS, the deconvolution analysis produced a
generally flat curve (Fig. 4.4), with no differentiation among the conditions.
77
Figure 4.4. BOLD signal in bilateral FFA, OFA, and STS as function of adaptation
condition.
78
Discussion
Bruce & Young (1986) and Haxby et al. (2000) proposed that facial
identity and expression are functionally independent and neuronally represented
in distinct brain areas. Our results challenge the idea of (complete) anatomical
separation of the neural representation of facial expression and identity (Calder
et al., 2005). Rather the fMRI adaptation pattern in the present investigation
suggests that FFA is sensitive to both facial identity and expression (Ganel et al.,
2005), while OFA is sensitive only to an identity change. The lack of sensitivity of
OFA to changes in viewpoint or expression is also not in agreement with
expectations from Haxby et al.’s (2000) claims that OFA would be sensitive to
any shape change of a face. Taken together, our findings suggest a more
complex picture in which the processing of changeable and invariant aspects of
faces might not be not clearly separated. Specifically, representation of face
identity invariant to change in viewpoint and expression is not made explicit in
FFA, as reflected by the mean activation pattern in FFA.
Encoding expression
A fast event-related fMRI adaptation experiment revealed that a change in
facial expression of the same person produced a release of adaptation of the
BOLD signal in FFA when compared to that for identical faces beyond what
would be expected from image dissimilarity alone, as gauged by the much
smaller response to orientation changes. This result was consistent with previous
reports of expression sensitivity in FFA (Fox et al., 2009b; Kadosh et al., 2010),
but was not consistent with Winston et al.’s (2004) finding of a lack of a release
79
from adaptation in FFA when expression was changed between faces. Instead of
modeling the similarity between faces as covariates of no interest, as did Winston
et al., we matched the similarity of view-changed and expression-changed faces
through the Gabor-jet model (Lades et al, 1993). The Gabor-jet model provides
a principled framework with which to scale image similarity. The justification for
this scaling derives from a match-to-sample task in which the model predicts
error rates and reaction times from the similarity between the target face and the
distracter with exceedingly high accuracy (both rs > .95) (Yue et al., 2012). With
this control for physical similarity, the sensitivity to facial expression was shown
to be much greater than that for a face’s orientation, as reflected in both
behavioral performances and fMRI adaptation in FFA. In a task requiring
detection of a change in identity, as in the present experiment, Kodosh et al.
(2010) and Ganel et al. (2005) both reported that a change of expression
resulted in higher activation in FFA in a block-design fMRI adaptation paradigm,
although image similarity was not controlled in these studies. The present results
clearly showed that FFA is sensitive to both expression and identity, as revealed
by the significant releases from adaptation from changes in these variables
compared to orientation changes, when these three variables had been matched
in similarities.
OFA was speculated to be an early stage in the face-processing hierarchy,
feeding both FFA and STS, which were postulated to represent variable and
invariable aspects of faces, respectively (Haxby et al., 2000). Consistent with this
theory was the result of fMRI adaptation experiments in which the adaptation in
80
OFA was released when either expression (Fox et al., 2009b) or identity
(Rotshtein et al., 2005, Fox et al., 2009b) was changed between two faces. Our
experiment, however, did not detect a release of adaptation in OFA as a result of
an expression change. The absence of an expression effect could possibly be
attributed to the individuation task performed by our subjects, which required
attention to face identity rather than expression. A role of attentional modulation
is supported by the finding that greater activation in OFA was found in a task
where subjects made expression judgments, rather than a gender-identification
task (Gorno-Tempini et al., 2001). More research is needed to clarify the
sensitivity to expression in OFA, perhaps by using different tasks with scaling of
stimulus similarity.
Encoding viewpoint
Whereas the sensitivity to expression was evident in FFA, a change of
viewpoint of the same physical magnitude did not produce a reliable release from
adaptation. Previous studies using the fMRI adaptation paradigm have
demonstrated that FFA was sensitive to the rotation in depth of unfamiliar faces
as small as 20 ° (Andrews & Ewbank, 2004; Fang et al., 2007; Ewbank &
Andrews, 2008; Xu et al., 2009; Yue et al., 2010b). It is possible that the lack of
release of adaptation from viewpoint change in the present study was due to the
relatively small rotation angle of 13 °, necessitated by the need to equate the
image similarity values for orientation and expression. As shown in Figure 1, the
mean Gabor-jet similarity between same-identity-different-viewpoint faces is
81
about 0.9, while in our previous study (Xu et al., 2009) it is about 0.8 (as a result
of a 20 ° rotation in depth). Consequently, it is possible that the magnitude of
release from adaptation is nonlinearly related to the degree of rotation (Fang,
Murray, & He, 2007b), with 13º being below the threshold for an effect. A
nonlinearity was evident at the upper end of rotation in the results of Fang et al.
(2007b), in which the increase to 90 ° did not produce a greater BOLD signal in
FFA than 60 °. A second possibility is that there were task differences. In the prior
study (Xu et al., 2009) subjects detected a change of the size of a face; here they
judged identity. It is possible that the identification task biased the attention away
from viewpoint.
We can only speculate as to why we found only a flat hemodynamic
function across all conditions in bilateral posterior STS. It is possible that the
voxels selected by our localizer, which contrasted the fMRI response to dynamic
images of faces with those of dynamic images of objects, is less sensitive to the
discrete static facial stimuli employed in the event-related adaptation runs
(Trautmann et al., 2009; Fox et al., 2009a). Indeed the face patch discovered in
monkeys’ STS (O’Toole, Roark, & Abdi, 2002) is also sensitive to biological
motion, such as that produced by body, hand and mouth movements. Our
stimulus presentation procedure, where the second presentation of a face was
always translated by 0.5º in all conditions, might have obscured an implied
motion effects generated by the rotation of faces. Winston et al. (2004) and Fang
et al. (2007b) did manage to obtain adaptation effects in STS with photographs
so their faces were more realistic than those in the present experiment. However,
82
Fox et al, (2009b) also used photographs and reported, nonetheless, STS
adaptation of relatively small amplitude.
Encoding identity
The present study found that a change in identity produced an enhanced
BOLD response over that produced by changes in expression and viewpoint in
both FFA and OFA. This response could not be attributable to physical changes
in the image, since the effect was present even when Gabor similarity of the ΔE+
ΔV+ ΔI
high-sim
condition was equivalent to the ΔE+ ΔV condition and was,
therefore, an effect attributable to a change in category, rather than to a physical
image change. In line with this result are prior fMRI adaptation studies that
showed a release of BOLD adaptation in both OFA and FFA when subjects
perceived a change in the identity between two faces (Gilaie-Dotan & Malach,
2007), for example, different degrees of morph between Marilyn Monroe and
Margaret Thatcher (Rotshtein et al., 2005).
Patients with lesions in FFA and/or OFA often exhibit deficits in face
individuation. That a lesion to OFA can be sufficient to produce prosopagnosia is
documented with P.S. (Rossion et al., 2003; Schiliz et al., 2006), a patient with
ablated OFA and intact FFA, who has normal performance in detecting faces but
cannot individuate faces. Unlike control subjects (or subjects in the present
study), an identity change did not produce a release from adaptation relative to
identical faces in FFA for this individual. Transcranial magnetic stimulation (TMS),
which temporarily interrupts local neural activity, when applied to OFA, interferes
83
with discrimination of individuals (Pitcher et al., 2007, 2009). Together, these
results suggest that an intact network that includes both OFA and FFA is
necessary for normal face processing for individuation (Rossion et al., 2003).
Conclusion
Bruce & Young (1986) and Haxby et al. (2000) proposed that facial
identity and expression are functionally independent and neuronally represented
in distinct brain areas. Our results challenge the idea of (complete) anatomical
separation of the neural representation of facial expression and identity (Calder
et al., 2005). Rather the fMRI adaptation pattern in the present investigation
suggests that FFA is sensitive to both facial identity and expression and OFA is
sensitive only to an identity change. The lack of sensitivity of OFA to changes in
viewpoint or expression, is also not in agreement with expectations from Haxby
et al.’s (2000) claims that OFA would be sensitive to any shape change of a face.
Taken together, our findings suggest a more complex picture in which the
processing of changeable and invariant aspects of faces might not be not clearly
separated.
84
Chapter 5. Summary and speculation
Motivated by human’s extraordinary efficiency in face recognition, this
dissertation explored in detail the neural correlates involved in some critical
aspects of face processing and representation. Previous research has
established the regions in the ventral temporal and occipital cortex that are
selectively activated in observing and processing face images. Lesion in these
areas (as observed in MJH) typically leads to differential degree of deficits in face
recognition, especially in identifying faces. However, it remains unclear whether
these regions also contributed to face detection, and how they represent different
face attributes, such as the view, expression, and identity.
Chapter 2 presented two psychophysical experiments in which the
detection threshold of faces and cars were quantified as the phase-spectrum
signal-to-noise ratio in the frequency domain of images. In normal subjects, the
threshold of faces is significantly lower than that of cars, corresponding with the
high sensitivity to face presence in human visual experience. Compared to
controls, MJH showed lower tolerance to noise in the phase spectrum for faces
(vs. cars), reflected in his much elevated detection threshold for faces. In a
saccadic choice task, the involuntary and fast initial saccade to a face (vs.
vehicle) image was present in normal subjects (Crouzet et al., 2010) but absent
in MJH. MJH’s lesions in bilateral occipito-temporal cortices thus appear to have
produced a deficit not only in face individuation, but also in face detection.
The representation of faces is said to be configural, that is, facial features
are better recognized/identified when placed in appropriate whole face context
85
(Tanaka & Farah, 1993). But what dose “configural” mean in neurocomputational
terms? Chapter 3 presented a biologically plausible account of the phenomenon,
with the Gabor-jet Model. The better behavioral discrimination of two facial part
exemplars, e.g. two noses, within whole face context relative to presented alone
could be explained by the larger distance between the two entities in former than
the later visual display, in a multi-dimensional Gabor space. In another
experiment, the lack of the configural effect in MJH suggested that the FFA and
OFA are critically involved in representing face in the Gabor-jet format.
Finally, we probed the neural representation of facial identity, expression
and viewpoint in the face-selective cortical ROIs, using fMRI adaptation paradigm.
Critically, we removed the low-level confounding in image similarities by scaling
the image change in across different attributes with Gabor-jet metric.
Contradictory to the theory proposing functional and neurological separation
coding facial identity and expression (Bruce & Young, 1986; Haxby et al., 2000),
we found that FFA is sensitive to change in both attributes. We also found that
OFA is uniquely sensitive to the change in identity but not the other attributes.
This is consistent with the deficits in MJH’s face identification ability, given his
lesion covering these areas.
To summarize, FFA and OFA in the ventral temporal and occipital lobe are
critically involved in detecting and encoding of faces, and further contributing to
the representations of various attributes. However, the complete invariant
representation of face identity, theorized as the Person Identity Nodes (PINs,
Bruce & Young, 1986) has to be achieved in higher stage of visual hierarchy
86
beyond FFA, an area showing sensitivity to face variations in not only identity,
but also other attributes. In contrast, recent studies have suggest the anterior
inferior temporal cortex (AIT) as a potential candidate representing the
abstraction of identity. Using multivariate pattern analysis, the above-chance
classification of face identity seems to be invariant to the change in facial
expression (Nestor et al., 2011) and rotation in depth (Anzellotti, Fairhall, &
Caramazza., 2013). Furthermore, reduced anatomical connectivity between the
FFA and the ATL measured by diffusion tensor imaging (DTI) was found in
congenital prosopagnosics (Thomas et al., 2009). Therefore, we speculate that
FFA is situated in the middle staging of face processing, receiving input from the
early visual area, maintaining the preliminary visual information in faces
characterized by Gabor-jet, and eventually feeding the processed information to
AIT for the abstraction and multi-model representation of people’s identity.
87
References
Adolphs, R., Tranel, D., Damasio, H. & Damasio, A. (1994). Impaired
recognition of emotion in facial expressions following bilateral damage to
the human amygdala. Nature, 372, 669–672.
Andresen, D.R., Vinberg, J. & Grill-Spector, K. (2009). The representation of
object viewpoint in human visual cortex. NeuroImage, 455, 522-536
Andrews, T. J., & Ewbank, M. P. (2004). Distinct representations for facial
identity and changeable aspects of faces in the human temporal lobe.
NeuroImage, 23, 905-913.
Anzellotti, S., Fairhall, S. L., & Caramazza, A. (Epub ahead of print March 5,
2013). Decoding representations of face identity that are tolerant to rotation.
Cereb Cortex. doi: 10.1093/cercor/bht046.
.
Barton, J. S., Keenan, J. P., & Bass, T. (2001). Discrimination of spatial
relations and features in faces: effects of inversion and viewing duration.
British Journal of Psychology, 92, 527-49.
Baudouin, J., Martin, F., Tiberghien, G., Verlut, I., & Franck, N. (2002).
Selective attention to facial emotion and identity in schizophrenia.
Neuropsychologia, 40, 503-511.
Behrmann, M., Avidan, G., Marotta, J. J., & Kimchi, R. (2005). Detailed
exploration of face-related processing in congenital prosopagnosia: 1.
Behavioral findings. Journal of Cognitive Neuroscience, 17, 1130-1149
Benton, A. L., Sivan, A. B., Hamsher, K. De S., Varney, N. R., & Spreen, O.
1983. Contribution to Neuropsychological Assessment. New York: Oxford
University Press.
Biederman, I. (1987). Recognition-by-components: a theory of human image
understanding. Psychological Review, 94, 115-147.
Biederman, I., & Kalocsai, P. (1997). Neurocomputational bases of object and
face recognition. Philosophical Transactions of the Royal Society London:
Biological Sciences, 352, 1203-1219.
Bindemann, M., Burton, A. M., Hooge, I. T., Jenkins, R., & de Haan, E. H.
(2005). Faces retain attention. Psychonomic Bulletin & Review, 12, 1048–
1053.
88
Busigny, T., Joubert, S., Felician, O., Ceccaldi, M., & Rossion, B. (2010).
Holistic perception of the individual face is specific and necessary: Evidence
from an extensive case study of acquired prosopagnosia. Neuropsychologia,
48, 4057–4092.
Bruce, V., & Young, A. (1986). Understanding face recognition. British Journal
of Psychology, 77, 305-327.
Calder, A.J., & Young, A. W. (2005). Understanding the recognition of facial
identity and facial expression. Nature Reviews Neuroscience, 6, 641-651.
Campbell, J., & Burke, D. (2009). Evidence that identity-dependent and identity-
independent neural populations are recruited in the perception of five basic
emotional facial expressions. Vision Research, 49, 1532-1540.
Carey, S., & Diamond, R. (1994). Are faces perceived as configurations more
by adults than by children? Visual Cognition, 1, 253–274.
Crawford, J. R., & Howell, D. C. (1998). Comparing an individual’s test score
against norms derived from small samples. Clinical Neuropsychology, 12,
482–486.
Crouzet, S. M., & Thorpe, S. J. (2011). Low-level cues and ultra-fast face
detection. Frontier Psychology, 2, 342.
Crouzet, S. M., Kirchner, H., & Thorpe, S. J. (2010). Fast saccades toward
faces: face detection in just 100 ms. Journal of Vision, 10, 16.1-17.
Dakin, S. C., & Watt, R. J. (2009). Biological “bar codes” in human faces.
Journal of Vision, 9, 1-10.
Dakin, S. C., Hess, R. F., Ledgeway, T., & Achtman, R. L. (2002). What causes
non-monotonic tuning of fMRI response to noisy images? Current Biology,
12, 476-477.
Damasio, A.R., Tranel, D., & Damasio, H. (1990). Face agnosia and the neural
substrates of memory. Annual Review of Neuroscience, 13, 89-109.
Davies-Thompson J., Gouws, A., & Andrews, T.J. (2009). An image-dependent
representation of familiar and unfamiliar faces in the human ventral stream.
Neuropsychologia, 47, 1627-1635.
De Gelder, B., & Rouw, R. (2000). Configural face processes in acquired and
developmental prosopagnosia: Evidence for two separate face systems?
Neuroreport, 11, 3145–3150.
89
De Valois, R. L., & De Valois, K.K. (1988). Spatial Vision, Oxford University
Press, Oxford.
Dricot, L., Sorger, B., Schiltz, C., Goebel, R., & Rossion, B. (2008). The roles of
"face" and "non-face" areas during individual face perception: evidence by
fMRI adaptation in a brain-damaged prosopagnosic patient. NeuroImage,
40, 318-332.
Duchaine, B. C., Nieminen-von, W. T., New, J., & Kulomaki, T. (2003).
Dissociations of visual recognition in a developmental agnosic: Evidence for
separate developmental processes. Neurocase, 9, 380–389.
Duchaine, B. C., Yovel, G., Butterworth, B., & Nakayama, K. (2006).
Prosopagnosia as an impairment to face-specific mechanisms: Elimination
of the alternative hypotheses in a developmental case. Cognitive
Neuropsychology, 23, 714–747.
Duchaine, B., & Nakayama, K. (2006). The Cambridge Face Memory Test:
results for neurologically intact individuals and an investigation of its validity
using inverted face stimuli and prosopagnosic participants.
Neuropsychologia, 44, 576-585.
Eger, E., Schyns, P.G., & Kleinschmidt, A. (2004). Scale invariant adaptation in
fusiform face-responsive regions. NeuroImage, 22, 232-242.
Eifuku, S., De Souza, W. C., Tamura, R., Nishijo, H., & Ono, T. (2004).
Neuronal correlates of face identification in the monkey anterior temporal
cortical areas. Journal of Neurophysiology, 91, 358-371.
Ellamil, M., Susskind, J. M., & Anderson, A. M. (2008). Examinations of identity
invariance in facial expression adaptation. Cognitive, Affective, & Behavioral
Neuroscience, 8, 273-281.
Ewbank, M. P., & Andrews, T. J. (2008). Differential sensitivity for viewpoint
between familiar and unfamiliar faces in human visual cortex. NeuroImage,
40, 1857-1870.
Fang, F., & He, S. (2005). Viewer-centered object representation in the human
visual system revealed by viewpoint aftereffects. Neuron, 45, 793–800.
Fang, F., Ijichi, K., & He, S. (2007). Transfer of the face viewpoint aftereffect
from adaptation to different and inverted faces. Journal of Vision, 7, 1–9.
Fang, F., Murray, S.O., & He, S. (2007). Duration-dependent FMRI adaptation
and distributed viewer-centered face representation in human visual cortex.
Cerebral Cortex, 17, 1402-1411.
90
Fiser, J., Biederman, I., & Cooper, E. E. (1996). To what extent can matching
algorithms based on direct outputs of spatial filters account for human
shape recognition? Spatial Vision, 10, 237-271.
Fiser, J., Biederman, I., & Cooper, E. E. (1996). To what extent can matching
algorithms based on direct outputs of spatial filters account for human
shape recognition? Spatial Vision, 10, 237-271.
Fox, C. J., & Barton, J. J. (2007). What is adapted in face adaptation? The
neural representations of expression in the human visual system. Brain
Research, 1127, 80-89.
Fox, C. J., & Barton, J. J. (2008). It doesn’t matter how you feel. The facial
identity aftereffect is invariant to changes in facial expression. Journal of
Vision, 8, 1-13.
Fox, C. J., Iaria, G., & Barton, J.S. (2009). Defining the face-processing network:
optimization of functional localizer in fMRI. Human Brain Mapping, 30,
1637-1651.
Fox, C. J., Moon, S. Y., Iaria, G., & Barton, J. J. (2009). The correlates of
subjective perception of identity and expression in the face network: an
fMRI adaptation study. NeuroImage, 44, 569-580.
Freiwald, W. A., Tsao, D. Y., & Livingstone, M. S. (2009). A face feature space
in the macaque temporal lobe. Nature Neuroscience, 12, 1187-96.
Ganel, T., Valyear, K. F., Goshen-Gottstein, Y., & Goodale, M. A. (2005). The
involvement of the "fusiform face area" in processing facial expression.
Neuropsychologia, 43, 1645-1654.
Garrido, L., Duchaine, B. C., & Nakayama, K. (2006). Face detection in normal
and prosopagnosic individuals. Journal of Neuropsychology, 2, 119-140.
Gilaie-Dotan, S., & Malach, R. (2007). Sub-exemplar shape tuning in human
face-related areas. Cerebral Cortex, 17, 325-338.
Gobbini, M. I., & Haxby, J. V. (2007). Neural systems for recognition of familiar
faces. Neuropsychologia, 45, 32-41.
Goffaux, V., Hault, B., Michel, C., Vuong, Q. C., & Rossion, B. (2005). The
respective role of low and high spatial frequencies in supporting configural
and featural processing of faces. Perception 34, 77 - 86.
91
Gorno-Tempini, M.L., Pradelli, S., Serafini, M., Pagnoni, G., Baraldi, P., Porro,
C., Nicoletti, R., Umita, C., Nichelli, P. (2001). Explicit and incidental facial
expression processing: an fMRI study. NeuroImage, 14, 465–473.
Greenwood, J.A., Bex, P.J., & Dakin, S. C. (2009). Positional averaging
explains crowding with letter-like stimuli. Proceeding of National Academe
of Science USA, 106, 13130–13135.
Grill-Spector, K., Henson, R., & Martin., A. (2006). Repetitioin and the brain:
neural models of stimulus-specific effects. Trends in Cognitive Sciences, 10,
14-23
Grill-Spector, K., Knouf, N., & Kanwisher, N. (2004). The fusiform face area
subserves face perception, not generic within-category identification. Nature
Neuroscience, 7, 555-562.
Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., & Malach, R.
(1999). Differential processing of objects under various viewing conditions
in the human lateral occipital complex. Neuron, 24, 187-203.
Grill-Spector, K., Kushnir, T., Hendler, T., Edelman, S., Itzchak, Y., & Malach, R.
(1998). A sequence of object-processing stages revealed by fMRI in the
human occipital lobe. Human Brain Mapping, 6, 316-328.
Grill-Spector, K., Sayres, R., & Ress, D. (2006). High-resolution imaging
reveals highly selective nonface clusters in the fusiform face area. Nature
Neuroscience, 9, 1177-1185.
Hancock P.J.B., V. Bruce and A.M. Burton, (2000). Recognition of unfamiliar
faces, Trends in Cognitive Sciences, 4, 330-337.
Hasselmo, M. et al. (1989). The role of expression and identity in the face-
selective responses of neurons in the temporal visual cortex of the monkey.
Behavioral Brain Research, 32, 203–218.
Hasson, U., Hendler, T., Ben Bashat, D. & Malach, R. (2001). Vase or face? A
neural correlate of shape-selective grouping processes in the human brain.
Journal of Cognitive Neuroscience, 13, 744–753.
Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human
neural system for face perception. Trends in Cognitive Science, 10, 14-23.
Haxby, J., Gobbini, M., Furey, M., Ishai, A., Schouten, J., & Pietrini, P. (2001).
Distributed and overlapping respresentations of faces and objects in ventral
temporal cortex. Science, 293, 2425-2429.
92
Hershler, O., & Hochstein, S. (2005). At first sight: A high-level pop out effect
for faces. Vision Research, 45, 1707-1724.
Hoffman, E. A., & Haxby, J. V. (2000). Distinct representations of eye gaze and
identity in the distributed human neural system for face perception. Nature
Neuroscience, 3, 80-84.
Honey, C., Kirchner, H., & VanRullen, R. (2008). Faces in the cloud: Fourier
power spectrum biases ultrarapid face detection. Journal of Vision, 8, 9 1-
13.
Horner, A. J., & Andrews, T. J. (2009). Linearity of the fMRI response in
category-selective regions of human visual cortex. Human Brain Mapping,
30, 2628-2640.
Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional
architecture of monkey striate cortex. Journal of Physiology, 195, 215–243.
Ishai, A., Ungerleider, L.G. & Haxby, J.V. (2000). Distributed neural systems for
the generation of visual images. Neuron, 28, 979–990.
Johnson, M.H., Dziurawiec, S., Ellis, H & Morton, J. (1991). Newborn’s
preferential tracking of face-like stimuli and its subsequent decline.
Cognition, 40, 1-19.
Kadosh, K. C., Henson, R., Kadosh, R. C., Johnson, M.H., & Dick, F. (2010).
Task-dependent activation of face-sensitive cortex: an fMRI adaptation
study. Journal of Cognitive Neuroscience, 22, 903-917.
Kanwisher, N., McDermott, J., & Chun, M.M. (1997). The fusiform face area: a
module in human extrastriate cortex specialized for face perception. Journal
of Neuroscience, 17, 4302-4311.
Kaplan, E., Goodglass, H., & Weintraub, S. (1983). The Boston naming test.
Philadelphia: Lea & Febiger.
Kourtzi, Z., & Kanwisher, N. (2000). Cortical regions involved in perceiving
object shape. Journal of Neuroscience, 20, 3310-3318.
Krekelberg, B., Boynton, G. M., & van Wezel, R. J. (2006). Adaptation: from
single cells to BOLD signals. Trends in Neurosciences, 29, 250-256.
Kriegeskorte, N., Bandettini, P. (2007). Analyzing for information, not activation,
to exploit high-resolution fMRI. NeuroImage, 38, 663-665.
93
Kriegeskorte, N., Formisano, E., Sorger, B., & Goebel, R. (2007). Individual
faces elicit distinct response patterns in human anterior temporal cortex.
Proceedings of the National Academy of Sciences, 104, 20600-20605.
Lades, J.C.V., Buhmann, J., Lange, J., Malsburg, C., Wurtz, R., & Konen, W.,
(1993). Distortion invariant object recognition in the dynamic link
architecture. IEEE Transactions on Computers: Institution of Electrical and
Electronics Engineers, 42, 300-311.
Le Grand, R., Cooper, P. A., Mondloch, C. J., Lewis, T. L., Sagiv, N., de Gelder,
B., et al. (2006). What aspects of face processing are impaired in
developmental prosopagnosia? Brain and Cognition, 61, 139–158.
Leder, H., & Bruce, V. (1998). Local and relational aspects of face
distinctiveness. Quarterly Journal of Experiment Psychology, 53A, 513-536.
Leopold, D., Bondar, I., & Giese, M. (2006). Norm-based face encoding by
single neurons in the monkey inferotemporal cortex. Nature, 442, 572-575.
Liu, J., Harris, A., Kanwisher, N. (2002). Stages of processing in face
processing: an MEG study. Nature Neuroscience, 5, 910-916.
Loffler, G., Yourganov, G., Wilkinson, F., & Wilson, H.R. (2005). fMRI evidence
for the neural representation of faces. Nature Neuroscience, 8, 1386-1390.
Mangini, M. C., & Biederman, I. (2004). Making the ineffable explicit: Estimating
the information employed for face classification. Cognitive Science, 28, 209-
226.
Michelon, P., & Biederman, I. (2003). Less impairment in face imagery than
face perception in prosopagnosia. Neuropsychologia, 41, 421-441.
Moeller, S., Freiwald, W.A., & Tsao, D.Y. (2008). Patches with links: a unified
system for processing faces in the macaque temporal lobe. Science, 320,
1355-1359.
NASA. (2001). "Unmasking the face on Mars." Retrieved from
http://science.nasa.gov/science-news/science-at-nasa/2001/ast24may_1/
Nestor, A., Vettel, J., Tarr, M. (2008). Task-Specific codes for Face Recognition:
How they Shape the Neural Representation of Features for Detection and
Individuation. PLoS One, 12, e3978
Nestor A, Plaut DC, Behrmann M. 2011. Unraveling the distributed neural code
of facial identity through spatiotemporal pattern analysis. Proceedings of the
National Academy of Science USA. 108:9998–10003.
94
O’Craven, K.M. & Kanwisher, N. (2000). Mental imagery of faces and places
activates corresponding stimulus-specific brain regions. Journal of Cognitive
Neuroscience, 12, 1013–1023.
O’Toole, A. J. (2011). A comparative examination of face recognition by
humans and machines. In (Eds. S. Li & A. Jain). Handbook of Face
Recognition (2nd Edition). Springer-Verlag, London Ltd., UK.
O’Toole, A. J., Roark, D. A. & Abdi, H. (2002). Recognizing moving faces: a
psychological and neural synthesis. Trends in Cognitive Science, 6, 261-
266.
Okada, K., Steffens, J., Maurer, T., Hong, H., Elagin, E., Neven, H., et al.
(1998). The Bochum/USC face recognition system and how it fared in the
FERET phase III test. In H. Wechsler, P. J. Phillips, V. Bruce, F. F. Soulie,
& T. Huang (Eds.), Face recognition: From theory to applications (NATO
ASI Series F). Berlin: Springer.
Olman, C., Ugurbil, K., Schrater, P., & Kersten, D. (2004). BOLD fMRI and
psychophysical measurements of contrast response to broadband images.
Vision Research, 44, 669-683.
Oppenheim, A., & Lim, S. (1981). The importance of phase in signals
Proceedings of the IEEE, 69, 529-541.
Palermo, R., Willis, M. L., Rivolta, D., McKone, E., Wilson, C. E., & Calder, A. J.
(2011). Impaired holistic coding of facial expression and facial identity in
congenital prosopagnosia. Neuropsychologia, 49, 1226–1235.
Pelli, D. G., Robson, J. G., & Wilkins, A. J. (1988). The design of a new letter
chart for measuring contrast sensitivity. Clinical Vision Science, 2, 187-199.
Pitcher, D., Charles, L., Devlin, J. T., Walsh, V., & Duchaine, B. C. (2009).
Triple dissociation of faces, bodies, and objects in extrastriate cortex.
Current Biology, 19, 319-324.
Pitcher, D., Walsh, V., Yovel, G., & Duchaine, B. C. (2007). TMS evidence for
the involvement of the right occipital face area in early face processing.
Current Biology, 17, 1568-1573.
Pomerantz, J. R., & Garner, W. R. (1973). Stimulus configuration in selective
attention tasks. Perception & Psychophysics, 14, 565-569.
Pourtois, G., Schwartz, S., Seghier, M.L., Lazeyras, F., & Vuilleumier, P. (2005).
Portraits or people? Distinct representations of face identity in the human
visual cortex. Journal of Cognitive Neuroscience, 17, 1043-1057.
95
Puce, A., Allison, T., Asgari, M., Gore, J.C., & McCarthy, G. (1996). Differential
sensitivity of human visual cortex to faces, letterstrings, and textures: a
functional magnetic resonance imaging study. Journal of Neuroscience, 16,
5205-5215.
Ramon, M., Busigny, T., & Rossion, B. (2010). Impaired holistic processing of
unfamiliar individual faces in acquired prosopagnosia. Neuropsychologia,
48, 933–944.
Reddy, L., Reddy, L., & Koch, C. (2006). Face identification in the near-
absence of focal attention. Vision Research, 46, 2336-2343.
Rhodes, G., & Jeffery, L. (2006). Adaptive norm-based coding of facial identity.
Vision Research, 46, 2977-2987.
Rossion, B., Caldara, R., Seghier, M., Schuller, A. M., Lazeyrasm, F., & Mayer,
E. (2003). A network of occipito-temporal face sensitive areas besides the
right middle fusiform gyrus is necessary for normal face processing. Brain,
126, 1-15.
Rotshtein, P., Henson, R.N., Treves, A., Driver, J., & Dolan, R.J. (2005).
Morphing Marilyn into Maggie dissociates physical and identity face
representations in the brain. Nature Neuroscience, 8, 107-113.
Sadr, J., & Sinha, P. (2004). Object recognition and Random Image Structure
Evolution. Cognitive Science, 28, 259-287.
Schiltz, C., Sorger, B., Caldara, R., Ahmed, F., Mayer, E., Goebel, R., &
Rossion, B. (2006). Impaired face discrimination in acquired prosopagnosia
is associated with abnormal response to individual faces in the right middle
fusiform gyrus. Cerebral Cortex, 16, 574-586.
Schmalzl, L., Palermo, R., & Coltheart, M. (2008). Cognitive heterogeneity in
genetically based prosopagnosia: A family study. Journal of
Neuropsychology, 2, 99–117.
Schultz, J. and K. S. Pilz. (2009). Natural facial motion enhances cortical
responses to faces. Experimental Brain Research, 194, 465-475.
Schweinberger, S. R., & Soukup, G. R. (1998). Asymmetric relationships
among perceptions of facial identity, emotion, and facial speech. Journal of
experimental psychology. Human perception and performance, 24, 1748-
1765.
96
Schweinberger, S. R., Burton, A. M., & Kelly, S. W. (1999). Asymmetric
dependencies in perceiving identity and emotion: experiments with morphed
faces. Perception & psychophysics, 61, 1102-1115.
Sinha, P., Balas, B., Ostrovsky, Y., & Russell, R. (2006). Face recognition by
humans: nineteen results all computer vision researchers should know
about. Proceedings of IEEE. 94, 1948-1962.
Smith, M. L., Fries, P., Gosselin, F., Goebel, R., & Schyns, P. G. (2009).
Inverse mapping the neuronal substrates of face categorizations. Cerebral
Cortex, 19, 2428-2438.
Sugase, Y., Yamane, S., Ueno, S., & Kawano, K. (1999). Global and fine
information coded by single neurons in the temporal visual cortex. Nature,
400, 869-873.
Summerfield, C., Trittschuh, H.E., Monti, M.J., Mesulam, M-M., & Egner, T.
(2008). Neural repetition suppression feflects fulfilled perceptual
expectations. Nature Neuroscience, 11, 1004-1006.
Tanaka, J. W., & Farah, M. J. (1993). Parts and wholes in face recognition.
Quarterly Journal of Experimental Psychology A, 46, 225–245.
Thomas, C., Avidan, G., Humphreys, K., Jung, K. J., Gao, F., & Behrmann, M.
(2009). Reduced structural connectivity in ventral visual cortex in congenital
prosopagnosia. Nature Neuroscience, 12, 29-31.
Tong, F., Nakayama, K., Vaughan, J. T. & Kanwisher, N. (1998). Binocular
rivalry and visual awareness in human extrastriate cortex. Neuron, 21, 753-
759.
Torralba, A., & Oliva, A. (2003). Statistics of natural image categories. Network:
Computation in Neural Systems, 14, 391–412.
Trautmann, S. A., Fehr, T., & Herrmann, M. (2009). Emotion in motion: dynamic
compared to static facial expressions of disgust and happiness reveal more
widespread emotion-specific activations. Brain Research, 1284, 100-115.
VanRullen, R. (2006). On second glance: Still no high level pop-out effect for
faces. Vision Research, 46, 3017–3027.
Vida, M. D., & Mondloch, C. J. (2009). Children's representations of facial
expression and identity: identity-contingent expression aftereffects. Journal
of Experimental Child Psychology, 104, 326-345.
Vuilleumier, P. (2000). Faces call for attention: Evidence from patients with
visual extinction. Neuropsychologia, 38, 693–700.
97
98
Watson, A. B. & Pelli, D. G. (1983). QUEST: a Bayesian adaptive psychometric
method. Percept Psychophys, 33, 113-120.
Wichmann, F. A., & Gegenfurtner, K. R. (2010). Animal detection in natural
scenes: Critical features revisited. Journal of Vision, 10, 1-27.
Willenbockel V, Sadr J, Fiset D, Horne GO, Gosselin F, Tanaka JW. (2010).
Controlling low-level image properties: the SHINE toolbox. Behavioral
Research Method, 42, 671-684.
Winston, J. S., Henson, R. N., Fine-Goulden, M. R., & Dolan, R. J. (2004).
fMRI-adaptation reveals dissociable neural representations of identity and
expression in face perception. Journal of Neurophysiology, 92, 1830-1839.
Wiskott, L., Fellous, J.-M., Krüger, N., & von der Malsburg, C. (1997). Face
recognition by elastic bunch graph matching. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 19, 775–779.
Xu, X., Yue, X., Lescroart, M.D., Biederman, I., & Kim, J.G. (2009). Adaptation
in the fusiform face area (FFA): image or person. Vision Research, 49,
2800-2807.
Xu, X., & Biederman, I. (2010). Loci of the release from fMRI adaptation for
changes in facial expression, identity and viewpoint. Journal of Vision,
10(14):36, 1-13.
Xu, X., & Biederman, I. (Epub ahead of print January 30, 2013). Neural
correlates of face detection. Cerebral Cortex. doi: 10.1093/cercor/bht005
Yue, X., Biederman, I., Mangini, M. C., von der Malsburg, C., & Amir, O. (2012).
Predicting the Psychophysical Similarity of Faces and Non-Face Complex
Shapes by Image-Based Measures. Vision Research, 55, 41-46.
Yue, X., Tjan, B., & Biederman, I. (2006). What makes faces special? Vision
Research, 46, 3802-3811.
Yin, R. K. (1969). Looking at upside-down faces. Journal of Experimental
Psychology, 81, 141–145.
Young, A. W., Hellawell, D., & Hay, D. C. (1987). Configurational information in
face perception. Perception, 16, 747–759.
Zhu Q, Zhang J, Luo YLL, Dilks DD, Liu J. (2011) Resting-state neural activity
across face-selective cortical regions is behaviorally relevant. Journal of
neuroscience 31, 10323–10330.
Abstract (if available)
Abstract
Humans are exceptionally good at face recognition. Within a fraction of a second, we not only detect the presence of a face, but also determine its identity, expression, orientation, age, sex, and attractiveness. This process involves collaboration of specific regions in the cortex, but how the different face attributes are processed and represented has remained unclear. This thesis is toward the understanding of the neural correlates supporting the detection and configural coding of face, as well as the representations of various face attributes, particularly of face individuation in both normal subjects and an acquired prosopagnosics MJH. ❧ In the first study, I examined the detection threshold of face in MJH and controls. The results revealed a significant face detection deficit in MJH compared with controls, suggesting the contribution of ventral temporal cortex to face detection. Secondly, I provided a neurocomputational account of the configural effect in face perception, that is the better discrimination of whole faces, compared with isolated face parts. The computational simulation and psychophysical results suggested that it is the overlapping receptor field of Gabor kernels that coding different part of faces gave rise to the holistic representation of face. Finally, previous research has shown that the Gabor-jet model predicts almost perfectly the psychophysical similarity of faces and other complex shapes. By using this model to scale image similarities among faces, I could compare the neural representation of the high-level face attributes without the low-level image similarity confounds. The results of two fMRI adaptation experiments revealed two cortical loci - the Fusiform face area (FFA) and occipital face area (OFA) tuning to face identity in normal subjects. On the other hand, MJH’s lesions in these areas could have led to his complete loss of ability in identifying faces. Behavioral and imaging results indicate no functional plasticity that had alleviated his deficits, even after the four decades subsequent to the lesions he suffered in early childhood. Taken together, these results elicited the important role of FFA and OFA played in both detection and more abstract coding of human faces.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Adaptation in fusiform face area (FFA): image or person?
PDF
The neural representation of faces
PDF
Functional magnetic resonance imaging characterization of peripheral form vision
PDF
Face classification
PDF
The neural correlates of creativity and perceptual pleasure: from simple shapes to humor
PDF
Computational models and model-based fMRI studies in motor learning
PDF
The representation of medial axes in the perception of shape
PDF
Explicit encoding of spatial relations in the human visual system: evidence from functional neuroimaging
PDF
Functional models of fMRI BOLD signal in the visual cortex
PDF
Crowding in peripheral vision
PDF
Invariance to changes in contrast polarity in object and face recognition
PDF
The importance of not being mean: DFM -- a norm-referenced data model for face pattern recognition
PDF
Structural brain imaging in psychopathy
PDF
Object adaptation in the lateral occipital complex: shape or semantics?
PDF
Sensitive, specific, and generative face recognition in a newborn visual system
PDF
The neural basis for shape preferences
PDF
Perceptual and computational mechanisms of feature-based attention
PDF
Sensory information processing by retinothalamic neural circuits
PDF
Spatiotemporal processing of saliency signals in the primate: a behavioral and neurophysiological investigation
PDF
The neural correlates of skilled reading: an MRI investigation of phonological processing
Asset Metadata
Creator
Xu, Xiaokun (author)
Core Title
The neural correlates of face recognition
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
07/26/2013
Defense Date
06/13/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
configural effect,face identification,FFA,fMRI,OAI-PMH Harvest,prosopagnosia
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Biederman, Irving (
committee chair
), Baker, Laura A. (
committee member
), Mel, Bartlett W. (
committee member
), Tjan, Bosco S. (
committee member
)
Creator Email
xiaokun82@gmail.com,xiaokunx@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-299793
Unique identifier
UC11293820
Identifier
etd-XuXiaokun-1853.pdf (filename),usctheses-c3-299793 (legacy record id)
Legacy Identifier
etd-XuXiaokun-1853.pdf
Dmrecord
299793
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Xu, Xiaokun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
configural effect
face identification
FFA
fMRI
prosopagnosia