Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
A framework for high‐resolution, high‐fidelity, inexpensive facial scanning
(USC Thesis Other)
A framework for high‐resolution, high‐fidelity, inexpensive facial scanning
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
A Framework for High-Resolution, High-Fidelity, Inexpensive Facial
Scanning
by
Paul R. Graham
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements of the Degree
DOCTOR OF PHILOSOPHY
(Computer Science)
August 2014
Copyright 2014 Paul R. Graham
Table of Contents
Abstract vi
Acknowledgments viii
1 Introduction 1
2 Background 7
2.1 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Geometry and Reflectance Acquisition . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Texture Synthesis and Super Resolution . . . . . . . . . . . . . . . . . . . . . . 11
3 Camera Calibration 14
3.1 Designing a Calibration Object . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Mesostructure 21
4.1 Hardware Setup and Capture Process . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Deriving Geometry and Reflectance . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Microstructure 48
5.1 Recording Skin Microstructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Facial Microstructure Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Discussion and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 Microgeometry Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Future Work 72
7 Conclusion 75
BIBLIOGRAPHY 76
ii
List of Figures
1.1 Uncanny valley depicted with computer generated faces representing different
points along the curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Examples of passive and active capture systems . . . . . . . . . . . . . . . . . 4
1.3 Resolution of state of the art capture systems . . . . . . . . . . . . . . . . . . . 5
1.4 Framework for high-resolution, high-fidelity, inexpensive facial scanning . . . . 6
2.1 Typical calibration objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1 Camera calibration data flow diagram . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Recovering focal length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Calibration object from four views . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Detected corners and reprojection errors . . . . . . . . . . . . . . . . . . . . . 19
3.5 Verification of calibration accuracy . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6 Cylinder calibration trouble areas . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 Mesostructure reconstruction data flow diagram . . . . . . . . . . . . . . . . . 22
4.2 Facial capture setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Face surface normal distributions . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Flash configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Interleaved cameras and highlights . . . . . . . . . . . . . . . . . . . . . . . . 27
4.6 Highlights from 24 images shown on shiny blue ball . . . . . . . . . . . . . . . 28
4.7 Mosiac of image shot from cameras under lighting conditions . . . . . . . . . . 29
4.8 Light going through a polarizing beamsplitter and specifications . . . . . . . . 30
4.9 Parallel and cross polarized flash images . . . . . . . . . . . . . . . . . . . . . 31
4.10 Geometry refinement from base mesh to final mesh . . . . . . . . . . . . . . . 33
4.11 Diffuse-specular separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.12 Optimizing geometry from differing specular highlights . . . . . . . . . . . . . 37
4.13 Recovered reflectance maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.14 Renderings of female subject from recovered geometry and reflectance maps
from different viewpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.15 Renderings of recovered geometry and reflectance maps of a female subject
under multiple lighting conditions . . . . . . . . . . . . . . . . . . . . . . . . 42
4.16 Renderings of recovered geometry and reflectance maps of a male subject. . . . 43
4.17 Renderings of female subject from recovered geometry and reflectance maps . . 43
iii
4.18 Renderings of recovered geometry of a female subject and validation against a
photograph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.19 Graceful degradation of normal maps . . . . . . . . . . . . . . . . . . . . . . . 45
4.20 Mesostructure comparison to laser scan . . . . . . . . . . . . . . . . . . . . . 46
4.21 Mesostructure comparison to light stage scans . . . . . . . . . . . . . . . . . . 47
5.1 Camera calibration data flow diagram . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Acquisition setups for skin microgeometry . . . . . . . . . . . . . . . . . . . . 51
5.3 Measured skin patches from different facial regions of two subjects . . . . . . . 53
5.4 Validation renderings for BRDF estimation of micro patches . . . . . . . . . . 54
5.5 Validation renderings for BRDF estimation of micro patches . . . . . . . . . . 54
5.6 Face segmentation map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.7 Microstructural detail added to a scanned facial region . . . . . . . . . . . . . 56
5.8 Melting artifact that can occur during synthesis if edge cases are not handled
correctly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.9 Comparison of mesoscale, microscale and photograph for cheek region of fe-
male subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.10 Comparison of mesoscale, microscale and photograph for forehead region of
male subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.11 Microstructure synthesis with different exemplars and constraints . . . . . . . . 62
5.12 Comparison of BRDF at different scales . . . . . . . . . . . . . . . . . . . . . 65
5.13 Subject one rendered from 16K displacement maps with synthesized microstruc-
ture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.14 Subject two rendered from 16K displacement maps with synthesized microstruc-
ture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.15 Subject three rendered from 16K displacement maps with synthesized microstruc-
ture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.16 Forehead patch under neutral and raised eyebrows conditions . . . . . . . . . . 69
5.17 Rendering of cheek from male subject with sinusoidal microstructure synthesis
compared to rending without microstructure . . . . . . . . . . . . . . . . . . . 69
5.18 Microstructure database examples . . . . . . . . . . . . . . . . . . . . . . . . 70
iv
List of Tables
4.1 Configuration costs table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1 Microscale two-lobe Beckmann distribution parameters obtained for the different
skin patches across two subjects of Fig. 3 . . . . . . . . . . . . . . . . . . . . . 64
5.2 Mesoscale two-lobe Beckman distribution parameters obtained for different facial
regions across two subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Cross validation of microscale two-lobe distribution done at mesoscale resolution 64
5.4 Microgeometry database entries . . . . . . . . . . . . . . . . . . . . . . . . . . 71
v
Abstract
We present a framework for high-resolution, high-fidelity, inexpensive facial scanning. The
framework combines the speed and cost of passive lighting scanning systems with the fidelity
of active lighting systems. The subject is first scanned at the mesoscale, the scale of pores and
fine wrinkles. The process is a near-instant method for acquiring facial geometry and reflectance
with 24 DSLR cameras and ten flashes. The flashes are fired in rapid succession with subsets of
the cameras, which are specially arranged to produce an even distribution of specular highlights
on the face. The total capture time is less than the mechanical movement of the eyelid in the hu-
man blink reflex. We use this set of acquired images to estimate diffuse color, specular intensity,
and surface orientation at each point on the face. With a single photo per camera, we optimize
the facial geometry to maximize the consistency of diffuse reflection and minimize the variance
of specular highlights using an energy-minimization message-passing technique. This allows the
final sub-millimeter surface detail to be obtained via shape-from-specularity, even though every
photo is from a different viewpoint. The final system uses commodity components and produces
models suitable for generating high-quality digital human characters. The mesostructure is en-
hanced to include microgeometry through the scanning of skin patches around the face. We dig-
itize the exemplar patches with a polarization-based computational illumination technique which
considers specular reflection and single scattering. The recorded microstructure patches can be
used to synthesize full-facial microstructure detail for either the same subject or a different subject
with similar skin types. We show that the technique allows for greater realism in facial renderings
including a more accurate reproduction of skin’s specular reflection effects. A microstructure
database is provided for easy cross-subject synthesis during the enhancement stage. Additionally,
a multi-view camera calibration technique is introduced. This new technique can be accomplished
with a single view from each camera of a cylinder wrapped in a checkerboard pattern. It is fast
and resolves extrinsic and intrinsic camera parameters to a sub-pixel re-projection error.
vi
Acknowledgments
This work has been made possible though mentorship, assistance and encouragement of
many people. It takes a village of support to finish a dissertation and I’d like to express my thanks
here.
First, I would like to thank my advisor Dr. Paul Debevec and committee members,
Dr.Abhijeet Ghosh, Dr.Gerard Medioni, Dr.Michelle Povinelli, and Dr.Hao Li. They have been a
well spring of inspiration, guidance, and support. I appreciate my collaborators Borom Tunwat-
tanpong and Graham Fyffe for their expert advice in reflection properties and lighting models and
Katheleen Haase for keeping order during the chaos of paper deadlines and project tracking. I
would like to thank the rest of the members of the ICT Graphics Lab: Jay Busch, Xueming Yu,
Andrew Jones, Valerie Dauphin, and Oleg Alexander. It has been a wonderful experience full of
scholarship and camaraderie. Additionally, I would like to thank USC’s department of computer
science PhD student advisor, Lizsl DeLeon, for keeping me on target.
I would also like to thank the leadership in the Computer Science Department at the U.S.
Air Force Academy for giving me this opportunity. Specifically, Col David Gibson, Lt Col Jeff
Boleng, Dr. Martin Carlisle, and Dr. Dennis Schwietzer for their support, encouragement and
faith in me. Their leadership has been always been an inspiration to me and without them I would
never have entertained a PhD possibility.
My parents, Bill and Nancy Graham for their love and stability. They provided a solid
appreciation for academia and encouragement to achieve anything.
vii
Finally, I’d like to thank my wife, Carolyn Graham, for her unwavering confidence and
love. She has supported me more than anyone could imagine though this entire adventure -
putting up with late nights and long weekends, baking treats for deadlines, all the while keeping
me sane. I couldn’t have made it without her.
The views expressed in this article are those of the author and do not reflect the official policy or
position of the United States Air Force, Department of Defense, or the U.S. Government
viii
Chapter 1
Introduction
A major goal of computer graphics research over the past couple of decades has been to create
photo-realistic digital human faces. The increased number of digital doubles seen in blockbuster
movies (Avatar, The Avengers, Oblivion, etc.) and the realism seen in modern game engines (Call
of Duty: Black Ops II, The Elder Scrolls V: Skyrim, Crysis 3, etc.) emphasize the drive to achieve
this goal. In addition, digital actors have started appearing in augmented and virtual reality [25]
and mental health care [49] applications. As opportunities to observe and interact with digital
characters appear more frequently, it becomes desirable to maximize the photo-realism of such
characters, while reducing the effort and resources required to do so.
A major concern of creating high-quality digital content is avoiding the phenomenon known as
”the uncanny valley” [44]. The uncanny valley describes a human observer’s reaction as the digital
character’s fidelity increases. Succinctly, as the digital actors become more realistic, observers
interacting with it feel more at ease. The level of comfort increases with the level of realism
until the actor appears ”close to human.” At that point, the digital actor tends to create a ”creepy”
feeling with human observers and the comfort level drops significantly (see Fig. 1.1).
1
Figure 1.1: Uncanny valley depicted with computer generated faces representing different points
along the curve (a) Computer animated animals (b) Cartoon characters (c)(d) Digital humans in
the valley (e)(f) Digital humans climbing out of the valley (g) Photograph
Capture Systems
As a part of the surge to create photo-realistic digital faces, a pattern for acquiring data has
emerged. First, scan the subject with a camera array. Then send the captured images through
a multi-view stereo algorithm that recovers geometry and a set of reflectance properties. The
camera arrays can be classified into one of two different groups: passive and active lighting sys-
tems (see Fig. 1.2).
Passive Lighting Systems Passive lighting systems use a collection of cameras that capture the
subject under constant lighting. This allows the subject to be captured in a single instant of time.
The cameras are synchronized to the same exposure length and trigger signal. A big advantage to
passive lighting systems is the reduction of artifacts introduced due to subject movement caused
2
by longer capturing sessions. Additionally, if the subject is a human being, he or she is not re-
quired to hold intricate facial expressions for extended lengths of time. However, these systems
can only capture skin surface detail with all layers blended together or ignore the specular compo-
nent all together if the lighting is cross polarized, . Results from such systems include geometry
that is missing mesoscale features (such as pores and fine wrinkles). Typically these approaches
hallucinate the missing fine detailed features through specialized filtering methods [5]. However,
hallucination techniques reduce the fidelity of the final geometry and renderings.
Active Lighting Systems Active lighting systems also use a collection of cameras and include
controllable lighting systems. With the introduction of controllable lighting, these systems cap-
ture multiple images per scan, one per lighting condition. Similar to passive lighting systems, the
cameras are synchronized in exposure and trigger times. However, by controlling the polarization
and orientation of the lighting, it is possible to separate the sub-surface layers from the top layer
of the skin [38,64]. This produce the highest fidelity geometry and specular reflectance measure-
ments possible with photogrammetry to date. Active lighting systems are not without drawbacks;
the subject being captured is required to remain still for multiple seconds, the data acquired can be
tedious to manage, controllable lighting rigs are expensive, and the physical setup of such devices
makes easy transportation impractical.
Additionally, all current techniques, active and passive, are limited to the resolution of the captur-
ing cameras. Typically this allows each pixel to cover approximately 70 mm
2
of surface area (see
Fig. 1.3). With the introduction of high-definition television and 4K resolution films, this scale
may not be adequate enough to provide convincing realistic close-ups. To produce compelling
renderings of digital actors in high-density formats, the level of resolution of today’s scans needs
to increase in scale.
3
(a) Camera Array used in passive- (b) Light Stage used to control
lighting systems [5] lighting in active-lighting systems [3]
Figure 1.2: Examples of passive and active capture systems
High-Resolution, High-Fidelity Facial Scanning Framework
Modern DSLR cameras are capable of exposure speeds faster than 5ms and modern flashes are
completely discharged in 1ms or less. By rapidly triggering the cameras sequentially, the array
can capture a series of photographs in an instant of time under different flash lighting conditions.
By triggering commercial off-the-shelf cameras and flashes with a simple circuit, mesoscale ge-
ometry and reflectance properties are acquired (Chapter 4). Combining the mesoscale geometry
acquisition and reconstruction with a microgeometry enhancement (Chapter 5) and the new cam-
era calibration methods (Chapter 3), this dissertation establishes a framework (Fig. 1.4) for fast,
inexpensive, high-quality facial scans that meet the demands of today’s high-density formats.
Contribution
This dissertation presents a framework for capturing faces at the mesoscale level, patches of faces
at the microscale level, and provides algorithms for reconstructing very high-resolution photo-
realistic relightable geometry and texture maps.
The contributions are:
4
Figure 1.3: Ruler captured at the same distance of a typical facial scan from a Canon EOS-1D X
Digital SLR Camera showing the surface area covered by each pixel
1. A framework for high-resolution, high-fidelity, inexpensive facial scanning by incorporat-
ing items two through four.
2. One shot cylinder calibration method for multi-view scanning systems.
3. Mesoscale facial capture process and algorithms which are comparable to the cost and
speed of passive lighting methods and quality of active lighting environment methods.
(a) A system to capture facial images using active lighting with speeds and costs compa-
rable to passive lighting.
(b) Algorithms for recovering geometry and relightable reflectance properties from flash
images.
4. A synthesis approach for increasing the resolution of mesostructure level facial scans using
surface microstructure digitized from skin samples about the face.
(a) A system for recording skin reflectance properties on the scale of 10 microns.
(b) An algorithm to increase the resolution of a mesostructure scan to microstructure
scale.
(c) An analysis of same-subject, cross-subject, and unconstrained microgeometry appli-
cation to mesostructure scans.
(d) A microgeometry database consisting of subjects of varying race, age, and gender.
5
Figure 1.4: Framework for high-resolution, high-fidelity, inexpensive facial scanning
6
Chapter 2
Background
2.1 Camera Calibration
When addressing any problem related to photometric scanning, the issue of camera calibration
is a necessary topic. This section will discuss some of the current techniques for recovering the
extrinsic and intrinsic camera parameters; specifically planar and volume calibration methods.
Planar calibration requires photographing a planar calibration object, typically a checkerboard
grid (see Fig. 2.1 (a)). V olumetric calibrations involve capturing a known object inside a volume
(see Fig. 2.1 (b)). A third category, Self-calibration, finds feature points that are common in
multiple views of the intended reconstruction object and uses them to solve a scaled set of camera
parameters [51]. Self-calibration methods will not be discussed because they do not provide
accurate scale information about the scene.
Planar Calibration
Roger Y . Tsai [55] describes a two-stage methodology that reduces the dimensionality of the
search space by categorizing camera parameters into two groups. The first group uses the radial
7
(a) Checkerboard used for planar (b) Sphere used in volumetric
calibration [69] calibration [5]
Figure 2.1: Typical calibration objects
alignment constraint to find the rotation and translation of the camera. The second stage fixes
group one’s parameters and uses the projection equations and a non-linear solver to find focal
length, scale, and distortion parameters.
Zhengyou Zhang [69] uses different (at least two) orientations of a planar pattern to solve for the
extrinsic and intrinsic camera calibration parameters. This algorithm has three steps: initialize
parameters using a closed-form solution, estimate radial distortion coefficients using linear least
squares, and refinement based on a maximum likelihood criterion.
Volumetric Calibration
Svobada et al. [52] employ a laser pointer which is waved around in the calibration volume to
create a set of virtual 3D points in a virtual environment. Projective structures are computed
followed by an Euclidean stratification based on geometric constraints. Non-linear distortion
coefficients are estimated and the process is iterated until reprojection errors fall below a user-
defined threshold.
8
Beeler et al. [5] have built a method of calibration using a small sphere that has been augmented
with a number of fiducial markings. This system first uses the known size of the sphere and the
markings to find correspondences between cameras in an Euclidean coordinate frame. Then the
Euclidean frame is discarded and the camera correspondences are provided as inputs into Svobada
et al’s algorithm as described above.
2.2 Geometry and Reflectance Acquisition
There are many ways to recover geometry and reflectance properties: photogrammetry, laser
scanning, sculpting, etc. This dissertation focuses on photogrammetry processes to recover the
desired facial attributes. In order to achieve ear-to-ear coverage, these techniques all involve
a multi-camera array. This section separates systems into two categories: passive and active
lighting systems.
Passive Lighting Systems
Furukawa et al. [16] implements multi-view stereopsis as a match-expand-filter procedure that
produces dense patch reconstruction from a set of calibrated images. The procedure reconstructs
a set of sparse correspondences and expands to find matches at nearby pixels. This process repeats
until a patch has been reconstructed in every tile of the image set. With a large number of images
this technique can solve for very fine features, but in practice it only recovers low and medium
frequency details.
Beeler et al. [5] reconstructs high-resolution geometry by exploiting the skin texture captured
under constant illumination. This technique recovers geometry that is smooth across fine details,
such as pores and wrinkles. To compensate, an additional mesoscopic augmentation is applied to
9
hallucinate fine detailed geometry at that scale. However, this produces geometry that does not
have the desired fidelity and specular reflectance maps cannot be recovered.
Valgaerts et al. [58] present a passive facial capture system which achieves high-quality facial ge-
ometry reconstruction under arbitrary uncontrolled illumination. They reconstruct base geometry
from stereo correspondence and incorporate high-frequency surface detail using shape from shad-
ing and incident illumination estimation as in Wu et al. [66]. The technique achieves impressive
results for uncontrolled lighting, but does not take full advantage of specular surface reflections
to estimate detailed facial geometry and reflectance.
The GelSight system [30] coats the sample with a silver powder and uses photometric stereo to
record surface microgeometry at the level of a few microns. However, while this does recovery
geometry, the silver coating removes the possibility of recovering the diffuse and specular color
components of the subject.
Active Lighting Systems
Typically specular highlights would shift across the subject as the location of the light and camera
changes. Zickler et al. [73] exploit Helmholtz reciprocity to overcome this limitation. By arrang-
ing cameras and lights such that the light transport is equivalent between views, it is possible to
ensure that the ratio of emitted light to incident light remains constant. This allows the capture
of subjects using direct flashes as a lighting condition and removes the shifting of highlights. In
addition, it allows for simple outlier detection due to self-shadowing.
Significant prior research has been accomplished with photometric stereo. R. J. Woodham [65]
derived surface orientations of Lambertian surfaces by capturing images of the same subject from
three point-light directions. Rushmeier et al. [50] devised bump maps from photometric stereo to
increase the detail on rendered surfaces. Lim et al. [34] transferred the concept of photometric
10
stereo to a moving subject under changing and distant illumination. Higo et al. [28] accomplished
photometric stereo using hand-held cameras with an attached light source and dropping the far
light assumption.
For semi-translucent materials such as skin, Jessica Ramella-Roman [48] showed that subsurface
scattering blurs the surface detail recoverable from traditional photometric stereo significantly.
Recent research looked at compensating for the smoothness caused by subsurface scattering. De-
bevec et al., Weyrich et al., Chen et al., and Jessica Ramella-Roman [9, 10, 48, 62] showed that
analysis of the specular layer can be done to obtain more accurate surface orientations.
Both polarization and color space analysis can be used in separating diffuse and specular reflec-
tions [45]. Mallick et al. [40] use a linear transform from RGB color space to an SUV color space
where S corresponds to the intensity of monochromatic specular reflectance and UV correspond
to the orthogonal chroma of the diffuse reflectance.
Ma et al. [38] use polarization difference imaging to isolate the specular reflection under gradient
lighting conditions, allowing specular surface detail to be recorded in a small number of images.
Ghosh et al. [20] shows that polarization imaging can be used to create a layered model of fa-
cial reflectance properties consisting of specular reflection, single scattering, shallow subsurface
scattering, and deep subsurface scattering.
2.3 Texture Synthesis and Super Resolution
Once mesoscale geometry, albedo maps, and normal maps are recovered, it is desirable to increase
their resolution to achieve the quality necessary for current and future display systems. Texture
synthesis and super resolution are two techniques that can be used to accomplish this goal. This
section describes some of the current research in those areas.
11
Texture Synthesis
Wei et al. [60] compiled a recent review of example-based texture synthesis. To summarize a
few key results, standard 2D texture synthesis algorithms, such as Heeger et al. [26] and Efros et
al. [14], have been extended to arbitrary manifold surfaces by several authors [33, 61, 68]. Ying
et al. [68] demonstrated that displacement maps can be synthesized with such techniques. Addi-
tionally, Tong et al. [53] and Tsumura et al. [56] extended the methods to measured reflectance
properties and skin color respectively. Some approaches [13, 27] also permit texture transfer,
transforming a complete image so that it has textural detail of a given sample.
2.3.1 Super Resolution
Hertzman et al., Freeman et al., and Lefebvre et al.‘[15, 27, 32] describe procedures for super-
resolution, which adds detail to low-resolution images based on examples. Constrained texture
synthesis techniques presented by Wang et al. [59] and Ramanarayanan et al.‘[47] add plausible
high-resolution detail to a low-resolution image without changing its low-frequency content.
Techniques have been proposed to increase the detail present in facial images and models, al-
though at a significantly coarser resolution than desired for the work proposed here. Liu et al. [35]
presented a face hallucination procedure that creates recognizable facial images through con-
strained facial texture synthesis onto a parametric face model. He later applied image processing
techniques to transfer facial detail from one subject to another, allowing image-based aging effects
.‘[?]Liu:2004:IBS. Golovinskiy et al. [21] synthesized skin mesostructure (wrinkles, creases, and
larger pores) from high-quality facial scans onto otherwise smooth faces by matching per region
frequency statistics, which is not designed to match existing mesostructure.
12
Additionally, there is a body of previous research on sampling microgeometry in order to better
predict appearances at normally observable larger scales. Marschner et al. [42] made observations
of a cross-section of wood from an electron micrograph which motivated reflection models that
better predicted anisotropic reflection effects. Zhao et al. [70] used Micro CT imaging of fabric
samples to model the volumetric scattering of complete pieces of cloth.
13
Chapter 3
Camera Calibration
The goal for designing a new calibration object is to provide a practical and quick method for
estimating the following parameters for a multi-view camera array: Rotation (R), Translation (T ),
Principal Point (p
x
, p
y
), Focal Length ( f
x
, f
y
), and the first two radial distortion coefficients (k
1
,
k
2
). While previous techniques are capable of recovering these parameters, they require each
camera to take multiple images of their respective objects in different orientations [52, 55, 69].
This can be cumbersome for the number of cameras required to achieve ear-to-ear coverage. In
this chapter we present the first part of our framework (see Fig. 3.1). It describes a new calibration
object and associated algorithms. Additionally, we discuss the design choices made, provide
discussion, and show results.
3.1 Designing a Calibration Object
An ideal shape for a single-shot multi-view calibration object has several properties. First, it
should be viewable from many surrounding points at the same time. This quality is necessary
Sections of this chapter have been published as a part of Acquiring Reflectance and Shape From Continuous
Spherical Harmonic Illumination [57].
14
Figure 3.1: Camera calibration data flow diagram
to ensure that all cameras exist in the same virtual space. This is guaranteed if every camera’s
parameters can be recovered at the same time. Second, the object should contain some detail
about the scale of the images. Without providing information on the size of the scene, calibration
methods can only recover the camera location up to a scale factor. Additionally, the object should
solve for f
x
and f
y
using only one image. This constraint forces the object to exhibit several depths
relative to the imaging plane (see Fig. 3.2 (a)). The object should also have machine-recognizable
features that easily correspond between the different cameras (in order to recover every camera’s
parameters at once) and the 3D coordinates associated with those points.
Checkerboards have traditionally provided an acceptable starting point for creating calibration
objects. They provide many easily detectable points, as well as a known scale factor to calibration
algorithms. Unfortunately, they do not allow for oblique imaging without rotation or translation.
However, wrapping a printed checkerboard around a cylindrical object affords a more ideal sit-
uation for multi-view calibration. Additionally, since a cylinder bends away from the imaging
plane with horizontal movement, f
x
and f
y
can be recovered from a single picture (see Fig. 3.2
15
(a) (b)
Figure 3.2: Recovering focal length: (a) Calibration planes allow for multiple depths when the
imaging device is in front of the object and the imaging plane is not parallel to the calibration
plane. In this figure, Camera B can not be calibrated without rotating the board. (b) A cylindrical
object provides multiple depths from any perspective on the equator. Here Camera B can be
calibrated at the same time as Camera A and Camera C.
(b)). Adding color markers to specific checkers allows consistent identification of the cylinder’s
orientation in each view (see Fig. 3.3). Furthermore, a cylinder has a mathematical formula that
allows the algorithm to determine a virtual 3D coordinate for each corner in the checkerboard that
has been wrapped around the object. Finally, a properly sized cylinder fills the same volume as
a human head, which allows for quick and accurate placement of the object into the scene and
accurate calibrations in the space occupied by the head.
3.2 Algorithm
The calibration algorithm first locates the color markers to determine which section of the cylinder
is visible in the view that was imaged. Then the checkerboard corners are detected using a stan-
dard corner detector [24] and refined to sub-pixel coordinates. Using the known location of the
16
Figure 3.3: Calibration object from four views
colored markers, the pixel locations of the detected corners are corresponded to the appropriate
3D coordinates of an ideal cylinder with the same dimensions as the imaged cylinder. Addition-
ally, the 3D correspondences allow for easy matching of the corners across camera views. This
information is then loaded into a least-squares minimization [43] of the distance between the de-
tected corner points and their 3D re-projections in their respective cameras. Equations 3.1-3.4 are
used to project the 3D points into the image plane and recover the camera parameters and first
two radial distortion coefficients.
l
2
4
xu
yu
1
3
5
=
2
4
f
x
1 p
x
0 f
y
p
y
0 0 1
3
5
2
4
I
3
0
3
5
2
4
R T
0 1
3
5
2
6
6
4
X
Y
Z
1
3
7
7
5
(3.1)
x
d
= xu(1+ k
1
r
2
+ k
2
r
4
) (3.2)
y
d
= yu(1+ k
1
r
2
+ k
2
r
4
) (3.3)
r=
q
x
2
u
+ y
2
u
(3.4)
XY Z are the 3D coordinates of a point in space
l is a scale factor determined by Z
(x
u
;y
u
) are undistorted image points
(x
d
;y
d
) are distorted image points
k
n
is the n
th
distortion parameter
17
The minimization occurs in three stages. During the first stage, the intrinsic parameters ( f
x
, f
y
, p
x
,
p
y
) and distortion coefficients are initialized to the camera and lens manufacturer’s specifications
and held constant while the algorithm solves for the external parameters (R and T ). The second
stage releases the internal parameters and distortion coefficients and re-minimizes the projection
error for each point. To compensate for manufacturing errors in the cylinder and printed checker-
board, each of the corners is allowed to ”float” in 3D space. Finally, mis-identified corners are
detected as outliers and removed from the set of points. We identify outliers by removing points
that have a reprojection error greater than the median reprojection error scaled by a user supplied
constant. The minimization routine is rerun until outliers are no longer present. Optionally, if
more than one image of the cylinder is taken per camera, it is possible to exploit the rigid body
motion of the cylinder and incorporate that data to help tighten the constraints on the focal length.
3.3 Discussion and Results
This method will work with any size cylinder, where the dimensions are known a priori. Addition-
ally, this technique works with a wide variety of cameras. It has successfully calibrated machine
vision, consumer and professional DSLRs, as well as home and production grade video cameras.
The results provided below were captured using four Canon EOS-1D X Digital SLR cameras each
using Nikon 135 mm E series lens. The cylinder is 47.88 cm in circumference wrapped in a 2 cm
checker pattern. Fig. 3.4 shows the results from the corner detector and re-projection errors for
each stage of the calibration algorithm. Typical reprojection error is less than 0:5 pixels.
To ensure the accuracy of the calibrations in relation to one another, the images can be rectified to
align epipolar lines. Fig. 3.5 shows the images of the cylinders that have been rectified such that
epipolar lines are aligned along scan lines.
18
(a) (b) (c)
Figure 3.4: Detected corners and reprojection errors (a) Initial cylinder image, with region of
interest highlighted (b) Corners detected in region of interest. The white X inside the thicker red
X shows the exact pixel location. (c) Final re-projection error. The black X inside the thick blue
X is the exact pixel re-projection. Please zoom in electronically to view the black and white X’s.
Please zoom in digital document to visualize
When calibrating cameras that surround an object a full 360-degrees, there are some necessary
constraints that should be enforced. The circumference of the cylinder should be an integer mul-
tiple of the checker sizes. Without that property, there will be a partial checker on the back side
that can corrupt the calibration (See Fig. 3.6 (a) ). Additionally, camera locations near the polar
positions relative to the cylinder will not accurately calibrate due to the flat surface or hole at
those locations (See Fig. 3.6 (b) ).
19
Figure 3.5: Verification of calibration accuracy, here epipolar lines have been aligned along scan
lines for two views from cylinder calibration
(a) (b)
Figure 3.6: (a) Back of a cylinder with checkers that were non-integer factors of the circumfer-
ence. (b) Polar view of cylinder with hole
20
Chapter 4
Mesostructure
Modeling realistic human characters is frequently done using 3D reconstructions of the shape and
appearance of real people across a set of different facial expressions [3, 46] to build blendshape
facial models. Believable characters which cross ”the uncanny valley” require high-quality ge-
ometry, texture maps, reflectance properties, and surface detail at the level of skin pores and fine
wrinkles. Unfortunately, there has not yet been a technique for recording such datasets which is
near-instantaneous and relatively low-cost.
While some facial capture techniques are instantaneous and inexpensive [5, 8], these do not
generally provide lighting-independent texture maps, specular reflectance information, or high-
resolution surface normal detail for relighting. In contrast, techniques which use multiple pho-
tographs from spherical lighting setups [19, 62] do capture such reflectance properties, but this
comes at the expense of longer capture times and complicated custom equipment.
In this chapter, we present a near-instant facial capture technique which records high-quality facial
geometry and reflectance using commodity hardware. We use a 24-camera DSLR photogramme-
try setup similar to common commercial systems
and use ten ring flash units to light the face.
DSLR facial capture photogrammetry setups can be found at The Capture Lab (http://www.capturelab.com/),
Autodesk (used in [37]), Ten24 (http://www.ten24.info/), and Infinite Realities (http://ir-ltd.net/)
21
However, instead of the usual process of firing all the flashes and cameras at once, each flash is
fired sequentially with a subset of the cameras, with the exposures packed milliseconds apart for a
total capture time of 78 ms, which is faster than the blink reflex [7]. This arrangement produces 24
independent specular reflection angles evenly distributed across the face, allowing a shape-from-
specularity approach to obtain high-frequency surface detail. Unlike other shape-from-specularity
techniques, our images are not taken from the same viewpoint, so we must rely on precise 3D
geometry to derive surface orientations from the specular reflections. We thus refine an initial
estimate of the facial geometry until its derived reflectance best matches the specular appearance
by performing energy minimization through a cost volume. The resulting system produces accu-
rate, pore level facial geometry and reflectance with near-instant capture in a relatively low-cost
setup that fills the capture and mesostructure elements of the high-resolution high-fidelity facial
scanning framework (see Fig. 4.1).
Figure 4.1: Mesostructure reconstruction data flow diagram
22
4.1 Hardware Setup and Capture Process
Our capture setup is designed to record accurate 3D geometry with both diffuse and specular
reflectance information per pixel while minimizing cost and complexity and maximizing the speed
of capture. In all, we use 24 entry-level DSLR cameras and a set of ten ring flashes arranged on a
gantry as seen in Fig. 4.2.
Figure 4.2: Facial capture setup, consisting of 24 entry-level DSLR cameras and with ten diffused
ring flashes, all one meter from the face. A set of images taken with this arrangement can be seen
in Fig. 4.7.
Camera and Flash Arrangement The capture rig consists of 24 Canon EOS 600D entry-level
consumer DSLR cameras, which record RAW mode digital images at 5202 3565 pixel reso-
lution. Using consumer cameras instead of machine vision video cameras dramatically reduces
cost, as machine vision cameras of this resolution are very expensive and require high-bandwidth
connections to dedicated capture computers. But to keep the capture near-instantaneous, we limit
the number of images each camera records to one image per capture sequence, as these entry-level
cameras require at least 0.25 seconds to reset between continuous photographs.
23
Since our processing algorithm determines fine-scale surface detail from specular reflections, we
wish to observe a specular highlight from the majority of the surface orientations of the face.
First, we needed to determine which angles were most important to observe. To measure the
range of normals present in a human subject’s face, we transformed the surface orientations from
several facial models (scanned in an active lighting system [19]) into polar coordinates and built
2D histograms over the range of surface orientations present (see Fig. 4.3). Not surprisingly, over
90% of the orientations fell between90
horizontally and45
vertically of straight forward.
We took this information into account when designing the flash arrangements needed to create
specular highlights for an even distribution over the range of normal directions within this space.
Fig. 4.6 shows the distribution for one of the explored arrangements as reflected by a shiny-plastic
blue ball.
Figure 4.3: Surface normal distributions for four faces, covering ears, forehead, and the front of
the neck. The extents of the dotted rectangles are90
horizontally by45
vertically, each
containing more than 90% of the normals.
One way to achieve this distribution would be to place a ring flash on the lens of every camera
and position the cameras over the the ideal distribution of angles. Then, if each camera fires with
its own ring flash, a specular highlight will be observed back in the direction of each camera.
However, this requires shooting each camera with its own flash in succession, lengthening the
capture process and requiring many flash units.
24
Instead, we leverage the specular reflection property, that the position of a specular highlight de-
pends not just on the lighting direction but also on the viewing direction, so that multiple cameras
fired simultaneously with a single flash will see different specular highlights (depending on the
half-angle between the flash and the camera). To exploit this property of specular reflections, we
arrange the 24 cameras and ten diffused Sigma EM-140 ring flashes as in Fig. 4.4 to observe 24
specular highlights evenly distributed across the face. The colors indicate which cameras (solid
circles) fire with which of the ten flashes (dotted circles) to create observations of the specular
highlights on surfaces (solid discs). For example, five cameras on the subject’s left shoot with the
”red” flash, four cameras shoot with the ”green” flash, and a single camera fires when the purple
flash fires. In this arrangement, most of the cameras are not immediately adjacent to the flash they
fire with, but they create specular reflections along a half-angle which points toward a camera that
is adjacent to the flash as shown in Fig. 4.5. The pattern of specular reflection angles observed
can be seen on a blue plastic ball in Fig. 4.6.
While the flashes fully discharge their light in less than 1 ms, the camera shutters can only syn-
chronize to 1/200th of a second (5 ms). When multiple cameras are fired along with a flash, we
allow for a time window of 15 ms to account for variability in the time it takes each camera to
fully open the shutter after receiving the trigger signal. In all, with the ten flashes, four of which
fire with multiple cameras, a total recording time of 78 ms (a little more than 1/15th of a sec) is
achieved. Fig. 4.4(b) shows the sequencing used to achieve this timing. By design, this is faster
the required interval for the mechanical movement of the eyelid in the human blink reflex [7].
Additionally, the flashes have been sequenced such that those located closer to the line of vision
fire later than those along the peripherals in order to delay the blink reflex as long as possible.
A faster capture time can be achieved by making the cyan and lavender self-flash cameras part
of the blue and red flash groups, respectively, allowing them to observe each other’s specular
25
Figure 4.4: (a) Location of the flashes (dotted circles), cameras (solid circles), and associated
specular highlight half-angles (filled dots). The grey plus sign in the center indicates the forward
sight line of the subject. The colors are for illustration; all flashes are the same white color. (b)
Firing sequence for the flashes (dotted lines) and camera exposures (solid strips).
half-angles. However, this puts half of the face in shadow and reduces the amount of information
available to the passive stereo reconstruction.
Implementation Details The one custom component in our system is a USB-programmable 80
MHz Microchip PIC32 micro-controller for triggering the cameras via the remote shutter release
input. The flashes are set to manual mode, full power, and are triggered by their corresponding
cameras via ”hot shoe”. The camera centers lie on a 1m radius sphere, framing the face using
inexpensive Canon EF 50 mm f/1.8 II lenses. We use the calibration object described in Chapter 3
to focus the cameras as well as recover the cameras’ extrinsic, intrinsic, and distortion parameters.
We also photograph an X-Rite ColorChecker Passport to convert the images to the sRGB color
26
Figure 4.5: Interleaved cameras and highlights: a subset of four images taken with the apparatus.
The first and third cameras fire with the ”red” flash, producing specular highlights at surface
normals pointing toward the first and second cameras. Likewise, the second and fourth cameras
fire with the ”green” flash, producing highlights toward the third and fourth cameras. Left-to-
right, the highlights progress across the face.
space. With the flash illumination, we can achieve a large depth of field at an aperture of f/16 with
the camera at its minimal gain of ISO 100 to provide well-focused images with minimal noise.
While the cameras have built-in flashes, these were significantly less bright. Additionally they
should not be used due to an Electronic Through-The-Lens (ETTL) metering process involving
short bursts of light before the main flash. Our choice for ring flashes are brighter and their
locations are easily derived from the camera calibrations. By design, there is no flash in the
subject’s line of sight, and subjects reported no discomfort from the capture process. A mosaic of
the set of images produced by the system can be seen in Fig. 4.7.
Cost One of the specific goals of this project was to keep cost as low as possible. It was a major
factor in several key engineering decisions, including the use of as many off-the-shelf components
as possible. The total material cost for the system can be seen in Table 4.1. Another major goal
of this project was to produce a system that is highly mobile.
27
(a) (b)
Figure 4.6: (a) Twenty-four images shot with the apparatus of a shiny blue plastic ball. (b) All
24 images added together after being reprojected onto the ball’s spherical shape as seen from the
front, showing 24 evenly-spaced specular reflections from the ten flash lighting conditions. The
colored lines indicate which images correspond to which flash.
Alternate Designs Other design elements considered for the system included camera/flash ar-
rangements exploiting Helmholtz reciprocity for stereo correspondence [73], cross- and parallel-
polarized flashes, and polarizing beamsplitters.
A hardware setup that uses Helmhotlz reciprocity, relies on the light transport between the two
paired cameras being the same. This property forces the half-angles between the sets of lights
and cameras to be the same and keeps the specular highlights shifting locations on the face for the
paired images. This is ideal for generating passive stereo results as there would be fewer outliers
distorting the reconstructed geometry. However, it reduces the total number of specular highlights
visible to a set number of cameras since both cameras in the pair observer the same specular high-
light. We found that greater coverage of specular highlights was desirable for creating accurate
reconstructions at the pore level. Costs for the Helmholtz system are comparable to the costs
associated with the proposed setup (see Table 4.1).
Light polarization provides a simple and efficient way to acquire a diffuse albedo. We explored
two setups using polarized light. First, we tested a 50 mm polarizing beamsplitter for visible
28
Figure 4.7: Multi-view images shot under rapidly varying flash directions. Results can be seen in
Fig. 4.18.
light (see Fig. 4.8). This configuration would have required two cameras at each flash position ar-
ranged such that one would record the parallel polarized light and the other would record the cross
polarized light. This would have reduced diffuse-specular separation to a series of image rectifi-
cations and image subtractions. However, the cost increase required (see Table 4.1) outweighed
the benefit of the simplified diffuse-specular separation algorithm.
We then explored surrounding a single camera configured to record parallel polarized light and
with three cameras configured to record cross polarized light (see Fig. 4.9). In this configuration
diffuse-specular separation is reduced to a reprojection and image subtraction. Even though the
cost was drastically reduced, (see Table 4.1) the specular coverage was not complete enough to
yield high-quality specular albedo or normals.
29
Dimensions (mm) 50
Dimensional Tolerance (mm) +0:0= 0:1
Clear Aperture (%) 90
Surface Accuracy (l) 1=8
Surface Quality 40-20
Beam Deviation (arcminutes) 3
Substrate N SF11
Design Wavelength DWL (nm) 420 680
Extinction Ratio > 500 : 1
P-Polarization Transmission (%) > 90
S-Polarization Reflection (%) > 99
(a) (b)
(c)
Figure 4.8: (a) Light going through a polarizing beam-splitter (b) Beamsplitter specifications (c)
Visible light wavelengths transmissions
4.2 Deriving Geometry and Reflectance
Our technique to process the photographs into an accurate 3D model plus maps of diffuse and
specular reflectance proceeds as follows. We first leverage passive stereo reconstruction to build
an approximate geometric mesh of the face from the photographs. We then separate the diffuse
and specular components of the photographs, and use photometric stereo to estimate diffuse pho-
tometric normals and albedo in each image. We use these images to refine the geometric mesh of
the face using a cost volume, which evaluates the consistency of the facial reflectance properties
30
(a) (b)
(c)
(d)
Figure 4.9: (a)(b)(d) Cross polarized ring flash images showing specular cancellation. (c) Parallel
polarized ring flash image showing enhanced specular detail
as projected on to a series of slightly perturbed meshes. The volume gives low costs to points with
consistent diffuse color in the 24 views, and high costs to points with inconsistent specular reflec-
tion, essentially trying to keep the specular highlights from the various half-vectors from falling
on top of each other. We then solve for the lowest cost facial geometry using Tree-Reweighted
Message Passing [31]. From this refined geometry, we compute the final diffuse and specular
reflectance maps for the face.
Helmholtz Beamsplitter Filtered Unpolarized
Item Qty Cost Qty Cost Qty Cost Qty Cost
Cart ($1000) 1 $1000 1 $1000 1 $1000 1 $1000
Cameras ($599) 26 $15574 48 $28752 12 $7188 26 $15574
Lenses ($99) 26 $2574 48 $4752 12 $1188 26 $2574
Circuitry ($59) 1 $59 1 $59 1 $59 1 $59
Flashes ($379) 26 $9854 24 $9096 12 $4548 10 $3790
Beamsplitter ($850) 0 $0 24 $20400 0 $0 0 $0
Filters ($67) 0 $0 48 $3216 12 $804 0 $0
Totals $29061 $67275 $14787 $22997
Table 4.1: Configuration costs table
31
Constructing the base mesh
We begin by building a base mesh using patch based multi-view stereo algorithms [17]
†
, using the
camera calibration from our calibration object and the 24 flash-lit photographs. Our images do not
all have the same lighting and contain specular reflections and shadows, neither of which is ideal
for passive stereo reconstruction. However, the system is designed to provide coverage for the
majority of normals from an average face and thus provides enough vantage points with similar
lighting conditions to build a point cloud consisting of hundreds of thousands of points. We then
employ a Poisson surface reconstruction with a oct-tree depth of nine from the point cloud that
is accurate to within a few millimeters (Fig. 4.10(a)); we call this the base mesh. We smooth
the base mesh, manually trim away extraneous surfaces (Fig. 4.10(b)), and create a minimally-
distorted 4,096 4,096 pixel (u,v) texture map space using the commercial software UnFold
3D
‡
.
Diffuse-Specular Separation
We next separate each of the images of the face into its diffuse and specular reflectance compo-
nents. The diffuse components will be used to provide refined matches between the views and
build a lighting-independent diffuse texture, and the specular component will be used to further
refine the geometric surface through variance minimization and to derive surface orientation de-
tail. From our color calibration and since skin is dielectric, we can assume that the RGB specular
color~ s in all images is (1,1,1). If we could know the diffuse color
~
d, i.e., the RGB color of
the subsurface scattered light at a given pixel, then it is trivial to decompose the pixel’s RGB
†
Similar base mesh results are obtainable with Autodesk’s 123D Catch http://www.123dapp.com/catch or AG-
ISoft’s PhotoScan http://www.agisoft.ru/products/photoscan/.
‡
http://www.polygonal-design.fr/
32
(a) (b) (c)
Figure 4.10: (a) The initial base mesh from PMVS2 (b) After manual trimming (c) The refined
mesh from the reflectance analysis, which has facial texture detail.
color into its diffuse and specular components. However, due to different amounts of melanin
and hemoglobin in the face, the diffuse color varies across the face. While Malleck et al. [39]
uses neighboring pixel areas to infer the diffuse color, we can, like Debevec et al. and Weyrich et
al. [10, 62] leverage the other images in our dataset.
Assume we are examining a point on a surface that projects into the different views in our dataset
to pixel values~ p
i
=[p
i
r
p
i
g
p
i
b
]
>
, i2(1:::k). Following Zickler et al. [71], we rotate the RGB
colors into the so-called suv color space via a simple matrix transform such that the s component
aligns with~ s, yielding [p
i
s
p
i
u
p
i
v
]
>
. Then the chroma intensities p
i
uv
=
p
(p
i
u
)
2
+(p
i
v
)
2
may be
employed to compute a chroma normal~ n
uv
using Lambertian photometric stereo (detailed in Sec.
??), as the u and v channels contain no specular highlight. As chroma information comes from
light that has scattered deeply into the skin, the chroma normal map of a face has an extremely
soft quality to it, and is unsuitable for constructing detailed surface geometry. We therefore desire
a normal map constructed from specular reflection information, motivating us to separate the
s channel into diffuse and specular components. As our dataset contains multiple illumination
33
w
i
=(1(~ n
uv
~
h
i
)
10
)
2
(4.1)
~ u
i
=
~ p
i
~ n
uv
~
h
i
(4.2)
r
d
=
å~ u
i
w
i
å w
i
(4.3)
~
h
i
is the halfway vector between the view vector and lighting direction for~ p
i
directions, we might use the most saturated pixel to establish a ratio of diffuse s to uv, allowing all
p
i
s
to be separated. However, this leaves a significant amount of single scattering reflection in the
specular component, which would confound our specular analysis. Thus, we instead compute the
s : uv ratio based on a blend of all the pixel values weighted by Equation 4.1, which is empirically
designed to suppress specular highlights. With this ratio and the chroma surface normal, it is
trivial to remove the diffuse component from all pixel values, leaving only specular highlights for
each image in the set. Additionally, we can apply the inverse of the lighting equation to unlight
each~ p
i
and establish a diffuse albedo according to Eqs. 4.2 and 4.3. Fig. 4.11 shows separation
results for a subsection of a facial image.
We then employ Blinn-Phong photometric stereo to extract detailed specular surface normals from
the specular highlight intensities.
Photometric Stereo
Given multiple observed pixel values p
i
of a surface point under differing illumination directions
~
l
i
, it is possible to recover the surface normal~ n and albedo r by leveraging certain assumptions
about the reflectance properties of the surface. This process is known as photometric stereo [65].
The photometric stereo equations are presented with a distant light assumption, and light intensity
34
(a) (b) (c)
Figure 4.11: Diffuse-Specular separation (a) Detail of original image (b) Diffuse component (c)
Specular component (brightened 2)
p. If the actual distances r
i
to the light sources are known, and the intensities I
i
are known, then
the pixel values can be adjusted to conform to the assumptions by multiplying them by pr
2
i
=I
i
before proceeding with photometric stereo.
We review the photometric stereo equations for exposition. In the Lambertian case, the lighting
equation is L
~
b = P, where L=
~
l
1
~
l
2
~
l
k
>
,
~
b =r~ n, and P=
p
1
p
2
p
k
>
. Importantly, any
i with p
i
= 0 are omitted, as the lighting equation does not hold. The solution via pseudoinverse
is:
~
b =(L
>
L)
1
L
>
P: (4.4)
In the Blinn-Phong case, the lighting equation is expressed in terms of halfway vectors
~
h
i
instead
of lighting directions, and is more complicated. The dot product has an exponent a, and an
associated normalization factor to conserve energy.
35
with v
i
the direction towards the viewer, and a the Blinn-Phong exponent. Though, the solution
via pseudoinverse has the same form:
~ g =(H
>
H)
1
H
>
Q: (4.5)
For more details on how the specular photometric stereo was accomplished please see Borom
Tunwattanapong’s dissertation ”Spherical Harmonic and Point Illumination Basis for Reflectom-
etry and Relighting”.
Cost Volume Construction
We build a cost volume calculating the diffuse and specular reflection photo consistency using
a face sweep algorithm on the GPU. We displace the base geometry in 50 mm increments from
-2.5mm to 2.5mm. At each increment we calculate the reflectance properties at those locations
and use them to build one layer of the cost volume.
While the majority of the cost volume is built on diffuse cost, we also add a measure of specular
reflection consistency, as we want the specular reflections to plausibly belong to the same surface
(See Fig. 4.12). For more specific details on the cost volume construction please see Borom Tun-
wattanapong’s dissertation ”Spherical Harmonic and Point Illumination Basis for Reflectometry
and Relighting”.
Solving the Refined Mesh with Energy Minimization
With the cost volume constructed, we next solve for a final face mesh which is the most consistent
with the diffuse and specular photometric observations of the face. Since we represent our refined
36
(a) (b)
(c) (d)
Figure 4.12: Optimizing geometry from differing specular highlights. (a-c) Three adjacent spec-
ular highlights on a forehead, color-coded, illuminating different sets of surface normals. (d) The
sum of the specular highlights projected onto the optimized model, fitting the highlights together
like a puzzle to minimize the specular variance per pixel.
mesh as a displacement from the base mesh, we formulate the refinement in terms of minimizing
an energy function in terms of the displacement values d
v
at each vertex v:
E =
å
v
C
v
(d
v
)+
å
(v
1
;v
2
)
S
v
1
v
2
(d
v
1
;d
v
2
); (4.6)
We use Kolmogorov’s Convergent Tree-Reweighted Sequential Message Passing algorithm [31]
with quadratic potentials to efficiently obtain a result free of discretization artifacts. As the en-
ergy in Eq.4.6 is not quadratic, we employ an iterative sliding weighted window scheme, at each
iteration fitting the following quadratic approximation:
37
E
å
v
Qfit
d
v
(C
v
(d
v
)+l(d
v
ˆ
d
v
)
2
)+
å
(v
1
;v
2
)
s
v
1
v
2
(d
v
2
d
v
1
d
v
1
v
2
)
2
; (4.7)
where l is the strength of the sliding weighted window centered around the previous itera-
tion’s result
ˆ
d
v
(initialized to 0), s
v
1
v
2
and d
v
1
v
2
parameterize the smoothing term (see below),
and Qfit
x
( f(x)) is the weighted least-squares quadratic fit to the function f(x), weighted by
exp( f(x)), which we found to provide suitable approximations for energy minimization. We
initialize the window weightl to 1, doubling it after each iteration, so that the quadratic fit tight-
ens around a minimum. We iterate the outer quadratic fitting loop 10 times, and the inner message
passing loop 100 times. The smoothing term parameters are designed to penalize deviation from
the photometric surface normals:
d
v
1
v
2
=
( ˆ x
v
1
v
2
~ x
v
2
)~ n
v
1
v
2
~ n
b
v
2
~ n
v
1
v
2
( ˆ x
v
1
v
2
~ x
v
1
)~ n
v
1
v
2
~ n
b
v
1
~ n
v
1
v
2
; (4.8)
ˆ x
v
1
v
2
=(~ x
v
1
+
ˆ
d
v
1
~ n
b
v
1
+~ x
v
2
+
ˆ
d
v
2
~ n
b
v
2
)=2; (4.9)
~ n
v
1
v
2
=~ n
v
1
+~ n
v
2
; (4.10)
s
v
1
v
2
= 1=j~ x
v
1
+
ˆ
d
v
1
~ n
b
v
1
~ x
v
2
ˆ
d
v
2
~ n
b
v
2
j
2
(4.11)
where~ x
v
is the position of the base mesh vertex v with base surface normal~ n
b
v
, and~ n
v
is the
photometric surface normal associated with the displacement having the least windowed cost
C
v
(d
v
)+l(d
v
ˆ
d
v
)
2
. The mesh refinement result respects both the cost volume and the specular
surface normals, producing an accurate mesh with fine surface details.
38
4.3 Discussion and Results
We used our system to acquire a variety of subjects with differing facial expressions. Figs 4.14,
4.15, 4.16, 4.17, and Fig. 4.18 show the high-resolution geometry and several renderings under
novel viewpoint and lighting conditions using our algorithm. The recovered reflectance maps
used to create faces in Fig. 4.17 are shown in Fig. 4.13. While not entirely without artifacts
(which could be fixed with modest manual effort), our algorithm produces geometric quality
which is competitive with more complex systems and reflectance maps not available from single-
shot methods.
Additionally, Fig. 4.18 shows a validation rendering under novel pose and illumination shot with
an additional camera not used in solving for the facial model. Although there is some difference in
the diffuse BRDF and subsurface scattering, the skin detail and specular reflections show similar
placement and texture.
Limitations
Our scanning system does have some limitations. Currently, it does not perform well when hair
(longer than facial stubble) is present on the face. To work around this subjects were asked to
wear a cap that pulled hair above their forehead and away from their ears (as seen on the subjects
in Fig. 4.2 and 4.5). It also depends on the subject’s ability to remain stationary. While the
capture time is only 78 ms (which is quicker than the blink reflex), it is not quick enough to
prevent motion blur if the subject is moving. Finally, our system accomplishes diffuse-specular
separation by assuming that the light is white. Any material whose albedo coincides with the
white vector (i.e. grey scales) fails to properly separate the two layers.
39
(a) (b)
(c) (d)
Figure 4.13: Recovered reflectance maps: (a) diffuse albedo (b) diffuse normals (c) specular
albedo (d) specular normals
40
(a) (b)
(c) (d)
Figure 4.14: Renderings of recovered geometry and reflectance maps of a female subject (a)
Shows the reconstructed geometry (b), (c), and (d) Shows rendering with different rotations of the
face lit from a static light
41
(a) (b)
(c) (d)
Figure 4.15: Renderings of recovered geometry and reflectance maps of a female subject (a)
Shows the reconstructed geometry (b), (c), and (d) Show renderings of the geometry and maps
under different lighting conditions
42
(a) (b)
Figure 4.16: Renderings of recovered geometry and reflectance maps of a male subject.
(a) (b)
Figure 4.17: Renderings of recovered geometry and reflectance maps of a female subject (a)
Shows the reconstructed geometry. The maps used to create (b) can be seen in Fig. 4.13
43
(a) (b)
(c) (d)
Figure 4.18: (a) Shows the reconstructed geometry (b) Shows a rendering in a novel configuration
(c) Validation photograph shot from an additional camera at the same time as the 24-camera
capture (d) Validation rendering under similar viewpoint and lighting
Additionally, some extreme parts of the face may not be visible to enough lights for the photomet-
ric stereo solver to resolve. The algorithms are designed to gracefully degrade when this occurs.
The degradation path flows from specular to diffuse to geometric normals (see Fig. 4.19)
44
(a) (b) (c)
Figure 4.19: Normal maps degrade from (a) specular to (b) diffuse to (c) geometric normals in
the absence of enough lighting conditions
Comparison to Other Techniques
Laser Scanning Systems A laser scanning system produces the resolution of results as shown
in Fig. 4.20. While these systems are able to produce very high-quality results, their cost is several
times the cost of the system proposed ($75K). Laser scanning systems are not ideal for sites or
projects that require mobility. Capture times for laser scanners are in the range of 15 seconds,
which may be impractical for expressions that are difficult for a subject to hold. Additionally, laser
scanning systems are unable to recover reflectance properties without the addition of a camera
array that captures the subject at the same time. However like Beeler et al. [5], the reflectance
properties provided by cameras configured to work with laser systems are only able to add diffuse-
specular mixed maps.
Passive Systems Fig. 4.10 shows a comparison of our technique to results obtained from stan-
dard passive stereo techniques [17]. It is clear that our technique produces geometry with the
resolution of pores and fine wrinkles, while standard techniques that use the same number of
cameras and framing are only able to provide low to medium frequency shapes and is unable to
reconstruct any of the necessary reflectance properties required for relighting.
45
(a) (b)
Figure 4.20: (a) Geometry from Steinbichler Comet L9D 5m with 75 mm lens (b) Geometry
produced from flash system
As stated earlier, Beeler et al. [5] method hallucinates high-frequency features onto the geometry
by looking at the brightness values in the albedo and applying a ”dark is deep” heuristic. Both
techniques provide the pore and wrinkle structures absent from standard passive stereo techniques.
However, Beeler’s technique excels at pushing geometry inward, which may not be the case for
bumps and the crests of wrinkles. Additionally, Beeler’s method can not produce a complete set
of reflectance maps for relighting purposes. Their system produces a flatly lit diffuse-specular
mixed albedo.
Active Systems Light stage systems provide the highest quality photometric scanning capabili-
ties to date. Fig. 4.21 compares our results to those published by Ghosh et al. [19]. Both methods
do a comparable job recovering the required reflectance properties and geometry. The major dif-
ference is that light stage systems are able to recover a per channel diffuse normal while our flash
system is limited to a single diffuse chroma normal.
46
(a) (b)
(c) (d) (e) (f)
(g) (h) (i) (j)
Figure 4.21: (a) Geometry from light stage (b) Geometry produced from flash system, (c) light
stage diffuse albedo (d) light stage diffuse normals (e) light stage specular albedo (f) light stage
specular normals (g) flash system diffuse albedo (h) flash system diffuse normals (i) flash system
specular albedo (j) flash system specular albedo
47
Chapter 5
Microstructure
The way skin reflects light is influenced by age, genetics, health, emotion, sun exposure, substance
use, and skin treatments. The importance of skin reflectance is underscored by the worldwide
cosmetics industry which sells a myriad of products to achieve specific skin appearance results.
As virtual human characters become increasingly prevalent in linear and interactive storytelling,
the need for measuring, modeling, and rendering the subtleties of light reflection from skin also
becomes increasingly important.
In addition to the work presented in Chapter 4, Hanrahan et al., Jensen et al., Donner et al., and
D’Eon et al. have made great strides in simulating the scattering of light beneath skin [11, 12, 23,
29]. Somewhat less attention has been given to surface reflection. While the shine of our skin
— the specular reflection from the epidermal cells of the stratum corneum — is a small fraction
of the incident light, its lack of scattering provides a clear indication of skin’s surface shape and
condition. And under concentrated light sources, the narrower specular lobe produces highlights
which can dominate appearance, especially for smoother, oilier, or darker skin.
Sections of this chapter have been published as a part of Measurement-Based Synthesis of Facial Microgeometry
[22].
48
Current face scanning techniques [5, 38, 62] as well as high-resolution scanning of facial casts
(e.g. [67], [1]) provide sub-millimeter precision recording of facial mesostructure. Nonetheless,
the effect of surface roughness continues to shape specular reflection at the level of the microstruc-
ture [54] — surface texture at the scale of microns. The current absence of such microstructure
may be a reason why digital humans can still appear uncanny in close-ups, which are the shots
most responsible for conveying the thought and emotion of a character.
This chapter presents a synthesis approach for increasing the resolution of mesostructure-level
facial scans using surface microstructure digitized from skin samples about the face. We digitize
the skin patches using macro photography and polarized gradient illumination [38] at approxi-
mately 10 micron precision. Additionally, we make point-source reflectance measurements to
characterize the specular reflectance lobes at this smaller scale and analyze facial reflectance
variation at both the mesostructure and microstructure scales. We then employ a constrained
texture synthesis algorithm based on Image Analogies [27] to synthesize appropriate surface mi-
crostructure per-region, blending the regions to cover the entire face. We show that renderings
made with microstructure-level geometry and reflectance models preserve the original scanned
mesostructure and exhibit surface reflection which is significantly more consistent with real pho-
tographs of the face. Finally, we describe a microstructure database that provides high-resolution
microstructure patches suitable for increasing the resolution of a mesostructure scan in a cross-
subject fashion. The algorithms described in this section along with the microstructure database
are the building blocks for the microstructure portion of the high-resolution, high-fidelity facial
scanning framework (see Fig 5.1).
49
Figure 5.1: Camera calibration data flow diagram
5.1 Recording Skin Microstructure
Acquisition
We record the microstructure of skin patches using one of two systems to create polarized gradient
illumination. For both, we stabilize the skin patch relative to the camera by having the subject
place their skin against a 24 mm 16 mm aperture in a thin metal plate. In addition, to ensure
that the skin moves as little as possible, we adhere the skin to the metal plate using double-sided
tape. This plate is firmly secured 30 cm in front of the sensor of a Canon 1D Mark III camera
with a Canon 100 mm macro lens stopped down to f/16, just enough depth of field to image the
sample when properly focused. The lens achieves 1:1 macro magnification, so each pixel of the
28 mm 19 mm sensor images about seven square microns (7mm
2
) of skin.
Our small capture system (Fig. 5.2(a)) is a 12-light dome (half of a deltoidal icositetrahedron)
similar to those used for acquiring Polynomial Texture Maps [41], with the addition that each
light can produce either of two linear polarization conditions. The polarization pattern is similar
50
(a) 12-light hemispherical dome (at end (b) LED sphere capturing the tip of the nose,
of camera lens) capturing a cheek patch. with the camera inside the sphere
Figure 5.2: Acquisition setups for skin microgeometry
to that of Ghosh et al.[19] except that the polarization orientation of each light is specifically opti-
mized for a single camera viewpoint. The difference between images acquired under parallel- and
cross polarized states records the polarization-preserving reflectance of the sample, attenuating
subsurface reflectance. In approximately two seconds, we acquire polarized gradient illumination
conditions to record surface normals. At this scale, motion can be caused by a variety of factors
including, but not limited to: breathing, blood flow, etc. We compensate for any subject motion
by using the joint photometric alignment technique of Wilson et al.[63]. For BRDF fitting, we
additionally capture a single-light image in both parallel- and cross-polarized lighting conditions.
For especially smooth or oily skin patches, the 12 light positions can produce separated specular
highlights, which can bias surface normal measurement. To address this, for some subjects, we
placed the macro photography camera and metal aperture frame inside the same 2.5m-diameter
polarized LED sphere used for facial scanning (Fig. 5.2(b)). While the camera occludes some light
directions from reaching the sample, the hemispherical coverage of the incident light directions
is denser and more complete than the 12-light setup, allowing the gradient illumination to yield
well-conditioned surface normal measurements for all patches we tested. Since a single LED light
source was not bright enough to illuminate the sample for specular BRDF observation, a pair of
horizontally and vertically polarized camera flashes (Canon Speedlite 580EX II) were fired to
51
record the point-light condition from close to the normal direction. The camera mounts in both
setups were reinforced to mechanically eliminate vibrations and flexing which would blur the
captured imagery.
Surface Normal and BRDF Estimation
For each patch we compute a per-pixel surface orientation map from the polarized gradients, as
well as specular and subsurface albedo maps. Fig. 5.3 shows the geometry of five skin patches
digitized for two subjects, including regions for the forehead, temple, cheek, nose, and chin. Due
to the flat nature of the skin patches, we only visualize the x and y components of the surface
normals with yellow and cyan colors respectively. Note that the skin microstructure is highly
differentiated across both individuals and facial regions.
Using the polarization difference point-lit image, we also tabulate a specular lobe shape and sin-
gle scattering model parameters [20]. With light pressure, the skin protrudes slightly through the
metal aperture frame, providing a slightly convex surface which exhibits a useful range of surface
normals for BRDF estimation. See Table 5.1 for the measured microscale-BRDF and Table 5.2
for the mesoscale-BRDFs. This technique also estimates the single scattering albedo as the dif-
ference between observed polarization preserving reflectance and average hemispherical specular
reflectance of a dielectric surface with index of refractionh = 1:33, which is about 0:063.
Figs. 5.4 and 5.5 show skin patch samples and validation renderings made using the estimated
subsurface albedo, specular albedo, specular normals, and specular BRDF, showing close visual
matches of the model to the photographs. At this scale, where much of the surface roughness
variation is evident geometrically, we found that a single two-lobe specular BRDF estimate to be
BRDF fitting was accomplished by Borom Tunwattanapong. It is included for completeness.
52
Subject 1
Subject 2
(a) forehead (b) temple (c) cheek (d) nose (e) chin
Figure 5.3: Measured skin patches from different facial regions of two subjects. (Top two rows)
Caucasian male subject. (Bottom two rows) Asian female subject. (Rows one and three) Surface
normals. (Rows two and four) Displacements.
53
(a) Photograph (b) Rendering (c) Photograph (d) Rendering
Figure 5.4: (a) Parallel-polarized photograph of a forehead skin patch of a male subject, lit slightly
from above. (b) Validation rendering of the patch under similar lighting using the surface normals,
specular albedo, diffuse albedo, and single scattering maps estimated from the 12-light dome,
showing visual similarity. (c,d) Similar images from a different light source direction.
(a) Photograph (b) Rendering (c) Photograph (d) Rendering
Figure 5.5: (a) Parallel-polarized photograph of a cheek patch for a female subject, lit slightly
from above. (b) Validation rendering of the cheek patch lit from a similar direction using re-
flectance maps estimated from the LED sphere, showing a close match. (c,d) Corresponding
images from subject’s nose patch, also showing a close match.
sufficient over each sample, and that variation in the reflectance parameter fits were quite modest
(see Table 5.1) compared to the differences observed at the mesostructure scale (see Table 5.2).
5.2 Facial Microstructure Synthesis
From the skin microstructure samples, we employ constrained texture synthesis to generate skin
microstructure for an entire face. To do this, we use the surface mesostructure evident in a full
facial scan to guide the texture synthesis process for each facial region, and then merge the syn-
thesized facial regions into a full map of the microstructure.
54
Figure 5.6: Facial UV map segmented into 8 regions: Red-Forehead, Yellow-Temple, Cyan-
Cheek, Orange-Chin, Green-Nose, Magenta-Eyes, Black-Eyebrows and Lips.
We begin with full facial scans recorded using a multi-view polarized gradient illumination tech-
nique [19], which produces an ear-to-ear polygon mesh of approximately five million polygons,
4K (4096 4096 pixel) diffuse and specular albedo maps, and a world-space normal map. We
believe our technique could also work with other high-resolution facial capture techniques [5,67].
We create the texture coordinate space for the facial scan using the commercial product Unfold3D
in a way which best preserves surface area and orientation with respect to the original scan. This
allows us to assume that the relative scale and orientation of the patches is constant with respect to
the texture space; if this were not the case, then the constrained texture synthesis could be adapted
to an anisometric texture synthesis technique (such as Lefebvre et al. [33]).
We transform the normal map to tangent space by calculating the rotation matrix that rotates the
geometric normal to the
~
Z vector and applying it to the world space normal. An artist segments
this map into forehead, temples, nose, cheeks, and chin regions (Fig. 5.6) ensuring that enough
overlap exists between regions so they can be blended together using linear-interpolation. We
then use multi-resolution normal integration to construct the displacement maps for each region
which best agrees with the normal map.
55
A B A’ B’
Figure 5.7: Microstructural detail added to a scanned facial region, B, using the analogous re-
lationship between an exemplar microstructure map, A
0
, and a blurred version of it, A, which
matches the mesostructural detail of B. The synthesized result with both mesostructure and mi-
crostructure is B
0
. These displacement maps are from the temple region of a female subject.
To synthesize appropriate skin microstructure over the mesostructure present in our facial scans,
we employ constrained texture synthesis in the framework of Image Analogies [27], which syn-
thesizes an image B
0
from an image B following the relationship of a pair of example images A
and A
0
. In our case, B is the mesostructure-level displacement map for a region of the face, such
as the forehead or cheek (see Fig. 5.7). Our goal is to synthesize B
0
, a higher resolution of this
region exhibiting appropriate microstructure. We form the A to A
0
analogy by taking an exemplar
displacement map of the microstructure of a related skin patch to be A
0
. The exemplar patch A
0
typically covers less than a square centimeter of skin surface, but at about ten times the resolution
of the mesostructure detail. To form A, we blur the exemplar A
0
with a Gaussian filter to sup-
press the high frequencies not present in the input mesostructure map B. A thus represents the
mesostructure of the exemplar. The synthesis process produces the output surface shape B
0
, with
both mesostructure and microstructure, given the input mesostructure B, and the pair of exemplars
relating mesostructure A to its microstructure A
0
by searching for per pixel matches starting in the
upper left-hand corner of B, looking left to right.
56
As values from A
0
replace values in B, it is possible for the low-to-medium frequency mesostruc-
ture of B to be unintentionally replaced with mesostructure from A
0
. To ensure preservation of
mesostructure details in the scan data, we run a high-pass filter on both the input and the exem-
plar displacement maps. This technique separates the low-to-medium frequency features from the
high-frequency features, allowing the medium-frequency features present in the high pass of B to
guide the synthesis of the high-frequency features from A
0
. We then recombine the frequencies to
obtain a final result.
Our synthesis process employs a BestApproximateMatch based on an approximate-nearest-neighbor
(ANN) search and BestCoherenceMatch based on Ashikhmin [4]. The two match types are
required to ensure that the output produced is consistent with the mesostructure (enforced by
the BestApproximateMatch), yet provides a coherent surface (enforced with the BestCoherence-
Match). We introduce a weighting parameter a to control the relative importance of a pixel
neighborhood in A
0
matching B
0
compared to the importance (1a) of the match between the
corresponding pixel neighborhood in B and A. We found this to be a useful control parameter
because certain visually important details that exist in A
0
can be entirely absent in the lower fre-
quency mesostructure maps A and B. We found that ana value around 0:5 produced good results.
We carry out the texture synthesis in a multi-resolution fashion, increasing the window size for the
neighborhood matching from a 55 pixel window at the lowest level (4K resolution) to a 1313
pixel window at the highest level (16K resolution) to match the increase in size of features at each
level of the synthesis. We also apply principle component analysis (PCA) in order to speed up
the synthesis. We use PCA to reduce the dimensionality of the search space to n, where n n
is the original pixel window. PCA reduced the synthesis time by a factor of three without any
qualitative decreases in the result. We employ equal weighting to BestApproximateMatch and
BestCoherenceMatch by setting the coherence parameter k = 1 in the synthesis process. It is
57
Figure 5.8: Melting artifact(highlighted in red) that can occur during synthesis if edge cases are
not handled correctly.
important to provide sufficient care for edge and corner cases during the synthesis or melting
artifacts can occur (see Fig 5.8). To ensure a smooth synthesis, we left a n=2+ 1 unsynthesized
border around the top and sides. Since the specular albedo map is highly correlated to the surface
mesostructure and microstructure, we also synthesize a 16K specular albedo map as a by-product
of the microstructure displacement synthesis process by borrowing the corresponding pixels from
the specular albedo exemplar.
We also explored some additional methods for the BestApproiximateMatch. Initially, we synthe-
sized using tangent normal maps instead of displacement maps. While, this technique showed
some promise, it proved to be a much slower solution since the dimensions of the search space
increased threefold. Additionally, we performed an exhaustive search at the lowest resolution us-
ing a 5 5 matching window. Then at higher resolutions, we searched for matches in a 25 25
window around the previous resolution’s match. This technique produced acceptable results at the
lower resolutions. However, quality drastically deteriorated as the resolution increased. Finally,
an exhaustive search was attempted at all levels of the resolution pyramid, but it also proved to
be computationally intractable for the size of images necessary to produce super-high resolution
textures for close up renderings.
58
This process can benefit from computational speedup through multiprocessing. However, since
each pixel depends on the previously synthesized pixel, to achieve an increase in speed we split
each region into square tiles. We found that 1:1K 1:1K tiles provide enough area to produce
consistent microstructure in a reasonable amount of time. Additionally, each tile was provided
a 100 pixel overlap to merge into the facial regions and account for the unsynthesized regions
around the border. However, since half of the matching process (BestCoherenceMatch) requires
results from an upper-left window of pixels, this algorithm does not lend itself to a simple GPU
implementation.
5.3 Discussion and Results
Creating Renderings
Once we have synthesized microstructure-level displacement maps for a face, we can create ren-
derings using the diffuse albedo, specular albedo, and single scattering coefficients using any
standard rendering technique. To generate the renderings in this section, we use a local specular
reflection model with two lobes per skin region estimated as described in Section 5.1. For ef-
ficiency, the subsurface reflection is simulated using a hybrid normals rendering technique [38]
from the gradient illumination data of the full facial scan, though in practice a true scattering
simulation would be preferable. Single scattering, estimated from the exemplars, is also rendered
[20]. We upscaled the original scanned data to fill in the regions where we did not synthesize
microstructure (lips, neck, eye brows, etc). For the upper eyelids, we synthesized microstructure
using the measured forehead microstructure exemplar.
59
(a) Rendering from (b) Rendering using (c) Comparison
original facial scan synthesized microstructure photograph
Figure 5.9: (a) Rendering with scanned mesostructure (4K displacement map). (b) Rendering with
synthesized microstructure (16K displacement map). (c) Photograph under flash illumination.
The digitized skin patches used for microstructure synthesis can be seen in Fig. 5.3, Subject 2.
Image Sizes and Re-sampling
In the rendering process, the subsurface albedo and subsurface normal maps remain at the original
4K resolution of the facial scan, as does the polygon geometry. The synthesized 16K microstruc-
ture displacement map is converted to a normal map for rendering and used in conjunction with the
16K synthesized specular albedo map. To avoid artifacts from normal map re-sampling or alias-
ing, full-face renderings are created using an OpenGL GPU shader to a large 16K (1638416384
pixels) half float frame buffer, and then resized to 4K using radiometrically linear pixel blending,
requiring approximately 1 GB of GPU memory.
Fig. 5.9 shows a high-resolution point-light rendering of a female subject using a synthesized 16K
microstructure displacement map (b) compared to using just the 4K mesostructure displacement
map from the original scan (a) as well as a reference photograph under flash illumination (c). The
16K rendering includes more high-frequency specular detail, and better exhibits skin’s character-
istic effect of isolated ”glints” far away from the center of the specular highlight. A similar result
60
(a) Rendering from (b) Rendering using (c) Comparison
original facial scan synthesized microstructure photograph
Figure 5.10: Renderings of original facial scan with mesostructure detail (a), and with synthesized
microstructure (b) compared to to a photograph under flash illumination (c). See Fig. 5.3, Subject
1 for the skin patches used in microstructure synthesis.
is shown in Fig. 5.10 where a point-light rendering of a male subject’s forehead using synthe-
sized microstructure is a better match to a validation photograph compared to the rendering of the
original scan with mesostructure detail.
Fig. 5.11 shows displacement maps (top row), normal maps (x and y components only, middle
row), and point-light renderings (bottom row) of a male forehead region generated with different
synthesis processes. Fig. 5.11(a) shows a region from an original mesostructure-only scan, with
no synthesis to add microstructure detail. The specular reflection, rendered with corresponding
mesoscale BRDF fit, is quite smooth as a result, and the skin reflection is not especially realistic.
Fig. 5.11(b) shows the result of our microstructure synthesis process using an exemplar skin
patch measurement from the same subject (the forehead of Subject 1 in Fig. 5.3(a)). The specular
reflection, rendered with a microscale BRDF fit, is broken up and shows greater surface detail,
while the mesostructure of the forehead crease is preserved. Fig. 5.11(c) shows the result of
using a forehead patch from a different male subject as the exemplar for adding microstructure.
Although the fine skin texture is different, the synthesized geometry and rendering is still very
plausible, suggesting cross-subject microstructure synthesis to be a viable option.
61
(a) Original (b) Same-subject (c) Cross-subject (d) Unconstrained
Mesostructure Mesostructure Mesostructure Mesostructure
Figure 5.11: Microstructure synthesis with different exemplars and constraints. The top row
shows displacement maps, the middle row shows normal maps and the bottom row shows point-
light renderings using the maps. (a) Original mesostructure. (b) With microstructure synthesized
from the same subject. (c) With microstructure synthesized from a different subject. (d) Without
constraining the synthesis to match the underlying mesostructure.
62
Fig. 5.11(d) tests the importance of the mesostructure constraints during texture synthesis. This
column was generated by setting the a parameter to 1:0, ignoring mesostructure matching con-
straints in the matching process, and then blindly embossing the synthesized detail onto the
mesostructure of the original scan. Fig. 5.11(b), however, synthesizes detail in a way which
tries to follow the mesostructure, so pores and creases in the scan will tend to draw upon sim-
ilar areas in the microstructure exemplar for their detail. As a result, the constrained synthesis,
Fig. 5.11(b), produces a more plausible result which better reinforces the scanned mesostructure
than the texture synthesis column (Fig. 5.11(d)).
Table 5.1 shows specular BRDF lobe fits for different skin patches across two subjects mea-
sured using our microstructure skin patch measurement setups. Table 5.2 presents comparison
Beckmann distribution fits for similar facial regions obtained at the mesostructure scale from a
face scan. Table 5.3 shows the results of a cross-verification done by low-pass filtering the skin
patches and measuring the BRDF fits at a ”mesostructure scale.” The parameter w is the weight
of the convex combination of the two lobes m1 and m2. This work was done according using the
methods outlined in Ghosh et al. [20] and is included for completeness.
As can be seen in Tables 5.1 and 5.2 the BRDF lobes estimate at the microstructure scale ex-
hibit reduced specular roughness compared to the mesostructure scale BRDF estimate as well as
significantly less variation across skin patches. This agrees with the theory that at sufficiently
high resolution, the surface microstructure variation is responsible for the appearance of specular
roughness. Table 5.3 confirms that low-pass filtering the microstructure also results in BRDF fits
with wider roughness, similar to the mesostructure scale BRDF fits.
Fig. 5.12 shows comparison renderings of a small patch of forehead shown in Fig. 5.11, at dif-
ferent scales of modeling. The original scanned data with mesostructure detail and mesoscale
BRDF fit results in a broad specular reflection that misses the sharp ”glints” (a). Rendering the
63
Description Subject 1 Subject 2
forehead m1=0.150, m2=0.050, w=0.88 m1=0.150, m2=0.050, w=0.60
temple m1=0.150, m2=0.075, w=0.55 m1=0.175, m2=0.050, w=0.80
cheek m1=0.150, m2=0.125, w=0.60 m1=0.100, m2=0.075, w=0.50
nose m1=0.100, m2=0.075, w=0.80 m1=0.100, m2=0.050, w=0.50
chin m1=0.125, m2=0.100, w=0.90 m1=0.150, m2=0.050, w=0.75
Table 5.1: Microscale two-lobe Beckmann distribution parameters obtained for the different skin
patches across two subjects of Fig. 3
Description Subject 1 Subject 2
forehead m1=0.250, m2=0.125, w=0.85 m1=0.250, m2=0.125, w=0.80
temple m1=0.225, m2=0.125, w=0.80 m1=0.225, m2=0.150, w=0.70
cheek m1=0.275, m2=0.200, w=0.60 m1=0.225, m2=0.150, w=0.50
nose m1=0.175, m2=0.100, w=0.65 m1=0.150, m2=0.075, w=0.80
chin m1=0.250, m2=0.150, w=0.35 m1=0.300, m2=0.225, w=0.15
Table 5.2: Mesoscale two-lobe Beckman distribution parameters obtained for different facial re-
gions across two subjects
Description Subject 1 Subject 2
forehead m1=0.175, m2=0.150, w=0.60 m1=0.225, m2=0.100, w=0.60
temple m1=0.200, m2=0.125, w=0.95 m1=0.225, m2=0.200, w=0.30
cheek m1=0.225, m2=0.200, w=0.60 m1=0.275, m2=0.125, w=0.45
nose m1=0.175, m2=0.125, w=0.95 m1=0.125, m2=0.050, w=0.85
chin m1=0.225, m2=0.200, w=0.75 m1=0.325, m2=0.150, w=0.70
Table 5.3: Cross validation of microscale two-lobe distribution done at mesoscale resolution
64
scanned mesoscale surface detail with a microscale BRDF fit results in a qualitative improvement
in the result at this scale of visualization (b). However, the specular reflection still appears a bit
too smooth to be realistic. Rendering with synthesized microstructure in conjunction with the
microscale BRDF fit achieves the best qualitative rendering result (c).
Figs. 5.13, 5.14, and 5.15 provide renderings of faces with synthesized microstructure at 16K
resolution. The renderings were created at 16K and filtered down to 4K for presentation. Here,
we simulate a ”parallel polarized” point lighting condition in order to accentuate the observed
specular highlights. Additionally, there are insets of different zoomed regions to emphasize the
added microstructure detail present in the renderings.
(a) Mesoscale scan (b) Mesoscale scan (c) Microscale scan
with Microscale BRDF with Microscale BRDF
Figure 5.12: Rendering original scan data (geometry + BRDF fit) (a), compared to rendering
scanned geometry with microscale BRDF fit (b), and rendering with synthesized microstructure
+ microscale BRDF fit (c).
Since faces are often scanned in a variety of expressions, this technique may not synthesize con-
sistently across expressions. The difference in microstructure can be observed as skin stretches
and contracts (Fig. 5.16) in the course of expression formation. If proper care is not taken to
ensure temporal spacial consistency across expressions in an animation, the microstructure will
65
Figure 5.13: Subject one rendered from 16K displacement maps with synthesized microstructure
66
Figure 5.14: Subject two rendered from 16K displacement maps with synthesized microstructure
67
Figure 5.15: Subject three rendered from 16K displacement maps with synthesized microstructure
68
cause the high-frequency glinting to flicker from frame to frame. One solution to this is to sacri-
fice some fidelity of the synthesis and use a procedural microstructure generation approach (see
Fig 5.17) [2].
(a) (b)
Figure 5.16: (a) 15mm 10mm forehead patch from a neutral expression, with marked reference
point. (b) The same forehead patch during a raised-eyebrows expression, exhibiting strongly
anisotropic microstructure, with submillimeter furrows.
(a) Rendering with procedural (b) Rendering without microstructure
microstructure
Figure 5.17: Rendering of cheek from male subject with sinusoidal microstructure synthesis com-
pared to rending without microstructure
5.4 Microgeometry Database
To increase the quality of mesostructure scans of subjects that are unable to sit for a microstructure
scan, a microstructure database has been captured and cataloged. This database contains a set of
microstructure samples from subjects across a variety of age, gender, and skin types (see Table
69
5.4). The dataset was captured using a Ximea xiQ USB 3.0 camera (model: MQ042MG-CM)
and a Nikon AF Micro 105mm 1:2.8 D macro lens inside the larger sphere of LEDs. The lens
aperture was set to F16 to ensure proper depth of field. The Ximea uses a monochrome sensor.
This provides more effective resolution images than a color sensor, as it removes the need for
debayering RGB pixels. However, this does prevent us from preparing a color diffuse sample for
the subjects. The large capture device was used for all subjects to account for any subjects with
highly specular skin and to maintain consistency across the database. When used in conjunction
with the cross-subject transfer described in Section 5.3 detail from any mesostructure scan can be
increased to the desired quality.
Each entry into the database contains maps for specular albedo and photometric specular normals
of the major regions (forehead, temple, nose, cheek, and chin) of the face (see Fig 5.18). A full-
face, flatly-lit photograph for matching the samples to future subjects is included for matching
skin types to future subjects. The database can be downloaded upon request from the University
of Southern California’s Institute for Creative Technology Graphic’s Lab.
(a) (b) (c)
Figure 5.18: Microstructure database examples for one facial region(a) Flatly lit full face photo-
graph (b) Specular Albedo (b) Specular Normals
70
Table 5.4: Microgeometry database entries
71
Chapter 6
Future Work
The work presented in this dissertation answers many questions concerning creating high-resolution,
high-fidelity facial models. However, the results from these techniques raise some interesting
new questions in where calibration and super-high resolution facial scanning and animation. This
chapter highlights future research efforts that would increase realism or remove assumptions cur-
rently being made.
Mesostructure
Currently, the surfaces of the eyes do not reconstruct well, due in part to the geometric disparity
between the diffuse reflection of the iris and the specular reflection of the cornea; detecting eyes
from the sharp specularities and modeling them specifically would be of interest. Modeling facial
hair [6] would expand the utility of the system. It would also be useful to be able to provide ”roots”
to grow hair from the scalp as a way of adding hair styles to the model and may allow for scanning
the back of a subjects head. Our data also has some small amounts of inter-reflections. These are
most apparent around the edges of the nose. Attempting to solve and correctly compensate for
these inter-reflections would expand the realism in our system.
72
We are not currently exploiting all of the reflectance cues present in our data. We do not yet
solve for a specular roughness term; however, the high-resolution surface detail allows much of
the spatially-varying skin BRDF to be exhibited directly from the geometry. Using reflectance
sharing [72], it could be possible to derive improved diffuse and specular BRDFs of the skin.
Also, the shadow transitions seen in the data could be analyzed to solve for subsurface scattering
parameters for certain areas of the face.
Since the number of lighting conditions is small, the technique could, in principle, be applied
to dynamic facial performances. It would require using optical flow to bring the streams into
temporal alignment with video cameras synchronized to alternating light sources. Such a system
would require custom LED lighting and machine vision cameras transitions between lighting
conditions to achieve a 60 frames per second speed, which would significantly increase the cost.
We would also like to determine if there were a single lighting condition composed of multiple
flashes that could be separated into individual flash components. This would increase the speed
at which our system is able to record subjects.
If the commercial off the shelf requirement is dropped there are two additional hardware config-
urations that would be interesting to explore:
The first would consist of a series of linear light sources that illuminate the face. This work would
build upon the work of Gardner et al. [18]. By arranging lights in a grid around the face and then
placing cross and parallel polarized cameras such that the specular highlights shift in a manner
parallel to the light source it might be possible to recover a larger portion of the face in a faster
amount of time.
The second hardware configuration would build out a semi-spherical lighting dome, similar to a
light stage device. With custom lighting it would be possible to capture the spherical gradient
illumination necessary to computer photometric stereo. However, we would continue to enforce
73
the one-shot per camera constraint to keep the speed of the capture below 70 ms. This could be
accomplished with a segmented camera array, one segment for each lighting condition.
Microstructure
Our work in microgeometry enhancement provides a sample-based approach for increasing the
resolution of a mesostructure scan and the current procedural based approach is based on sinu-
soidal equations, which produces patterns upon close inspection. It would be of interest to devise
a purely statistical method for creating the same quality results.
Additionally, building a statistical model for how the microgeometry behaves under different
stresses would greatly improve photo-realism for facial animations. By measuring dynamic mi-
crofacet distributions of skin to analyze the changes in surface BRDF under stretching and com-
pression, we could then model dynamic surface reflectance by modifying the roughness parame-
ters of a microfacet distribution model in accordance with facial deformations in an animation.
74
Chapter 7
Conclusion
This dissertation presents a framework for high-resolution, high-fidelity, inexpensive facial scan-
ning. The framework combines the speed and cost of passive lighting scanning systems with
the fidelity of active lighting systems and includes the use of a new cylindrical calibration object
which allows for smooth capture sessions and permits a large volume of subjects to be captured
sequentially. First, we capture the subject in milliseconds using modern cameras that provide
microsecond exposure times and flash photography. This data set is processed to produce a base
mesh from passive stereo. We then use the base mesh to build a cost volume of 0.05 mm displace-
ments. The cost volume is cut using a message passing technique to produce a mesostructure
scale geometry and reflectance maps consistent with the input photographs. We show that the
mesostructure resolution can be increased to a micron scale using constrained texture synthesis
and sample skin patches from about the face. Finally, a microgeometry skin patch database is
provided for cross subject transfer when samples from the intended subject are not available.
75
BIBLIOGRAPHY
[1] Gino Acevedo, Sergei Nevshupov, Jess Cowely, and Kevin Norris. An accurate method
for acquiring high resolution skin displacement maps. In ACM SIGGRAPH 2010 Talks,
SIGGRAPH ’10, pages 4:1–4:1, New York, NY , USA, 2010. ACM.
[2] Oleg Alexander, Graham Fyffe, Jay Busch, Xueming Yu, Ryosuke Ichikari, Paul Graham,
Koki Nagano, Andrew Jones, Paul Debevec, Joe Alter, Jorge Jimenez, Etienne Danvoye,
Bernardo Antionazzi, Mike Eheler, Zybnek Kysela, Xian-Chun Wu, and Javier von der
Pahlen. Digital Ira: High-resolution facial performance playback. In ACM SIGGRAPH
2013 Computer Animation Festival, SIGGRAPH ’13, pages 1–1, New York, NY , USA,
2013. ACM.
[3] Oleg Alexander, Mike Rogers, William Lambeth, Jen-Yuan Chiang, Wan-Chun Ma, Chuan-
Chang Wang, and Paul Debevec. The Digital Emily Project: Achieving a photoreal digital
actor. IEEE Computer Graphics and Applications, 30:20–31, July 2010.
[4] Michael Ashikhmin. Synthesizing natural textures. In Proceedings of the 2001 symposium
on Interactive 3D graphics, I3D ’01, pages 217–226, New York, NY , USA, 2001. ACM.
[5] Thabo Beeler, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. High-quality
single-shot capture of facial geometry. ACM Trans. Graph., 29:40:1–40:9, July 2010.
[6] Thabo Beeler, Bernd Bickel, Gioacchino Noris, Paul Beardsley, Steve Marschner, Robert W.
Sumner, and Markus Gross. Coupled 3d reconstruction of sparse facial hair and skin. ACM
Trans. Graph., 31(4):117:1–117:10, July 2012.
[7] Edward O. Bixler, Neil R. Bartlett, and Robert W. Lansing. Latency of the blink reflex and
stimulus intensity. Perception & Psychophysics, 2(11):559–560, 1967.
[8] Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. High resolution passive
facial performance capture. ACM Trans. Graph., 29:41:1–41:10, July 2010.
[9] T.B. Chen, M. Goesele, and H. P. Seidel. Mesostructure from specularities. In CVPR, pages
1825–1832, 2006.
76
[10] Paul Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker, Westley Sarokin, and Mark
Sagar. Acquiring the reflectance field of a human face. In Proceedings of the 27th An-
nual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’00, pages
145–156, New York, NY , USA, 2000. ACM Press/Addison-Wesley Publishing Co.
[11] Eugene D’Eon and Geoffrey Irving. A quantized-diffusion model for rendering translucent
materials. In ACM SIGGRAPH 2011 papers, SIGGRAPH ’11, pages 56:1–56:14, New York,
NY , USA, 2011. ACM.
[12] Craig Donner and Henrik Wann Jensen. Light diffusion in multi-layered translucent materi-
als. ACM TOG, 24(3):1032–1039, 2005.
[13] Alexei A. Efros and William T. Freeman. Image quilting for texture synthesis and trans-
fer. In Proceedings of the 28th annual conference on Computer graphics and interactive
techniques, SIGGRAPH ’01, pages 341–346, New York, NY , USA, 2001. ACM.
[14] Alexei A. Efros and Thomas K. Leung. Texture synthesis by non-parametric sampling.
In Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2,
ICCV ’99, pages 1033–, Washington, DC, USA, 1999. IEEE Computer Society.
[15] W.T. Freeman, T.R. Jones, and E.C. Pasztor. Example-based super-resolution. Computer
Graphics and Applications, IEEE, 22(2):56 –65, mar/apr 2002.
[16] Yasutaka Furukawa and Jean Ponce. Dense 3D motion capture for human faces. In Proc. of
CVPR 09, 2009.
[17] Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis. IEEE
Trans. Pattern Anal. Mach. Intell., 32(8):1362–1376, August 2010.
[18] Andrew Gardner, Chris Tchou, Tim Hawkins, and Paul Debevec. Linear light source reflec-
tometry. In ACM TOG, pages 749–758, 2003.
[19] Abhijeet Ghosh, Graham Fyffe, Borom Tunwattanapong, Jay Busch, Xueming Yu, and Paul
Debevec. Multiview face capture using polarized spherical gradient illumination. ACM
Trans. Graphics (Proc. SIGGRAPH Asia), 30(6), 2011.
[20] Abhijeet Ghosh, Tim Hawkins, Pieter Peers, Sune Frederiksen, and Paul Debevec. Prac-
tical modeling and acquisition of layered facial reflectance. ACM Trans. Graphics (Proc.
SIGGRAPH Asia), 27(5):139:1–139:10, December 2008.
[21] Aleksey Golovinskiy, Wojciech Matusik, Hanspeter Pfister, Szymon Rusinkiewicz, and
Thomas Funkhouser. A statistical model for synthesis of detailed facial geometry. ACM
TOG, 25(3):1025–1034, 2006.
[22] Paul Graham, Borom Tunwattanapong, Jay Busch, Xueming Yu, Andrew Jones, Paul De-
bevec, and Abhijeet Ghosh. Measurement-based synthesis of facial microgeometry. Com-
puter Graphics Forum, 32(2pt3):335–344, 2013.
77
[23] Pat Hanrahan and Wolfgang Krueger. Reflection from layered surfaces due to subsurface
scattering. In Proceedings of SIGGRAPH 93, pages 165–174, 1993.
[24] Chris Harris and Mike Stephens. A combined corner and edge detector. In Proc. of Fourth
Alvey Vision Conference, pages 147–151, 1988.
[25] Arno Hartholt, Jonathan Gratch, Anton Leuski, Louis-Philippe Morency, Stacy C. Marsella,
Matt Liewer, Prathibha Doraiswamy, Lori Weiss, Kim LeMasters, Edward Fast, Ramy
Sadek, Andrew Marshall, Jina Lee, Marcus Thiebaux, and Andreas Tsiartas. At the virtual
frontier: Introducing gunslinger, a multi- character, mixed-reality, story-driven experience.
In Proceedings of the 9th International Conference on Intelligent Virtual Agents (IVA), Am-
sterdam, The Netherlands, 2009.
[26] David J. Heeger and James R. Bergen. Pyramid-based texture analysis/synthesis. In Pro-
ceedings of the 22nd annual conference on Computer graphics and interactive techniques,
SIGGRAPH ’95, pages 229–238, New York, NY , USA, 1995. ACM.
[27] Aaron Hertzmann, Charles E. Jacobs, Nuria Oliver, Brian Curless, and David H. Salesin.
Image analogies. In Proceedings of the 28th annual conference on Computer graphics and
interactive techniques, SIGGRAPH ’01, pages 327–340, New York, NY , USA, 2001. ACM.
[28] Tomoaki Higo, Yasuyuki Matsushita, Neel Joshi, and Katsushi Ikeuchi. K.: A hand-held
photometric stereo camera for 3-d modeling. In In: Proc. ICCV ., 2009.
[29] Henrik Wann Jensen, Stephen R. Marschner, Marc Levoy, and Pat Hanrahan. A practical
model for subsurface light transport. In Proceedings of ACM SIGGRAPH 2001, pages 511–
518, 2001.
[30] Micah K. Johnson, Forrester Cole, Alvin Raj, and Edward H. Adelson. Microgeometry
capture using an elastomeric sensor. ACM Trans. Graph., 30:46:1–46:8, August 2011.
[31] Vladimir Kolmogorov. Convergent tree-reweighted message passing for energy minimiza-
tion. IEEE Trans. Pattern Anal. Mach. Intell., 28:1568–1583, October 2006.
[32] Sylvain Lefebvre and Hugues Hoppe. Parallel controllable texture synthesis. In ACM SIG-
GRAPH 2005 Papers, SIGGRAPH ’05, pages 777–786, New York, NY , USA, 2005. ACM.
[33] Sylvain Lefebvre and Hugues Hoppe. Appearance-space texture synthesis. ACM Trans.
Graph., 25:541–548, July 2006.
[34] Jongwoo Lim, J. Ho, Ming-Hsuan Yang, and D. Kriegman. Passive photometric stereo from
motion. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on,
volume 2, pages 1635–1642 V ol. 2, 2005.
[35] Ce Liu, Heung-Yeung Shum, and Chang-Shui Zhang. A two-step approach to hallucinating
faces: global parametric model and local nonparametric model. In Computer Vision and
Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society
Conference on, volume 1, pages I–192 – I–198 vol.1, 2001.
78
[36] Z. Liu, Z. Zhang, and Y . Shan. Image-based surface detail transfer. Computer Graphics and
Applications, IEEE, 24(3):30 –35, may-june 2004.
[37] Linjie Luo, Hao Li, and Szymon Rusinkiewicz. Structure-aware hair capture. ACM Trans.
Graph., 32(4), July 2013.
[38] Wan-Chun Ma, Tim Hawkins, Pieter Peers, Charles-Felix Chabert, Malte Weiss, and Paul
Debevec. Rapid acquisition of specular and diffuse normal maps from polarized spherical
gradient illumination. In Rendering Techniques, pages 183–194, 2007.
[39] Satya P. Mallick, Todd Zickler, Peter N. Belhumeur, and David J. Kriegman. Specularity
removal in images and videos: A pde approach. In ECCV, 2006.
[40] Satya P. Mallick, Todd E. Zickler, David J. Kriegman, and Peter N. Belhumeur. Beyond
lambert: Reconstructing specular surfaces using color. In CVPR, 2005.
[41] Tom Malzbender, Dan Gelb, and Hans Wolters. Polynomial texture maps. In Proceedings of
the 28th annual conference on Computer graphics and interactive techniques, SIGGRAPH
’01, pages 519–528, New York, NY , USA, 2001. ACM.
[42] Stephen R. Marschner, Stephen H. Westin, Adam Arbree, and Jonathan T. Moon. Measuring
and modeling the appearance of finished wood. ACM Trans. Graph., 24:727–734, July 2005.
[43] J. J. Mor´ e, D. C. Sorensen, K. E. Hillstrom, and B. S. Garbow. The MINPACK project. In
Sources and Development of Mathematical Software, pages 88–111, 1984.
[44] M. Mori, K.F. MacDorman, and N. Kageki. The uncanny valley [from the field]. Robotics
Automation Magazine, IEEE, 19(2):98–100, 2012.
[45] S. Nayar, X. Fang, and T. Boult. Separation of reflection components using color and polar-
ization. IJCV, 21(3):163–186, 1997.
[46] Fr´ ed´ eric Pighin, Jamie Hecker, Dani Lischinski, Richard Szeliski, and David H. Salesin.
Synthesizing realistic facial expressions from photographs. In Proceedings of the 25th An-
nual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’98, pages
75–84, New York, NY , USA, 1998. ACM.
[47] Ganesh Ramanarayanan and Kavita Bala. Constrained texture synthesis via energy mini-
mization. IEEE Transactions on Visualization and Computer Graphics, 13:167–178, Jan-
uary 2007.
[48] Jessica C. Ramella-Roman. Out of plane polarimetric imaging of skin: Surface and subsur-
face effect. In Wojtek J. Bock, Israel Gannot, and Stoyan Tanev, editors, Optical Waveguide
Sensing and Imaging, NATO Science for Peace and Security Series B: Physics and Bio-
physics, pages 259–269. Springer Netherlands, 2008. ”10.1007/978-1-4020-6952-9 12”.
79
[49] Albert Rizzo, Eric Forbell, Belinda Lange, John Galen Buckwalter, Josh Williams, Kenji
Sagae, and David Traum. SimCoach: an online intelligent virtual agent system for break-
ing down barriers to care for service members and veterans. In Healing War Trauma: A
Handbook of Creative Approaches, pages 238–250. Routledge, November 2012.
[50] Holly Rushmeier, Gabriel Taubin, and Andr´ e Gu´ eziec. Applying shape from lighting varia-
tion to bump map capture. In Rendering Techniques, pages 35–44, 1997.
[51] Noah Snavely, Steven M. Seitz, and Richard Szeliski. Photo tourism: exploring photo col-
lections in 3d. In ACM SIGGRAPH 2006 Papers, SIGGRAPH ’06, pages 835–846. ACM,
2006.
[52] Tom´ as Svoboda, Daniel Martinec, and Tom´ as Pajdla. A convenient multicamera self-
calibration for virtual environments. Presence: Teleoper. Virtual Environ., 14(4):407–422,
August 2005.
[53] Xin Tong, Jingdan Zhang, Ligang Liu, Xi Wang, Baining Guo, and Heung-Yeung Shum.
Synthesis of bidirectional texture functions on arbitrary surfaces. ACM Trans. Graph.,
21:665–672, July 2002.
[54] K. E. Torrance and E. M. Sparrow. Theory of off-specular reflection from roughened sur-
faces. J. Opt. Soc. Am., 57:1104–1114, 1967.
[55] Roger Tsai. An efficient and accurate camera calibration technique for 3d machine vision.
In In Proceedings CVPR, pages 364–374. IEEE, 1986.
[56] Norimichi Tsumura, Nobutoshi Ojima, Kayoko Sato, Mitsuhiro Shiraishi, Hideto Shimizu,
Hirohide Nabeshima, Syuuichi Akazaki, Kimihiko Hori, and Yoichi Miyake. Image-based
skin color and texture analysis/synthesis by extracting hemoglobin and melanin information
in the skin. ACM TOG, 22(3):770–779, 2003.
[57] Borom Tunwattanapong, Graham Fyffe, Paul Graham, Jay Busch, Xueming Yu, Abhijeet
Ghosh, and Paul Debevec. Acquiring reflectance and shape from continuous spherical har-
monic illumination. ACM Trans. Graph., 32(4):109:1–109:12, July 2013.
[58] Levi Valgaerts, Chenglei Wu, Andrs Bruhn, Hans-Peter Seidel, and Christian Theobalt.
Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans-
actions on Graphics (Proceedings of SIGGRAPH Asia 2012), 31(6):187:1–187:11, Novem-
ber 2012.
[59] Lujin Wang and Klaus Mueller. Generating sub-resolution detail in images and volumes
using constrained texture synthesis. In Proceedings of the conference on Visualization ’04,
VIS ’04, pages 75–82, Washington, DC, USA, 2004. IEEE Computer Society.
[60] Li-Yi Wei, Sylvain Lefebvre, Vivek Kwatra, and Greg Turk. State of the art in example-
based texture synthesis. In Eurographics 2009, State of the Art Report, EG-STAR. Euro-
graphics Association, 2009.
80
[61] Li-Yi Wei and Marc Levoy. Texture synthesis over arbitrary manifold surfaces. In Pro-
ceedings of the 28th annual conference on Computer graphics and interactive techniques,
SIGGRAPH ’01, pages 355–360, New York, NY , USA, 2001. ACM.
[62] Tim Weyrich, Wojciech Matusik, Hanspeter Pfister, Bernd Bickel, Craig Donner, Chien Tu,
Janet McAndless, Jinho Lee, Addy Ngan, Henrik Wann Jensen, and Markus Gross. Analysis
of human faces using a measurement-based skin reflectance model. ACM TOG, 25(3):1013–
1024, 2006.
[63] Cyrus A. Wilson, Abhijeet Ghosh, Pieter Peers, Jen-Yuan Chiang, Jay Busch, and Paul
Debevec. Temporal upsampling of performance geometry using photometric alignment.
ACM Trans. Graph., 29:17:1–17:11, April 2010.
[64] L.B. Wolff. Using polarization to separate reflection components. In Computer Vision and
Pattern Recognition, 1989. Proceedings CVPR ’89., IEEE Computer Society Conference on,
pages 363–369, 1989.
[65] R. J. Woodham. Photometric stereo: A reflectance map technique for determining surface
orientation from image intensity. In Proc. SPIE’s 22nd Annual Technical Symposium, vol-
ume 155, 1978.
[66] Chenglei Wu, Kiran Varanasi, Yebin Liu, Hans-Peter Seidel, and Christian Theobalt.
Shading-based dynamic shape refinement from multi-view video under general illumina-
tion. In Proceedings of the 2011 International Conference on Computer Vision, ICCV ’11,
pages 1108–1115, 2011.
[67] XYZRGB. 3D laser scanning - XYZ RGB Inc. http://www.xyzrgb.com/.
[68] Lexing Ying, Aaron Hertzmann, Henning Biermann, and Denis Zorin. Texture and shape
synthesis on surfaces. In Proceedings of the 12th Eurographics Workshop on Rendering
Techniques, pages 301–312, London, UK, 2001. Springer-Verlag.
[69] Zhengyou Zhang. A flexible new technique for camera calibration. PAMI, 22(11):1330–
1334, 2000.
[70] Shuang Zhao, Wenzel Jakob, Steve Marschner, and Kavita Bala. Building volumetric ap-
pearance models of fabric using micro ct imaging. ACM Trans. Graph., 30:44:1–44:10,
August 2011.
[71] Todd Zickler, Satya P. Mallick, David J. Kriegman, and Peter N. Belhumeur. Color sub-
spaces as photometric invariants. Int. J. Comput. Vision, 79(1):13–30, August 2008.
[72] Todd Zickler, Ravi Ramamoorthi, Sebastian Enrique, and Peter N. Belhumeur. Reflectance
sharing: Predicting appearance from a sparse set of images of a known shape. PAMI,
28(8):1287–1302, 2006.
81
[73] Todd E. Zickler, Peter N. Belhumeur, and David J. Kriegman. Helmholtz stereopsis: Ex-
ploiting reciprocity for surface reconstruction. Int. J. Comput. Vision, 49(2-3):215–227,
2002.
82
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Multi-scale dynamic capture for high quality digital humans
PDF
Spherical harmonic and point illumination basis for reflectometry and relighting
PDF
Recording, reconstructing, and relighting virtual humans
PDF
Facial gesture analysis in an interactive environment
PDF
Accurate 3D model acquisition from imagery data
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Automatic image and video enhancement with application to visually impaired people
PDF
Face recognition and 3D face modeling from images in the wild
PDF
Data-driven 3D hair digitization
PDF
Scalable dynamic digital humans
PDF
Compositing real and virtual objects with realistic, color-accurate illumination
PDF
Landmark-free 3D face modeling for facial analysis and synthesis
PDF
Rapid creation of photorealistic virtual reality content with consumer depth cameras
PDF
Efficient template representation for face recognition: image sampling from face collections
PDF
Incorporating aggregate feature statistics in structured dynamical models for human activity recognition
PDF
Outdoor visual navigation aid for the blind in dynamic environments
PDF
Deformable geometry design with controlled mechanical property based on 3D printing
PDF
Human appearance analysis and synthesis using deep learning
PDF
Digitizing human performance with robust range image registration
PDF
Human pose estimation from a single view point
Asset Metadata
Creator
Graham, Paul R.
(author)
Core Title
A framework for high‐resolution, high‐fidelity, inexpensive facial scanning
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
06/23/2014
Defense Date
05/08/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D facial scanning,computer graphics,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Debevec, Paul (
committee chair
), Ghosh, Abhijeet (
committee member
), Li, Hao (
committee member
), Medioni, Gerard (
committee member
), Povinelli, Michelle (
committee member
)
Creator Email
paul@e-graham.net
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-422357
Unique identifier
UC11285907
Identifier
etd-GrahamPaul-2575.pdf (filename),usctheses-c3-422357 (legacy record id)
Legacy Identifier
etd-GrahamPaul-2575.pdf
Dmrecord
422357
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Graham, Paul R.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
3D facial scanning
computer graphics