Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Data-driven 3D hair digitization
(USC Thesis Other)
Data-driven 3D hair digitization
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Data-Driven 3D Hair Digitization
Liwen Hu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Department of Computer Science)
August 2019
To My Family
Acknowledgements
I would like to thank Prof. Hao Li for being my advisor throughout my six years at USC. His
guidance and inspiration are invaluable to my Ph.D work, as well as to my future career.
I would like to thank my dissertation committee members: Prof. Aiichiro Nakano and Prof. Jeff
Watson, in addition, Prof. Andrew Nealen. Thanks for all the helps and discussions.
Furthermore, I would like to thank Chongyang Ma, who guided the beginning of my research
career and influenced my whole Ph.D years. I would like to regard him as my second advisor.
I would like to thank Linjie Luo, who introduced me into my research area.
Over the summers and past years, I have had the fortune to intern with many researchers: Thabo
Beeler, Derek Bradley, Duygu Ceylan, Byungmoon Kim, Zhili Chen, and Jens Fursund. I have
been fortunate to work and learn with these successful researchers.
Special thanks to my roommate and collaborator Cosimo Wei for friendship and being a free
Uber driver.
Thanks to my close collaborators who actually made this thesis possible.
Last, I would like to dedicate this thesis to my parents for support and encouragement.
Abstract
Human hair presents highly convoluted structures and spans an extraordinarily wide range of
hairstyles, which is essential for the digitization of compelling virtual avatars but also one of
the most challenging to create. This dissertation proposes several data-driven methods to model
high quality 3D hairstyles.
First, we introduce a data-driven hair capture framework based on example strands generated
through hair simulation. Our method can robustly reconstruct faithful 3D hair models from un-
processed input point clouds with large amounts of outliers. Current state-of-the-art techniques
use geometrically-inspired heuristics to derive global hair strand structures, which can yield
implausible hair strands for hairstyles involving large occlusions, multiple layers, or wisps of
varying lengths. We address this problem using a voting-based fitting algorithm to discover
structurally plausible configurations among the locally grown hair segments from a database
of simulated examples. To generate these examples, we exhaustively sample the simulation
configurations within the feasible parameter space constrained by the current input hairstyle.
The number of necessary simulations can be further reduced by leveraging symmetry and con-
strained initial conditions. The final hairstyle can then be structurally represented by a limited
number of examples. To handle constrained hairstyles such as a ponytail of which realistic
simulations are more difficult, we allow the user to sketch a few strokes to generate strand
examples through an intuitive interface. Our approach focuses on robustness and generality.
Since our method is structurally plausible by construction, we ensure an improved control
during hair digitization and avoid implausible hair synthesis for a wide range of hairstyles.
Second, we propose a data-driven method to automatically reconstruct braided hairstyles
from input data obtained from a single consumer RGB-D camera. Our approach covers
the large variation of repetitive braid structures using a family of compact procedural braid
models. From these models, we produce a database of braid patches and use a robust random
sampling approach for data fitting. We then recover the input braid structures using a multi-
v
label optimization algorithm and synthesize the intertwining hair strands of the braids. We
demonstrate that a minimal capture equipment is sufficient to effectively capture a wide range
of complex braids with distinct shapes and structures.
We then introduce a novel data-driven framework that can digitize complete and highly complex
3D hairstyles from a single-view photograph. We first construct a large database of manually
crafted hair models from several online repositories. Given a reference photo of the target
hairstyle and a few user strokes as guidance, we automatically search for multiple best matching
examples from the database and combine them consistently into a single hairstyle to form the
large-scale structure of the hair model. We then synthesize the final hair strands by jointly
optimizing for the projected 2D similarity to the reference photo, the physical plausibility
of each strand, as well as the local orientation coherency between neighboring strands. We
demonstrate the effectiveness and robustness of our method on a variety of hairstyles and
challenging images, and compare our system with state-of-the-art hair modeling algorithms.
Later, we present a fully automatic framework that digitizes a complete 3D head with hair from
a single unconstrained image. Our system offers a practical and consumer-friendly end-to-end
solution for avatar personalization in gaming and social VR applications. The reconstructed
models include secondary components (eyes, teeth, tongue, and gums) and provide animation-
friendly blendshapes and joint-based rigs. While the generated face is a high-quality textured
mesh, we propose a versatile and efficient polygonal strips (polystrips) representation for the
hair. Polystrips are suitable for an extremely wide range of hairstyles and textures and are
compatible with existing game engines for real-time rendering. In addition to integrating
state-of-the-art advances in facial shape modeling and appearance inference, we propose a
novel single-view hair generation pipeline, based on 3D-model and texture retrieval, shape
refinement, and polystrip patching optimization. The performance of our hairstyle retrieval is
enhanced using a deep convolutional neural network for semantic hair attribute classification.
Our generated models are visually comparable to state-of-the-art game characters designed
by professional artists. For real-time settings, we demonstrate the flexibility of polystrips in
handling hairstyle variations, as opposed to conventional strand-based representations.
Finally, we present the manifold of 3D hairstyles implicitly through a compact latent space of a
volumetric variational autoencoder (V AE). This deep neural network is trained with volumetric
orientation field representations of 3D hair models and can synthesize new hairstyles from a
compressed code. To enable end-to-end 3D hair inference, we train an additional regression
network to predict the codes in the V AE latent space from any input image. Strand-level
vi
hairstyles can then be generated from the predicted volumetric representation. Our fully
automatic framework does not require any ad-hoc face fitting, intermediate classification and
segmentation, or hairstyle database retrieval. Our hair synthesis approach is significantly more
robust than and can handle a much wider variation of hairstyles than state-of-the-art data-driven
hair modeling techniques w.r.t. challenging inputs, including photos that are low-resolution,
noisy, or contain extreme head poses. The storage requirements are minimal and a 3D hair
model can be produced from an image in a second.
Table of contents
List of figures x
List of tables xiii
1 Introduction 1
1.1 3D Hair Digitization in Computer Graphics . . . . . . . . . . . . . . . . . . . 1
1.2 3D Hair Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Strands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Strips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 V olume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Work 6
2.1 3D Hair Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Multi-View Hair Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Single-View Hair Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Hair Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 3D Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Robust Hair Capture Using Simulated Examples 12
3.1 Backgound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Overview of Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Strand Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Strand Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Strand Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Table of contents viii
3.6.1 Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6.2 Simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Capturing Braided Hairstyles 29
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Overview of Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Procedural Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Patch Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Structure Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6 Strand Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5 Single-View Hair Modeling Using A Hairstyle Database 46
5.1 Overview of Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Database Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Modeling Large-Scale Structure . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.1 Example Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3.2 Hairstyle Combination . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.3.3 Structure Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Modeling Strand Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4.1 2D Orientations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.4.2 Depth estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.4.3 3D Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.5.1 Hairstyles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.5.2 Comparisons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.5.3 Evaluations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6 Avatar Digitization From a Single Image For Real-Time Rendering 61
6.1 Overview of Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.1.1 Image Pre-Processing. . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1.2 Face Digitization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.1.3 Hair Digitization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.4 Rigging and Animation. . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Face Digitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Table of contents ix
6.2.1 3D Head Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.2 Face Texture Reconstruction. . . . . . . . . . . . . . . . . . . . . . . 68
6.2.3 Secondary Components. . . . . . . . . . . . . . . . . . . . . . . . . 69
6.3 Hair Digitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.3.1 Hairstyle Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.3.2 Hair Attribute Classification. . . . . . . . . . . . . . . . . . . . . . . 72
6.3.3 Hairstyle Matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3.4 Hair Mesh Fitting. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3.5 Polystrip Patching Optimization. . . . . . . . . . . . . . . . . . . . . 74
6.3.6 Hair Rendering and Texturing. . . . . . . . . . . . . . . . . . . . . . 75
6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.4.1 Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.4.2 Comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7 3D Hair Synthesis Using Volumetric Variational Autoencoders 84
7.1 Overview of Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.2 Hair Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3 V olumetric Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . 86
7.4 Hair Regression Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.5 Networks Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.6 Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8 Conclusion and Future Work 99
References 101
List of figures
1.1 Hairstyles and identity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 3D hair representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 2D volumetric representation. . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 3D hair modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Multi-view hair capture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Single-view hair capture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Hair simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Initial point cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Teaser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 The overview of our system . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4 Preprocessing steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 The simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.6 Illustration of strand fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.7 Ribbon connection and strand synthesis . . . . . . . . . . . . . . . . . . . . 23
3.8 capture setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.9 Reconstruction results using examples from user strokes . . . . . . . . . . . 26
3.10 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.11 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.12 Our reconstruction results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 Making braids. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Teaser. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Overview of our system pipeline. . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Database preparation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
List of figures xi
4.5 Comparison between fitting with rigid ICP and non-rigid ICP. . . . . . . . . . 36
4.6 Illustration of our structure analysis algorithm. . . . . . . . . . . . . . . . . . . 41
4.7 Adding fuzziness [20] over the final output strands. . . . . . . . . . . . . . . 42
4.8 Our capture setups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.9 Comparisons with state-of-the-art hair capture methods. . . . . . . . . . . . . 43
4.10 Comparisons with different example patches. . . . . . . . . . . . . . . . . . 43
4.11 Comparisons with different estimated scales for the example patch in the database. 44
4.12 Reconstruction results of different braided hairstyles. . . . . . . . . . . . . . 45
5.1 Teaser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Preprocessing of an example hairstyle . . . . . . . . . . . . . . . . . . . . . 48
5.3 Retrieving and combining multiple hairstyles . . . . . . . . . . . . . . . . . 49
5.4 Editing the structure of a hairstyle with a cutting tool . . . . . . . . . . . . . . 51
5.5 Illustration of 2D deformation . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.6 Fit with piecewise helix curves . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.7 Linear blending skinning for a cluster of hair strands . . . . . . . . . . . . . 56
5.8 Comparison with state-of-the-art sketch-based hair modeling method . . . . . . 57
5.9 Comparison with 3D hair capture . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.10 Modeling results for the same reference photo using different sets of user strokes 59
5.11 Single-view modeling results of various hairstyles from reference photographs 60
6.1 Teaser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Hair segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.4 Facial modeling pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.5 Texture completion with visibility. . . . . . . . . . . . . . . . . . . . . . . . 70
6.6 Hair mesh digitization pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.7 Hair mesh fitting pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.8 Polystrip Patching Optimization . . . . . . . . . . . . . . . . . . . . . . . . 75
6.9 Polystrip textures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.10 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.11 Importance of our deep learning-based hair attribute classification . . . . . . 78
6.12 Real-time hair simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.13 Comparison with several state-of-the-art avatar creation systems . . . . . . . 80
6.14 Comparison with AutoHair . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
List of figures xii
6.15 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.1 Teaser. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 2D volumetric representation. . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.3 V olumetric hairstyle representation. . . . . . . . . . . . . . . . . . . . . . . . 87
7.4 Our pipeline overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.5 Some training samples in our dataset for the hair regression network. . . . . . 92
7.6 Some additional training samples in our dataset for the V AE network. . . . . 93
7.7 Modeling results of 3D hairstyle from single input image. . . . . . . . . . . . 95
7.8 Comparisons between our method with AutoHair [16] using the same input
images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.9 Comparisons between our method with the state-of-the-art avatar digitization
method [43] using the same input images. . . . . . . . . . . . . . . . . . . . . 97
7.10 Comparison between strands interpolation [101] result and our latent space
interpolation result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
List of tables
6.1 Accuracies of hair classification network . . . . . . . . . . . . . . . . . . . . 79
7.1 Our volumetric V AE architecture. . . . . . . . . . . . . . . . . . . . . . . . 89
Chapter 1
Introduction
Hairstyle is an important aspect to display the identity of a person. One of the easiest way to
change your appearance is to change your hairstyle (Figure 1.1). Just like in real life, hairstyles
are essential elements for digital characters, reflecting personality, fashion, as well as one’s
cultural and social background. The generation of compelling 3D hair models in film or game
production usually takes several weeks of manual work by a digital artist, involving the use
of sophisticated hair modeling tools and procedural scripting. The task becomes even more
tedious when capturing digital doubles, since the hair has to accurately match real references
from photographs.
Unlike the 3D acquisition of objects with smooth surfaces (e.g., faces and bodies) which can
usually take advantage of generic shape priors to regularize the reconstruction, it is generally
very difficult to find a generic shape prior for hair because of the wide range of possible
hairstyles. Specialized hair capture techniques are required due to extremely fine, intricate, and
self-occluding structures.
1.1 3D Hair Digitization in Computer Graphics
Recently, the 3D acquisition of human hair has become an active research area in computer
graphics in order to make the creation of digital humans more efficient, automated, and cost
effective. While high-end hair capture techniques based on specialized hardware [3, 29, 39,
50, 65, 74, 104] can already produce high-quality 3D hair models, they can only operate in
1.2 3D Hair Representations 2
Figure 1.1 Hairstyle is an essential element for personality. Original image courtesy of Mash-
able.
well-controlled studio environments. More consumer-friendly techniques, such as those that
only require a single input image [15, 16, 41, 43, 85], are becoming increasingly popular
and important as they can facilitate the mass adoption of new 3D avatar-driven applications,
including personalized gaming, communication in VR [60, 72, 93], and social media apps
[31, 48, 69, 78]. Cutting-edge single-view hair modeling methods all rely on a large database of
hundreds of 3D hairstyles, which is used as shape prior for further composition and refinement,
in order to handle the complex structure and variations of possible hairstyles.
1.2 3D Hair Representations
In computer graphics, the representations of 3D hair can be divided into three different categories
as shown in Figure 1.2.
1.2 3D Hair Representations 3
strands strips volume
Figure 1.2 Different hair representations.
1.2.1 Strands
The most common way to present a single hair is strand, which simulates the real world scenario.
In practice, a 3D hairstyle model is usaully presented by 100k− 200k strands. Each strand
is a set of 3D vertices, which are ordered from root to tip. We use strands to present hair in
Chapter 3 and Chapter 5.
1.2.2 Strips
Another representation for hair is strip, which is particularly suitable for real-time rendering and
integration with existing game engines. Formally, a strip is a 2D grid of 3D vertices R
j
i
where
i= 0,1,...,L and j= 0,1,...,W. L is the length of the strip and W the width. The isocurves
R
j
along the length define the orientation of the strip. Isocurve R
W/2
is defined as the center
isocurve of the strip. Combining with a proper texture, a strip can simulate a cluster of hair
strands. We use strips in Chapter 6.
1.3 Thesis Overview 4
Figure 1.3 From left to right, we show the strands; constructed flow volume from the single
strand; regrow strands from the flow.
1.2.3 Volume
V olume is usually used as an intermediate representation for 3D hair model, we also call it 3D
orientation filed. Specifically, given a hairstyle of 3D strands, we generate an occupancy field
O using the outer surface extraction method proposed in Hu et al. [43]. Each grid of O has
a value of 1 if the grid center is inside of the hair volume and is set to 0 otherwise. We also
generate a 3D flow field F from the 3D hair strands. We first compute the local 3D orientation
for those grids inside the hair volume by averaging the orientations of nearby strands [96].
Then we smoothly diffuse the flow field into the entire volume as proposed by Paris et al. [ 73].
Conversely, given an occupancy field O and the corresponding flow field F, we can easily
regenerate 3D strands by growing from hair roots on a fixed scalp. The hair strands are grown
following the local orientation of the flow field F until hitting the surface boundary defined by
the volume O. This representation is used in Chapter 7.
We show a 2D scenario of volumetric representation in Figure 1.3.
1.3 Thesis Overview
The rest parts of this thesis are organized as follows. In Chapter 2, we give a survey about
related works on hair modeling, multi-view hair capturing, single-view hair reconstruction,
dynamic hair simulation and related 3D deep learning techniques. Chapter 3 introduces a robust
and general hair reconstruction framework that guarantees structural plausibility by construction
1.3 Thesis Overview 5
using a database of strand examples obtained through simulation. Chapter 4 further introduces
a method to capturing braids using a handheld Kinect. Chapter 5 presents a system that creates
a high quality 3D hair model from a single input reference photo using a database of full 3D
hairstyles and a few user strokes. In Chapter 6, we present a fully automatic framework that
digitizes a complete 3D head with hair from a single unconstrained image. In Chapter 7 we
introduce to represent the manifold of 3D hairstyles implicitly through a compact latent space
via deep learning methods. Inspiring by previous hair reconstruction methods and the latest 3D
deep learning techniques, we present an end-to-end system to reconstruct 3D hair from a single
unconstrained image.
Chapter 2
Related Work
The creation of high-quality 3D hair models is one of the most time consuming tasks when
modeling CG characters. Despite the availability of sophisticated design tools and commercial
solutions such as XGen, Ornatrix, and HairFarm, production of a single 3D hair model for a
hero character can take weeks or months even for senior character artists.
2.1 3D Hair Modeling
A general survey on existing hair modeling techniques can be found in [98]. One class of hair
modeling methods use physical simulation to generate hair strands such as [6]. To better control
the final shape of the hair, researchers also investigated other physically-inspired methods,
e.g. vector fields [ 36, 105] and statistical wisp model [21]. Sketch-based methods provide
intuitive and convenient tools for hair modeling. Fu et al. [33] demonstrated a sketch-based
interface to build a vector field that generates the final hair strands. Wither et al. [ 102] proposed
a method to estimate hair simulation parameters from user strokes. Another category of hair
modeling technique provides direct editing tools on hair geometry. These geometry-based
methods use various structured models to facilitate controllable hair modeling. Kim and
Neumann [55] employed a generalized cylinder hierarchy for multi-resolution hair editing.
Ward et al. [99] introduced a level-of-detail hair representation to ease hair modeling on
different scales. Yuksel et al. [106] demonstrated hair meshes to model complex hairstyles
through topological operations on the polygonal meshes that drive the hairstyle. Wang et
2.2 Multi-View Hair Capture 7
Figure 2.1 3D hair modeling.
al. [97] proposed a method to synthesize new hairstyles from input examples inspired by
texture synthesis methods. A well designed hair modeling system can help artists create decent
hairstyles and facial hairs within a shorter time (Figure 2.1).
2.2 Multi-View Hair Capture
Hair digitization techniques have been introduced in attempts to reduce and eliminate the
laborious and manual effort of 3D hair modeling. Most high-end 3D hair capture systems [3,
29, 39, 50, 65, 73, 74, 104] maximize the coverage of hair during acquisition and are performed
under controlled lighting conditions. The multi-view stereo technique of Luo et al. [65] shows
that, for the first time, highly complex real-world hairstyles can be convincingly reconstructed
in 3D by discovering locally coherent wisp structures. Hu et al. [40] later proposes a data-
driven variant using pre-simulated hair strands, which eliminates the generation of physically
2.3 Single-View Hair Capture 8
Figure 2.2 Multi-view hair capture.
implausible hair strands. Their follow-up work [42] solves the problem of capturing constraint
hairstyles such as braids using procedurally generated braid structures. In this work they used
an RGB-D sensor (Kinect) that is swept around the subjects instead of a collection of calibrated
cameras. [109] recently proposes a generalized four-view image-based hair modeling method
that does not require all views to be from the same hairstyle, which allows the creation of new
hair models. A concrete multi-view capture example is shown in Figure 2.2. These multi-view
capture systems are not easily accessible to end-user, as they often require expensive hardware
equipment, controlled capture settings, and professional manual clean-up.
2.3 Single-View Hair Capture
With the availability of internet pictures and the ease of taking selfies, single-view hair modeling
solutions are becoming increasingly important within the context of consumer-friendly 3D
avatar digitization. We show an application scenario in Figure 2.3. Single-view hair modeling
techniques have been first introduced by Chai et al. [ 17, 18] for portrait manipulation purposes.
These early geometric optimization methods are designed for reconstructing front-facing sub-
jects and have difficulty approximating the geometry of the back of the hair. Hu et al. [ 41]
proposes a data-driven method to produce entire hairstyles from a single input photograph
2.4 Hair Simulation 9
Figure 2.3 Single-view hair capture.
and some user interactions. Their method assembles different hairstyles from a 3D hairstyle
database developed prior for the purpose of the shape reconstruction. Chai et al. [16] later
presents a fully automated variant using an augmented 3D hairstyle database and a deep convo-
lutional neural network to segment hair regions. Hu et al. [43] further improves the retrieval
performance by introducing a deep learning-based hair attribute classifier, which increases the
robustness for challenging input images where local orientation fields are difficult to extract.
However, these data-driven methods rely on the quality and diversity of the database, as well as
a successful pre-processing and analysis of the input image. In particular, if a 3D hair model
with identifiable likeness is not available in the database, the reconstructed hair model is likely
to fail. Furthermore, handcrafted descriptors become difficult to optimize as the diversity or
number of hair models increases.
2.4 Hair Simulation
A large category of hair simulation methods generates the motion of hair as an aggregated
medium to reduce computational complexity such as with fluid continuum [ 37]. Many me-
chanical strand models have been investigated in the past. Selle et al. [88] proposed a simple
mass-spring model for efficient and robust hair simulation. The discrete elastic rods by [ 4]
2.5 3D Deep Learning 10
Frame 0 Frame 54 Frame 37 Frame 79
Frame 0 Frame 57 Frame 19 Frame 60 Frame 133
Frame 145
Figure 2.4 Hair simulation.
provides an efficient simulation framework based on explicit centerline representation with
reduced coordinates. Super-Helices [5] use a piecewise helical discretization for the Cosserat
rod model and allow efficient simulation with very few elements. State-of-the-art collision re-
sponse algorithms [53] and solvers that capture non-smooth friction in large hair assemblies [24]
ensure realistic dynamic hair behavior. A robust hair simulator can successfully capture the hair
dynamic in the real world (Figure 2.4).
2.5 3D Deep Learning
The recent success of deep neural networks for tasks such as classification and regression can
be explained in part by their effectiveness in converting data into a high-dimensional feature
representation. Because convolutional neural networks are designed to process images, 3D
shapes are often converted into regular grid representations to enable convolutions. Multi-
2.5 3D Deep Learning 11
View CNNs [80, 90] render 3D point clouds or meshes into depth maps and then apply 2D
convolutions to them. V olumetric CNNs [67, 80, 103, 108] apply 3D convolutions directly
on the voxels, which are converted from a 3D mesh or point cloud. On the other hand,
PointNet [79, 81] presents a unified architecture that can directly take point clouds as input.
Brock et al. [10] applies 3D CNNs to variational autoencoder [56] in order to embed 3D
volumetric objects into a compact subspace. These methods are limited to very low resolutions
(e.g., 32× 32× 32) and focus on man-made shapes. Recently, Jackson et al. [49] propose a
single-view face inference method which uses a face geometry encoded as a 3D volume.
Chapter 3
Robust Hair Capture Using Simulated
Examples
3.1 Backgound
Just like in real life, hairstyles are essential elements for any digital character, reflecting
personality, fashion, as well as one’s cultural and social background. The generation of
compelling 3D hair models in film or game production usually takes several weeks of manual
work by a digital artist, involving the use of sophisticated hair modeling tools and procedural
scripting. The task becomes even more tedious when capturing digital doubles, since the hair
has to accurately match real references from photographs.
Unlike the 3D acquisition of objects with smooth surfaces (e.g., faces and bodies) which can
usually take advantage of generic shape priors to regularize the reconstruction, it is generally
very difficult to find a generic shape prior for hair because of the wide range of possible
hairstyles. Specialized hair capture techniques are required due to extremely fine, intricate, and
self-occluding structures.
Recently, graphics researchers have explored a variety of input sources for high-fidelity hair
digitization ranging from multi-view stereo [3], thermal imaging [39], depth-of-field [ 50] to
structured light [74]. Most of these approaches focus on matching the visual appearance through
direct strand growing in a diffused orientation field and work well for sufficiently short and
simple hair. For highly convoluted hairstyles, Luo et al. [65] lately demonstrated the importance
3.2 Overview of Our Approach 13
Figure 3.1 Initial point cloud reconstructed from multi-view stereo with outliers.
of incorporating structural priors during the reconstruction process. Despite the effectiveness
and accuracy, it depends on a good initial point cloud from multi-view stereo and uses a bottom-
up strategy to connect local ribbons into wisps through purely geometry-inspired heuristics.
Consequently, implausible hair strands that travel across different close wisps can still appear.
Moreover, the approach relies on a careful manual clean-up procedure, involving tedious per-
image hair segmentation and outlier removal after the initial point cloud reconstruction (see
Figure 3.1). On average, segmenting the hair takes around 5 minutes per photograph, assisted
with GrabCut [83], resulting in hours of work for a single hair reconstruction. Furthermore,
Patch-based Multi-View Stereo (PMVS) [34] still produces a large mount of outliers due to
ambiguous feature correspondences between the input images. A manual 3D outlier point
removal procedure takes up to 30 minutes per hair example. A fully automatic end-to-end
solution, from the capture session to the generation of a hair for proper simulation, has therefore
not yet been proposed.
3.2 Overview of Our Approach
In this chapter, we introduce a robust and general hair reconstruction framework that guarantees
structural plausibility by construction using a database of strand examples obtained through
3.2 Overview of Our Approach 14
(a) (b) (c) (d) (e) (f) (g)
Figure 3.2 Our system takes as input a few images (a) and employs a database of simulated
example strands (b) to discover structurally plausible configurations from the reconstructed
cover strands (c) for final strand synthesis (d). Our method robustly fits example strands to the
cover strands which are computed from unprocessed outlier affected input data (e) to generate
compelling reconstruction results (f). In contrast, the state-of-the-art method of [65] fails in the
presence of strong outliers (g).
simulation. While our method is designed for unconstrained hairstyles, we also allow user-
sketched hair strands as database examples for some simple constrained styles (e.g., ponytail)
which involve boundary constraints in addition to non-trivial inter-strand collisions and friction.
During the reconstruction, we use our example strands in the database as structural references
to ensure structural plausibility in the presence of large amount of outliers and occlusions in
the input (Figure 3.2). To motivate, our database can constrain the range of possible strand
lengths in the reconstruction, since this property is inherently difficult to estimate and enforce,
especially for layered hairstyles with many crossing wisps.
For unconstrained hair, we generate structurally plausible strand examples through hair sim-
ulation based on the Super-Helices model introduced by Bertails et al. [5]. To constrain the
database within a manageable size, we reduce the simulation configuration space by a symmetry
principle and constraints to the initial conditions, as well as a selection scheme for the natural
parameters (e.g., stiffness and curliness) through the subsequent strand fitting step.
The strand fitting step discovers structurally plausible configurations of locally grown strand
segments by a robust voting scheme to select matched examples while enforcing the roots of
examples to lie close to the scalp. The best configurations are then extracted through mean-shift
clustering. We then use these structural configurations to connect local strand segments into
wisps in a top-down fashion for final hair strand synthesis.
Notice that the examples in our database do not need to match the input exactly since we only
use them as a structural references in subsequent steps. As a result, one key insight of our
3.2 Overview of Our Approach 15
input images
point cloud
3D orientation field
simulated examples
cover strands
clustered strands synthesized strands
Figure 3.3 The overview of our system. From a set of input images, we first reconstruct a point
cloud as well as a 3D orientation field followed by a local growing step that generates the cover
strands. We then generate a database of example strands through hair simulation constrained
by the input hairstyle. We use this database to discover clustered strands from the cover with
structurally plausible configurations via strand fitting and finally we synthesize realistic strands
from these clustered strands.
research is that plausible structural configurations can be encoded using a limited number of
examples in a database (three orders of magnitude less than the total number of synthesized hair)
to describe complex hairstyles that contain tens of thousands of strands with varying lengths
and shapes.
While the fitting accuracy of our technique is on par with the state-of-the-art in 3D hair capture,
our proposed method focuses on robustness and generality. Our system robustly generates
structurally plausible hair from unprocessed point cloud input and handles a wide range of
hairstyles. Due to the limited level of manual work required (if any, up to a few minutes for
a novice user), we argue that our data-driven hair capture framework is suitable for efficient
replication of complex hairstyles from the real world and can also be interesting for applications
beyond digital production, such as virtual web content creation for cosmetic brands.
Our full pipeline is shown in Figure 7.4. Similar to [3, 65], we use a multi-view stereo capture
system for the input data. From the input photographs, we compute a rough initial 3D point
cloud of the hair via Patch-based Multi-View Stereo (PMVS) algorithm [34]. For each point we
compute the local 3D hair orientation field by maximizing the projected orientation consistency
across the 2D orientation maps of each image. We use the 3D orientation field to generate a set
of cover strands that are grown bidirectionally from each point as in Luo et al. [65]. The initial
cover strands are mostly short and disconnected due to occlusions and missing data (Figure 3.4).
3.2 Overview of Our Approach 16
(a) (b) (c)
Figure 3.4 Preprocessing steps. From a set of input images (a), our system first reconstructs a
point cloud with a 3D orientation field (b). Cover strands are then grown to cover the entire
point cloud based on the 3D orientation field (c).
The example strands in the database are then generated through physical simulation based
on Super-Helices [6]. We simulate static hair strands under gravity and different boundary
constraints (Section 5.2). Note that we do not aim to exactly reproduce the input hairstyle
through simulation but rather use the example strands as structural references to discover
plausible structural configurations among the cover strands. We exhaustively sample all possible
simulation configurations within a pre-defined feasible parameter space constrained by the
current input hairstyle. The final number of necessary simulations can be significantly reduced
by leveraging symmetry principles and constraining the initial conditions. An optimal set of
examples is then selected through an iterative scheme based on their fitting errors to the cover
strands.
Next, we present a strand fitting scheme inspired by the Hough transform to simultaneously
cluster and connect the cover strands into structurally plausible wisps based on the simulated
examples (Section 3.4), instead of using separate steps of clustering and wisps connection as in
Luo et al. [65]. We formulate a transformation space and ask each cover strand to cast votes
for their matched example strands whose roots are constrained to the scalp model. The scalp
model itself is obtained by fitting a 3D head model to the input point cloud [ 62]. The matched
examples are revealed as clustered modes in the transformation space and we use mean-shift
clustering to extract these modes in an iterative fashion. The matched examples are then used
3.3 Strand Simulation 17
to effectively discover the structural connections of fragmented cover strands from which we
construct wisps to synthesize the final strands (Section 3.5).
3.3 Strand Simulation
Our goal in this section is to generate a database of strand examples{S
e
} for the subsequent
strand fitting step (Section 3.4). We use the Super-Helices model [ 6] to simulate static hair
strands constrained by the scalp under gravity. With Super-Helices, a strand is modeled as a
Cosserat rod, which is characterized by the following kinematics equation:
∂e
i
∂s
=Ω× e
i
, (3.1)
where{e
i
} are the axes of the material frame m
1
(s), m
2
(s) and t(s) parametrized by curve
length s for i= 1,2,3 respectively; Ω=κ
1
(s)m
1
(s)+κ
2
(s)m
2
(s)+τ(s)t(s) is the Daboux
vector determined by the curvaturesκ
1
(s),κ
2
(s) and torsionτ(s). Given the initial conditions,
the curvatures and torsion uniquely determine the centerline of the rod r(s) through Equation 3.1
and r(s) can be integrated by the technique described in Bertails et al. [5].
To simulate a static rod, one way is to minimize the potential energy. The potential energyE of
a rod is given by:
E=E
e
+E
g
+E
c
, (3.2)
whereE
e
is the internal elastic energy,E
g
the gravitational potential energy, andE
c
the collision
energy.
To formulateE
e
, we first approximate the cross sections of the hair strands to be circular, despite
the fact that the real hair strand’s cross section varies among ethnic groups with up to 20%
change in eccentricity. With this approximation,E
e
can be written as:
E
e
= k
Z
L
0
(κ
1
(s)− κ
0
1
)
2
+(κ
2
(s)− κ
0
2
)
2
+ A(τ(s)− τ
0
)
2
ds,
where L is the strand length;κ
0
1
,κ
0
2
, andτ
0
are the natural curvatures and torsion of a hairstyle;
k is the stiffness and we choose A= 0.6 since the Poisson ratio is constant across human hair.
3.3 Strand Simulation 18
We can directly rephraseE
g
as follows:
E
g
= − B
Z
L
0
g· (r(s)− r(0))ds
= − B
Z
L
0
(L− s)g· t(s)ds
where B= 4.1× 10
− 5
and g is the unit direction of gravity.
Finally, collision energyE
c
can be modeled by elastic repelling force:
E
c
= D∆ℓ
2
where D is set to 0.1 and∆ℓ is the amount of penetration of the rod from the colliding point on
the scalp. Note that all the constants (A,B,D) above are derived from real physical quantities
(see Appendix B in [5]).
Therefore, given the boundary constraints, the simulation is determined by the initial conditions
{m
1
(0),m
2
(0),t(0),r(0)}, strand length L, the natural curvatures and torsionκ
0
1
,κ
0
2
,τ
0
, and
strand stiffness k. Since an exhaustive enumeration of the full simulation configuration space
for a given hairstyle is intractable, one of our key insights is that we can leverage symmetry and
constrained initial conditions to significantly reduce the number of simulation configurations.
These simulated examples are used as rough structural references to ensure plausible hair
structures on the cover strands for the final strand synthesis (Section 3.5).
To reduce the number of simulation configurations, we first simplify the boundary constraints.
We approximate the scalp as a sphere so that simulations are equivalent on the same latitude as
shown in Figure 3.5. However, for long hairstyles, other body parts (e.g., shoulders, neck, back)
will affect the boundary constraints. Thus, we extrude the sphere model downwards to form a
cylindrical shape to approximate these cases.
We further simplify our simulation setup by fixing r(0) on the top of the boundary model
(Figure 3.5). In particular, simulating a strand from the top also covers the configurations that
start from any middle point to the tip, using the material frame at the starting point as the initial
condition. We also fix the angle θ between the normal of the top point and t(0) to 80 degrees
since hair strands are not growing in parallel to the scalp (Figure 3.5).
With r(0) and t(0) fixed, we enumerate the last degree-of-freedom in the initial conditions by
rotating the material frame around t(0) byφ∈[0,2π) (Figure 3.5). To enumerate the strand
3.3 Strand Simulation 19
(a)
✓ (b)
✓ (c) (d)
Figure 3.5 The simulation setup. We use Super-Helices to simulate a natural hair strand which
consists of a few piecewise helical segments . We use cylindrical a and spherical b boundary
constraints for realistic simulation of strands with different lengths. A simulated database is
shown in c.
length L, we uniformly sample L within the range of 100∼ 400mm for all our datasets. Since
the initial material frame rotates around t(0), only one of κ
0
1
and κ
0
2
is independent and we
set κ
0
2
= 0. κ
0
1
is sampled within 0∼ 0.1mm
− 1
. The natural twist τ
0
occurs for african hair
and other artificially permed hairstyles. For these hairstyles, we find it useful to set τ
0
within
− 0.03∼ 0.03mm
− 1
. For most of the natural hairstyles, we setτ
0
= 0 to enforce zero natural
twisting according to [5].
We generate the final hair databases by varying natural curvature κ
0
1
within the range of
[0.01,0.06] with intervals of 0.01, and stiffness k within the range of[1.0,2.0] with intervals of
0.5, based on the Young’s modulus of natural hair [5], resulting in a total number of 6× 3= 18
databases. Each database contains multiple strands by sampling φ and L on their feasible
range. While in the worst case, we would iterate over each database to determine its fitness
(Section 3.4), most databases can be excluded since they are significantly different than the
target hairstyle, resulting in much lower actual running times. In practice, we only need to
select 1∼ 5 databases for each input hairstyle. Another possible way for parameter selection is
3.4 Strand Fitting 20
to develop a scheme similar to [102] which estimates natural curvatures and stiffness from 2D
user sketches.
3.4 Strand Fitting
Given a set of cover strands{S
c
} grown from the captured point cloud and orientation field
(Figure 3.4), we first collect candidate rigid transformations {T} that are obtained by fitting those
cover strands with example strands from the database{S
e
}. In our case, a transformation has 7
components T={t
x
,t
y
,t
z
,r
x
,r
y
,r
z
,I}, where(t
x
,t
y
,t
z
)
T
is the translation, r
x
,r
y
,r
z
the Euler
angles, and I the index of the example strand in the database. The Euler angles are measured in
radians and are multiplied by a factor of 10 to balance the weight of rotation vector with respect
to translation vector. We incorporate the index values into the the transformation space for a
unified treatment of extracting the matched example strands, which are the representatives of a
cluster. We scale those index values by a factor of 100 as a sufficiently large value, so that the
votes from different example strands will not affect each other and will be clustered separately.
We consider each pair of strands{S
c
,S
e
} and compute the optimal transformation to align S
e
with S
c
. Specifically, we minimize the matching cost via Iterative Closest Point (ICP) algorithm
with point-to-point constraints [7]:
E
S
c
,T(S
e
)
=
∑
i
p(s
c,i
)− T
p(s
e,i
)
2
(3.3)
where{p(s
c,i
)} are the positions of all the samples in S
c
, s
e,i
the closest sample of s
c,i
in S
e
,
and T
p(s
e,i
)
the position of s
e,i
under the rigid translation and rotation specified by T. We
consider a transformation T as a valid vote to be collected, if and only if the matching cost
defined by Equation 4.2 satisfies the following criterion:
E
S
c
,T(S
e
)
≤ T
E
· n(S
c
) (3.4)
where T
E
is a user specified threshold and n(S
c
) the number of samples in the strand S
c
. The
threshold T
E
controls the tolerance of average matching error between two strands, which
essentially determines how many strands can be grouped into a single cluster. We fix this
parameter to be 10mm for all of our datasets.
We estimate the hair root position using a RANSAC approach [32]. More concretely, we first
find the nearest point on the scalp for each cover strand, and randomly pick 5 nearby points
3.4 Strand Fitting 21
(a) Examples (b) Cover strands (c) V otes
Figure 3.6 With a database of two examples (a), two groups of cover strands (b) exhibit different
behaviors in the votes (c), depending on the fitness of each example to those strands: the
votes for fit examples are concentrated in clusters while the votes for unmatched examples are
scattered and decentralized. The votes are projected onto the plane of their first two principal
components for each example. The green dots (pointed out by the black arrows) show the
cluster centers found by mean-shift.
within a radius of 20mm on the scalp as the potential roots. For each potential root, we attach
it to the cover strand as an additional sample and run ICP for 30 iterations with each example
strand to collect candidate transformations.
The collected transformations lie in a 7-dimensional parameter space. Intuitively, under the
same or similar transformation, parallel cover strands which are close to each other can be
aligned to the same part of a matched example strand. On the other hand, consistently oriented
strands, that can be connected/merged into a single strand, would be aligned to different parts of
the same example strand. As a result, the votes between cover strands and a matched example
form a cluster in parameter space, while the votes between cover strands and non-matching
examples are scattered (see Figure 3.6). We then perform mean-shift clustering to extract
the most dominant clusters of candidate transformations in parameter space. While a similar
approach has been proposed for symmetry detection of 3D shapes [68], we instead group
parallel strands together and simultaneously connect consistent strands. We propose to compute
3.5 Strand Synthesis 22
the local densityρ of each transformation T as a scaled sum of kernel functions K centered at
T:
ρ(T)= C(T)· ∑
i
K
|T− T
i
|/σ
(3.5)
whereσ is the kernel radius measured in the parameter space (we setσ = 5 for all the datasets
used in the paper), and the scaling factor C(T) is computed as the number of samples on the
example strand that have been matched to certain samples on the cover strands. We choose the
following kernel function as suggested by Comaniciu et al. [23]:
K(x)=
1− x
2
, if 0≤ x< 1
0, otherwise
(3.6)
The mean-shift clustering process is performed iteratively. Each time we extract the most domi-
nant cluster mode which maximizes the kernel function in Equation 3.5 with the corresponding
transformation T. We apply the rigid translation and rotation to the example strand using T,
and reveal all the cover strands{S
c
} that satisfy the matching criterion defined by Equation 3.4.
All those cover strands belong to the same cluster and the structure information is provided by
the transformed example strand. We then remove all the votes which originate from those cover
strands, and continue the clustering process until all the votes have been removed or enough
number of clusters have been found.
3.5 Strand Synthesis
We synthesize final strands by independently considering each cluster extracted in the strand
fitting stage as described in Section 3.4. Within each cluster, the corresponding cover strands
{S
c
} serve as captured local features of the underlying hairstyle, while the matched example
strand S
e
provides the structural reference.
Similar to the synthesis algorithm described in [65], we first group the cover strands {S
c
} within
the same cluster into multiple ribbons, where each ribbon is a flat surface fragment composed
of close and nearly parallel strands. Then we connect the ribbons to form wisps and synthesize
output strands from those wisps. The major difference between Luo et al. [65] and our strand
synthesis approach is that, instead of using some heuristics like local circle fitting which cannot
3.5 Strand Synthesis 23
(a) (b) (c) (d)
Figure 3.7 Ribbon connection and strand synthesis. (a) Matched example strand is highlighted
in red, cover strands in the same cluster are highlighted in blue. (b) Cover strands are grouped
into multiple ribbons. (c) We connect two ribbons based on the correspondence between the
example strand and the ribbons. Here the segment of samples in the example strand which
are not matched to any samples of the ribbon centers are highlighted in red. (d) Final output
strands.
ensure global structural plausibility, we use the structure information provided by the example
strand to connect those ribbons.
We first build the correspondence between the example strand and the center strands of all the
ribbons within a single cluster. For each sample on a center strand, we first find the closest
sample on the example strand. A ribbon is considered to be matched to the segment[i
min
,i
max
]
of the example strand, when i
min
and i
max
are the minimum and maximum indices of the
closest samples. We then divide those ribbons into several subsets while ensuring that (1) each
ribbon belongs to at least one subset; (2) the matched segments within the same subset do
not overlap; (3) the total length of the matched segments within each segment is maximized
by including as many ribbons as possible. For the ribbons in each subset, we resample them
by linearly interpolating original sample positions, so that they have the same ribbon width w
as the maximum width before resampling, and the ribbon length l equals to(i
max
− i
min
+ 1).
3.6 Results 24
Finally we sort those ribbons in the same subset based on the indices of matched segments, and
connect them one by one to get a wisp of width w. The wisp length l is the same as the example
strand (see Figure 3.7).
For the samples{s} in the example strand which are matched to certain samples on the ribbon
centers, we use the resampled ribbon positions for the wisp. For those segments of successive
samples{s
i
} in the example strand which are not matched to any samples of the ribbon centers,
we determine their new positions{p(s
i
)} in the wisp by minimizing the following energy for
each segment of length N:
E
{p(s
i
)}
=
N− 1
∑
i=1
p(s
i
)− p(s
i+1
)
−
˜ p(s
i
)− ˜ p(s
i+1
)
2
+α
N
∑
i=1
|p(s
i− 1
)− 2p(s
i
)+ p(s
i+1
)|
2
(3.7)
where ˜ p(s
i
) is the original position of sample s
i
in the example strand andα a user specified
weight (we useα = 5 for all the results throughout the paper). For the second term in Equa-
tion 3.7, we extend the segment by predefining p(s
0
) and p(s
N+1
) to be the two positions of the
ribbon samples adjacent to this segment, if either one of the two samples s
0
and s
N+1
exists.
This term ensures that the unmatched segments of the example strand can be smoothly deformed
into a new segment for ribbon connections. The final output strands are synthesized from the
wisps using the method of Luo et al.[65].
3.6 Results
We test the performance of our hair capture framework on different data sets which have
different multi-view stereo configurations:
• Set A includes the Figures 3.2 and 3.9 and was captured using 66 synchronized Canon
EOS Rebel T3i cameras (864× 1296 pixels, focal length of 30 mm). The capture setting
is shown in Figure 3.8 and all subjects were real people.
• Set B is shown in Figure 3.12 (top three rows) and used 45 Canon EOS 550D cameras
(864× 1296 pixels, focal length of 50 mm) to capture static wigs.
3.6 Results 25
Figure 3.8 Our capture setup.
• Set C, illustrated in Figure 3.12 (bottom), contains the wig from the motorized gantry
setting of [65] where 50 shots around the hair were taken (936× 1404 pixels and 50 mm
lens).
As demonstrated in Figure 3.2, our method effectively reconstructs complex and curly hairstyles
(Set A) on fully unprocessed input point clouds with many outliers and the presence of surface
geometries from the face and body, while the method of Luo et al. [65] breaks.
The first two rows of Figure 3.12, show examples of wigs that contain many crossing wisps and
hair strands of varying lengths. The multiple layers of occluding hair also increase the difficulty
of estimating connections. We show that using a database with simulated strands is effective
in discovering the connections of fragmented hair wisps and producing structurally plausible
hair configurations during hair synthesis. The third row in Figure 3.12 shows an example with
straight hair where a database with minimal number of example strands is sufficient to encode
global properties of the hair structure.
As shown on the challenging example of Set C (Figure 3.12, bottom), our method produces
results that are visually as compelling as the current state-of-the-art with faithfully reconstructed
ringlet structures.
To further demonstrate the generality of our solution, we show in Figure 3.9 that simple
constrained hairstyles (e.g., ponytail) can be digitized when we replace the simulated database
with examples from user sketches. The user only needs to draw a few strokes on the input photo
to depict plausible structure of the underlying hairstyle (see Figure 3.9), which can be done in
minutes. We then re-project those 2D strokes back to the surface of the captured point cloud
and smooth the strands using Equation 3.7 with an additional positional term to get a database
3.6 Results 26
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Figure 3.9 From an input photo (a) of a constrained hairstyle, the user can draw strokes (b) for a
set of plausible strands. Our system creates 3D strand examples (c) from these strokes to fit the
input cover strands as shown in (d) and (g) without and with outliers respectively. The strand
fitting step discovers clustered strands (e) / (h) which we use to synthesize final strands (f) / (i).
of 3D examples (see Figure 3.9). Notice that strands from user sketches, simulation, or other
sources can be combined to form a more powerful prior, but were not required in any of our
examples.
3.6.1 Comparison.
We compare our method with the state-of-the-art hair capture technique of [65]. As illustrated
in Figure 3.10, our method is often better at discovering correct inter-wisp connections than the
bottom-up approach of Luo et al. [65].
3.6 Results 27
Reference photo [65] Our result
Figure 3.10 Comparison between [65] and our reconstruction result in terms of structural
plausibility.
Figure 3.11 Results of dynamic simulation using our output strands.
3.6.2 Simulation.
To simulate our resulting hairstyles, individual fibers were first converted into piecewise helices
(with 8 helical arcs each) using the floating tangents algorithm [ 28] that served to initialize
the rest shape of an assembly of super-helices [5]. The resulting physical hair model was then
animated under wind forces using Daviet et al.’s simulation solver [24] for computing hair-hair
and hair-body frictional contacts.
3.6 Results 28
Figure 3.12 For each dataset, we show reference photo, input point cloud with RGB color-
encoded 3D orientation field, database of simulated examples, top 10 clustered strands, our
final synthesized result and comparative result from [65].
Chapter 4
Capturing Braided Hairstyles
4.1 Background
Whether as a fashion statement or an indicator of social status, braided hairstyles for both men
and women have remained prevalent in many cultures for centuries. It is no surprise that digital
storytellers frequently use characters with braids, ranging from casual ones to meticulously
styled updos. In both film and game production, the hairstyles of digital humans need to be
carefully designed as they often reflect a unique personality of the character. Typically, reference
photographs or 3D scans of real actors are used as a starting point for the digitization. While
sophisticated hair modeling tools [20, 106] have been introduced to improve the workflow of
digital artists, significant effort is still required to create digital hair models that accurately
match the input references. In feature films, modeling the hairstyle of a single character can
take up to several weeks for an artist.
To facilitate the digitization of complex hair models, important advancements in 3D hair
capture [40, 65, 74] have recently emerged to further reduce the manual effort of digital artists.
Even though sophisticated hardware is necessary, these techniques can capture a wide spectrum
of real world hairstyles using geometric and physics-driven priors for hair structural analysis.
However, these structural analyses become problematic for constrained hairstyles such as braids
since their priors cannot properly model the intertwined topologies of braids.
A great variety of complex braids can be generated by repeatedly applying several basic rules
[22]. For instance, the most common basic braids with 3 strands (Section 4.1 are generated
4.1 Background 30
Figure 4.1 Making braids.
Top: a three-strand basic braid; bottom: a five-strand Dutch braid for Figure 4.2. Individual
steps of making each braid style are shown from left to right.
by repeatedly crossing the current middle strand under the outside one (left or right) in an
interleaved fashion. Other braid styles extend the basic one by varying the number of strands,
crossing over or under, or merging extra hair strands after each crossing. More complex braids
include the Princess Anne style (Figure 4.12, bottom). . Although in theory, an infinite number
of braids exist in B
n
, braided hairstyles in daily life are typically limited to a couple of styles
where each strand is symmetric to another for simplicity and aesthetics.
In this paper, we develop the first 3D hair reconstruction framework that enables the acquisition
of wide range of different braided hairstyles. In addition to greatly increasing the space
of capturable hair variation, our system requires only a commodity depth sensor as input,
producing results comparable to existing high-end hair capture techniques. Inspired by the
simplistic generative rules of braids in both theory and practice, we use a family of compact
procedural models based on sinusoidal functions to represent common braided hairstyles. These
procedural models can be used to fit the braid structures to the input data and resolve the
4.2 Overview of Our Approach 31
Figure 4.2 Teaser.
We capture the braided hairstyle (a) using a Kinect sensor and obtain an input mesh with a local
3D orientation for each vertex (b). Based on the information provided by the example patches in
a database, we extract the centerlines (c) of the braid structure to synthesize final output strands
(d). (e) and (f) show the input reference photo and our output strands from another viewpoint.
structural ambiguities caused by occlusion. We show that our system can faithfully recover
complex intertwined patterns and generate structurally plausible braided hair models. We adopt
a patch-based fitting algorithm based on random sampling and perform structure analysis to
connect them into consistent braids. Finally, we synthesize output strands in the braid structures
and combine with the remaining hair by diffusing the 3D orientation fields. Since our system
only requires a consumer level depth sensor such as Microsoft’s Kinect, we can greatly reduce
the cost of the acquisition hardware as compared to previous multi-view stereo systems (e.g.
[40, 65]).
4.2 Overview of Our Approach
The overview of our pipeline is shown in Figure 4.3. Our input is a point cloud with normals
and local hair orientations at each point for the captured hairstyle. The point cloud can
be derived from the vertices of a mesh reconstructed by a consumer depth camera [70] or
directly from multi-view stereo [34]. The local 3D orientation for each point (Figure 4.3d) is
computed by maximizing the consistency of projected directions across a set of 2D orientation
maps (Figure 4.3c) as in [65]. We manually remove the non-hair part from the point cloud
(Figure 4.3e). This is the only mandatory manual step of our entire pipeline.
4.3 Procedural Model 32
(a) input photos
(b) input mesh
(c) 2D orientation maps
(d) 3D orientation field (e) cleaned mesh
(f) database
(h) labeling result (i) extracted structure (j) output strands (g) fitting result
Figure 4.3 Overview of our system pipeline.
More detailed descriptions are in Section 4.2.
We generate a database of a variety of example patches via procedural braid models (Figure 4.3f,
Section 4.3). Each patch is a group of surface tubes with their centerlines. We then adopt a
fitting method based on random sampling to align the input point cloud with the patches in
the database and derivate a set of candidate examples matching the braided part (Figure 4.3g,
Section 4.4). To speed up computation, users can optionally perform the following two types of
manual input: (1) choosing a braid example from the database based on visual observation, and
(2) providing a rough initial scale which narrows the search range
Next, we compute the optimal subset of the candidate patches to disjointly cover the input
braid via multi-label optimization (Figure 4.3h), and extract the structure of the braided part as
the connected centerlines of the covering braid patches (Figure 4.3i, Section 4.5). Finally, we
follow the recovered braid structure with the rest of the unbraided hair region to synthesize the
output strands guided by a diffused 3D orientation field (Figure 4.3j, Section 4.6).
4.3 Procedural Model
Inspired by the hair braiding process and braid theory, we use a few intertwining centerlines
to represent the braid structures. These centerlines are symmetric to each other and have
repeating patterns that can be characterized by periodic sinusoidal functions. Basic braids can
4.3 Procedural Model 33
Figure 4.4 Database preparation.
(a) shows the centerlines of a three-strand braid (left), two intermediate results of the patch
expanding process (middle two), and the final expanded patch (right). (b), (c) and (d) show
the expanded patches for a four-strand braid, a five-strand dutch braid and a fishtail braid
respectively.
4.4 Patch Fitting 34
be described by three centerlines{L
i
,i= 0,1,2} in its natural frame as follows:
L
0
: x= asin(t), y= t, z= bsin(2t)
L
1
: x= asin(t+ 2π/3), y= t, z= bsin(2(t+ 2π/3))
L
2
: x= asin(t+ 4π/3), y= t, z= bsin(2(t+ 4π/3))
(4.1)
where the y is the braiding direction, and a and b are two constants that determine the shape of
the braid. Other braids can be similarly modeled according to how the strands are woven. Note
that the procedural forms are by no means exhaustive: users can provide additional parametric
forms either procedurally (as above) or manually (e.g. sketching a few curves via 3D authoring
tools).
In order to model the strand thickness in real braids, we augment the procedural model by
expanding the centerlines to tubes. To compute the proper radius of these tubes, we expand an
initially small radius until inter-tube penetrations occur. othing
We use each procedural model to generate a segment of the target braid as an example patch
P
e
and collect them into a single database DB (Figure 4.4). Each patch in the database is a set
of helical tubes with local orientation defined at each vertex v
i
∈P
e
. The vertex orientation is
defined to be along the direction of the centerline.
4.4 Patch Fitting
Given the input geometry of the captured hair, our goal is then to identify the local braid
structure of the hairstyle by fitting braid patches from the database to the point cloud. There
are two major challenges we need to address here. First, almost all braided hairstyles in real
life exhibit rich variations in terms of type, orientation, and size. For example, braids can have
different numbers of strands, and the orientation and the size of each knot can change along the
braids. Second, the input geometry is usually very noisy and largely occluded due to hair and
body contacts.
We first align the input point cloud C with each braid patch in the database DB to collect a
few candidate matches using a fitting approach based on random sampling similar to [ 9, 40].
Specifically, for each example patch P
e
∈ DB, we apply the fitting algorithm N times, each time
starting from a random initial position, orientation, and scale for the patch. The initial position
4.5 Structure Analysis 35
of the example patch is determined by randomly sampling a point on the captured surface, and
the initial orientation of the patch is computed from a randomly normalized quaternion. The
initial scale is computed by randomly scaling between 0.9∼ 1.1 of an estimated scale, which is
obtained by manually scaling and matching the braid patches to the braids in the input point
cloud. Here, we use a simple interactive tool by drawing three lines on the mesh surface to
indicate the x, y, z scales. Note that since we adopt a strategy based on random sampling,
we only need a roughly estimated scale for our fitting algorithm to work and this has to be
done only once for a given example patch. We found that satisfactory results are obtained for
N= 100 in all our examples.
Next, we compute the optimal transformation T to align the scaled patchP
e
withC via Iterative
Closest Point (ICP) method [7]. To take into account the variations of braids in the input
hairstyle and to achieve better fitness, we adopt both, a rigid and non-rigid ICP [ 59] approach to
deformP
e
. (An example comparison is shown in Figure 4.5.) The fitting error is computed as:
E
C,P
f
=
∑
i
|p(v
c,i
)− p( ˆ v
e,i
)|
2
+α|o(v
c,i
)− o( ˆ v
e,i
)|
2
(4.2)
where p and o indicate the position and orientation of the corresponding point;P
f
= T(P
e
) the
fitted patch; ˆ v
e,i
the transformed position of the vertex inP
f
with closest point v
c,i
∈C, andα
a constant weight which is fixed to be 100 for all of our results. Empirically, we have found
that good results can be obtained with 30 iterations of rigid ICP followed by 30 iterations of
non-rigid ICP.
Note that the tail of the braid are usually tapered and can cause large errors when fitting the
braid patches. To faithfully fit the tail of a braid, we also introduce a tail patch for each example
patch in the database. Each tail patch is automatically created by linearly tapering the example
patch towards the end such that the width and the thickness of the tubes are halved compared to
the beginning along the y-axis. See Figure 4.3f for a concrete example.
4.5 Structure Analysis
Our goal in this section is to select the optimal set of fitted patches from the candidate set
and extract the coherent braid structure of the input hairstyle. The process is visualized in
Figure 4.6.
4.5 Structure Analysis 36
Figure 4.5 Comparison between fitting with rigid ICP and non-rigid ICP.
Here we show (a) input mesh and the fitted patches with rigid ICP and non-rigid ICP in (b) and
(c) respectively.
The patch fitting step (Section 4.4) produces a set of fitted patches {P
f
}. Each fitted patch
covers a subset of points{v
c
} in the input point cloudC if v
c
is within a distance threshold to
some vertex ˆ v
e
∈P
f
.
We collect all the vertices{v
c
} covered by at least one fitted patch P
f
, and consider them as the
braided partC
b
ofC, whose structure can be analyzed by the procedural braid models in the
database. Since{P
f
} may be redundant and overlap with each other, one v
c
can be covered by
multiple fitted patches (see Figure 4.6b). We need to select a subset of the fitted patches so that
we can connect them together to form a complete and clean structure for the braided part.
We formulate the structure analysis task as a multi-label problem. Specifically, for each point
v
i
∈C
b
, we choose a single label l
i
which corresponds to a fitted patch P
f,i
by minimizing the
following energy function:
E
C
b
,{l
i
}
=
∑
i
E
d
v
i
,l
i
+
∑
i, j∈N(i)
E
s
v
i
,v
j
,l
i
,l
j
(4.3)
4.5 Structure Analysis 37
The first term in Equation 4.3 is a data term to ensure that each point v
i
is assigned to a label
such that it is covered by the corresponding fitted patch P
f,i
:
E
d
v
i
,l
i
=
(
E
v
i
,P
f,i
ifP
f,i
covers v
i
100 otherwise
(4.4)
whereE
v
i
,P
f,i
is the fitting error computed based on Equation 4.2. We also ensure that a
sufficiently large penalty factor is used ( 100). The second term in Equation 4.3 is a smoothness
term that tries to assign the same label to vertices that are within a distance threshold and have
similar local orientations:
E
s
v
i
,v
j
,l
i
,l
j
=
(
0 if l
i
= l
j
100 otherwise
(4.5)
The energy function in Equation 4.3 can be efficiently minimized using a graph-cuts algorithm
[25]. See Figure 4.6c for a color visualization of the labeling result on the input mesh.
Based on the labeling result, we can select an optimal subset of fitted patches {P
f
} whose
corresponding labels have been assigned to certain points inC
b
(Figure 4.6d). Now we need to
connect different sets of centerlines for adjacent patches to generate a complete braid structure.
We pose this task as an assignment problem. Specifically, we consider all possible matching
combinations of the centerlines for every two adjacent patches. For each possible combination,
we compute an assignment cost as the average distance between the corresponding points in the
overlapping regions of all the combined pairs of centerlines.
Finally, we expand the connected centerlines using the same method for the database described
in Section 4.3, in order to obtain the surface of the volumetric tubes for the braid (Figure 4.6g).
To further improve the fitness to the input hairstyle, we use the non-rigid ICP algorithm to
align the expanded surface with the captured geometry. The deformed surface represents the
reconstructed volumetric tubes for the braided part of input hair (Figure 4.6h).
Discussion An alternative way to obtain the structure of the braided part is to connect these
selected surface patches{P
f
} directly using a mesh composition approach [45]. We found that
connecting the centerlines first, followed by expanding the surface with non-rigid ICP is easier
to implement and produces good results in all our cases.
4.6 Strand Synthesis 38
4.6 Strand Synthesis
After extracting the braid structure of the input hair, we synthesize full hair strands in both the
braided and unbraided parts. The unbraided part consists of the pointsC that are not covered by
the fitted braid patches in the braid obtained in the fitting step in Section 4.4.
We first synthesize the strands on the unbraided part by growing bidirectionally from the
uncovered points based on the local orientations as described in [65].
Next, we determine the root positions by randomly sampling a user-specified region on a
manually prepared scalp. Finally we grow the output strands from the roots following the 3D
orientation field. Once a strand grows into the braided part, i.e. one of the volumetric tubes
obtained in Section 4.5, we compute the barycentric coordinates of the entry point with respect
to the cross section of the tube, and interpolate all subsequent growing positions using the same
barycentric coordinates along the tube.
To further improve the visual realism of the reconstruction results, we introduce some fuzziness
into the sample positions of the final strands using the method of [20] (See Figure 4.7).
4.7 Results
Capture setup Our algorithm supports different capturing systems, and does not require
precise control of the environments. Except for the basic braid case in Figure 4.12 which is
captured using a multi-view stereo (as shown in Figure 4.8b), all other results are obtained by
slowly moving a hand-held Kinect camera around the targets (as shown in Figure 4.8a) for
realtime geometry reconstruction using KinectFusion [70].
Hairstyles To show the effectiveness of our method, we have tested our framework with a
variety of different braided hairstyles. As illustrated in Figure 4.2 and 4.12, our method can
successfully reconstruct complex braided hairstyles with different numbers of strands per braid,
different numbers of braids, different braid geometry (varying width and thicknesses with global
twist and bending), and different topology (e.g., the merged style at the bottom of Figure 4.12).
4.7 Results 39
Comparisons In Figure 4.9 we compare our method with two state-of-the-art hair capture
techniques [40, 65], using the same input point cloud and orientation field. Luo et al. [ 65]
connects hair threads via geometric heuristics. Such heuristics often fail when there are
significant occlusions in braids, causing broken hair threads in the final reconstruction. Hu et al.
[40] uses database of example strands from pre-simulation and user sketches to produce better
customized hair connections. However, the strand groups can still yield incorrect topological
configuration due to potential ambiguity in the orientation fields, e.g. at adjacent segments
coming from different braids.
In contrast, our approach produces structurally correct strands for braided hairstyles by leverag-
ing our procedurally generated patches.
Evaluations To evaluate the robustness of our algorithm (see Figure 4.10), we reconstruct
the same input hairstyle of a fishtail braid (Figure 4.10a) using different example patches.
Specifically, we restrict the number and type of example patches within the database and first
check if our method produces smaller fitting errors for more suitable patches. We also see if our
technique generates plausible results even with examples that are different from the captured
input.
As demonstrated, our method can successfully distinguish between different example patches
based on the fitting errors. Moreover plausible reconstructions can be obtained even without
structurally correct examples in the database. Figure 4.11 further demonstrates the robustness
of our method against the estimated scale of example patches. Specifically, we uniformly scale
the example patches in Figure 4.3f by different factors s before running our patch fitting and
structure analysis algorithm. As shown in Figure 4.11, our method produces good results with a
wide range of estimated scales.
Implementation The input hairstyles shown in this paper are manually woven and styled,
taking several minutes for an inexperienced person per hairstyle. The most time-consuming
stage in our pipeline is the input pre-processing part. Computing a 3D orientation field from a
set of 2D images takes about 40 minutes, while manual clean-up of the mesh takes about half
an hour. The patch fitting step requires five minutes of computation for one patch using 100
different initial poses (parallelized on four cores and eight threads). In our implementation, all
the numbers length related measurements are in millimeters and all angles are in radians. We
4.7 Results 40
compute the distance in Section 4.5 according Section 4.4, and set the threshold to 100 in all
our results except 200 for Figure 4.10c and Figure 4.10d. The grid size for an orientation field
in Section 4.6 is set to 1mm, and building a vector field takes about five minutes. The number
of final output strands ranges from 30K to 50K. All the other computations can be done within
seconds (on a 2.6 GHz Intel Core i7 and 16 GB RAM machine).
4.7 Results 41
Figure 4.6 Illustration of our structure analysis algorithm.
Here we show (a) input mesh; (b) candidate patches collected during the fitting stage; (c)
colorized visualization of labeling result on input mesh; (d) patches selected by the labeling
step; (e) centerlines of the selected patches; (f) connected centerlines; (g) surface expanded
from the connected centerlines; (h) fitted surface after non-rigid ICP.
4.7 Results 42
Figure 4.7 Adding fuzziness [20] over the final output strands.
Figure 4.8 Our capture setups.
4.7 Results 43
Figure 4.9 Comparisons with state-of-the-art hair capture methods.
Figure 4.10 Comparisons with different example patches.
We fit a fishtail braid (a) using different example patches in our database (b) - (e); shown in
each are the example patch(es), the extracted structure, the output strands, and the fitting error
E computed via Section 4.4.
4.7 Results 44
Figure 4.11 Comparisons with different estimated scales for the example patch in the database.
4.7 Results 45
Figure 4.12 Reconstruction results of different braided hairstyles.
From left to right: original photo, 2D orientation map, input mesh to our core algorithm, labeling
result, extracted structure, and rendering of final strands from two different viewing points.
Chapter 5
Single-View Hair Modeling Using A
Hairstyle Database
5.1 Overview of Our Approach
In the previous chapter, We introduce a multi-view hair capture framework based on example
strands generated through hair simulation. However, the system relies on complex capture
settings and also requires a costly multi- view stereo system. While In this work, we present
a system that creates a high quality 3D hair model from a single input reference photo using
a database of full 3D hairstyles and a few user strokes. Our hairstyle database is obtained
by collecting more than 300 hairstyles which are created manually and publicly available on
some online game communities [30, 71]. We propose a coarse-to-fine modeling strategy and
adopt a hierarchical two-level representation to accurately model the target hairstyle. As shown
in Figure 5.1, at the coarse scale, we first retrieve the example hairstyles from the database
which best match the 2D user strokes. These strokes outline the representative structures in the
target hairstyle which are necessary for revealing the full hair connectivity and topology, often
occluded in single-view images. We then combine the retrieved examples into a 3D orientation
field by considering both the user strokes and local orientation consistency in a Markov Random
Field (MRF) framework. We use the combined 3D orientation field to produce the structure
of the target hairstyle by growing strands from the scalp of a fitted 3D head model. At the
fine scale, we first deform the combined hair strands to match the 2D orientations of the target
hairstyle. To produce physically plausible hair strands during the refinement, one possibility is
5.2 Database Construction 47
(a) (b) (c) (d) (e) (f)
Figure 5.1 Our system takes as input a reference photo (a), a few user strokes (b) and a database
of example hairstyles (c) to model the 3D target hairstyle (d). The retrieved examples that best
match the user strokes are highlighted with the corresponding colors in (c) which are combined
consistently as shown in (d). Original image courtesy of Yung-Yuan Kao.
to simulate the strands. However, while inferring simulation parameters has been demonstrated
on 3D data [27], those variables are extremely difficult to estimate from 2D input. Since hair
strands can be represented by piecewise helices [5], we adopt the method proposed by Cherin et
al. [19] to fit piecewise 3D helix curves to the 2D projections of the target hair strands. Finally
we refine the output strands by jointly optimizing the similarity to these fitted helices as well as
the orientation consistency between neighboring strands.
5.2 Database Construction
We construct a database of hairstyles spanning a wide range of overall shapes and different local
details by collecting 343 examples from several repositories available online [30, 71]. Most
of these hairstyles are manually created by various artists and gamers. The original data of
each hairstyle is specified as a set of triangle meshes, while each mesh represents a wisp of hair
strands with consistent orientation and length.
Since many of the original hairstyle models are created under different poses, we manually
transform the global pose of each hairstyle to roughly align them with a fixed standard head
model by translating, rotating and/or scaling. To convert the mesh into hair strands for later
computation, we first uniformly sample all the meshes in the hairstyle following the local
orientation and obtain a set of hair strands, where each strand is represented as a set of equally
5.3 Modeling Large-Scale Structure 48
(a) (b) (c)
Figure 5.2 From left to right: (a) the original mesh (downloaded from online repositories and
roughly aligned with a given head model); (b) hair strands by uniformly sampling the original
mesh; (c) the final example by growing hair strands from the scalp.
spaced sample points. Next we build a 3D orientation field from the hair strands using the
method of Wang et al. [97] and smoothly diffuse the field into the entire 3D volume as proposed
by Paris et al. [74]. Guided by the diffused 3D orientation field, we grow hair strands from
10000 uniformly distributed roots on the scalp and consider these strands as an example hairstyle
E in the database DB. See Figure 5.2 for a concrete example. We finally augment the database
by flipping each example w.r.t. the plane of reflection symmetry of the head model and obtain a
database containing 686 examples in total.
5.3 Modeling Large-Scale Structure
Given a reference photograph of the 3D target hairstyle, we first determine a transformation
matrix T to align the standard head model with the head pose in the photo. For photographs of
frontal faces, the transformation matrix can be automatically computed based on the detection
of a set of 2D facial feature points [1]. For other photos we manually align the head to estimate
the transformation matrix approximately based on visual observation.
Next, we let the user draw a few 2D strokes over the reference photo to guide the modeling of
the large-scale structure. For each stroke, we search for the best matching example hairstyle in
5.3 Modeling Large-Scale Structure 49
(a) (b) (c)
Figure 5.3 From left to right: (a) user strokes with the latest one shown with red; (b) the
best matching example hairstyle retrieved from the database based on the red stroke; (c) the
combined hairstyle at the current step, with each hair sample colorized according to the label of
the grid which contains the sample point.
the database (Section 5.3.1). We then combine all the retrieved example hairstyles together to
form a consistent large-scale structure of the 3D target hairstyle (Section 5.3.2). In addition,
we provide some optional editing operations to refine the modeling result of the large-scale
structure (Section 5.3.3).
5.3 Modeling Large-Scale Structure 50
5.3.1 Example Retrieval
For many reference photos, it is difficult or nearly impossible to automatically extract the
complete structure of hair strands due to the complexity of the hairstyles, self-occlusions
between hair strands, and the conditions under which the photos were taken. As a result, we
ask the user to draw several 2D strokes{U} based on his/her observation of the reference
photograph. Each stroke is required to be from the root to the tip following the orientation
of hair strands in the photograph. We consider these user strokes as essential structures of
the target hairstyle projected onto the image plane, which are necessary to reveal the full hair
connectivity and topology.
To measure the difference between a 2D user strokeU and a 3D hair strand S, we project the
strand onto the image plane using the transformation matrix T mentioned above. We find the
closest sample s
j
on the projected strand for each sample s
i
on the user strokeU and compute
the difference as:
D
U,S
=
∑
s
i
∈U
min
s
j
∈S
|p(s
i
)− p(s
j
)| (5.1)
where|p(s
i
)− p(s
j
)| is the distance between the positions of s
i
and s
j
on the image plane.
We define the difference between a user stroke U and an example hairstyleE in the database by
simply searching for the hair strand S of the hairstyle that minimizes the stroke-strand difference
as defined in Equation 5.1:
D
U,E
= D
U,{S
i
}
= min
S
i
∈E
D
U,S
i
(5.2)
For each user strokeU
i
, we search for the best matching example hairstyleE
i
from the database
DB which minimizes the stroke-hairstyle difference defined in Equation 5.2. We also store the
best matching hair strand S
i
∈E
i
which minimizes the stroke-strand difference among all the
strands inE
i
for the computation in Section 5.3.2.
Discussion. We have tried several different metrics for the retrieval of the best matching
example, including an alternative way based on local curvatures. We have found that the simple
form in Equation 5.1 to be highly effective and sufficiently robust since we focus on large-scale
structures rather than local shape details during this retrieval step.
5.3 Modeling Large-Scale Structure 51
(a) (b) (c) (d)
Figure 5.4 From left to right: (a) reference photo; (b) manually extracted binary mask; (c) the
hair structure before editing; (d) the hair structure after editing. Original image courtesy of Eu
Hairdresser.
5.3.2 Hairstyle Combination
After the example retrieval step as described above, our next goal is to combine these retrieved
hairstyles into a consistent large-scale structure representing the target hairstyle. The combined
hairstyle should follow the guidance of the user strokes{U
i
} and maintain local consistency
while the retrieved example hairstyles{E
i
} may be significantly different from each other. Due
to the diversity of the retrieved hairstyles{E
i
}, it is usually difficult to directly combine or
blend different hair strands together. As a result, we build a 3D orientation field F
i
for each
retrieved hairstyleE
i
[97], and perform the combination over the grids of these fields.
We formulate the task as a multi-label assignment problem. Specifically, we consider each
retrieved example hairstyleE
i
together with the best matching hair strand S
i
∈E
i
as a label l
i
,
and try to assign the optimal label for each grid g in the 3D space by minimizing the following
energy function:
E
{g
i
},{l
i
}
=
∑
i
E
d
g
i
,l
i
+
∑
i,g
j
∈N(g
i
)
E
s
g
i
,g
j
,l
i
,l
j
(5.3)
5.3 Modeling Large-Scale Structure 52
where the first term is a fitness term to ensure that the combined orientation field follows the
guidance of the user strokes, and is computed as the minimum distance between the grid center
and the best matching strand S
l
i
corresponding to the label l
i
:
E
d
g
i
,l
i
= min
s
j
∈S
l
i
|p(g
i
)− p(s
j
)| (5.4)
where p(g
i
) and p(s
j
) are the positions of the grid center g
i
and the sample s
j
respectively. The
second term in Equation 5.3 is a smoothness term to ensure that every pair of adjacent grids
(i.e., g
j
belongs to the neighborhoodN(g
i
) of g
i
) are assigned to labels with consistent local
orientations:
E
s
g
i
,g
j
,l
i
,l
j
=
0 if l
i
= l
j
5 elif l
i
,l
j
compatible
100 otherwise
(5.5)
In Equation 5.5, we consider two different labels l
i
and l
j
to be compatible for two adjacent
grids g
i
and g
j
, if and only if the corresponding orientation fields have similar local orientations
o
i
= F
i
(g
i
) and o
j
= F
j
(g
j
), i.e., the dot product between o
i
and o
j
is larger than a threshold
τ = 0.7.
The energy function defined in Equation 5.3 can be minimized via the graph-cuts algorithm as
described in Delong et al. [25]. After computing the optimal label for each grid, we obtain a
combined orientation field F
c
where the local orientation is specified according to the assigned
label for each grid. We then grow the hair strands from the combined orientation field F
c
to obtain the combined hairstyleH
c
as the large-scale structure of the target hairstyle. See
Figure 5.3 for an illustration about combining multiple hairstyles.
5.3.3 Structure Editing
Once we obtain the combined hairstyleH
c
, we can optionally perform editing operations
on the large-scale structure. In particular, we have implemented a cutting tool to manipulate
the contour ofH
c
and introduce some random variations to the length of hair strands inH
c
,
following the method introduced by Chai et al. [17]. See Figure 5.4 for an example of editing
the contour with a 2D mask which is manually prepared from the reference photo.
5.4 Modeling Strand Shapes 53
(a) (b) (c) (d)
Figure 5.5 From left to right: (a) reference photo; (b) 2D orientation map; (c) four hair
strands before deformation; (d) the corresponding hair strands after deformation (100 iterations).
Original image courtesy of Gokhan Altintas.
5.4 Modeling Strand Shapes
After obtaining the large-scale structureH
c
of the target hairstyle, we refine the shape of each
individual strand inH
c
such that (1) the projected 2D local details of the modeling result are as
close as possible to the reference photo (Section 5.4.1); (2) the shapes of the 3D strands are
physically plausible (Section 5.4.2); and (3) the global structure of the hairstyle is preserved
while maintaining the local coherency between neighboring strands (Section 5.4.3).
5.4.1 2D Orientations
We compute a 2D orientation map M from the reference photo using the method described
by Luo et al. [65]. Then we deform each hair strand S∈H
c
based on the 2D orientation map
as well as the visibility of each hair strand in the view of the reference photo. We consider a
hair strand as visible if and only if its root is visible from current point of view. We found this
simple heuristic to work well in practice for all of our examples. Our key idea to enforce the
similarity of 2D orientations is to gradually deform each visible hair strand with 100 iterations
so that for each sample s
i
on the strand S, the projected local orientation o(s
i
)= p(s
i+1
)− p(s
i
)
is as close as possible to the local orientation M(s
i
) on the 2D map.
5.4 Modeling Strand Shapes 54
(a) (b) (c) (d)
Figure 5.6 From left to right: (a) hair strand before fitting; (b) the fitted piecewise helix curve
with each segment visualized using a different color; (c) & (d) another view of (a) & (b).
In each iteration, every hair strand S is projected onto the image plane based on the viewpoint of
the photograph. We then deform the strand S according to the 2D orientation map by minimizing
the following energy:
E
{p(s
i
)}
=
∑
i
|p(s
i
)− ¯ p(s
i
)|
2
+α
1
|o(s
i
)− M(s
i
)δ|
2
+α
2
|p(s
i− 1
)− 2p(s
i
)+ p(s
i+1
)|
2
(5.6)
where p(s
i
) is the projected 2D position of sample s
i
, ¯ p(s
i
) the initial position of sample s
i
before the deformation, s
i− 1
and s
i+1
the predecessor and successor of s
i
on the same strand,
andδ the average length between two adjacent samples on the strand. The constantsα
1
andα
2
are specified to control the relative weights of the orientation and curvature terms with respect
to the first position term. We set α
1
= 10 andα
2
= 10 for all the results in our paper. We solve
the linear system about all the sample positions{p(s
i
)} to minimize Equation 5.6 and obtain
the deformed shape of strand S. See Figure 5.5 for an illustration about the deformation.
5.4 Modeling Strand Shapes 55
5.4.2 Depth estimation
The reference photo does not provide any depth information for the hair strands, and the 3D
strand shapes after the deformation step described above may not be physically plausible.
In theory, when there is no external force like gravity or friction, the 3D shape of each hair
strand will be a helix whose curvature and torsion are constants. However, in reality most of
the hair strands are influenced by the global gravity and hair-hair interactions. Furthermore,
the curvature of the helix does not remain invariant under 2D projections. As a result, it is
impossible to accurately depict the curvature and torsion values solely from the 2D projection
of a hair strand.
Inspired by the fact that a hair strand can be modeled as piecewise helices [5], to estimate the
depth information of a given strand S, we fit its projected 2D shape S
p
on the image plane with
a piecewise helix curve S
f
, using the method proposed by Cherin et al. [19] (see Figure 5.6).
To determine the position of the fitted strand shape in the global coordinate system, we translate
the piecewise helix curve S
f
to align the first sample to the original position before the fitting.
5.4.3 3D Shapes
To apply the estimated depth information to the entire hairstyle, we first group all the hair
strands into 300 clusters via k-means clustering based on the root positions and strand shapes
as in Wang et al. [97]. For each cluster, we apply the helix fitting algorithm described above to
the center strand S
c
and get the fitted piecewise helix curve S
f
. To combine the local details of
S
f
with the overall structure of S
c
, we align S
f
with S
c
using 40 iterations of non-rigid ICP [59]
and obtain a deformed strand S
d
.
Next, we deform the other strands in the same cluster as S
c
consistently. One possible way is to
directly transfer the per-sample offset between S
c
and S
d
to each strand in the cluster. But this
naïve approach cannot generate coherent deformation result, since it does not take into account
the global pose of the strand cluster. Consequently, we adopt a linear blend skinning approach
[58] to transfer the deformation between S
c
and S
d
to other strands. Specifically, we consider
S
c
as the rest pose shape and S
d
as the deformed shape. We use each segment between two
adjacent samples on both S
c
and S
d
as a bone. For each bone B
c
in S
c
, we compute the twist-free
material frame [4], and measure the transformation between B
c
and the corresponding bone
B
d
in S
d
within the material frame. For each sample s on other strands in the same cluster, we
5.5 Results 56
(a) (b) (c) (d)
Figure 5.7 From left to right: (a) the deformed center strand S
d
; (b) strands in the same cluster
before deformation with the center strand S
c
highlighted in blue; (c) the naïve deformation
result by directly transferring the per-sample offset; (d) our deformation result via linear blend
skinning.
search for the four closest bones of S
c
and compute the skinning weights based on the distances.
Then we blend the transformations of those bones linearly according to the weights and apply
the blended transformation to get the deformed position for s. See Figure 5.7 for a comparison
of deformation results between a naïve offset transfer method and the linear blend skinning
technique.
5.5 Results
5.5.1 Hairstyles.
To demonstrate the generality and effectiveness of our method, we experiment with a variety of
examples from single-view photographs, as shown in Figure 5.1 and 5.11. Specifically, we can
handle different overall shapes (including both long and short hair strands), different curliness
(ranging from straight strands to very curly ones) and different head poses (including frontal,
profile and back views). Thanks to the hairstyle database, we can generate modeling results
5.5 Results 57
(a) (b) (c)
Figure 5.8 From left to right: (a) reference photos; (b) results by [17]; (c) our results. From top
to bottom, original images courtesy of Chris Zerbes and Georg Sander.
(a) (b) (c)
Figure 5.9 From left to right: (a) reference photo; (b) the reconstruction result of [40] based on
capturing the target hairstyle from 50 different views; (c) our result from the single view of (a).
with complete hair structures even when the input photos are cropped and largely incomplete
(Figure 5.1, the second row in Figure 5.11). The user only needs to provide a few 2D strokes
using the input photo as reference. Our method then automatically generates plausible 3D hair
strands of the target hairstyle.
5.5 Results 58
5.5.2 Comparisons.
We compare our approach with the state-of-the-art sketch-based single-view hair modeling
method [17]. Their method relies on information provided by the input photo and some
heuristics for depth estimation which only work well for images from the frontal view. As a
result, their method cannot handle challenging cases when the hair/head are partially missing
(e.g., Figure 5.1) or the photos are not taken from the frontal view (e.g., the last three rows
in Figure 5.11). By leveraging the prior knowledge of our 3D hairstyle database, we are
able to generate a reasonable and complete structure of the target hairstyle even when the
reference photo is incomplete. Furthermore, our strand synthesis algorithm based on piecewise
helix fitting and linear blend skinning can produce more natural and faithful 3D strand shapes
compared to existing techniques as shown in Figure 5.8. We also compare our approach with
cutting-edge 3D hair capture method [40]. As shown in Figure 5.9, the overall quality of our
output is comparable to theirs, even though our approach uses much less information as input (a
single reference photo as in Figure 5.9 versus a dense 3D point cloud obtained from 50 images).
The user strokes for Figure 5.8 and 5.9 are shown in the accompanying video.
5.5.3 Evaluations.
To assess the robustness of our method, we produce several modeling results using different sets
of user strokes on the same reference photo. Intuitively, the modeling results are faithful when
more strokes are sketched. In Figure 5.10, we collected different sets of strokes from three users
to guide the modeling process from the same reference photo of a profile view. The large-scale
structures of the modeling results from different user strokes can be quite different, because the
retrieved example hairstyles can vary significantly. However, all three results in Figure 5.10 are
visually reasonable solutions and closely match the local details of the reference photo.
5.5 Results 59
(a) (b) (c)
Figure 5.10 From left to right: (a) the user strokes; (b) combined hairstyles visualized from two
different views; (c) final hair strands visualized from two different views.
5.5 Results 60
Figure 5.11 From left to right: the reference photo; user strokes; colored visualization of the
hairstyle combination result; combined hairstyle from two different views; final hair strands
from two different views. From top to bottom, original images courtesy of Gokhan Altintas,
Kris Krüg, Rivodoza, Maria Morri, Wallace Lan, and Eu Hairdresser.
Chapter 6
Avatar Digitization From a Single Image
For Real-Time Rendering
6.1 Overview of Our Approach
In this chapter, we present a fully automatic framework that digitizes a complete 3D head with
hair from a single unconstrained image as shown in Figure 6.1. Our system offers a practical
and consumer-friendly end-to-end solution for avatar personalization in gaming and social VR
applications. The reconstructed models include secondary components (eyes, teeth, tongue, and
gums) and provide animation-friendly blendshapes and joint-based rigs. While the generated
face is a high-quality textured mesh, we propose a versatile and efficient polygonal strips
(polystrips) representation for the hair. Polystrips are suitable for an extremely wide range of
hairstyles and textures and are compatible with existing game engines for real-time rendering.
In addition to integrating state-of-the-art advances in facial shape modeling and appearance
inference, we propose a novel single-view hair generation pipeline, based on 3D-model and
texture retrieval, shape refinement, and polystrip patching optimization. The performance of
our hairstyle retrieval is enhanced using a deep convolutional neural network for semantic
hair attribute classification. Our generated models are visually comparable to state-of-the-art
game characters designed by professional artists. For real-time settings, we demonstrate the
flexibility of polystrips in handling hairstyle variations, as opposed to conventional strand-based
6.1 Overview of Our Approach 62
input image
face mesh and
hair polystrips
3D avatar input image
face mesh and
hair polystrips
3D avatar input image
face mesh and
hair polystrips
3D avatar
Figure 6.1 We introduce an end-to-end framework for modeling a complete 3D avatar from a
single input image for real-time rendering. We infer fully rigged textured faces models and
polygonal strips for hair. Our flexible and efficient mesh-based hair representation is suitable
for a wide range of hairstyles and can be readily integrated into existing real-time game engines.
All of the illustrations are rendered in realtime in Unity. President Trump’s picture is obtained
from whitehouse.gov and Kim Jong-un’s photograph was published in the Rodong Sinmun.
The other celebrity pictures are used with permission from Getty Images.
input image face modeling
face texture
reconstruction
faces and hair
segmentation
hairstyle
digitization
hair appearance matching (shader, texture, alpha mask, bump map, color)
real-time
3D avatar rendering
facial rigging (blendshape, joint-based, secondary components)
Figure 6.2 Our single-view avatar creation framework is based on a pipeline that combines both
complete face digitization and hair polystrip digitization—both geometry and appearance are
captured. Original image courtesy of Getty Images.
representations. We further show the effectiveness of our approach on a large number of images
taken in the wild, and how compelling avatars can be easily created by anyone.
Our end-to-end pipeline for face and hair digitization is illustrated in Figure 6.2. An initial
pre-processing step computes pixel-level segmentation of the face and hair regions. We then
produce a fully rigged avatar based on textured meshes and hair polystrips from this image. We
decouple the digitization of face and hair since they span entirely different spaces for shape,
6.1 Overview of Our Approach 63
appearance, and deformation. While the full head topology of the face is anatomically consistent
between subjects and expressions, the mesh of the hair model will be unique for each person.
6.1.1 Image Pre-Processing.
Segmenting the face and hair regions of an input image improves the accuracy of the 3D
model fitting process, as only relevant pixels are used as constraints. It also provides additional
occlusion areas, that need to be completed during texture reconstruction, especially when the
face is covered by hair. For the hair modeling step, the silhouette of the segmented hair region
will provide important matching cues.
We adopt the real-time and automatic semantic segmentation technique of [86] which uses a
two-stream deconvolution network to predict face and hair regions. This technique produces
accurate and robust pixel-level segmentations for unconstrained photographs. While the original
implementation is designed to process face regions, we repurpose the same convolutional neural
network to segment hair. In contrast to the image pre-processing step of [13], ours is fully
automatic.
To train our convolutional neural network, we collected 9269 images from the public LFW
face dataset [44] and produce the corresponding binary segmentation masks for both faces
and hair via Amazon Mechanical Turk (AMT) as illustrated in Figure 6.3. We detect the face
in each image using the popular Viola-Jones face detector [95] and normalize their positions
and scales to a 128× 128 image. To avoid overfitting, we augment the training dataset with
random Gaussian-distributed transformation perturbations and produce 83421 images in total.
The standard deviations are 10
◦ for rotations, 5 pixels for translations, and 0.1 for scale, and
the means are 0, 0, and 1.0 respectively. We further use a learning rate of 0.1, a momentum
of 0.9, and weight decay of 0.0005 for the training. The optimization uses 50,000 stochastic
gradient descent (SGD) iterations which take roughly 10 hours on a machine with 16GB RAM
and NVIDIA GTX Titan X GPU. We refer to the work of [86] for implementation details.
Once trained, the network outputs a multi-class probability map (for face and hair) from an
arbitrary input image. A post-hoc inference algorithm based on dense conditional random field
(CRF) [57] is then used to extract the resulting binary mask. Successful results and failure cases
are presented in Figure 6.3.
6.1 Overview of Our Approach 64
training data successful results failure cases
Figure 6.3 Hair segmentation training data, successful results, and failure cases.
6.1.2 Face Digitization.
We first fit a PCA-based linear face model for shape and appearance to the segmented face
region. Next, a variant of the efficient pixel-level analysis-through-synthesis optimization
method of [92] is adopted to solve for the PCA coefficients of the 3D face model and an initial
low-frequency albedo map. We use our own artist-created head topology (front and back head)
with identity shapes transferred from [8] and expressions from [12]. A visibility constraint
is incorporated into the model fitting process to improve occlusion handling and non-visible
regions. A PCA-based appearance model is constructed for the textures of the full head, using
artist-painted skin textures in missing regions of the original data samples. We then infer high-
frequency details to the frontal face regions even if they are not visible in the capture using a
feature correlation analysis approach based on deep neural networks [87]. Finally, we eliminate
the expression coefficients of our linear face model to neutralize the face. The resulting model
is then translated and scaled to fit the eye-balls using the average pupillary distance of an adult
human of 66 mm. We then translate and scale the teeth/gum to fit pre-selected vertices of the
mouth region. We ensure that these secondary components do not intersect the face using a
penetration test for all the FACS expressions of our custom animation rig.
6.1 Overview of Our Approach 65
6.1.3 Hair Digitization.
Our hair digitization pipeline produces a hair mesh model and infers appearance properties
for the hair shader. We first use a state-of-the-art deep convolutional neural network based on
residual learning [38] to extract semantic hair attributes such as hair length, level of baldness,
and the existence of hairlines and fringes. These hair attributes are compared with a large
hairstyle database containing artist created hair polystrip models. We then form a reduced
hairstyle dataset that only contains relevant models with compatible hair attributes. We then
search for the closest hairstyle to our input image based on the silhouette of its segmentation
and the orientation field of the hair strands. As the retrieved hairstyle may not match the input
exactly, we further perform a mesh fitting step to deform the retrieved hairstyle to the input
image using the silhouette and the input orientation field. We incorporate collision handling
between the deformed hair and the personalized face model to avoid hair meshes intersecting
the face mesh. The classification network for hair attribute classification also identifies hair
appearance properties for proper rendering such as hair color, texture and alpha maps, various
shader parameters, etc. Polystrip duplication is necessary, since the use of alpha masks for the
hair texture can cause a loss of scalp coverage during rendering. Consequently, we iteratively
identify the incomplete hair regions using multi-view visibility map and patch them with
interpolated hair strips. The hair polystrips are alpha blended using an efficient rendering
algorithm based on order-independent transparency with depth peeling [2].
6.1.4 Rigging and Animation.
Since our linear face model is expressed by a combination of identity and expression coef-
ficients [ 87], we can easily obtain the neutral pose. Using an example-based approach, we
can compute the face input’s corresponding FACS-based expressions (including high-level
controls) via transfer from a generic face model [61]. Our generic face is also equipped with
skeleton joints based on linear blend skinning (LBS) [75]. The face and secondary components
(eyes, teeth, tongue, and gums) also possess blendshapes. Eye colors (black, brown, blue, and
green) are detected using the same deep convolutional neural network used for hair attribute
classification [ 38] and the appropriate texture is used. Our model consists of 71 blendshapes,
and 16 joints in total. Our face rig also abstracts the low-level deformation parameters with
a smaller and more intuitive set of high-level controls as well as manipulation handles. We
implemented our rig in both the animation tool, Autodesk Maya, and the real-time game engine,
6.2 Face Digitization 66
Unity. We can rig our hair model directly with the skeleton joints of the head in order to add
a minimal amount of dynamics for simple head rotations. For more complex hair dynamics,
we also demonstrate a simple real-time physical simulation of our polystrip hair representation
using mass-spring models with rigid body chains and hair-head collisions [88].
6.2 Face Digitization
We first build a fully textured head model using a multi-linear PCA face model. Given a single
unconstrained image and the corresponding segmentation mask, we compute a shape V, a
low-frequency facial albedo map I, a rigid head pose(R,t), a perspective transformationΠ
P
(V)
with the camera intrinsic matrix P, and illumination L, together with high-frequency textures
from the visible skin region. Since the extracted high-frequency texture is incomplete from
a single-view, we infer the complete texture map using a facial appearance inference method
based on deep neural networks [87].
6.2.1 3D Head Modeling.
input image
without visibility
constraints
without visibility
constraints (uv map)
with visibility
constraints
with visibility
constraints (uv map)
Figure 6.4 Our facial modeling pipeline with visibility constraints produces plausible facial
textures when there are occlusions such as hair.
To obtain the unknown parametersχ ={V,I,R,t,P,L}, we adopt the pipeline of [92] which is
based on morphable face models [8] extended with a PCA-based facial expression model and
an efficient optimization based on pixel color constraints. We further incorporate pixel-level
visibility constraints using our segmentation mask obtained using the method of [86].
6.2 Face Digitization 67
We use a multi-linear PCA model to represent the low-frequency facial albedo I and the facial
geometry V with n= 10,822 vertices and 21,510 faces:
V(α
id
,α
exp
)=
¯
V+ A
id
α
id
+ A
exp
α
exp
,
I(α
al
)=
¯
I+ A
al
α
al
.
Here A
id
∈ R
3n× 40
, A
exp
∈ R
3n× 40
, and A
al
∈ R
3n× 40
are the basis of a multivariate normal
distribution for identity, expression, and albedo with the corresponding mean:
¯
V =
¯
V
id
+
¯
V
exp
∈
R
3n
, and
¯
I∈ R
3n
, and the corresponding standard deviation: σ
id
∈ R
40
, σ
exp
∈ R
40
, and
σ
al
∈ R
40
. A
id
, A
al
,
¯
V, and
¯
I are based on the Basel Face Model database [76] and A
exp
is
obtained from FaceWarehouse [12]. We assume Lambertian surface reflectance and approximate
the illumination using second order Spherical Harmonics (SH).
First, we detect 2D facial landmarks f
i
∈F using the method of Kazemi et al. [54] in order to
initialize the face fitting by minimizing the following energy:
E
lan
(χ)=
1
|F|
∑
f
i
∈F
∥ f
i
− Π
P
(RV
i
+t)∥
2
2
.
We further refine the shape and optimize the low-frequency albedo, as well as the illumina-
tion, by minimizing the photometric difference between the input image and a synthetic face
rendering. The objective function is defined as:
E(χ)= w
c
E
c
(χ)+ w
lan
E
lan
(χ)+ w
reg
E
reg
(χ), (6.1)
with energy term weights w
c
= 1, w
lan
= 10, and w
reg
= 2.5× 10
− 5
for the photo-consistency
term E
c
, the landmark term E
lan
, and the regularization term E
reg
. Following [87], we also
ensure that the photo-consistency term E
c
is only evaluated for visible face regions:
E
c
(χ)=
1
|M|
∑
p∈M
∥C
input
(p)− C
synth
(p)∥
2
,
6.2 Face Digitization 68
where C
input
is the input image, C
synth
the rendered image, and p∈M a visibility pixel given by
the facial segmentation mask. The regularization term E
reg
is defined as:
E
reg
(χ)=
40
∑
i=1
(
α
id,i
σ
id,i
)
2
+(
α
al,i
σ
al,i
)
2
+
40
∑
i=1
(
α
exp,i
σ
exp,i
)
2
.
This term encourages the coefficients of the multi-linear model to conform a normal distribution
and reduces the chance to converge into a local minimum. We use an iteratively reweighted
Gauss-Newton method to minimize the objective function (6.1) using three levels of image
pyramids. In our experiments, 30, 10, and 3 Gauss-Newton steps were sufficient for convergence
from the coarsest level to the finest one. After this optimization, a high-frequency albedo texture
is obtained by factoring out the shading component consisting of the illumination L and the
surface normal from the input image. The resulting texture map is stored in the uv texture map
and used for the high-fidelity texture inference.
6.2.2 Face Texture Reconstruction.
After obtaining the low-frequency albedo map and a partially visible fine-scale texture, we can
infer a complete high-frequency texture map, as shown in Figure 6.5, using a deep learning-
based transfer technique and a high-resolution face database [66]. The technique has been
recently introduced in [87] and is based on the concept of feature correlation analysis using
convolutional neural networks [35]. Given an input image I and a filter response F
l
(I) on
the layer l of a convolutional neural network, the feature correlation can be represented by a
normalized Gramian matrix G
l
(I):
G
l
(I)=
1
M
l
F
l
(I)
F
l
(I)
T
Saito et al. [87] have found that high-quality facial details (e.g., pores, moles, etc.) can be
captured and synthesized effectively using Gramian matrices. Let I
0
be the low-frequency
texture map and I
h
be the high-frequency albedo map with the corresponding visibility mask
M
h
. We aim to represent the desired feature correlation G
h
as a convex combination of G(I
i
),
where I
1
,...,I
k
are the high-resolution images in the texture database:
G
l
h
=
∑
k
w
k
G
l
(I
k
),∀l s.t.
K
∑
k=1
w
k
= 1.
6.2 Face Digitization 69
We compute an optimal blending weight{w
k
} by minimizing the difference between the feature
correlation of the partial high-frequency texture I
h
and the convex combination of the feature
correlations in the database under the same visibility. This is formulated as the following
problem:
min
w
∑
l
∑
k
w
k
G
l
M
(I
k
,M
h
)− G
l
M
(I
h
,M
h
)
F
s.t. ∑
K
k=1
w
k
= 1
w
k
≥ 0 ∀k∈{1,...,K}
,
(6.2)
where G
M
(I,M) is the Gramian Matrix computed from only the masked region M. This allows
us to transfer multi-scale features of partially visible skin details to the complete texture. We
refer to [87] for more detail.
Once the desired G
h
is computed, we update the albedo map I so that the resulting correlation
G(I) is similar to G
h
, while preserving the low frequency spatial information F
l
(I
0
) (i.e.,
position of eye brows, mouth, nose, and eyes):
min
I
∑
l∈L
F
F
l
(I)− F
l
(I
0
)
2
F
+α
∑
l∈L
G
G
l
(I)− G
h
2
F
, (6.3)
where L
G
is a set of high-frequency preserving layers and L
F
a set of low-frequency preserving
layers in VGG-19 [89]. A weightα balances the influence of high frequency and low frequency
andα= 2000 is used for all our experiments. Following Gatys et al. [35], we solve Equation 6.3
using an L-BFGS solver. Since only frontal faces are available in the database, we can only
enhance frontal face regions. To obtain a complete texture, we combine the results with the
PCA-based low-frequency textures of the back of the head using Poisson blending [77].
6.2.3 Secondary Components.
To enhance the realism of the reconstructed avatar, we insert template models for eyes, teeth,
gums, and tongue into the reconstructed head model. The reconstructed face model is rescaled
and translated to fit a standardized pair of eye balls so that each avatar is aligned as to avoid
scale ambiguity during the single-view reconstruction. The mouth-related template models are
aligned based on pre-selected vertices on the facial template model. After the initial alignment,
we test for intersections between the face and the secondary components for each activated
blendshape expression. The secondary models for the mouth region are then translated by the
6.3 Hair Digitization 70
input image input uv map inferred uv map
inferred textured
face model
Figure 6.5 We produce a complete and high-fidelity texture map from a partially visible and low
resolution subject using a deep learning-based inference technique. Original image courtesy of
Getty Images.
minimal offset where no intersection is present. The eye color texture (black, brown, green,
blue) is computed using a similar convolutional neural network for semantic attribute inference
as the one used for hair color classification. The input to this network is a cropped image of the
face region based on the bounding box around the 2D landmarks from [54], where non-face
regions are set to black and the image centered between the two eyes.
6.3 Hair Digitization
6.3.1 Hairstyle Database.
Starting from the USC-HairSalon database for 3D hairstyles, introduced in [41], and 89
additional artist created models, we align all the hairstyle samples to the PCA mean head
model
¯
V used in Section 6.2. Inspired by [16], we also increase the number of samples in our
database using a combinatorial process, which is necessary to span a sufficiently large variation
of hairstyles. While the online model generation approach of [41] is less memory consuming, it
requires some level of user interaction.
To extend the number of models, we first group each sample of the USC-HairSalon database
into 5 clusters via k-means clustering using the root positions and the strand shapes as in [97].
Next, for every pair of hairstyles, we randomly pick a pair of strands among the cluster centroids
6.3 Hair Digitization 71
input image
hair attribute
extraction
hair attributes
hair category
matching
hairstyle
retrieval
hair mesh
fitting
polystrip
patching
optimization
hairstyle database
reduced dataset
closest hairstyle
fitted hairstyle
reconstructed hair alpha mask
segmentation
and orientation
Short
curly
not spiky
...
...
...
Figure 6.6 Our hair mesh digitization pipeline. Original image courtesy of Getty Images.
6.3 Hair Digitization 72
and construct a new hairstyle using these two sampled strands as a guide using the volumetric
combination method introduced in [41]. We further augment our database by flipping each
hairstyle w.r.t. the x-axis plane, forming a total of 100,000 hairstyles.
For each hair model, the set of all particles forms the outer surface of the entire hair by
considering each hair strand as a chain of particles. This surface can be constructed using a
signed distance field obtained by volumetric points samples [ 110]. By using the surface normal
of this mesh, we compose close and nearly parallel hair strands into a hair polystrip, which
is a parametric piece-wise linear patch. This thin surface structure can carry realistic looking
textures that provide additional variations of hair, such as curls, crossings, or thinner tips.
Additionally, the transparency of the texture allows us to see through the overlay of different
polystrips and provide an efficient way to achieve volumetric hair renderings.
Luo et al. [65] proposed a method to group short hair segments into a ribbon structure. Adopting
a similar approach, we start from the longest hair strand in the hairstyle as the center strand
of the polystrip. By associating the normal of each vertex on the strand to the closest point
on the hair surface, we can expand the center strand on both sides of the binormal as well as
its opposite direction. We compute the coverage of all hair strands by the current polystrip,
and continue to expand the polystrip until no more strands are covered. Once a polystrip is
generated, we remove all the covered strands in the hairstyle, and reinitiate process from the
longest strand in the remaining hair strand subset. Finally, we obtain a complete hair polystrip
model, once all the hair strands are removed from the hairstyle. We refer to [65] for more
details.
6.3.2 Hair Attribute Classification.
We use 40K images from the CelebA dataset [63] with various hairstyles and collect their hair
attributes using AMT (see Table 6.1 for the list of hair attributes). Similarly, we manually label
all the hair models in our database using high level semantic attributes. We also actively ensure
that we have roughly the same quantity of images for each attribute by resampling the training
data.
These annotations are then fed into a state-of-the-art classification network, ResNet [ 38], to
train multiple classifiers predicting each hair attribute given an input image. We use the 50-layer
ResNet pre-trained with ImageNet [26], and fine-tune it using our training data under learning
6.3 Hair Digitization 73
rate 10
− 4
, weight decay 10
− 4
, momentum 0.9, batch size 32, and 90 epochs using the stochastic
gradient descent method. The images are augmented for the training based on perturbations
suggested by He et al. [38] (variations in cropping, brightness, contrast, and saturation).
During test time, input images are resized so that the maximum width or height is 256, center-
cropped to 224× 224, and fed into the trained classifiers. Each classifier returns a normalized
n-dimensional vector, where n= 2 for binary attributes and n= m for m-class attributes. The
predictions of all classifiers are then concatenated into a multi-dimensional descriptor. Nearest
neighbor search is then performed to find the k-closest matching hair with smallest Euclidean
distance in the descriptor space. If the classifier detects a bald head, the following hairstyle
matching process is skipped.
6.3.3 Hairstyle Matching.
After obtaining a reduced hair model subset based on the semantic attributes, we compare the
segmentation mask and hair orientations at the pixel level using pre-rendered thumbnails to
retrieve the most similar hairstyle [16]. Following Chai et al. [16], we organize our database
as thumbnails and adopt the binary edge-based descriptor from [111] to increase matching
efficiency. For each hairstyle in the database, we pre-render the mask and the orientation map
as thumbnails from 35 different views, where 7 angles are uniformly sampled in[− π/4,π/4] as
yaw and 5 angles in[− π/4,π/4] as pitch. If the hair segmentation mask has multiple connected
components due to occlusion or if the hair is partially cropped, then the segmentation descriptor
may not be reliable; in this case, we find the most similar hairstyle using the classifiers.
6.3.4 Hair Mesh Fitting.
In order to match the retrieved model with the silhouette and orientation of the input, we
extend the hair fitting algorithm for strands [ 16, 41] to the polystrip meshes. First, we perform
spatial deformation in order to fit the hair model to the personalized head model, using an
as-rigid-as-possible graph-based deformation model [59]. We represent the displacement of
each vertex on the hair mesh as a linear combination of the displacements of k-nearest vertices
6.3 Hair Digitization 74
retrieved
hairstyle
deformed
hairstyle
collision
handling
input
image
Figure 6.7 Our hair mesh fitting pipeline.
on the head mesh using the following inversely weighted Gaussian approximation:
dp
i
=
∑
j∈N
i
(1+∥p
i
− q
j
∥
2
+∥p
i
− q
j
∥
2
2
)
− 1
dq
j
,
where p and q are vertices on the hair and mean head mesh respectively. This allows the hair
model to follow the head deformation without causing intersection. Once the scalp and the
hair mesh is aligned, we compute a smooth warping functionW(·) mapping vertices on the 3D
model’s silhouette to the closest points on the input’s 2D silhouette from the camera angle, and
deform each polystrip according to the as-rigid-as-possible warping function presented in [59].
Then, we deform each polystrip to follow the input 2D orientation map as described in [16, 41].
Possible intersections between the head and the hair model due to this deformation are resolved
using simple collision handling via force repulsion [65].
6.3.5 Polystrip Patching Optimization.
With the benefit of having a low computational overhead, a polystrip-based rendering with a
bump map and an alpha mask produces locally plausible hair appearance for a wide range of
hairstyles. However, such rendering is prone to a lack of scalp coverage, especially for short
hairstyles. We propose an iterative optimization method to ensure scalp coverage via patching
with minimum increase in the number of triangles.
6.3 Hair Digitization 75
input hair model
multi-view scalp
visibility map
iteration 1 final result
Figure 6.8 Our iterative optimization algorithm for polystrip patching.
We measure the coverage by computing the absolute difference between the alpha map in a
model view space with and without hair transparency from multiple view points (see Figure 6.8).
Regions with high error expose the scalp surface and need to be covered by additional hair
meshes. Without transparency, all polystrips are rendered with alpha value 1.0. When a hair
alpha mask is assigned by the hair style classification, the polystrips are rendered via order-
independent transparency (OIT), resulting in alpha values of range[0,1]. First, we convert the
error map into a binary map by thresholding if the error exceeds 0.5, and apply blob detection
on the binary map. Given the blob with highest error, a new polystrip is then placed to cover
the area.
We find the k-closest polystrips to the region with the highest error and resample two polystrips
within this set so that their average produces a new one that covers this region. We use k= 6 for
all our examples. The two polystrips are re-sampled so that they have consistent vertex numbers
for linear blending. By averaging the polystrips, we can guarantee that the resulting strips are
inside the convex hull of the hair region. Thus, our method does not violate the overall hair
silhouette after new strips are added. We iterate this process until the highest error has reached
a certain threshold or when no more scalp region is visible.
6.3.6 Hair Rendering and Texturing.
We render the resulting polystrips using a variant of [84]. The hair tangents are directly obtained
from the directions of the mesh’s UV parameterization. We use our classification network to
determine the semantic shader parameters, such as the width and the intensity of the primary
6.3 Hair Digitization 76
straight dreadlock wavy
Figure 6.9 Example polystrip textures for characterizing high-frequency structures of different
hair types. Each texture atlas contains a 9-uv map for polystrips of different sizes.
and secondary highlights. To approximate the multiple scattering components, we add the
diffuse term from Kajiya and Kay [51]. We perform alpha blending between the polystrips
using an order-independent transparency (OIT) algorithm based on depth peeling.
Our classification network also specifies for each input image the most similar local hairstyle
texture. As illustrated in Figure 6.9, we characterize a hairstyle’s local high-frequency structure
into different categories. These textures are manually designed by an artist based on pre-
categorized images that are also used for training. As demonstrated in many games, these type
of hair textures can represent a wide range of hair appearances. As different hair types are
associated with custom shaders, some styles may be associated with a bump map, which is also
prepared by the artist.
For the texture lookups, we use a hierarchical UV atlas which depends on the world dimensions
of individual polystrips after the deformation step. The polystrip textures are grouped into nine
categories of sizes in a single map. Using multiple texture sizes for each hair patch reduces
stretching and compression artifacts in both U and V directions, and also increases texture
variations.
6.4 Results 77
6.4 Results
We created fully-rigged 3D avatars with challenging hairstyles and secondary components for a
diverse set of inputs from a wide range of image sets. Even though the input resolutions are
inconsistent, there is no a-priori knowledge about the scene illumination or intrinsic camera
parameters, and the subjects within the inputs may have tilted or partially covered heads with
different expressions, we were still able to produce automatically digitized outputs. We also
processed short and long hairstyles of different local structures including straight, wavy, and
dreadlock styles. As illustrated in Figure 6.15, our proposed framework successfully digitizes
textured face models and reproduces the volumetric appearance of hair, which is shown from
the front and the back. Facial details are faithfully digitized in unseen regions and fully covered
hair polystrips can be reconstructed using our iterative patching optimization algorithm. Our
accompanying video shows several animations produced by a professional animator using the
provided controls of our avatar. We also demonstrate an avatar animation applications using a
real-time facial performance capture system, as well as the simulated hair motions of our hair
polystrip models using a mass-spring system based on rigid body chains and hair-head collision
(see Figure 6.12).
6.4.1 Evaluation.
We evaluate the robustness of our system and consistency of the reconstruction using a variety
of input examples of the same subject as shown in Figure 6.10. Our combined facial segmenta-
tion [86], texture inference [87] and PCA-based shape, appearance, and lighting estimation [92]
framework is robust to severe lighting conditions. We can observe that the visual difference be-
tween the reconstructed albedo map of a same person, captured under contrasting illuminations,
is minimal. We also demonstrate how our linear face model can discern between a person’s
identity and its expression up to some degree. Our visualization shows the resulting avatar in
the neutral pose. While some slightly noticeable dissimilarity in the face and hair digitization
remains, both outputs are plausible. For large smiles in the input image, the optimized neutral
pose can still contain an amused expression.
While traditional hair database retrieval techniques [16, 41] are effective for strand-based output,
our hair polystrip modeling approach relies on clean mesh structures and topologies as they are
mostly preserved until the end of the pipeline. As shown in Figure 6.11, a deep learning-based
6.4 Results 78
input output different lighting output
different expression
output different pose output
Figure 6.10 We evaluate the robustness of our framework by validating the consistency of the
output under different capture conditions.
input image
with hair
classification
(5 attributes)
without hair
classification
(0 attributes)
with hair
classification
(10 attributes)
Figure 6.11 We assess the importance of our deep learning-based hair attribute classification.
Original image courtesy of Getty Images.
6.4 Results 79
Figure 6.12 Real-time hair simulation using a mass-spring system.
attribute possible values accuracy (%)
hair_length long/short/bald 72.5
hair_curve straight/wavy/curly/kinky 76.5
hairline left/right/middle 87.8
fringe full/left/right 91.8
hair_bun 1 bun/2 buns/... 91.4
ponytail 1 tail/2 tails/... 79.2
spiky_hair spiky/not spiky 91.2
shaved_hair fully/partially shaved 81.4
baldness fully bald/receded hair 79.6
Table 6.1 We train a network to classify the above attributes of hairstyles, achieving accuracies
around 70-90%.
hair attribute classification step is critical in avoiding wrong hair types being used during
retrieval. Table 6.1 lists a few annotated hair attributes, as well as their prediction accuracies
from the trained network. Although the predictions are sometimes not accurate due to the
lack of training data, we can still retrieve similar hairstyles which are further optimized by
subsequent steps in the pipeline.
6.4.2 Comparison.
We compare our method against several state-of-the-art facial modeling techniques and avatar
creation systems in Figure 6.13. Our deep learning-based framework [87] can infer facial
textures with more details comparing to linear morphable face models [8, 92], In addition
to producing high-quality hair models, our generated face meshes and textures are visually
comparable to the video-based reconstruction system of Ichim et al. [46]. We can also reproduce
6.4 Results 80
input image [Thies et al. 2016] our method our method (side)
input image [Ichim et al. 2015] our method our method (side)
input image [Cao et al. 2016] our method our method (side)
input image Loom.ai our method our method (side)
input image itSeez3D our method our method (side)
Figure 6.13 We compare our method with several state-of-the-art avatar creation systems.
Original image (row 4) courtesy of Getty Images.
6.4 Results 81
input image [Chai et al. 2016] our method
our method
(textured)
Figure 6.14 We compare our method with the latest single-view hair modeling technique,
AutoHair [16]. Original images (row 1, 2) courtesy of Getty Images.
similarly compelling avatars as in [13], but using only one out of many of their input images.
While their approach is still associated with some manual labor, our system is fully automatic.
Additionally, we provide two comparisons with two existing commercial solutions. In particular,
we notice that the system of Loom.ai [64] fails to retrieve the correct hairstyle, while itSeez3D’s
Avatar SDK [48] does not automatically produce hair models, nor allows the avatar to be
animated.
We further compare our polystrip-based results with the state-the-art single-view hair modeling
technique from Chai et al. [16] as shown in Figure 6.14. Their methods are constrained to strand-
based hairstyles and lose effectiveness on local features compared to our polystrips method.
While strand-based renderings are typically more realistic, we argue that our representation is
more versatile (especially for very short hair) and suitable for efficient character rendering in
highly complex virtual scenes. In particular, a single polystrip patch can approximate a large
6.4 Results 82
number of strands using a single texture with an alpha mask, which can significantly increase
rendering performance.
6.4 Results 83
input image face and hair mesh 3D avatar (side) 3D avatar (animated) 3D avatar face and hair mesh (side)
Figure 6.15 Our proposed framework successfully generates high-quality and fully rigged
avatars from a single input image in the wild. We demonstrate the effectiveness on a wide range
of subjects with different hairstyles. We visualize the face meshes and hair polystrips, as well
as their textured renderings. Original images courtesy of Getty Images.
Chapter 7
3D Hair Synthesis Using Volumetric
Variational Autoencoders
7.1 Overview of Our Approach
In the previous sections, we proposed two single-view hair digitalization methods [41, 43].
However, some fundamental limitations are accompanied with these single-view data-driven
methods: (1) The large amount of storage of the hair model database is prohibited from being
deployed on resource-constrained platforms, such as mobile devices; (2) the search steps are
usually slow and difficult to scale as the database grows to handle increasingly various hairstyles;
(3) these techniques also rely on well-conditioned input photographs and are susceptible to the
slightest failure during the image pre-processing and analysis step, such as bad hair segmentation
or incorrect face fitting.
To address the above challenges, we propose an end-to-end single-view 3D hair synthesis
approach using a deep generative model to represent the continuous space of hairstyles. We
implicitly model the continuous space of hairstyles using a compact generative model so that
plausible hairstyles can be effectively sampled and interpolated and hence, eliminate the need for
a comprehensive hair database. We also enable end-to-end training and 3D hairstyle inference
from a single input image by learning deep features from a large set of unconstrained images.
7.2 Hair Data Representation 85
Figure 7.1 Teaser.
Our method automatically generates 3D hair strands from a variety of single-view input.
Each panel from left to right: input image, volumetric representation with color-coded local
orientations predicted by our method, the final synthesized hair strands rendered from two
viewing points.
7.2 Hair Data Representation
In practice, a 3D hairstyle is usaully presented as a collection of strands which are grown from
a specific scalp. However, such representation is hard to be encoded into a neural network. The
key motivation of our data representation designing is how to convert existing strands based
hairstyle dataset [41], into a format, which can be easily handled by neural networks, with as
little information loss as possible. To achieve this, we adopt a similar intermediate conversion
used in the previous hair capture systems [73, 74, 96, 100] and use a 3D occupancy field and a
corresponding hair growing field to describe a 3D hairstyle.
Specifically, given a strands based hairstyle, we first extract the outer surface by using the
method proposed in Hu et al. [43]. Then the occupancy field O is generated by assigning 1 if the
grid center is inside of the surface and assigning 0 otherwise. Meanwhile, a corresponding 3D
flow field F is generated using the method proposed in Hu et al. [41]. We first compute the local
3D orientation for those grids inside the hair volume by averaging the orientations of nearby
strands [96]. Then we smoothly diffuse the flow field into the entire volume as proposed by
Paris et al. [73]. Conversely, given an occupancy field O and the corresponding flow field F, we
can easily regenerate arbitrary number of the 3D strands by growing from a pre-defined scalp.
The hair strands start from the roots on the scalp and grow following the local orientation of the
flow field F until hitting a 0 grid defined in the occupancy field O. In our implementation, we
7.3 V olumetric Variational Autoencoder 86
Figure 7.2 From left to right, we show the strands; constructed flow volume from the single
strand; regrow strands from the flow.
sample each 3D field of resolution 128× 192× 128. Longer hairstyles are better accommodated
by using a larger resolution along the vertical direction. We show a 2D scenario of volumetric
representation in Figure 7.2 and some concrete examples in Figure 7.3.
7.3 Volumetric Variational Autoencoder
Variational Autoencoder First, we aim to encode our volumetric representation into a low
dimensional space. We proposed an approach based on variational autoencoder (V AE), which
has emerged as one of the most popular approaches to unsupervised learning of complicated
distributions. [56, 82, 91]. A typical V AE consists of an encoderE
θ
(x) and a decoderD
φ
(z).
The encoderE
θ
takes an input x, and convert it into a smaller, denser latent code z, which the
decoderD
φ
can use to generate an output x
′
similar to the original input. The parametersθ and
φ of the encoder and the decoder can be jointly trained so that the reconstruction error between
x and x
′
is minimized. A variational autoencoder (V AE) [56] approximatesE
θ
(x) as a posterior
distribution q(z|x). By designing as a continuous latent space, it allows easy random sampling
and interpolation to generate new data. We train the encoding and decoding parametersθ and
φ using stochastic gradient variational Bayes (SGVB) algorithm [56] as follows:
θ
∗ ,φ
∗ = argmin
θ,φ
E
z∼ E
θ
(x)
− log p
x
z
+ D
kl
E
θ
(x)
p(z)
,
7.3 V olumetric Variational Autoencoder 87
volume+flow regrown strands strands
Figure 7.3 V olumetric hairstyle representation.
From left to right: original 3D hairstyle represented as strands; our representation using
occupancy and flow fields defined on regular grids, with the visualization of the occupancy field
boundary as mesh surface and encoding of local flow value as surface color; regrown strands
from our representation.
7.3 V olumetric Variational Autoencoder 88
orientation
field
occupancy
field
volumetric
encoder
!
σ
#~% 0,(
∗
z
+
occupancy field
decoder
orientation field
decoder
input image
hair coefficients
PCA
-1
hair strands
image
encoder
(ResNet-50)
regressor
y
+
,-./
+
0.-
+
1-
+
2
Figure 7.4 Our pipeline overview.
Our volumetric V AE consists of an encoder and two decoders (blue blocks) with blue arrows
representing the related dataflow. Our hair regression network (orange blocks) follows the red
arrows to synthesize hair strands from an input image.
where D
kl
denotes the Kullback-Leibler divergence. Assuming a multivariate Gaussian dis-
tribution E
θ
(x)∼ N(z
µ
,diag(z
σ
)) as a posterior and a standard isotropic Gaussian prior
p(z)∼ N(0,I), the Kullback-Leibler divergence is formulated as
D
kl
E
θ
(x)|N(0,I)
=
1
2
∑
i
1+ 2logz
σ,i
− z
2
µ,i
− z
2
σ,i
,
where z
σ
, z
µ
are the outputs ofE
θ
(x), representing the mean and standard deviation respectively.
To make all the operations differentiable for back propagation, the random variable z is sampled
from the distributionE
θ
(x) via a reparameterization trick [56] as below:
z= z
µ
+ε⊙ z
σ
, ε∼ N(0,I),
where⊙ is an element-wise matrix multiplication operator.
V AE Architecture From the volumetric representation of the collected 3D hairstyle dataset,
we train an encoder-decoder network to obtain a compact model for the space of 3D hairstyles.
The architecture of our V AE model is shown in Table 7.1. The encoder concatenates the
occupancy field O and flow field F together as volumetric input of resolution 128× 192× 128
and encode the input into a volumetric latent space z
µ
and z
σ
of resolution 4× 6× 4. Each
voxel in the latent space has a 64 dimension feature vector. Then we sample a latent code
7.3 V olumetric Variational Autoencoder 89
Net Type Kernel Stride Output
enc. conv. 4× 4 2× 2 64× 96× 64× 4
enc. conv. 4× 4 2× 2 32× 48× 32× 8
enc. conv. 4× 4 2× 2 16× 24× 16× 16
enc. conv. 4× 4 2× 2 8× 12× 8× 32
enc. conv. 4× 4 2× 2 4× 6× 4× 64
dec. transconv. 4× 4 2× 2 8× 12× 8× 32
dec. transconv. 4× 4 2× 2 16× 24× 16× 16
dec. transconv. 4× 4 2× 2 32× 48× 32× 8
dec. transconv. 4× 4 2× 2 64× 96× 64× 4
dec. transconv. 4× 4 2× 2 128× 192× 128×{ 1,3}
Table 7.1 Our volumetric V AE architecture.
The last convolution layer in the encoder is duplicated forµ andσ for reparameterization trick.
The decoders for occupancy field and orientation field are the same architecture except the last
channel size (1 and 3 respectively). The weights on the decoders are not shared. All the
convolutional layers are followed by batch normalization and ReLU activation except the last
layer in both the encoder and the decoder.
z∈R
4× 6× 4× 64
from z
µ
and z
σ
using the reparameterization trick [56]. The latent code z is
used as the input for the two decoders. One decoder generates the occupancy field and the other
one generates the flow field.
Loss Function Our loss function to train the network weights consists of reconstruction errors
for the occupancy field and the flow field, as well as KL divergence loss [ 56]. We use Binary
Cross-Entropy (BCE) loss for the reconstruction of occupancy field:
L
vol
=− 1
|V|
∑
i∈V
O
i
log
ˆ
O
i
+
1− O
i
log
1− ˆ
O
i
,
whereV demotes the uniformly sampled grids,|V| is the total number of grids, O
i
∈{0,1} is
the ground-truth occupancy field value at a voxel v
i
and
ˆ
O
i
is the value predicted by the network
and is in the range of[0,1].
7.4 Hair Regression Network 90
For 3D orientation field, we use L1 loss because L2 loss is known to produce over smoothed
prediction results [47]:
L
f low
=
∑
i∈V
O
i
f
i
− ˆ
f
i
1
∑
i∈V
O
i
, (7.1)
where f
i
and
ˆ
f
i
are the ground-truth and predicted flow vectors at voxel v
i
respectively. Our
KL-divergence loss is defined as:
L
kl
= D
kl
q
z
O,f
N
0,I
. (7.2)
where q is the Gaussian posteriorE
θ
(O,f). Then our total loss becomes
L=L
vol
+ w
f low
L
f low
+ w
kl
L
kl
, (7.3)
where w
f low
,w
kl
are relative weights for 3D flow field and KL divergence loss respectively.
7.4 Hair Regression Network
To achieve end-to-end single-view 3D hair synthesis, we train a regression network to predict
the hair latent code z in the latent space from input images by using the collected dataset of
portrait photos and the corresponding 3D hairstyles as training data (see Section 7.5).
Since our training data is limited, it is desirable to reduce the number of unknowns to be
predicted for more robust training of the regression. We assume the latent space of 3D hairstyles
can be well-approximated in a low-rank linear space. Based on this assumption, we compute
the PCA embedding of the volumetric latent space and use 512-dimensional PCA coefficients y
as a compact feature representation of the feasible space of 3D hairstyles. Then the goal of the
regression task is to match predicted hair coefficients ˆ y to the ground-truth coefficients y by
minimizing the following L2 loss:
L
y
=
y− ˆ y
2
. (7.4)
7.5 Networks Training Data 91
Note that we use z
µ
instead of stochastically sampled latent code z∼ N(z
µ
,z
σ
) for PCA
embedding to eliminate randomness in the regression process. Our hair regression pipeline is
shown in Figure 7.4 (bottom part).
Architecture We use pre-trained ResNet-50 [38] as an initialization and fine tune it as an
image encoder. We apply average pooling in the last convolution layer and take the output
vector as an image feature vector I∈R
2048
. The hair regression network consists of two 1024-
dimensional fully connected layers with ReLU and dropout layers in-between, followed by an
output layer with 512 neurons. Then the process of Iterative Error Feedback (IEF) [14, 52] is
adopt to train the hair regression network.
7.5 Networks Training Data
To train the hair regression network, we first collect 816 portrait images of various hairstyles
and different head poses. Then we use a state-of-the-art single view hair modeling method [41]
to reconstruct 3D hair strands for each image. The hair geometry is normalized and aligned
with a mean head model and then converted to our volumetric representation. Our portrait
dataset consists of examples modeled from normal headshot photographs. No extreme cases are
used during training (e.g. stylized illustrations, poorly illuminated images, or pictures of dogs).
None of our test images are included in the training set. Some training samples are shown in
Figure 7.5.
To train the encoding and decoding parameters of a V AE, we further enlarge the dataset by
adding the 343 hairstyles from the USC-HairSalon dataset
1
. In total, we have collected 1159
different 3D hairstyles. Some additional examples are shown in Figure 7.6. We further augment
the data by flipping each hairstyle via x = 0 plane and obtain a dataset of 2318 different
hairstyles. We randomly split the entire dataset into a training set of 2164 hairstyles and a test
set of 154 hairstyles.
1
http://www-scf.usc.edu/~liwenhu/SHM/database.html
7.6 Post-Processing 92
short spiky mid long curly occluded partial shaved profile back
Figure 7.5 Some training samples in our dataset for the hair regression network.
7.6 Post-Processing
After we predict the hair volume with local orientations using our hair regression network, we
can synthesize hair strands by growing them from the scalp following the orientation inside
the hair volume. Since we represent the 3D hair geometry in a normalized model space, the
7.6 Post-Processing 93
Figure 7.6 Some additional training samples in our dataset for the V AE network.
synthesized strands may not align with the head pose in the input image. If the head pose is
available (e.g., via manual fitting or face landmark detection) and the segmentation/orientation
can be estimated reliably from the input image, we can optionally apply several post-processing
steps to further improve the modeling results, following some prior methods with similar data
representation [16, 41, 43]. Starting from the input image, we first segment the pixel-level hair
7.7 Results 94
mask and digitize the head model [43]. Then we run a spatial deformation step as proposed in
Hu et al. [43] to fit our hairstyle to the personalized head model. Next we apply the mask-based
deformation method [16] to improve alignment with the hair segmentation mask. Finally, we
utilize the 2D orientation deformation and the depth estimation method introduced by Hu et
al. [41] to match the local details of synthesized strands to the 2D orientation map computed
from the input image.
7.7 Results
Single-View Hair Modeling We show single-view 3D hairstyle modeling results from a
variety of input images in Figures 7.1 and 7.7. For each image, we show the predicted occupancy
field with color-coded local orientation as well as synthesized strands with manually specified
color. Note that none of these test images are used for training of our hair regression network.
Our method is end-to-end and does not require any user interactions such as manually fitting
a head model and drawing some guided strokes. Moreover, several input images in Fig 7.7
are particularly challenging, because they are either over-exposed (the third row), have low
contrast between the hair and the background (the fourth row), of very low resolution (the fifth
row and the sixth row), or even illustrated in a cartoon style (the last two rows). Although our
training dataset for the hair regression network only consists of examples modeled from normal
headshot photographs without any extreme cases (e.g. poorly illuminated images or pictures of
dogs), our method generalizes pretty well due to the robustness of deep image features.
In Figure 7.8, we compare our method with a state-of-the-art automatic single-view hair
modeling techniques [16] using three challenging input images. Even though reliable face
detection/fitting is available in these three cases, their modeling results are less faithful compared
to ours, because they rely on decent quality of input image to compute some low-level features
such as 2D local orientations or hair segmentation.
We also compare our method with another recent automatic avatar digitization method [43] in
Figure 7.9. Their hair attribute classifier can successfully identify the long hairstyle for the first
image, but fails to retrieve a proper hairstyle from the database because the hair segmentation
is not accurate enough. For the second input image in Figure 7.9, their method generates a
non-ideal result because the classifier cannot correctly identify the target hairstyle as “with
fringe”.
7.7 Results 95
Figure 7.7 Modeling results of 3D hairstyle from single input image.
From left to right, we show the input image, occupancy field with color-coded local orientations
predicted by our hair regression pipeline, as well as the synthesized output strands. None of
these input images has been used for training of our regression network.
7.7 Results 96
[Chai et al. 2016] ours input image
Figure 7.8 Comparisons between our method with AutoHair [16] using the same input images.
7.7 Results 97
[Hu et al. 2017] ours input image
Figure 7.9 Comparisons between our method with the state-of-the-art avatar digitization method
[43] using the same input images.
Hair Interpolation Our compact representation of latent space for 3D hairstyles can be
easily applied for hair interpolation. Given multiple input hairstyles and we first compute the
corresponding hair coefficient in the latent space for each one. Then we use normalized weights
to interpolate the latent vector and generate the interpolated hairstyle via the decoder network.
We compare our hairstyle interpolation results with a state-of-the-art method [101], which
applies hairstyles interpolation in strands level directly. As shown in Figure 7.10, our compact
representation of hair latent space helps to produce much more plausible interpolation results.
7.7 Results 98
[Wen et al. 2103] ours
hairstyle A hairstyle B interpolation results
Figure 7.10 Comparison between strands interpolation [101] result and our latent space interpo-
lation result.
Chapter 8
Conclusion and Future Work
In conclusion, we have introduced a few data-driven approaches for hair digitization. Below,
we discuss several potential future directions, building on our current hair modeling algorithms.
High quality hairstyle database generation While we introduced USC-HairSalon dataset [41],
the models can only describe global shapes but limited to local variations and details. There
are two ways to create a high quality hairstyle database. First, we can ask lots of people
with different hairstyles and capture a wide variety of plausible hairstyles using the method
mentioned in Chapter 3 and Chapter 4. Second, we can use the database in Chapter 5 as a
base database, but capture local variations and details from the real world. Then an algorithm
is needed to combine the base model and the real world details to generate a reasonable high
quality hairstyle. Such high quality hair collection can be useful for further explorations of a
possible general database that can cover the space of all possible hairstyles, potentially improv-
ing the performance of data-driven hair capture techniques and impacting research for hairstyle
learning and classification.
Hair semantic training While the latent space of our volumetric V AE can compactly describe
the space of possible hairstyles, there is no semantic meaning associated to each sample. For
many graphics applications, it would be advantageous to provide high-level controls to a user
for intuitive analysis and manipulation. We plan to explore ways to provide semantic controls
for intuitive hairstyle modeling, such as learning a manifold of hairstyle space [11, 94].
100
High-resolution hair synthesis As with using any volumetric representation, the demand of
GPU memory imposes a considerable limitation in terms of hair geometry resolution. As our hair
synthesis algorithm is currently limited to strand-based hairstyles, it would be worth exploring
more generalized representations that can also handle very short hair, Afro hairstyles, or even
more rendering efficient polystrip models [ 43, 107]. An interesting direction is decomposing a
hairstyle into low frequency component and high frequency detail. Then try to learn two aspects
separately and compose together later
Hair appearance capturing While our current effort focuses on the geometry of hairstyles,
i.e., the 3D structure and shapes of hair strands, we are interested in exploring appearance
estimation techniques of hair strands, especially from a single reference photograph.
References
[1] Baltrusaitis, T., Robinson, P., and Morency, L.-P. (2013). Constrained local neural fields for
robust facial landmark detection in the wild. In IEEE ICCVW, pages 354–361.
[2] Bavoil, L. and Myers, K. (2008). Order independent transparency with dual depth peeling.
[3] Beeler, T., Bickel, B., Noris, G., Beardsley, P., Marschner, S., Sumner, R. W., and Gross,
M. (2012). Coupled 3d reconstruction of sparse facial hair and skin. ACM Trans. Graph.,
31(4):117:1–117:10.
[4] Bergou, M., Wardetzky, M., Robinson, S., Audoly, B., and Grinspun, E. (2008). Discrete
elastic rods. ACM Trans. Graph., 27(3):63:1–63:12.
[5] Bertails, F., Audoly, B., Cani, M.-P., Querleux, B., Leroy, F., and Lévêque, J.-L. (2006).
Super-helices for predicting the dynamics of natural hair. ACM Trans. Graph., 25(3):1180–
1187.
[6] Bertails, F., Audoly, B., Querleux, B., Leroy, F., Leveque, J.-L., and Cani, M.-P. (2005).
Predicting natural hair shapes by solving the statics of flexible rods. In Eurographics (short
papers), pages 81–84.
[7] Besl, P. and McKay, N. D. (1992). A method for registration of 3-d shapes. IEEE Trans. on
PAMI, 14(2):239–256.
[8] Blanz, V . and Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In
Proceedings of the 26th annual conference on Computer graphics and interactive techniques,
pages 187–194. ACM Press/Addison-Wesley Publishing Co.
[9] Bradley, D., Nowrouzezahrai, D., and Beardsley, P. (2013). Image-based reconstruction
and synthesis of dense foliage. ACM Trans. Graph., 32(4):74:1–74:10.
[10] Brock, A., Lim, T., Ritchie, J. M., and Weston, N. (2016). Generative and discriminative
voxel modeling with convolutional neural networks. In Advances in neural information
processing systems.
[11] Campbell, N. D. F. and Kautz, J. (2014). Learning a manifold of fonts. ACM Trans.
Graph., 33(4):91:1–91:11.
References 102
[12] Cao, C., Weng, Y ., Zhou, S., Tong, Y ., and Zhou, K. (2014). Facewarehouse: A 3d facial
expression database for visual computing. IEEE TVCG, 20(3).
[13] Cao, C., Wu, H., Weng, Y ., Shao, T., and Zhou, K. (2016). Real-time facial animation
with image-based dynamic avatars. ACM Trans. Graph., 35(4).
[14] Carreira, J., Agrawal, P., Fragkiadaki, K., and Malik, J. (2016). Human pose estimation
with iterative error feedback. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 4733–4742.
[15] Chai, M., Luo, L., Sunkavalli, K., Carr, N., Hadap, S., and Zhou, K. (2015). High-
quality hair modeling from a single portrait photo. ACM Transactions on Graphics (TOG),
34(6):204.
[16] Chai, M., Shao, T., Wu, H., Weng, Y ., and Zhou, K. (2016). Autohair: Fully automatic
hair modeling from a single image. ACM Transactions on Graphics (TOG), 35(4):116.
[17] Chai, M., Wang, L., Weng, Y ., Jin, X., and Zhou, K. (2013). Dynamic hair manipulation
in images and videos. ACM Transactions on Graphics (TOG), 32(4):75.
[18] Chai, M., Wang, L., Weng, Y ., Yu, Y ., Guo, B., and Zhou, K. (2012). Single-view hair
modeling for portrait manipulation. ACM Transactions on Graphics (TOG), 31(4):116.
[19] Cherin, N., Cordier, F., and Melkemi, M. (2014). Modeling piecewise helix curves from
2d sketches. Computer-Aided Design, 46:258–262.
[20] Choe, B. and Ko, H.-S. (2005a). A statistical wisp model and pseudophysical approaches
for interactive hairstyle generation. IEEE Transactions on Visualization and Computer
Graphics, 11(2):160–170.
[21] Choe, B. and Ko, H.-S. (2005b). A statistical wisp model and pseudophysical approaches
for interactive hairstyle generation. TVCG, 11(2):160–170.
[22] Coefield, S. (2013). DIY Braids: From Crowns to Fishtails, Easy, Step-by-Step Hair
Braiding Instructions. Adams Media.
[23] Comaniciu, D. and Meer, P. (2002). Mean shift: a robust approach toward feature space
analysis. IEEE Trans. on PAMI, 24(5):603–619.
[24] Daviet, G., Bertails-Descoubes, F., and Boissieux, L. (2011). A hybrid iterative solver for
robustly capturing coulomb friction in hair dynamics. In Proc. SIGGRAPH Asia.
[25] Delong, A., Osokin, A., Isack, H. N., and Boykov, Y . (2012). Fast approximate energy
minimization with label costs. International Journal of Computer Vision, 96(1):1–27.
[26] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). ImageNet: A
Large-Scale Hierarchical Image Database. In CVPR09.
[27] Derouet-Jourdan, A., Bertails-Descoubes, F., Daviet, G., and Thollot, J. (2013a). Inverse
dynamic hair modeling with frictional contact. ACM Trans. Graph., 32(6):159:1–159:10.
References 103
[28] Derouet-Jourdan, A., Bertails-Descoubes, F., and Thollot, J. (2013b). Floating tangents
for approximating spatial curves with G
1
piecewise helices. Computer Aided Geometric
Design, 30(5).
[29] Echevarria, J. I., Bradley, D., Gutierrez, D., and Beeler, T. (2014). Capturing and stylizing
hair for 3d fabrication. ACM Transactions on Graphics (ToG), 33(4):125.
[30] Electronic Arts (2014). The Sims Resource. http://www.thesimsresource.com/.
[31] FaceUnity (2017). http://www.faceunity.com/p2a-demo.mp4.
[32] Fischler, M. A. and Bolles, R. C. (1981). Random sample consensus: a paradigm for
model fitting with applications to image analysis and automated cartography. Commun.
ACM, 24(6):381–395.
[33] Fu, H., Wei, Y ., Tai, C.-L., and Quan, L. (2007). Sketching hairstyles. In SBIM ’07, pages
31–36.
[34] Furukawa, Y . and Ponce, J. (2010). Accurate, dense, and robust multiview stereopsis.
IEEE Trans. PAMI, 32:1362–1376.
[35] Gatys, L. A., Ecker, A. S., and Bethge, M. (2016). Image style transfer using convolutional
neural networks. In IEEE CVPR.
[36] Hadap, S. and Magnenat-Thalmann, N. (2000). Interactive hair styler based on fluid
flow. In Eurographics Workshop on Computer Animation and Simulation 2000, pages 87–99.
Springer.
[37] Hadap, S. and Magnenat-Thalmann, N. (2001). Modeling dynamic hair as a continuum.
Computer Graphics Forum, 20(3):329–338.
[38] He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recogni-
tion. In Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 770–778.
[39] Herrera, T. L., Zinke, A., and Weber, A. (2012). Lighting hair from the inside: A thermal
approach to hair reconstruction. ACM Transactions on Graphics (TOG), 31(6):146.
[40] Hu, L., Ma, C., Luo, L., and Li, H. (2014a). Robust hair capture using simulated examples.
ACM Transactions on Graphics (TOG), 33(4):126.
[41] Hu, L., Ma, C., Luo, L., and Li, H. (2015). Single-view hair modeling using a hairstyle
database. ACM Transactions on Graphics (TOG), 34(4):125.
[42] Hu, L., Ma, C., Luo, L., Wei, L.-Y ., and Li, H. (2014b). Capturing braided hairstyles.
ACM Transactions on Graphics (TOG), 33(6):225.
[43] Hu, L., Saito, S., Wei, L., Nagano, K., Seo, J., Fursund, J., Sadeghi, I., Sun, C., Chen,
Y .-C., and Li, H. (2017). Avatar digitization from a single image for real-time rendering.
ACM Trans. Graph., 36(6):195:1–195:14.
References 104
[44] Huang, G. B., Ramesh, M., Berg, T., and Learned-Miller, E. (2007). Labeled faces in the
wild: A database for studying face recognition in unconstrained environments. Technical
Report 07-49, University of Massachusetts, Amherst.
[45] Huang, H., Gong, M., Cohen-Or, D., Ouyang, Y ., Tan, F., and Zhang, H. (2012).
Field-guided registration for feature-conforming shape composition. ACM Trans. Graph.,
31(6):179:1–179:11.
[46] Ichim, A. E., Bouaziz, S., and Pauly, M. (2015). Dynamic 3d avatar creation from
hand-held video input. ACM Trans. Graph., 34(4).
[47] Isola, P., Zhu, J.-Y ., Zhou, T., and Efros, A. A. (2017). Image-to-image translation
with conditional adversarial networks. IEEE Conference on Computer Vision and Pattern
Recognition.
[48] itSeez3D: Avatar SDK (2017). https://avatarsdk.com.
[49] Jackson, A. S., Bulat, A., Argyriou, V ., and Tzimiropoulos, G. (2017). Large pose 3d
face reconstruction from a single image via direct volumetric cnn regression. International
Conference on Computer Vision.
[50] Jakob, W., Moon, J. T., and Marschner, S. (2009). Capturing hair assemblies fiber by fiber.
In ACM Transactions on Graphics (TOG), volume 28, page 164. ACM.
[51] Kajiya, J. T. and Kay, T. L. (1989). Rendering fur with three dimensional textures.
In Proceedings of the 16th Annual Conference on Computer Graphics and Interactive
Techniques, SIGGRAPH ’89. ACM.
[52] Kanazawa, A., Black, M. J., Jacobs, D. W., and Malik, J. (2018). End-to-end recovery of
human shape and pose. In Computer Vision and Pattern Regognition (CVPR).
[53] Kaufman, D. M., Tamstorf, R., Smith, B., Aubry, J.-M., and Grinspun, E. (2014). Adaptive
nonlinearity for collisions in complex rod assemblies. ACM Trans. Graph., 33(4):123:1–
123:12.
[54] Kazemi, V . and Sullivan, J. (2014). One millisecond face alignment with an ensemble of
regression trees. In IEEE CVPR.
[55] Kim, T.-Y . and Neumann, U. (2002). Interactive multiresolution hair modeling and editing.
ACM Trans. Graph., 21(3):620–629.
[56] Kingma, D. P. and Welling, M. (2013). Auto-encoding variational bayes. arXiv
preprint:1312.6114.
[57] Krähenbühl, P. and Koltun, V . (2011). Efficient inference in fully connected crfs with
gaussian edge potentials. In Advances in Neural Information Processing Systems.
[58] Lewis, J. P., Cordner, M., and Fong, N. (2000). Pose space deformation: A unified
approach to shape interpolation and skeleton-driven deformation. In SIGGRAPH ’00, pages
165–172.
References 105
[59] Li, H., Adams, B., Guibas, L. J., and Pauly, M. (2009). Robust single-view geometry and
motion reconstruction. ACM Trans. Graph., 28(5):175:1–175:10.
[60] Li, H., Trutoiu, L., Olszewski, K., Wei, L., Trutna, T., Hsieh, P.-L., Nicholls, A., and
Ma, C. (2015). Facial performance sensing head-mounted display. ACM Transactions on
Graphics (Proceedings SIGGRAPH 2015), 34(4).
[61] Li, H., Weise, T., and Pauly, M. (2010). Example-based facial rigging. ACM Trans. Graph.
(Proceedings SIGGRAPH 2010), 29(3).
[62] Li, H., Yu, J., Ye, Y ., and Bregler, C. (2013). Realtime facial animation with on-the-fly
correctives. ACM Trans. Graph., 32(4):42:1–42:10.
[63] Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). Deep learning face attributes in the wild.
In Proceedings of International Conference on Computer Vision (ICCV).
[64] Loom.ai (2017). http://www.loom.ai.
[65] Luo, L., Li, H., and Rusinkiewicz, S. (2013). Structure-aware hair capture. ACM
Transactions on Graphics (TOG), 32(4):76.
[66] Ma, D. S., Correll, J., and Wittenbrink, B. (2015). The chicago face database: A free
stimulus set of faces and norming data. Behavior Research Methods, 47(4).
[67] Maturana, D. and Scherer, S. (2015). V oxnet: A 3d convolutional neural network for
real-time object recognition. In 2015 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS).
[68] Mitra, N. J., Guibas, L. J., and Pauly, M. (2006). Partial and approximate symmetry
detection for 3D geometry. ACM Trans. Graph., 25(3):560–568.
[69] Myidol (2017). http://en.faceii.com/.
[70] Newcombe, R. A., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A. J., Kohli,
P., Shotton, J., Hodges, S., and Fitzgibbon, A. (2011). Kinectfusion: Real-time dense surface
mapping and tracking. In IEEE ISMAR. IEEE.
[71] Newsea (2014). Newsea SIMS. http://www.newseasims.com/.
[72] Olszewski, K., Lim, J. J., Saito, S., and Li, H. (2016). High-fidelity facial and speech
animation for vr hmds. ACM Transactions on Graphics (Proceedings SIGGRAPH Asia
2016), 35(6).
[73] Paris, S., Briceño, H. M., and Sillion, F. X. (2004). Capture of hair geometry from multiple
images. In ACM Transactions on Graphics (TOG), volume 23, pages 712–719. ACM.
[74] Paris, S., Chang, W., Kozhushnyan, O. I., Jarosz, W., Matusik, W., Zwicker, M., and
Durand, F. (2008). Hair photobooth: geometric and photometric acquisition of real hairstyles.
In ACM Transactions on Graphics (TOG), volume 27, page 30. ACM.
References 106
[75] Parke, F. I. and Waters, K. (2008). Computer Facial Animation. AK Peters Ltd, second
edition.
[76] Paysan, P., Knothe, R., Amberg, B., Romdhani, S., and Vetter, T. (2009). A 3d face model
for pose and illumination invariant face recognition. In Advanced video and signal based
surveillance, 2009. AVSS’09. Sixth IEEE International Conference on. IEEE.
[77] Pérez, P., Gangnet, M., and Blake, A. (2003). Poisson image editing. In ACM Trans.
Graph., volume 22. ACM.
[78] Pinscreen (2017). http://www.pinscreen.com.
[79] Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2016a). Pointnet: Deep learning on point sets
for 3d classification and segmentation. arXiv preprint:1612.00593.
[80] Qi, C. R., Su, H., Niessner, M., Dai, A., Yan, M., and Guibas, L. J. (2016b). V olumetric
and multi-view cnns for object classification on 3d data. In 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR).
[81] Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017). Pointnet++: Deep hierarchical feature
learning on point sets in a metric space. arXiv preprint:1706.02413.
[82] Rezende, D. J., Mohamed, S., and Wierstra, D. (2014). Stochastic backpropagation and
approximate inference in deep generative models. arXiv preprint:1401.4082.
[83] Rother, C., Kolmogorov, V ., and Blake, A. (2004). Grabcut: Interactive foreground
extraction using iterated graph cuts. ACM Trans. Graph., 23(3):309–314.
[84] Sadeghi, I., Pritchett, H., Jensen, H. W., and Tamstorf, R. (2010). An artist friendly hair
shading system. In ACM SIGGRAPH 2010 Papers, volume 29 of SIGGRAPH ’10. ACM.
[85] Saito, S., Hu, L., Ma, C., Ibayashi, H., Luo, L., and Li, H. (2018). 3d hair synthesis using
volumetric variational autoencoders. ACM Transactions on Graphics (Proc. SIGGRAPH
Asia), 37(6).
[86] Saito, S., Li, T., and Li, H. (2016). Real-time facial segmentation and performance capture
from rgb input. In Proceedings of the European Conference on Computer Vision (ECCV).
[87] Saito, S., Wei, L., Hu, L., Nagano, K., and Li, H. (2017). Photorealistic facial texture
inference using deep neural networks. In IEEE CVPR.
[88] Selle, A., Lentine, M., and Fedkiw, R. (2008). A mass spring model for hair simulation.
ACM Trans. Graph., 27(3):64:1–64:11.
[89] Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale
image recognition. CoRR, abs/1409.1556.
[90] Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E. (2015). Multi-view convolutional
neural networks for 3d shape recognition. In Proceedings of the 2015 IEEE International
Conference on Computer Vision (ICCV), ICCV ’15, pages 945–953.
References 107
[91] Tan, Q., Gao, L., Lai, Y .-K., and Xia, S. (2018). Variational autoencoders for deforming
3d mesh models. In 2018 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR).
[92] Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., and Nießner, M. (2016a).
Face2face: Real-time face capture and reenactment of rgb videos. In IEEE CVPR.
[93] Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., and Nießner, M. (2016b).
Facevr: Real-time facial reenactment and eye gaze control in virtual reality. arXiv preprint
arXiv:1610.03151.
[94] Umetani, N. (2017). Exploring generative 3d shapes using autoencoder networks. In
SIGGRAPH Asia 2017 Technical Briefs, SA ’17, pages 24:1–24:4.
[95] Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple
features. In IEEE CVPR, volume 1. IEEE.
[96] Wang, L., Yu, Y ., Zhou, K., and Guo, B. (2009a). Example-based hair geometry synthesis.
In ACM Transactions on Graphics (TOG), volume 28, page 56. ACM.
[97] Wang, L., Yu, Y ., Zhou, K., and Guo, B. (2009b). Example-based hair geometry synthesis.
ACM Trans. Graph., 28(3):56:1–56:9.
[98] Ward, K., Bertails, F., Kim, T.-Y ., Marschner, S. R., Cani, M.-P., and Lin, M. C. (2007).
A survey on hair modeling: Styling, simulation, and rendering. IEEE Transactions on
Visualization and Computer Graphics, 13(2).
[99] Ward, K., Lin, M. C., Lee, J., Fisher, S., and Macri, D. (2003). Modeling hair using
level-of-detail representations. In Proc. CASA, pages 41–47.
[100] Wei, Y ., Ofek, E., Quan, L., and Shum, H.-Y . (2005). Modeling hair from multiple views.
In ACM Transactions on Graphics (TOG), volume 24, pages 816–820. ACM.
[101] Weng, Y ., Wang, L., Li, X., Chai, M., and Zhou, K. (2013). Hair interpolation for portrait
morphing. In Computer Graphics Forum, volume 32, pages 79–84. Wiley Online Library.
[102] Wither, J., Bertails, F., and Cani, M.-P. (2007). Realistic hair from a sketch. In SMI ’07,
pages 33–42.
[103] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. (2015). 3d
shapenets: A deep representation for volumetric shapes. In 2015 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pages 1912–1920.
[104] Xu, Z., Wu, H.-T., Wang, L., Zheng, C., Tong, X., and Qi, Y . (2014). Dynamic hair
capture using spacetime optimization. ACM Trans. Graph., 33(6):224:1–224:11.
[105] Yu, Y . (2001). Modeling realistic virtual hairstyles. In Pacific Graphics’01 , pages
295—-304.
References 108
[106] Yuksel, C., Schaefer, S., and Keyser, J. (2009a). Hair meshes. ACM Trans. Graph.,
28(5):166:1–166:7.
[107] Yuksel, C., Schaefer, S., and Keyser, J. (2009b). Hair meshes. In ACM Transactions on
Graphics (TOG), volume 28, page 166. ACM.
[108] Yumer, M. E. and Mitra, N. J. (2016). Learning semantic deformation flows with 3d
convolutional networks. In European Conference on Computer Vision (ECCV 2016), pages
294–311. Springer.
[109] Zhang, M., Chai, M., Wu, H., Yang, H., and Zhou, K. (2017). A data-driven approach to
four-view image-based hair modeling. ACM Transactions on Graphics (TOG), 36(4):156.
[110] Zhu, Y . and Bridson, R. (2005). Animating sand as a fluid. ACM Trans. Graph., 24(3).
[111] Zitnick, C. L. (2010). Binary coherent edge descriptors. In Proceedings of the 11th
European Conference on Computer Vision: Part II, ECCV’10.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
3D deep learning for perception and modeling
PDF
Complete human digitization for sparse inputs
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Effective data representations for deep human digitization
PDF
Deep representations for shapes, structures and motion
PDF
Landmark-free 3D face modeling for facial analysis and synthesis
PDF
3D object detection in industrial site point clouds
PDF
Human appearance analysis and synthesis using deep learning
PDF
Face recognition and 3D face modeling from images in the wild
PDF
Multi-scale dynamic capture for high quality digital humans
PDF
Feature-preserving simplification and sketch-based creation of 3D models
PDF
Autostereoscopic 3D diplay rendering from stereo sequences
PDF
3D face surface and texture synthesis from 2D landmarks of a single face sketch
PDF
3D modeling of eukaryotic genomes
PDF
Object detection and recognition from 3D point clouds
PDF
Interactive rapid part-based 3d modeling from a single image and its applications
PDF
Accurate 3D model acquisition from imagery data
PDF
Plant substructuring and real-time simulation using model reduction
PDF
Scalable dynamic digital humans
PDF
Data-driven methods for increasing real-time observability in smart distribution grids
Asset Metadata
Creator
Hu, Liwen
(author)
Core Title
Data-driven 3D hair digitization
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/25/2019
Defense Date
11/15/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D capture,3D modeling,data-driven,hair,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Li, Hao (
committee chair
), Nakano, Aiichiro (
committee member
), Watson, Jeff (
committee member
)
Creator Email
huliwenkidkid@gmail.com,liwenhu@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-194306
Unique identifier
UC11663147
Identifier
etd-HuLiwen-7629.pdf (filename),usctheses-c89-194306 (legacy record id)
Legacy Identifier
etd-HuLiwen-7629.pdf
Dmrecord
194306
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Hu, Liwen
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
3D capture
3D modeling
data-driven