Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
3D face surface and texture synthesis from 2D landmarks of a single face sketch
(USC Thesis Other)
3D face surface and texture synthesis from 2D landmarks of a single face sketch
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
3DFACESURFACEANDTEXTURESYNTHESISFROM2DLANDMARKS OFASINGLEFACESKETCH by TanasaiSucontphunt ADissertationPresentedtothe FACULTYOFTHEUSCGRADUATESCHOOL UNIVERSITYOFSOUTHERNCALIFORNIA InPartialFulfillmentofthe RequirementsfortheDegree DOCTOROFPHILOSOPHY (COMPUTERSCIENCE) December2012 Acknowledgements I would like to thank my advisor Prof. Ulrich Neumann for his valuable and insightful guidance during my PhD study. I would like to thank Prof. Zhigang Deng for his greatly support on my research. I would like to thank Prof. C.-C. Jay Kuo and Prof. Aiichiro Nakano for their valuable suggestions and time serving as my thesis defense committee members. I would like to thank Prof. Jernej Barbic and Prof. Shang-Hua Teng for their valuable suggestions during my qualifying exam. I would like to thank my fellow doctoral students in Computer Graphics and Immersive Technologies lab for their support and friendship. I would like to thank the University of Southern California for its world class facilities. I had a really great time during my five years’study. Finally,Iwouldliketothankmyfamilyfortheirloveandencouragement. ii Contents Acknowledgements ii ListofFigures iv ListofTables viii 1 Introduction 1 2 RelatedWork 6 2.1 3DObjectModelingFromADrawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 3DFaceModelingFromADrawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 SystemFramework 10 3.1 DimensionReduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 ExpressionvsIdentity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 FacialExpressionModeling 13 4.1 OfflineDataProcessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.1 2Dand3DSubspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.2 PortraitComponentLibrary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 2DPortraitSketchingInterface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 3DFaceConstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5 FacialIdentityModeling 26 5.1 OfflineDataProcessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 2DPortraitSketchingInterface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2.1 LandmarkDetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2.2 PortraitRenderingandCorrection . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.3 3DFaceConstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.4 3DLandmarkEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.5 3DFaceSurfaceSynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.5.1 SurfaceGeneration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.5.2 IdentityEngineeringAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.6 TextureSynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.6.1 HumanTextureGeneration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.6.2 ArtisticTextureGeneration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.7 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.7.1 HumanFaceEvaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.7.2 ArtisticFaceEvaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.7.3 2DPortraitSketchingInterfaceEvaluation . . . . . . . . . . . . . . . . . . . . . . 73 5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6 Discussion 78 Bibliography 79 iii ListofFigures 1 Major3Dfacialmodelingtypesareafacialexpressionandafacialidentitymodeling. . . . . 1 2 3Dfaceconstructioncanbecategorizedbytheiracquisitionmethods. Thefocusofthiswork istoconstructa3Dfacefromanartisticfacedrawing. . . . . . . . . . . . . . . . . . . . . 2 3 Live capturing method of 3D face modeling. A: camera-projector capturing system [64], B: passive stereo system [1], C: passive stereo system with supporting a single-shot captur- ing[8],D:lightdomecapturingsystem[20]. . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 Artistic capturing method of a 3D face modeling. Blanz et al. [6] pioneered a 3D face mod- elingfromasingleormultiplephotograph(s)usinga3DMorphableModelconstructedfrom a3Dfacedataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 Fibermeshsystem[44]: ausercansketcha2Dsilhouettetocreateasimple3Dmodel. Also, theusercanfurtherpullthe2Dsilhouettetoeditthe3Dshape. . . . . . . . . . . . . . . . . 7 6 From the input photograph and landmarks, Blanz et al. [5] develop an application to fit 3D surfacetothetargetlandmarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7 System framework. From the face dataset, each entry is projected to a reduced subspace to reduce its dimensions. These concise entries are then used as internal representations. The input to the system can be either drawn image or user strokes. In case of the user stroke input, an interactive sketching interface is provided with an automatic portrait rendering and correction feature. After the landmarks are extracted, they and supplemental information will be used to construct a 3D face model. Each component in this framework is separately developedtofittothetypeofthefacialmodelingi.e. expressionoridentity. . . . . . . . . . 11 8 Schematicoverviewofthisfacialexpressionposingsystem. . . . . . . . . . . . . . . . . . 14 9 Asnapshotofthisrunningfacialexpressionposingsystem. . . . . . . . . . . . . . . . . . . 15 10 FacialExpressionMoCapcomparingtoportraitlandmarks . . . . . . . . . . . . . . . . . . 16 11 Mocapmarkerstransformtoportraitlandmarks. . . . . . . . . . . . . . . . . . . . . . . . . 16 12 IllustrationoftheKD-treestructureinthiswork. . . . . . . . . . . . . . . . . . . . . . . . 17 13 ThePortraitComponentLibrary(PCL). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 14 Illustrationoftheportraithierarchyformotionpropagation. . . . . . . . . . . . . . . . . . . 19 15 Anexampleofmotionpropagationforportraitcontrolpoints. . . . . . . . . . . . . . . . . . 20 16 Theexamplesofourusabilitystudy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 17 Facialexpressionsonvarioussimultaneouslysculpted3Dfacemodels. . . . . . . . . . . . . 23 18 Sixeditedportraitsandcorrespondinggenerated3Dfacialexpressions. . . . . . . . . . . . 24 19 Internal representations. (A): 3D Landmark is represented by vertex locations Each vertex locationcontainsx,y,andzcoordinates. (B):Surfaceisrepresentedbyaffinetransformation matrices. Eachaffinetransformationmatrixcontains3-by-3matrixofeachtriangletransfor- mation from an average face to the new face. (C): Texture is represented by color pixels in RGBchannels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 20 A: Facial Landmarks on the face image. B: When connecting all landmarks together with lines, the face portrait can be automatically generated. C: Anchors (A) and Pivots (P) points usingincreatingfaceportraitsketchbyartists.. . . . . . . . . . . . . . . . . . . . . . . . . 26 21 An affine transformation matrix between 2 shapes of a triangle can be calculated from their vertices and normals. Here, V 4 is the added vertex to the triangle to V 1 with the normal directiontocreateanewedge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 22 Comparisonofdifferentsurfacerepresentationsinblending: (A)theoriginalfacemodel,(B) vertex location blending, and (C) DG blending. The blending is performed by scaling some partsoftheoriginalface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 23 The prototype Model, from left to right, is the frontal-view mesh, the side-view mesh, the meshwithitstexture,andthetexturewithitsUVcoordinates. . . . . . . . . . . . . . . . . 29 iv 24 Asnapshotofthesketchinginterface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 25 Interactivesketchingprocess. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 26 Schematic overview of the 3D face construction. It consists of the following three main components, namely, 3D facial landmark estimation, 3D face surface synthesis, and face texturesynthesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 27 Thedepthofeach2Dlandmarkisestimatedbyusingthedistortionoftheperspectiveprojec- tionasahint. The2Dlandmarksareprojectedback-and-forthto3Dspacewiththeestimated depth (Z) which initialized with an average depth. In 3D space, the depth is estimated again withPPCAtomaximizeitslikelihood. Thisloopisstoppedwhentheprojected2Dlandmarks areclosesttotheincoming2Dlandmarks. The3Dlandmarks(x,y,z)aretheresult. . . . . . 38 28 3D reconstructed results comparing parallel and perspective projection assumption in esti- mating the depths. The input sketch is showing on the left. A: the 3D ground-truth model of the sketch. B: the reconstructed 3D face from the parallel projection. When parallel project- ing back the 2D sketch to the 3D space, the face will be leaner than the original face since theperspectivefactorisnottakingintoaccount. Onaverage,thesurfacereconstructionerror oftheparallelassumptionisabout9m.m. C:thereconstructed3Dfacefromtheperspective projection. Onaverage,thereconstructionerroroftheprojectiveassumptionisabout1m.m. Allthe3Dresultsareshownintheperspectiveprojectionofthe3Dmodels. . . . . . . . . . 39 29 The 3D landmark estimation errors. The errors are the differences between each estimated 3D landmark and each of the ground-truth’s 3D landmarks of the thirty face examples in millimeters. (A): the error distribution of the 83 3D landmarks over the face. (B): the errors comparing our 3D landmark estimation to the morphable model technique [5]. On average, ourapproachproduces1m.m. lowerinerrorforeachlandmark. . . . . . . . . . . . . . . . 41 30 ByaddingGaussiannoisestothe2Dlandmarks,ourapproachproduceslowererrorsthanthe morphablemodeltechniqueforthenoisyinputs. . . . . . . . . . . . . . . . . . . . . . . . . 42 31 By adding Gaussian noises to the focal length in the depth estimation process, our approach producesmoreerrorsforthelargeramountofthedifferencefromthefocallengthassumption. 43 32 Ten constructed eigen-surfaces ordered by their eigen-values in a descending order from left to right (i.e. the top row is the first five highest eigen-value shapes). These surfaces are ex- aggeratedbyascalingfactor(=30)tovisualizetheirdeformationdirectionsfromtheaverage surfaceshowingintheleft-mostbox. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 33 ByaddingGaussiannoisestothe2Dlandmarks,ourapproachproduceslowererrorsthanthe otherapproachesforthenoisyinputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 34 The left column model is the AS-face and the top row shows the input sketch images. The surface synthesis with choice (1) is shown in the middle row: using 3D landmarks to morph the AS-face directly.The surfaces are very similar to each other. On the other hand, choice (2) shown in the last row which blends the human surfaces together yields higher surface variationsandmorenaturalsurfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 35 Comparison of different blending schemes. (Left): shape transferring via deformation gra- dient [53], (Middle): half interpolation between their corresponding PCA feature vectors, (Right): ourIEapproach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 36 TheschematicoverviewoftheIEalgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 48 37 Different results by varying a slide-bar. By sliding a bar to the right, the subtle identity from HI-facewillgraduallydisappeared. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 38 Examplegeometricresultsbyourapproach. . . . . . . . . . . . . . . . . . . . . . . . . . . 50 39 Theruntimeuserinterfaceofourapproach. (Left): aninputhumanface(HI-face),(Middle): theresultantidentity-embodiedartisticfacewithaslide-bartocontrolalevelofHI-faceover theAS-faceintheresultantface,(Right): aninputartisticface(AS-face). . . . . . . . . . . 51 v 40 The resultant synthesized texture is shown in the middle. From the texture candidates, they are combined to the base texture (the outline region). Each texture candidate is represented bytheunmarkedtexturearea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 41 The comparison of synthesized textures compositing of the five texture candidates in Fig- ure40showingfromlefttoright: copy-and-pasteofRGB,copy-and-pasteofonlyVchannel (from HSV), and our approach. The copy-and-paste is performed by filling the base texture with the other regional textures. Since the facial texture is mainly differed in tone, transfer- ring only V channel of HSV is also conducted. The artifacts of the copy-and-paste are still existinbothcases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 42 The comparison of the artistic texture synthesis process on different color spaces. To em- phasis the difference, the gradients are scaling up by 5. The RGB blending produces the texturewithhuman-skin-base-colorontheresultwhiletheHSVblending(maintainingHue) producesthetexturewiththepureartistic-style-base-color. . . . . . . . . . . . . . . . . . . 55 43 The pipeline of the evaluation process. The ground-truth 3D models are first rendered with fixed parameters to create colored face images. The artists draw the sketches to match the coloredfaceimagesandthedrawnsketchesareusedtogeneratethe3Dmodelbyoursystem. 57 44 The3Dlandmarkestimationerrorsofthesketches’landmarks. Theerrorsarethedifferences between each estimated 3D landmark and each of the ground-truth’s 3D landmarks of the thirty face examples in millimeters. (A): the error distribution of the 83 3D landmarks over the face. (B): the errors comparing our 3D landmark estimation to the morphable model technique[5]. Onaverage,ourapproachproduces1m.m. lowerinerrorforeachlandmark . 57 45 Example results from the system. The left-most column shows input sketches drawn by an artist from the rendered images of the ground-truth models. The middle column shows the ground-truth models (surface and textured face respectively). The right-most column shows the3Dhumanresultsgeneratedbyoursystem. . . . . . . . . . . . . . . . . . . . . . . . . 58 46 Moreexampleresultsfromthesystem. Theleft-mostcolumnshowsinputsketchesdrawnby anartistfromtherenderedimagesoftheground-truthmodels. Themiddlecolumnshowsthe ground-truth models (surface and textured face respectively). The right-most column shows the3Dhumanresultsgeneratedbyoursystem. . . . . . . . . . . . . . . . . . . . . . . . . 59 47 ReconstructedsurfacecomparisonbetweenRBF,Laplacian,andourapproach. . . . . . . . 60 48 ReconstructedsurfacecomparisonbetweenRBF,Laplacian,andourapproach. . . . . . . . 61 49 ReconstructedsurfacecomparisonbetweenRBF,Laplacian,andourapproach. . . . . . . . 62 50 (A)Thedistributionofthesurfacereconstructionerrors(inmillimeters)overtheface. Theer- rorsare theEuclidean distances between each reconstructed vertex to itsground-truth vertex (averaged over thirty face examples). (B) Reconstruction error comparison between our ap- proach, the RBF, and Laplacian based technique. Both RBF and Laplacian based techniques are employed to deform the average face surface to the 3D landmarks setting as constraints. Onaverage,ourapproachproduceslessthan1.6m.m. inerrorsoverallthesurfaces. . . . . 63 51 Thepipelineoftheartisticevaluationprocess. Theground-truth3Dmodelsarefirstgenerated byourIEalgorithmandrenderedwithfixedparameterstocreatecoloredartisticfaceimages. Theartistsdrawthesketchestomatchthecoloredartisticfaceimagesandthedrawnsketches areusedtogeneratetheartistic3Dmodelbyoursystem. . . . . . . . . . . . . . . . . . . . 64 52 Examplesofartisticfaceresultsconstructedfromtheinputfacesketches. . . . . . . . . . . 65 53 Examplesofartisticfaceresultsconstructedfromtheinputfacesketches. . . . . . . . . . . 66 54 The3Dlandmarkestimationerrorsoftheartisticsketches’landmarks. Theerrorsarethedif- ferences between each estimated 3D landmark and each of the ground-truth’s 3D landmarks of thethirtyface examples inmillimeters. (A):theerrordistributionofthe83 3D landmarks overtheface. (B):theerrorscomparingour3Dlandmarkestimationtothemorphablemodel technique[5]. Onaverage,ourapproachproduces7m.m. lowerinerrorforeachlandmark . 67 vi 55 Examplesofartisticfaceresultsconstructedfromtheinputartisticsketches. . . . . . . . . . 69 56 Examplesofartisticfaceresultsconstructedfromtheinputartisticsketches. . . . . . . . . . 70 57 Reconstructedartistic-stylesurfacecomparisonbetweenRBF,Laplacian,andourapproach. 71 58 (A) The distribution of the surface reconstruction errors (in millimeters) over the face. The errors are the Euclidean distances between each reconstructed vertex to its ground-truth ver- tex(averagedovertwelvefaceexamples). (B)Reconstructionerrorcomparisonbetweenour approach, the RBF, and Laplacian based technique. Both RBF and Laplacian based tech- niques are employed to deform the example 3D artistic surfaces to the 3D landmarks setting as constraints. On average, our approach produces less than 7 m.m. in errors over all the surfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 59 2DPortraitSketchingevaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 60 Examplesofgeneratedfacesbasedontheusers’sketchesandpartialidentityinformation. . 75 61 Anexampleoftheempiricalvalidationstudy. . . . . . . . . . . . . . . . . . . . . . . . . . 76 vii ListofTables 1 Comparisonofpropertiesincreatingfacialexpressionandforfacialidentityinourframework. 12 viii Abstract Synthesizing a 3D human face surface and texture from a drawing is a challenging problem. The problem involves inferring a 3D model and its color information from 2D information. In contrast, human beings have thenaturalabilitytoreconstruct3Dmodelsfromadrawingintheirmindeffortlessly. Thisskillisbuilt up from years of experience in mapping the perceived 2D information to the 3D model in the actual scene. By imitating this mapping process, this work illustrates an approach to reconstruct a 3D human face from just the 2D facial landmarks gathering from a sketch image. The 2D facial landmarks of the sketch image containenoughinformationforthisprocessbecausetheysemanticallyrepresentafacialstructurethatcanbe recognizable as a human face. The approach also exploits the perspective distortion of the sketch image as a guidance toinferthedepthinformationfromthe2Dlandmarks. Variousartisticstylescanalsobeappliedto the generated face similar to how an artist would apply their own artistic style to a drawing. The controlled environmentevaluationsshowthatthereconstructed3Dfacesarehighlysimilartotheground-truthexamples. This approach can be used in many face modeling applications such as in 3D avatar creations, artistic face modelings,andpoliceinvestigations. ix 1 Introduction 3Dfacemodelingistheprocessofcrafting3Dmeshestowardatarget3Dface. Theprocessisusedinvarious industries including motion pictures, video games, medical imaging, social networks, and law enforcement. The two primary goals of facial modeling can be categorized into two categories which is to create either a facialexpressionsorafacialidentity(Fig.1). Thedefinitionsofthesefacialmodelingsare: Figure1: Major3Dfacialmodelingtypesareafacialexpressionandafacialidentitymodeling. Facial expression: a facial expression results from motions of the muscles of an individual face. The motion patterns are varied by the physical mechanism of a facial structure. In this modeling, both temporal and spatial facial movements must be carefully taken into account. This type of modeling is mainly used in animation. Facialidentity: afacialidentityisthefacialappearanceofaperson. Facialidentityencodesbothshapes andcolorsofthefacewhichisnormallydefinedfromaneutralfacialexpression. Eachfacialidentitydiffers mainly by ethnicity, gender, and skin tone of the person. In this modeling, facial proportions, surfaces, and texture must be uniquely recognized as an individual person. The 3D face model can be later used in the facialexpressionmodelingwiththespecificdesignofa3Dtopology. In traditional, a 3D face model is crafted manually by using a 3D modeling tool such as MAYA or 3ds- MAX.However, modeling a3Dfacefromscratchisatimeconsuming process. Currentresearchgoalofthe 3D face modeling is to make the process as automated as much as possible. The automated system can be 1 achieved by capturing the face in the real world and reconstructing it in 3D. The face capturing techniques canbecategorizedintoliveandartisticcapturing(Fig.2). Figure 2: 3D face construction can be categorized by their acquisition methods. The focus of this work is to constructa3Dfacefromanartisticfacedrawing. • Live capturing: this technique requires a person to be present in front of capturing equipment. The capturing equipments (Figure 3) can be a laser scanner (e.g. Cyberware), camera-projector [64], photometric stereo system [1,2,8,15], or light dome [20]. In many scenarios, the person and the equipmentsaredifficulttobeacquired. Amongthem,creating3Dfacemodelfromasinglephotograph or drawing is often needed in fiction character face modeling and in law enforcement (e.g., finding suspectormissingperson). Thisproblemleadstothedevelopmentofthenextcapturingcategory. • Artistic capturing: in this type of capturing, an input is any existing face images. These images are captured by a typical camera or an artistic drawing. Normally, it is a single frontal view of the face. 2 Figure 3: Live capturing method of 3D face modeling. A: camera-projector capturing system [64], B: pas- sive stereo system [1], C: passive stereo system with supporting a single-shot capturing [8], D: light dome capturingsystem[20]. 3 From a single photograph, a 3D face model can be constructed by a 3D morphable face model [6,10, 34,46](Figure4). The3Dmorphablemodelisconstructedfroma3Dfacecollectioncapturedfromthe livecapturing. Many2DfacesketchtoolsforafacialcompositeincludingFACES[3],SpotIt[26],and Identi-Kit [24] have been developed to aid a novice user to create a 2D face sketch. Although the 2D drawing is often limited to the frontal view without colors, it provides enough information for people to recognize the output as a target person. However, only a few research projects develop a 3D face constructionfromthe2Ddrawing. Figure 4: Artistic capturing method of a 3D face modeling. Blanz et al. [6] pioneered a 3D face modeling fromasingleormultiplephotograph(s)usinga3DMorphableModelconstructedfroma3Dfacedataset. Under the above contexts, we look into the following research problem: Can we create a 3D face model based on a 2D face sketch input? To date, few previous methods have been reported to tackle this research problem. Indeed, it is challenging since it is an ill-posed and under-constrained problem. The face sketch is theperfectinputtocreateafacebecauseonlyimportantinformationispresentedandtheirrelevantinforma- tionisdiscardedbythefiltermechanismofthehumanbrain. Themajordifficultiesofusinga2Dfacesketch toconstructa3Dfacemodelcanbesummedupbythefollowing: • Thesketchdataisverysparsecomparingtothephotograph. Often,itcontainsonlysalientfacialfeature linesthatareonlyenoughforahumantorecognize. • Unlike in a photograph, the sketch contains unpredictable texture and colors and cannot be directly usedtoproducethefacialtexture. • Itcontainsanexaggerationofthefaceinformofartiststyle. 4 Normally, the face sketch image contains salient facial feature contours with or without colors and tex- tures. These contours are the only information that is unambiguous to be used to reconstruct the face. They are the main landmarks of the face. In this system, in order to model the 3D facial expression, the sketching interfaceisusedtocollectfaciallandmarkmovementsandtoreproducethenatural3Dfacialexpressionwith the guide of the MoCap facial expression dataset. To achieve the facial identity modeling, the system takes the2Dlandmarksextractedfroma2Dportraitsketchimageandestimatestheircorresponding3Dlandmarks byusingstatisticalestimation. Then,the3Dsurfaceanditstexturearesynthesizedtofittothe3Dlandmarks anditsfacialdescriptionsbyusinganexample-basedsurfaceandtexturesynthesistechnique. The3Dsurface and its texture can also be further altered by the surface and texture blending technique which is particularly usefulintheartisticfacemodelingprocess. 5 2 RelatedWork In the past, researchers developed several 2D-to-3D estimation techniques including Analysis-By-Synthesis (ABS) [10] and Shape From Shading (SFS) [52]. These techniques require a 2D shaded or colored image input to be able to estimate the surface using a rendering equation. However, compared with the 2D color image, the 2D face sketch often contains only conceptual lines of the facial structure without a color or grayscale texture. Thus, this section focuses mainly the works of the 3D modeling from only 2D lines over decades. 2.1 3DObjectModelingFromADrawing From a 2D sketching interface, simple 3D objects can be constructed from the drawing lines [7,17,21,25, 29,30,38,44]. These techniques can be considered as an inverse process of Non-Photorealistic Rendering (NPR). For example, on top of the pioneering Teddy work [25], FiberMesh system [44] employs Laplacian deformation technique to inflate 2D silhouette strokes to become a 3D objects. The 3D object can then be edited by pulling its silhouette lines (Figure 5) and the smoothness of the surface is maintained according to the Laplacian operator. Bourguignon et al. [7] introduce a method to create a 3D object by interpreting its depth cues from a sketch over a 3D model. Kalnins et al. [28] present an effective way to draw strokes directlyon3Dmodelswhereaninteractiveuserinterfaceisprovidedforstrokecontrols. Karpenkoetal.[30] presentanapproachtouseadditionallinestoconstructanoccludedareaofthe3Dobject. Gingoldetal.[21] develop an intuitive user interface to collect shapes and annotations over a sketch image to produce a 3D object. These approaches demonstrated their successes in crafting simple 3D shapes from lines. However, withoutadomain-specificknowledge,itisdifficulttoproducearealisticmodel. 2.2 3DFaceModelingFromADrawing For a 3D facial identity modeling, current research direction is aiming in utilizing the patterns of the human face to create a realistic face model. Decarlo et al. [16] develop a data-driven technique to generate 3D face models by imposing facial anthropometry statistics as constraints. However, the 3D faces generated by their approach are not sufficiently realistic since the used anthropometric constraints are too sparse and general. Blanz and colleagues proposed various methods to create 3D faces from vague information [4] or a sparse setofthefaciallandmarks[5](Figure6),basedontheirwell-knownmorphablefacemodel[6]. Theirworks 6 Figure 5: Fibermesh system [44]: a user can sketch a 2D silhouette to create a simple 3D model. Also, the usercanfurtherpullthe2Dsilhouettetoeditthe3Dshape. perform well in creating realistic human face from just a set of example shapes or 2D landmarks but they requireuserinteractionstoproperlyproducethe3Dface. For a 3D facial expression modeling, various methods are developed to help a user to efficiently edit 3D face poses. Many smart sketching interfaces [12,22,33] are developed to utilize a facial motion dataset to easethefacialexpressionposingtaskforauser. Forexample,Lauetal.[33]developaninterfacethatallows ausertodrawlinesovera3Dfaceandusesthelinesasapriortosynthesisthefacepose. However,thiswork onlylimitstheplausibleposestobeintherangeofthedatasetstrictly. Apartfromthesketchinginterfaces,theblendshapeapproach(orshapeinterpolation)offerssimpletools for sculpting 3D facial expressions. Some tools attempt to improve the efficiency of producing muscle ac- tuation based blend shape animations [14,50]. However, these approaches generally require considerable manualeffortstocreateasetofblendshapetargets,evenforskilledanimators. Essentially,theseapproaches are designed to simultaneously move and edit a group of relevant vertices. However, different facial regions are correlated each other, and the above approaches typically operate a local facial region or global shape at one time. An animator need to switch editing operations on different facial regions in order to sculpt realistic 3D faces with fine details, which create a large amount of additional work. In addition, even for skilled animators, it is difficult to judge which facial pose (configuration) is closer to a real human face. A number of statistic learning techniques [11,27,35] were proposed to address this issue. For example, based 7 Figure6: Fromtheinputphotograph andlandmarks,Blanz et al.[5]developanapplicationtofit3Dsurface tothetargetlandmarks. 8 onablendshaperepresentationfor3Dfacemodels,Joshietal.[27]presentaninteractivetooltoedit3Dface geometry by learning the optimal way for a physically-motivated face segmentation. A rendering algorithm for preserving visual realism in this editing is also presented in their approach. However, the tool’s inter- face limits the controls over the facial shapes and is unintuitive for novice users. Recently, many intuitive tools [36,48,55] offer interfaces that can automatically adjust the co-related facial regions when posing a face. Meyer and Anderson [40] produce a plausible 3D facial expression by using sparse landmarks on the 3D face to interpolate the possible 3D expressions in the dataset. Besides, the puppetry methods have been developedtodirectlycontrolfaceposesbasedontheliveperformanceofanactor/actress[36,59,61,62]. Beside the 3D face modeling, 2D face synthesis techniques are also developed separately. As the ear- liest documented effort on computer generated caricature, Brennan [9] generates a 2D caricature sketch by exaggeratingthewholefacedrawingwithrespecttotheaveragehumanface. Koshimizuetal.[31]presenta PICASSOsystemtoproducea2Dimage-basedcaricature. HsuandJain[23]useinteractivesnakesalgorithm todetectfacialcomponentsandthengeneratea2Dcaricaturesketchbyscalingitsdifferencefromtheaverage human face. Example-based caricature generation techniques have been proposed recently by extracting the drawing style from the drawings using learning algorithms such as partial least-squares based learning [37] andeigen-spacemapping[39]. Wangetal.[60]mapcharacteristicsoffacesketchesbetweenartistdrawings andtheircorrespondingphotographstosynthesisthenovelphotographforaninputfacesketch. Noneoftheabovetechniquesconstructsa3Dfacemodelfromasingle2Dfacedrawing,partiallydueto the fact that reconstructing a 3D model from a 2D drawing is a well-known, under-constrained problem. In contrast,the2Dfacedetectionishighlysuccessinanimageprocessingresearch. The2Dfacesketchcanalso bedetectedrobustlybymanyavailableimagedetectiontechniquessuchasActiveShapeModel(ASM)[58], or FaceTracker [47]. These techniques mainly produce facial landmarks over the face sketch as an output. Thus,theselandmarksarethemainmaterialtosynthesizeaplausible3Dsurfaceanditstextureinthiswork. 9 3 SystemFramework Theframeworkillustratesaninteractionbetweencomponentsinthesystem. Therearethreemaincomponents in this framework including an offline data processing, landmark extraction, and 3D face construction as showninFig.7. 3.1 DimensionReduction The PCA-based dimension reduction is used to digest the raw data to meaningful representations in this framework. For each data example of n examples, we concatenate each data into a D-dimension row vector {X i } and center its mean to be zero. These n zero-mean vectors{X i..n } are then put together in columns of a matrixofsizen−by−D. SingularValueDecompositionisthenemployedtogeneratethreenewmatricesof U, S, andV, as shown in Eq. 1, whereU i is an ith eigen-vector, s i is the square roots of ith eigen-value, and V isaU −1 . X 1 X 2 ... X n =U.S.V = U 1 U 2 . U m T × s 1 0 . 0 0 s 2 0 . . . . . 0 . 0 s m ×V (1) The result of this process is that one high dimensional vector X i is transformed into a reduced feature vector,C i asshowsinEq.2. C=EigMX T .(X i −μ) (2) EigMX = U 1 U 2 U 3 ... U k (3) Here,C isthereducedvector(PCAcoefficient),μ representsthemeanvectorofthedataset,andEigMX is a retained eigen-vector matrix with most significant kth eigen-vectors where each column is a retained eigen-vectorasshowninEq.3. 10 Figure 7: System framework. From the face dataset, each entry is projected to a reduced subspace to reduce its dimensions. These concise entries are then used as internal representations. The input to the system can be either drawn image or user strokes. In case of the user stroke input, an interactive sketching interface is providedwithanautomaticportraitrenderingandcorrectionfeature. Afterthelandmarksareextracted,they andsupplementalinformationwillbeusedtoconstructa3Dfacemodel. Eachcomponentinthisframework isseparatelydevelopedtofittothetypeofthefacialmodelingi.e. expressionoridentity. 11 3.2 ExpressionvsIdentity The facial expression and identity modeling are mainly different in the usage of the dataset. The techniques using in each component are designed for the characteristics of the dataset. Table 1 compares the facial expressionmodelingandthefacialidentitymodelinginusingthisframework. Table1: Comparisonofpropertiesincreatingfacialexpressionandforfacialidentityinourframework. Properties Expression Identity Dataset(entries) Mocap(100K) Scanned3DFaces(100) Internal FacialLandmarks FacialLandmarks Representations FacialSurface FacialTexture Input Sketch Sketch TextureInformation 3DFaceConstruction SearchEngine(KD-Tree) StatisticalEstimation(Least-Squares) 12 4 FacialExpressionModeling Thissectiondescribesasystemof3Dfacialexpression modeling froma2Dsketch drawing. Amanipulated 2D portrait sketch serves a metaphor for automatically inferring its corresponding 3D facial expression with fine details. While 2D portraits may not show certain face details, e.g. wrinkles, this system automatically fills in these details of 3D face expressions in a data-driven manner: based on an edited 2D portrait, this system searches for the most matched 3D facial motion configurations in a pre-constructed facial motion database. As a proof of concept, the movable control points over a 2D portrait are available over the facial landmarksforusertoedittheportrait’spose. Whentheusersmoveoneoragroupof2Dcontrolpointsonthe portrait sketch, the reshaping process is automatically adjust other control points to maximally maintain the facial structure of the edited sketch. Finally, to map a 2D portrait to its corresponding 3D facial expression, the portrait is used as a query input to search for the most matched 3D facial configurations from a pre- constructed facial motion database. Figure 8 shows the schematic details of this 3D facial expression posing system. Figure 9 shows a snapshot of the running system. As shown in Fig. 9, its interface is composed of three panels: theportraitcomponentlibrary(left),a2Dportraitwindow(middle),andagenerated3Dfacewindow (right). Userscandragdesiredportraitcomponentsfromthelibrary(left)totheportraitwindow,andthenedit the assembled or initial portrait. Meanwhile, the 3D face view window (right) will be interactively updated basedonthe2Dportrait(middle)thatisbeinginteractivelyedited. 4.1 OfflineDataProcessing 4.1.1 2Dand3DSubspaces The live human facial expression are captured with high-fidelity 3D facial motions using a VICON Motion Capture (MoCap) system (the left panel of Fig. 10). An actress with markers on her face was directed to speak a delicately designed corpus four times, and each repetition was spoken with a different facial expression. In this data capture, a total of four basic facial expressions were recorded: neutral, happiness, anger and sadness,andthedatarecordingratewas120Hz. Morethan105,000framesoffacialmotiondata arecollected. Atotalofninetyfacialmarkers(themiddlepanelofFig.10)wasusedinthiswork. Afterdata capture, the 3D facial motion data is normalized by removing head motion. The 3D facial expression data is the recorded ninety markers. However, the 2D facial expression data which is represented by 2D facial 13 Figure8: Schematicoverviewofthisfacialexpressionposingsystem. 14 Figure9: Asnapshotofthisrunningfacialexpressionposingsystem. landmarks are not exactly enclosed in the ninety facial markers (the red points in middle panel of Fig. 10). Hence, we need to generate corresponding 2D facial landmarks for any 3D facial motion capture frame (the ninety 3D facial markers). As shown in Fig. 11, first, based on a motion capture frame and the specified correspondencesbetweenmarkersandvertices,thefeaturepointbaseddeformationtechnique[32]wasused to deform a static 3D face model. Then, given specified mappings between 2D facial landmarks and the vertices of 3D face geometry, the 3D positions of these specified vertices on the deformed 3D face were transformedandprojectedtoa2Dplane, whichoutputtedcorresponding 2Dfaciallandmarks. Inthisway, a pairbetweena3Dfacialmotioncaptureframeanditscorresponding2Dfaciallandmarksframewascreated. Thesepairswillbeusedin3Dfaceconstructioncomponent. Considering the large size of the collected facial motion capture data, a KD-tree [42] is used as our data structurefortheabovesearchduetoitsefficiency. TheKD-treeschemeconsistsofanoff-linepreprocessing and an on-line searching. In the off-line preprocessing stage, the dimensionality of the concatenated 2D control points is reduced to a fewer dimensions in its truncated PCA space, while retaining more than 95% of variation. Figure 12 illustrates the constructed KD-tree in this system. In this work, the Simple KD-tree library[43]wasused. 15 Figure10: FacialExpressionMoCapcomparingtoportraitlandmarks Figure11: Mocapmarkerstransformtoportraitlandmarks. 16 Figure12: IllustrationoftheKD-treestructureinthiswork. 4.1.2 PortraitComponentLibrary Since the facial expression contains basic poses that are normally used instantly, the Portrait Component Library(PCL)isprovidedtousersinordertospeedthesculptingprocess. ThepurposePCLofistoprovide a rapid prototyping tool to assemble an initial portrait by simply clicking. For a prominent part of a human face,suchaslefteyearea(includingeyebrow),userscansimplybrowseandpickonefromasetofpre-defined lefteyeportraitcomponents. Theseselectedportraitcomponents(onefromeachcategory)areassembledinto anew2Dportraitthatistypicallyusedasaninitialportraitforfurtherediting. Figure13illustratestheportrait componentlibrary. To construct the PCL, the 2D portrait landmarks that lie in a component area (e.g. the mouth area) are concatenatedtoformavector. ThentheK-meansclusteringalgorithmisappliedtothesevectorstofindtheir cluster centers, which are regarded as representatives of this category of portrait component. Finally, these representativeportraitpointsarerenderedusingthe2Dportraitsketchrenderingalgorithm. Inthiswork,we experimentally divideaportraitintothreecomponents (lefteyearea,righteyearea,andthemoutharea)and setK(thenumberofclusters)to10. 17 Figure13: ThePortraitComponentLibrary(PCL). 4.2 2DPortraitSketchingInterface For the initialized portrait sketch, users can either rapidly assemble a face sketch from the PCL or use a default neutral expression portrait. Then, users are allowed to edit the portrait sketch by moving one or multiple portrait landmarks which are now called control points (the right panel of Fig. 13). When portrait controlpointsareedited,themotionpropagationalgorithm[65]isadaptedtoautomaticallyadjusttheedited portraittomaximallymaintainitsnaturalnessandfaceness. Ahierarchicalprinciplecomponentanalysis[65] is employed for this motion propagation. Essentially, the face is divided into region nodes, then, PCA is computed separately for each node. Next, the tree from these nodes is constructed in order to propagate the motion projection from a leaf node to the whole tree at run-time. Initially, the portrait is divided into seven leafnodes(lefteye,lefteyebrow,righteye,righteyebrow,nose,upperlip,andlowerlip),andthenconstruct twointermediatenodes(theupperfaceandthelowerface). Finally,thewholeportraitisregardedastheroot ofthishierarchy. Fig.14showsthisportraithierarchicalstructure. Therulesfortheabovemotionpropagationprocedureare1)itchoosestomoveupwardpriortodownward inthehierarchyand2)itvisitseachnodeonlyonce. Thepropagationworksasfollows: first,whentheusers move portrait control points, the propagation starts at the lowest hierarchy which is one of the seven left- 18 Figure14: Illustrationoftheportraithierarchyformotionpropagation. nodes. Then,itpropagatesupwardtothemiddleofthehierarchy(upperorlowerfacenodes). Then,itmoves upwardtotherootnodethatistheentireface. Afterthat,itmovesdownwardagaintothemiddlenodethatit hasnotvisitedyetanditkeepsgoingupwardanddownwarduntilallnodesarevisited. Foreachnodeitvisits, it projects the control points contained in the node to the subspace spanned by the principal components of the node. In other words, the projection is the best approximation of the propagated motions in the PCA subspace of the node. We pre-computed a eigen-vector matrix EigMx and a mean vector MEAN from 2D portrait landmarks of each hierarchical node (Fig. 14). As such, each node in the hierarchy holds its own version of EigMx and MEAN. In our experiment, to cover at least 90% of the variation, we keep the largest 20eigenvectorsfortheentirefacenode,10fortheupperandthelowerfacenodes,and3foreachoftheseven leafnodes. The basic steps of the motion propagation algorithm [65] is shown in Algorithm 1. In the following algorithm description, F represents any node in the hierarchy, δV represents the displacement vector of all control points, and Proj(δV,F∗) denotes the projection of the F∗ part ofδV to the truncated PCA space of thenodeF∗. Figure15showsanexampleoftheportraiteditingwhenausermovesacontrolpointonthelefteyebrow. As we can see from this figure, other control points are adjusted accordingly. In particular, the adjustments 19 Algorithm1PortraitMotionPropagation Input: F∗,theselectednodeinthehierarchy. 1: sethtothehierarchylevelofF∗.; 2: ifhasBeenProcessed(F∗)then 3: return; 4: endif 5: ComputeProj(δV,F∗); 6: UpdateδV withProj(δV,F∗); 7: SethasBeenProcessed(F∗)tobetrue; 8: for∀F ⊆(level(F)=h−1andF∩F∗=NonEmpty)do 9: PortraitMotionPropagation(F); 10: endfor 11: for∀F ⊆(level(F)=h+1andF∩F∗=NonEmpty)do 12: PortraitMotionPropagation(F); 13: endfor oftheothercontrolpointsinthelefteyebrowareaarenoticeable. Figure15: Anexampleofmotionpropagationforportraitcontrolpoints. 4.3 3DFaceConstruction In this system, the 3D facial expression construction problem is transformed to an optimal search problem. From the 2D facial abstraction represented by the user-edited 2D portrait, the problem is to search for the corresponding 3D facial abstraction represented by 3D motion capture markers. Given the fact that a 2D 20 portrait is essentially determined by the portrait control points (landmarks), and all pairs of portrait control point frames and 3D motion capture frames are pre-computed in the offline data processing (Section 4.1), this search problem is formalized as: given a portrait control point frame P query (query input), we search for the optimal 3D facial motion capture frame FRM ∗ from all pairs{<P i ,FRM i >} in the database, assuming distance(P query ,P ∗ )isminimumamongallpossibledistances{distance(P i ,P query )}. InordertosearchforP ∗ ,P query isreducedtosamedimensionalPCA query byprojectingP query tothesame truncatedPCAspacepreparedfromtheofflinedataprocessingstep(Figure12). At run-time, the PCA query is used as a query input to search through the constructed KD-tree to obtain the K nearest neighbors. The searched K 3D motion capture frames is interpolated to generate a new 3D motion capture frame that is used to deform a static 3D face model. The interpolation weights are computed based on the Euclidean distance between the retained PCA coefficients of the query input and the retrieved Knearestneighbors. Intheexperiment,theKissettobe3. Finally,givena3Dfacialmotioncaptureframe, thefeature-pointbaseddeformationtechnique[32]isusedtodeformastatic3Dfacemodelaccordingly. 4.4 Results A comparative usability study on this system is conducted. A total of ten human subjects were asked to use boththeMayasoftwareandthissystemforthetaskofsculptingtarget3Dfacialexpressions. Sincethetarget user group is non-artist users but the people who is able to manipulate the computer in a good level, all the participants are computer science undergraduate or graduate students. The participants have an intermediate level skill of using basic tools of Maya, and were also trained to use this new system for one minute before the official start of this user study. Also, the participants were allowed to explore the new system tools for about2minutesbeforestarttheexperiment. Thescenarioofthisstudywasdesignedasfollows: 1. Eachparticipantisassignedfourtargetfacialexpressions(2Dimages). 2. Eachparticipantneedtosculpt3Dfacialexpressionsthataresufficientlyclosetothegiventargetfacial expressionimages,usingthissystemandtheMayasoftwarerespectively. 3. A professional animator judges whether the sculpted 3D facial models by the two tools (this system andtheMayasoftware)areacceptable(i.e. closeenough). 4. Thetimethateachparticipantspendonusingthetwospecifiedtoolsisrecorded. 21 Figure16: Theexamplesofourusabilitystudy. The average time using this system is less than 2 minutes for each 2D face image, and the average time using the Maya is about 27 minutes for each 2D face image. The experimental result revealed that compared with traditional 3D tools such as the Maya, this 3D facial expression posing/prototyping system can significantly save users’ efforts for the task of sculpting 3D facial expressions. Figures 16 show two examplesofsculptedfacialexpressionsbytheparticipantsinthisstudy. Inthisfigure,thetopleftisthetarget facial expression image. The top right is the sculpted 3D face expression using the Maya by a participant. The bottom is the posed 3D facial expression using this system by the same participant. Furthermore, at the end of this user study, feedbacks from the participant short comments are collected regarding the usability of this system. We summarized them as follows: (1) by using this system, they spend less time to deal with the 3D navigation gadget (e.g., zooming and rotation) to manipulate 3D vertices and meshes, which helps them focus on 3D shapes of the face components they are working on. (2) In the course of sculpting, the intermediate facial expression results produced by this system appear more natural than the Maya. This reduces significant repetitive efforts of re-adjusting 3D face models for them. (3) The Portrait Component Library(PCL)isusefulforthemtogetagoodstartandsavestheirtime. Also, numerous 3D facial expression results using this system are generated. Figure 18 shows the re- sulting 2D portraits (top row) and corresponding 3D facial expressions on different 3D face models (other 22 Figure17: Facialexpressionsonvarioussimultaneouslysculpted3Dfacemodels. rows). In addition to a single model, multiple 3D face models can be edited simultaneously in this system (Figure 17). After 3D facial expressions are generated, users can view and manipulate them from different angles. Oneadvantageofthisnewapproachisthatcertainfacedetailsareautomaticallysculptedon3Dfacial expressions, although they cannot be explicitly specified in 2D portraits. Figure 17 (right) shows a specific example. 4.5 Conclusion Inthisframework,aninteractivedata-driven3Dfacialexpressionposingsystemthrough2Dportraitmanipu- lationispresented. Thissystemisbuiltontopofapre-recordedfacialmotioncapturedatabase. Thissystem allows users intuitively edit 2D portraits and then automatically generates corresponding 3D facial expres- sions with fine details. The 2D portrait allows the user to focus on the face features and the 2D control give theusermoreintuitivelymanipulationover3Dcontrolsystem. Byconductingacomparativeuserstudy,this systemshowsthatitcanbeeffectivelyusedasarapidprototypingtoolforgenerating3Dfacialexpressions. Certain limitations exist in this current system. First, since a pre-recorded facial motion dataset is used, it is hard to predict how much data we need to collect to guarantee the generation of realistic 3D facial expressions for arbitrary 2D portrait input; some user controls that fall off the vocabulary are filtered out. However, this can be alleviated by acquiring more data if needed. Second, the number of the available portrait components and the number of portrait control points are still quite limited. Adding more portrait details into the system is one of the future directions that can be pursued. To improve its 2D portrait editing procedure,therearenumerousdirectionsthatcanbefollowed. Forexample,toallowuserseditportraitsfrom 23 Figure18: Sixeditedportraitsandcorrespondinggenerated3Dfacialexpressions. 24 different angles, rather than limited to the front view in current system. Another area to develop is a free- hand 2D portrait drawing tool that making the system more intuitive and friendly to users, especially novice users. Inaddition,currentsystemcanonlyinterpolatenewfacialexpressionsfromtheexistingfacialmotion database and cannot deal with exaggerated portraits. A portrait exaggeration function can be developed and seamlessly convert the exaggerated portraits to corresponding 3D facial expressions by a hybrid of machine learningandgeometricdeformationalgorithms. Thissystemutilizesthefacialexpressiondata(fromMoCap)tocrafta3Dfacialmodelingusingasearch enginescheme. However,forthefacialidentitymodeling,thefacialidentitydatasetisfarmoresparse(about 100 faces comparing to 100K faces in the MoCap) than the facial expression dataset. Thus, the approach to utilizethedatasetisrequiredtobeadaptedforthefacialidentityaswellasanapproachtodealwithatexture synthesis. 25 5 FacialIdentityModeling Unlike facial expressions, facial identities are distinguish not only by the facial landmarks but also subtle surface between landmarks as well as texture. An identity of a face is represented in this system by three representations including landmarks, surface, and texture. Each representation defines the facial similar- ity differently. Figure 19 shows inputs of each representation. The goal of this system is to connect each representationtosynthesizeaproperface. Figure19: Internalrepresentations. (A):3DLandmarkisrepresentedbyvertexlocationsEachvertexlocation contains x, y, and z coordinates. (B): Surface is represented by affine transformation matrices. Each affine transformationmatrixcontains3-by-3matrixofeachtriangletransformationfromanaveragefacetothenew face. (C):TextureisrepresentedbycolorpixelsinRGBchannels. Figure 20: A: Facial Landmarks on the face image. B: When connecting all landmarks together with lines, the face portrait can be automatically generated. C: Anchors (A) and Pivots (P) points using in creating face portraitsketchbyartists. 26 Facial Landmarks: Figure 19 (A) shows the landmarks using in this system. The 83 facial landmarks as shown in Fig. 20 (a) are used as the main input and as the connection to the other representations in the system. These landmarks can be obtained by an image detection technique or manually labeling via a user interface. They are located at the corners on the contours of the prominent facial features [19] which are normally used in the drawing practice by a traditional portrait sketch artist. When connected and rendered theselandmarkstoanimage,ahumancanrecognizetheimageasanindividualface[41]. FacialSurfaces: The Deformation Gradient (DG) [53] which is a 3-by-3 affine transformation matrix is the surface representation in this system. This matrix composes of rotation and shearing components. No translation is present, however, there is no global translation in the surface modeling. This matrix represents how each triangle is rotated and sheared from the triangle of the average human face as shown in Figure 19 (B). Figure 21 illustrates how the matrix can be constructed from the vertices and normals of the triangles. ThereasonthattheDGisselectedasthesurfacerepresentationinsteadofthevertexlocationsisbecauseDG producessmoothersurfacefromblendingmultiplefacestogether. Figure22comparestheresultsofdifferent types of the surface representation in the surface blending experiment. Eq. 5 shows the construction of the surfacematrixforPCA where X,Y, and Z arethefirst,second, and thirdcolumn, respectively, ofthe3-by-3 affinetransformationmatrixofeachtriangle. t isthetotalnumberoftriangles. misthetotalnumberoffaces inthedataset. Figure21: Anaffinetransformationmatrixbetween2shapesofatrianglecanbecalculatedfromtheirvertices andnormals. Here,V 4 istheaddedvertextothetriangletoV 1 withthenormaldirectiontocreateanewedge. 27 Figure 22: Comparison of different surface representations in blending: (A) the original face model, (B) vertex location blending, and (C) DG blending. The blending is performed by scaling some parts of the originalface. Facial Texture: The texture for each face is represented by 256-by-256 pixels. Each pixel contains 3 channels of RGB values. To create a skin tone of the face, we cropped a part of the forehead region of the facialtextureasshowninFigure19(C). To create a facial identity, all three representations must be synthesized in their own manners. The land- markrepresentationisusedtoconnecttheotherrepresentationstosynchronizethefacialsimilarities. 5.1 OfflineDataProcessing Our approach utilize the human face structures from a 3D facial identity dataset [63] to create the facial representation subspaces. The 3D facial identity dataset consists of 100 subjects (56 females and 44 males), ranging from 18 to 70 years old with diversified ethnicities. Each entry in the dataset contains a 3D face mesh, texture, and and 83 landmarks. The face dataset isprocessed to construct a prior knowledge about the humanfacestructureintheofflineprocess. Prototypefacemodel: Aprototypefacemodelisusedasthemain3Dtopologyforcreatingafullcorre- spondenceoflandmarks,surface,andtextureforallthefaces. Weselectedthetopologyof2043vertices,6042 edges,4000facesand12000UVcoordinatesfromaninternalfaceshaperepresentationofFaceGen[18]. The verticesofthistopologyareequallyplacedalloverthesurfaceandcreateagooddeformationmovement. This topologymightnotbesuiteforafacialanimation. Iftheanimationisrequired,itcanbelatermorphedtothe 28 animationtopologysubsequently. Figure23showstheprototypefacemodel. Figure 23: The prototype Model, from left to right, is the frontal-view mesh, the side-view mesh, the mesh withitstexture,andthetexturewithitsUVcoordinates. Facealignment: Tobuildthecorrespondencesforallthe3Dfacemodelsinthedataset,theprototypeface model is morphed to match each model in the 3D face dataset using an iterated closet point algorithm [53] (83 landmarks were used as the alignments). The prototype face model served as the source, and each face modelinthe3Dfacedatasetwasusedasthetargetinthismorphingprocess. Inthisway,100aligned3Dface models are created with the same topology (i.e. the number of vertices, connectivity structure, and texture coordinates). Eachfacialtextureisalignedtotheprototypefacemodelusingitstexturemappingcoordinates (UVs)tocreateafullcorrespondenceamongallfaceinthedataset. Eigen-vectorsof3Dfaciallandmarks,α LM : Eigen-vectorsof3Dfaciallandmarksareconstructedfrom X, Y, and Z coordinates of the 83 facial landmarks of all the faces in the dataset. The eigen-vectors are calculated by the Principal Component Analysis (PCA) technique. In this work, the PCA subspace spanned by the retained eigen-vectors contains more than 95% of the variations. The retained eigen-vectors are later used to estimate the depths of 2D landmarks at the 3D facial landmark estimation step. Eq. 4 shows the construction of α LM by PCA where < x,y,z > are the 3D coordinates of the 3D landmark, m is the total number of landmarks, and n is the total number of faces in the dataset. Also, we divide the 3D landmarks into different regions (α LM Regional ): the eyes, eyebrows, nose, mouth and outline, which are used later during thefacetexturesynthesisstep. 29 x 11 ... ... ... x 1m y 11 ... ... ... y 1m z 11 ... ... ... z 1m ... ... ... ... ... x n1 ... ... ... x nm y n1 ... ... ... y nm z n1 ... ... ... z nm =α LM .S.V (4) Eigen-vectorsofdeformationgradients(DG),α DG : Thesurfaceofeach3DfaceisencodedasDG[53] which is 3D affine transformation matrices between the surface and an average human face. Then, similar to the construction of α LM , PCA is used to construct eigen-vectors from DG of all the faces in the dataset. One important issue is that affine transformation matrices cannot be linearly interpolated, while coefficients inaPCAspaceareexpectedtobelinearlyinterpolatable. Thus,wefirstdecomposeanaffinetransformation matrix (A) to a rotation component, R, and a scaling component, S, using polar decomposition [49] (i.e., A=exp(log(R))×S) and construct eigen-vectors for each component. Eq. 5 shows the construction ofα DG by PCA where A 1 , A 2 , A 3 are the first, second, and third column vector of the affine transformation matrix (A), t is the total number of triangles, and n is the total number of faces in the dataset. The α DG is also generated separately for each gender and ethnicity. These extracted eigen-vectors are used later during the facesurfacesynthesisstep. A 1 11 ... ... ... A 1 1t A 2 11 ... ... ... A 2 1t A 3 11 ... ... ... A 3 1t ... ... ... ... ... A 1 n1 ... ... ... A 1 nt A 2 n1 ... ... ... A 2 nt A 3 n1 ... ... ... A 3 nt =α DG .S.V (5) Skin tone likelihood: Gaussian Mixture Models (GMMs) are used to calculate the skin tone likelihood 30 by first clustering similar skin tones in the dataset into 10 multivariate Gaussian distribution groups. These groups’ centroids are also used as the skin-tone bases for users to select, as shown in Fig. 26. Finally, a skin tone likelihood table (ψ Texture ) is constructed by calculating the likelihood of each face texture based on its distancetothecentroidsofalltheskin-tonebases(Eq.6). Thelikelihoodtableislaterusedatthefacetexture synthesisstep. P(T|I)= 1 (2π) M 2 |C k | − 1 2 exp{− 1 2 (I−μ k ) t C −1 k (I−μ k )} (6) Here, I is the facial texture identity, k is the group that the input T belongs to,C k is a M x M covariance matrixforthek th group,andμ k isthemeanofthek th group. 5.2 2DPortraitSketchingInterface Inthecasewhereauserrequirestodrawafaceinteractively,aninteractive2Dsketchinginterfaceisprovided withanautomaticdrawingcorrectionmechanism(Figure24). Anoverlayuserinterfaceisprovidedforauser todrawthelinesoverthe2Dsketchimage. Ausercanfreelydrawcontoursofafacecomponentshapeinthe sketching area as in the traditional pencil sketching. To smooth the drawn contours, we employ a Laplacian curvedeformation[44]toremovenoisefromtheuserstrokes. Oncethecontourisdrawn,astrokeanalyzeris usedtodetectedwhichfacecomponent,i.e. eyebrows,eyes,nose,mouth,andfaceboundary,isbeingdrawn. Thisfacecomponentdetectionisdonebyatemplatematchingtechnique. Basically,thetemplateofeachface componentisrepresentedbythefacecomponentlandmarksfromanaveragehumanface. Thedrawnstrokes arethencomparedtoeachfacecomponent’stemplatetofindthebestmatch. 5.2.1 LandmarkDetection Thesystemextractsthe83landmarks(Fig.20(a))fromthesecontoursinatwo-stepprotocol. First,thedrawn contoursarealignedtoadrawingtemplatewhichisusedtoidentifyanchorlandmarksandpivotlandmarksof eachfacecomponent(refertoFig.20(c)). Theanchorlandmarkisthestartingorendingpointofthecontour, locatedatoneofthecorners,andthepivotlandmarksareothercornerpointsonthecontour. Theselandmarks aredetectedbyourheuristicrulesincludingrelativepositionsandacornerdetectionalgorithm. Forexample, the corner point is detected if it is the vertical critical point along the contour. Second, from the anchor and pivotlandmarks,thesketchcontouristhenre-sampleduniformlyfortherestofthelandmarksthataredefined 31 Figure24: Asnapshotofthesketchinginterface. differentlyforeachfacialcomponent. Theanchorandpivotlandmarksforeachfacecomponentaredetected inthefollowingway: • Eyebrows and eyes: Anchor and pivot landmarks are identified as the first and the other corner posi- tionsofthestroke,respectively. • Nose: Two anchor landmarks are identified as the first and the last positions of the stroke. Two pivot landmarksaredetectedastheleftmostandtherightmostcorners. • Upperlip: Twoanchorlandmarksareidentifiedasthefirstandtheothercornerpositionsofthestroke. Threepivotlandmarksatthemiddleoftheupperliparedetectedasthestationarypointsalongthetop contour. • Lowerlip: Twoanchorlandmarksareidentifiedastwocornerpositionsofthestroke. • Faceboundary: Twoanchorlandmarksareidentifiedasthefirstandthelastpositionsofthestroke. In addition, if users need to change the position of any landmarks, the users can click and drag the particularpointtoadjustthecontours. 32 5.2.2 PortraitRenderingandCorrection Therefinedportraitaregeneratedbyaportraitrenderingtechniquetogiveaninstantfeedbackontheportrait being drawn. It starts with an artistic-style face portrait as initial pose in the background in order to give the users rough guidelines about the portrait structure and locations (Figure 24). Then, based on the 83 facial landmarks extracted from the user’ strokes, a non-photorealistic rendering technique is used to repaint the portrait in an artistic style. In this work, a model-based approach [41] is employed as the main renderer for prominent facial components. This approach enables the generation of different line drawings (a range of stylization) for different face parts while keeping the facial landmarks staying in place. Fig. 20 (b) shows the rendered result by this technique. Based on the facial landmarks, each facial component is rendered differentlyinthefollowingrules: Eyebrows: Each eyebrow contains 10 landmarks. To render the eyebrow, a bounding tube is firstly createdfromtheselandmarks. Then,thepencil-stylehatchingwithadefinedincliningdegreeisgeneratedin theboundingtubetomimictheeyebrowappearance. Eyes: Each eye contains 8 landmarks. Kochanek-Bartels spline is used to draw the top and bottom contours to create an eye shape. Then, an iris with the size of about a half of the eye space is generated by a hatchingfunction. Finally,aneyelidisdrawnjustabovethetopoftheeyeshapeusingaB-splinefunction. Nose: The nose contains 12 landmarks. The U-shape nose is generated by a B-spline curve at the left, middleandrightsidesofthenose. Mouth: Themouthcontains20landmarks(10fortheupperlipand10forthelowerlip). First,theupper andlowerlipsaredrawnbyaB-splinecurve. Then,ahatchingfunctionisusedtofillineachliptocreateits naturalappearance. Face boundary: Face boundary contains 15 landmarks. A B-spline curve is drawn to connect these pointstogethertoformasmoothprofileshape. Duringthesketchingprocess,toensurethequalityofthesketchcontours,thedrawncontoursarefiltered toaproperfacialshapeusinga2Dmorphablefacemodelwhichisa2Dversionofα LM . Atrun-time,oncea user finishes the drawing of each face component, its extracted landmarks are projected to the retainedα LM , whichgeneratesthefilteredlandmarks. Specifically,givenadrawingvectorS i ,S i isprojectedtotheretained α LM to yield its reduced representation C i (Eq. 7), and then C i is projected back to reconstruct its filtered drawingvector ˆ S i (Eq. 8). 33 Figure25: Interactivesketchingprocess. 34 C i =α T LM .(S i −μ) (7) ˆ S i =μ+α LM ∗C i (8) Here,μ isthemean2Dlandmarksofallthefacesinthedataset. Fig.25illustratesthisfilteredprocess. In thisfigure,left: theuserstrokesafterlandmarks(shownasreddots)areextracted. Inthemiddle,thedrawing contoursarefilteredbasedonth2Dmorphablefacemodeltomaintainthenaturalnessofthecontour. Onthe right: therenderedportraits. Eachrowshowsanexampleofeachfacecomponent. 5.3 3DFaceConstruction AsschematicallyillustratedinFig.26,the3Dfaceconstructionconsistsofthefollowingthreemaincompo- nents,namely, 3Dfaciallandmarkestimation,3Dfacesurfacesynthesis,andfacetexturesynthesis. 3Dfaciallandmarkestimation: From the detected 2D facial landmarks, corresponding 3D facial land- marks are estimated by a statistical inference method, inspired by the work of [5]. The main distinctions betweenourapproachand [5]are,inourapproach: • The perspective distortions of the 2D landmarks are used as a clue to estimate their depths while [5] basesonanorthogonalprojectionassumption. • ThesurfaceisrepresentedbyDGsinsteadofvertexlocationstomaintainthesmoothnormaltransition withouttheneedofaheuristicregularizationasin [5]. • The 3D landmark estimation and 3D face surface synthesis are decoupled to prevent an over-fitting problem where the surface tries too hard to fit to the landmarks causing non-smooth surface. This separationalsoallowsthesurfacetobealteredlater. 3D face surface synthesis: From the obtained 3D facial landmarks, a 3D face surface is synthesized by anexample-basednon-linearinterpolationtechnique. Facetexturesynthesis: Tosynthesizethetexture,wefollowhowanartistcolorsa2Dfacesketch. Basi- cally,anartisttypicallypaintsa2Dportrait/sketchbasedontheimplicitassumptionofitsfacialdescriptions, 35 i.e. gender, ethnicity, and skin tone in his/her mind. As such, in this work, we formulate it as a search-and- synthesis problem by first searching for most proper textures in a library and then synthesizing a novel face texture. To alter the surface and texture, the system can synthesize surface and texture based on a given example input from a user. This is particularly useful for a artistic face modeling. In this artistic face modeling, gradientblendingtechniquesareemployedtoalterthe3Dfacesurfaceandtexturetowardsanyuser-provided artistic face example. Specifically, the gradient blending techniques transfer and merge the reconstructed textured3Dfaceandtheuser-providedartisticfacetogethertogenerateanewidentity-embedded3Dartistic face. 5.4 3DLandmarkEstimation Each detected 2D facial landmark consists of X and Y coordinates in an image space. To estimate the 3D landmarks, the Z coordinate (depth) of each 2D landmark needs to be reconstructed. If we directly use the X and Y coordinates of the 2D landmarks as the X and Y components of the 3D landmarks and just fill up the Z value, the 3D landmarks will be visually distorted and result in producing the distorted 3D model (Fig. 28). This is because the 2D landmarks in an image space are already perspectively distorted in the rendering process. This perspective projection property is highly beneficial in inferring the depth information. Therefore, to properly estimate the 3D landmarks, the perspective projection needs to be taken into consideration when estimating the depths of the landmarks. Fig. 27 illustrates the process of the depth estimationtechnique. Weassumetheartist’svisionisaperfectcameraandthecenterofprojectionisthemiddleofthe2Dface sketch. Inthisway,theperspectiveprojectionofthe3Dlandmarks, ˆ uand ˆ v,canbederivedbytheEq.9. ˆ u= x.d z ˆ v= y.d z (9) Here, x, y, and z are 3D landmarks in the model space, and d is the focal length. In this work, we set the d to be 22 millimeters according to the average human eyes’ focal length. Assuming the detected 2D landmarks from the sketch image are u and v, we can formulate the 3D facial landmark estimation problem asthefollowingenergyoptimizationproblem. 36 Figure 26: Schematic overview of the 3D face construction. It consists of the following three main compo- nents,namely,3Dfaciallandmarkestimation,3Dfacesurfacesynthesis,andfacetexturesynthesis. 37 Figure 27: The depth of each 2D landmark is estimated by using the distortion of the perspective projection as a hint. The 2D landmarks are projected back-and-forth to 3D space with the estimated depth (Z) which initialized with an average depth. In 3D space, the depth is estimated again with PPCA to maximize its likelihood. Thisloopisstoppedwhentheprojected2Dlandmarksareclosesttotheincoming2Dlandmarks. The3Dlandmarks(x,y,z)aretheresult. 38 Figure 28: 3D reconstructed results comparing parallel and perspective projection assumption in estimating the depths. The input sketch is showing on the left. A: the 3D ground-truth model of the sketch. B: the reconstructed 3D face from the parallel projection. When parallel projecting back the 2D sketch to the 3D space, the face will be leaner than the original face since the perspective factor is not taking into account. On average, the surface reconstruction error of the parallel assumption is about 9 m.m. C: the reconstructed 3Dfacefromtheperspectiveprojection. Onaverage,thereconstructionerroroftheprojectiveassumptionis about1m.m. Allthe3Dresultsareshownintheperspectiveprojectionofthe3Dmodels. argmin ˆ u,ˆ v {∑ i (ku i − ˆ u i k 2 +kv i − ˆ v i k 2 )} or argmin x,y,z {∑ i (ku i − x i .d z i k 2 +kv i − y i .d z i k 2 )} (10) To solve the above Eq. 10, x, y, and z are initialized with the average 3D landmarks calculated from the dataset. Then, at each iteration, x and y are updated by: x i →x i − α xi .z i d and y i →y i − α yi .z i d , whereα x andα y aretheresiduesfromthefirstandthesecondcomponentsofEq.10,respectively. Toupdatethez,basedonthe observationthatziscorrelatedwithxandyasencodedbyPCAofthe3Dlandmarks,PPCAalgorithm[56]is employedtoestimatezfromthexandybyusingtheeigen-vectorsof3Dlandmarks(α LM ). Basically,PPCA estimationisachievedbyassumingtheGaussiandistributionoverthePCAsubspaceandusingthelikelihood of the 3D landmarks in the subspace to optimize the objective function by EM algorithm. In this work, the objectivefunctionisalower-boundofthelog-likelihoodasshowninEq.11. D 2 (1+logσ 2 )+ 1 2 (Tr{Σ}−log|Σ|)+ 1 2 kfk 2 − D h 2 logσ 2 prev (11) Here D is the number of landmarks multiplied by 3 (for x, y and z), f is the coefficients of the 3D 39 landmarks in PCA space constructed from the current x, y, and treating z as a missing value by initialized it with0,σ 2 isthevarianceofreconstructedlandmarksfrom f,σ 2 prev isthevariancefromthepreviousiteration, Σ is the covariance matrix of f (i.e., calculated fromα LM andσ 2 prev ), and D h is the number of landmarks. In each iteration, the E-step is performed on f which updating the z value, and the M-step is performed onσ 2 . The optimizations of both Eq. 10 and Eq. 11 are repeated until they converge (i.e., the energy difference is lessthanaspecifiedthreshold). Figure29visualizesthedifferencesbetweentheground-truthandtheestimatedz-valuesbyourapproach. Theinput2Dlandmarksaregeneratedfromthe3Dmodelsoftheground-truthmodelswiththefixedrender- ing parameters. Also, it shows a reconstruction error comparison in millimeters between our technique and themorphablemodeltechnique[5]. Inthismorphablemodeltechnique,thexandyofthe3Dlandmarksare firstestimatedbytheuandvofthe2Dlandmarksusingthedepthsoftheaverage3Dlandmarkstocompen- sate the perspective distortion. Then, the estimation is performed by pseudo-inverse the basis matrix. The basismatrixisthemultiplicationsofthemappingmatrix(from2Dto3D),theeigen-vectors,andthestandard derivation diagonal matrix. Since our approach utilizes the perspective projection property in estimating the depth, our reconstruction errors are lower than the morphing technique where only the PCA coefficients are considered. Also, the depths are calculated by the plausible-fit manner according to the 2D landmarks in the 3Dspace instead ofderiving purely fromthePCAcoefficients asin [5]. Thisproduces higher flexibility for anout-of-human-rangefaceshapeswithouttheneedofregularization. Since the estimation errors are subjected to the input 2D landmarks, the effects of noises over input 2D landmarksarealsoevaluated. Inthisexperiment,theGaussiannoisesofstandarddeviationbetween1pixels to20pixelsareaddedtoeach uand vofeach2Dlandmark. Figure30showsthelandmarkestimationerrors effected from added noises for each 3D landmark estimation technique. To make the morphable model gen- eratethe3Dlandmarksclosertotheground-truthin3Dspace,theimagespacelandmarksarefirsttransform tothe3Dspacebyusingtheaverage3Dlandmarks’depths. To evaluate the effect of the wrong assumption on the focal length (d), the Gaussian noises of standard deviation between 1 to 20 m.m. are added to the d of the depth estimation process. Figure 31 shows the effectstothe3Dlandmarkestimationerrorsofourapproachcomparingtothemorphablemodelwiththethe parallel projection and our morphable model where the the average landmarks’ depths are used to transform theimagespacelandmarkstothe3Dspacelandmarks. Oncethenoiseisaddedtothelevelofabouttwicethe focal length assumption (16 m.m.), our approach produces the errors worst than the morphable model. This 40 Figure 29: The 3D landmark estimation errors. The errors are the differences between each estimated 3D landmark and each of the ground-truth’s 3D landmarks of the thirty face examples in millimeters. (A): the errordistributionofthe833Dlandmarksovertheface. (B):theerrorscomparingour3Dlandmarkestimation to the morphable model technique [5]. On average, our approach produces 1 m.m. lower in error for each landmark. isbecausethefocallengthassumptionisverycrucialinestimatingthedepthinourapproach. 5.5 3DFaceSurfaceSynthesis Toconstructa3Dfacesurface,asetofDGsisusedtocraftthesurfacetofittothe3Dlandmarks. Besidethe humansurface,ourapproachcansynthesizeanartisticstylesurfaceusinginaartisticfacemodeling. 5.5.1 SurfaceGeneration Fromtheestimated3Dlandmarks,aplausiblehumanfacesurfacecanbemoldedfromasetofDGexamples. Specifically,MeshIK[54]isemployedtointerpolatetheDGsofhumanfaceexamplesconstrainingbythe3D landmarks. Instead of using DGs directly from the human surface dataset, the new set of DGs is constructed from the major principle components of α DG (constructed from Section 5.1) to cover a proper variation of humansurfaces. Basically,thesenewDGexamplesareconstructedfromeacheigen-vectorintheα DG which represents a major variance across the whole DG dataset. Each eigen-vector is also orthogonal to each other providingaproperspanfortheinterpolation. TheseDGexamplesarecalledeigen-surfacesinthiswork. An eigen-surface is constructed by adding the mean DG to a retained eigen-vectorα DG . In other words, EigenSurface i =α DG (i)+μ DG . Note that the two components (log(R) and S) are processed separately. 41 Figure 30: By adding Gaussian noises to the 2D landmarks, our approach produces lower errors than the morphablemodeltechniqueforthenoisyinputs. 42 Figure 31: By adding Gaussian noises to the focal length in the depth estimation process, our approach producesmoreerrorsforthelargeramountofthedifferencefromthefocallengthassumption. 43 Finally, these DGs are then used in the MeshIK framework. Figure 32 shows ten constructed eigen-surfaces (inadescendingorderbytheireigen-values). Figure 32: Ten constructed eigen-surfaces ordered by their eigen-values in a descending order from left to right(i.e. thetoprowisthefirstfivehighesteigen-valueshapes). Thesesurfacesareexaggeratedbyascaling factor(=30)tovisualizetheirdeformationdirectionsfromtheaveragesurfaceshowingintheleft-mostbox. The MeshIK isa least-square problem as shown in Eq. 12 and itcan be solved by a sparse matrix solver. ThesetofDGexamples,whichareaffinetransformationmatrices,aretransferredaccordingtotheirweighted sumbytakingthe3Dlandmarksasconstraints. argmin x kTx−E−Sk (12) Here, T is a linear operator constructed from an average face surface, x is the vector containing the resultant target vertex locations, S is the 3D landmarks (as constraints), and E is the set of DG examples (i.e., the collection of the obtained eigen-surfaces ). Since each DG is the multiplication of R and S, the interpolation required to be solved non-linearly. This non-linear interpolation can be effectively solved by Gauss-Newton iterative algorithm. For the artistic surface, the T is replaced by the artistic-style surface example. This results in the 3D surface based on the artistic-style surface example with a variation of the humanidentities. Figure 33 shows the surface reconstruction errors effected by adding noises to each technique of 30 examples similar to the Figure 30’s experiment. The surface errors generated from our approach are lower 44 than the other approaches because more surface examples are provided to create higher flexibility in surface fitting process. All the surfaces for each technique are generated from the same 3D landmarks from our 3D landmarkestimationtechnique. Figure 33: By adding Gaussian noises to the 2D landmarks, our approach produces lower errors than the otherapproachesforthenoisyinputs. 5.5.2 IdentityEngineeringAlgorithm Incertainapplications,the3Dhumanfacesurfaceand3Dartistic-stylesurfaceneededtobeblendeddirectly. To avoid confusion, the human surface is abbreviated as the HI-face (human identity face) and the artistic surfacefaceisabbreviatedastheAS-face(artisticstyleface). There are 2 options to craft a artistic face surface: (1) Using only the provided AS-face in Eq. 12 to morph to the human’s 3D landmarks. (2) Using both the provided AS-face surface and the HI-face surface 45 andcarefullycombiningthem. Thefirstchoiceisstraightforwardbutthesurfacevariationsarelimitedtoone AS-face as shown in Figure 34. On the other hand, the second choice produces higher surface variations by imitatingthevariationofthehumansurfacesasshowninFigure34. TocombinetheHI-faceandAS-facetogether,asurfaceblendingtechniqueisrequired. Thisproblemcan besolveddirectlyintheMeshIKframeworkbythefollowingleast-squareequation(Eq.13). arg x mink(T h +T c )x−A h −A c −Sk (13) Here,T h andT c arelinearoperatorsconstructedfromtheHI-faceandAS-facerespectively,A h andA c are the affine transformations of the HI-face and AS-face respectively; and x is the vector containing the vertex positions of the resultant identity-embodied artistic face; again, S is the 3D landmarks setting as constraints. This blending technique basically compensates between fitting the surface to HI-face and adding AS-face whilemaintainingthesurfacesmoothness. However,theresultantfacesfromthisblendingarerandomlymixedbetweentheAS-faceandtheHI-face. Tobeabletocontroltheresultantface,weintroduceanIdentityEngineering(IE)algorithmthatcapturesthe uniqueness of the HI-face and transfers it to the AS-face automatically. These techniques include the ability tocontrolthelevelofidentityuniquenessoftheHI-face. To be able to work on the facial identity, the PCA-based facial recognition technique [57] is exploited. Basically, the facial identity measurement is performed in the PCA space. In this work, we refer to the projected PCA coefficients of a face in the eigen subspace as its feature vector. In this subspace, we can interpolate the two feature vectors as a way to control the blending result. However, the combined face will loosetheuniquenessesofbothofthefacesasshowninFig.35becauseeachdimensioninthefeaturevectoris interpolatedequally. IntheIEalgorithm,eachdimensioninthefeaturevectoristreateddifferentlydepending onitsimportance. TheoverviewofIEalgorithmisschematicallyillustratedinFigure36. The IE algorithm consists of face registration, identity selection, and identity transfer steps. At the face registration step, we first register the provided AS-face with the reference face model through deformation transferring[53]asintheofflinedataprocessing(Section5.1). Then,theaffinetransformationsofallthetri- anglesoftheregisteredAS-faceareextracted. Finally,thelog(R)componentofthefaceistransformedtoits PCA feature vector by projecting it to the pre-constructed PCA subspace,α DG . Similar PCA transformation 46 Figure34: TheleftcolumnmodelistheAS-faceandthetoprowshowstheinputsketchimages. Thesurface synthesiswithchoice(1)isshowninthemiddlerow: using3DlandmarkstomorphtheAS-facedirectly.The surfaces are very similarto each other. On the other hand, choice (2) shown in the lastrow which blends the humansurfacestogetheryieldshighersurfacevariationsandmorenaturalsurfaces. 47 Figure 35: Comparison of different blending schemes. (Left): shape transferring via deformation gradi- ent [53], (Middle): half interpolation between their corresponding PCA feature vectors, (Right): our IE approach. Figure36: TheschematicoverviewoftheIEalgorithm. 48 isappliedtotheScomponent. Forconvenience,weshortlyuseV h andV c torefertotheobtainedPCAfeature vectoroftheHI-faceandtheAS-facerespectively. At the selection step, to identify the identity uniqueness, the likelihood of the V h showing in Eq. 14 is considered. L human = 1 (2π) M 2 exp{− 1 2 M ∑ i=1 V 2 h i v i } (14) Here, M is the number of the eigen-vectors, v i is the i th eigen-value, and V h i is i th coefficient of the feature vectorV h . L human is the likelihood ofV h which defines a similarity between the face and the average human face in which the lower the likelihood, the more unique of the face. From the Eq. 14, the uniqueness score of coefficient i can be calculated by V 2 h i v i . These uniqueness scores are the main criteria to select which coefficients to be transferred toV c . To make the scores comparable, the scores are normalized to be 0-1 and putinonevectorwiththesamedimensionasV h whichwecallitascorevectorS h . From the score vector S h , the coefficients ofV h which have a uniqueness score higher than the threshold setting from the slide-bar value (ranging from 0-1) are selected. Then, these selected coefficients are trans- ferred to V c to create the new vector e V c at the transfer step as described in Algorithm 2. Thus, when users movetheslide-barfromlefttoright,theresultantfacewillbegraduallychangedfromhighdefinitiontolow definition of the HI-face while the resultant face always contains the main identity of the HI-face. Figure 37 shows the different results obtained by varying the slide bar values. Figure 38 shows the 3D geometries of twoartistic-stylefaceresultswithhalfslide-barsetting. Next,wetransform e V c backtotheiraffinetransformationmatrixandtexturerepresentationtoreconstruct the3Dfacemodels. ThegeometryreconstructionissolvedbyEq.15. Algorithm2IdentityEngineering Input: V h ,V c ,S h ,threshold 1: fori=1tosizeof(V c )do 2: ifS h (i)≥threshold then 3: e V c (i)=V h (i); 4: else 5: e V c (i)=V c (i); 6: endif 7: endfor 49 Figure 37: Different results by varying a slide-bar. By sliding a bar to the right, the subtle identity from HI-facewillgraduallydisappeared. Figure38: Examplegeometricresultsbyourapproach. 50 arg x minkTx− e A c k (15) HereT isthelinearoperatoroftheaveragefacesurfaceand e A c isthereconstructedaffinetransformations (from e V c ), and x is the vector containing the vertex positions of the resultant identity-embodied artistic face. ThissparsematrixoptimizationcanbealsosolvedbyCholeskyfactorization. 5.5.2.1 User Interface and Control One primary design goal of our approach is to allow users to in- tuitively control the embodied levels of the human identity on the target artistic style. Accordingly, our approachprovidesaslidebartousers(refertoFig39)toadjustthevaluefrom0to1. Figure39: Theruntimeuserinterfaceofourapproach. (Left): aninputhumanface(HI-face),(Middle): the resultant identity-embodied artistic face with a slide-bar to control a level of HI-face over the AS-face in the resultantface,(Right): aninputartisticface(AS-face). 51 5.6 TextureSynthesis To synthesize a facial texture, the gradient transfer technique similar to the one used in the surface synthesis (Section5.5)isemployedbutinthecolorspaceinsteadofthe3Dspace. 5.6.1 HumanTextureGeneration Firstly,atexturesearchengineschemeisemployedtosearchforappropriatetexturesinthedataset. Theesti- mated3Dlandmarksandtheselectedskintoneareusedasthesearchcriterionwithanassumptionthatsimilar textures share a similar landmark structure and skin tone. To properly search for the candidate textures, the searching is performed only within a specified group of genders (male or female) and ethnicities (European, Black, and Asian) upon the user’s inputs. Finally, a joint probability combines these two searching criterion togetherasshowninEq.16. P(LM,T|I)= P(LM|I)×P(T|I) (16) WhereI isthefacecandidateinthedataset(accordingtotheselectedgenderandethnicity)toberanked, P(LM|I) is the similarity score between the estimated 3D landmarks and I’s 3D landmarks measured by their Euclidean distance inα LM space, and P(T|I) is the likelihood score inψ Texture table from the selected skin tone to theskin tone of I. The highest score texture isthen used as thefacial texture candidate. Also, to providemorefacialtexturevariations,thesearchisperformedseparatelyoneachofthevariousfacialregions whichincludetheeyes,eyebrows,nose,mouth,andfacialoutline. Thus,eachfacialregionwillhaveitsown texturecandidate. Secondly, to be able to smoothly combine all regional textures together without producing artifacts (Fig- ure 41), the Poisson Image Editing [45] gradient transfer technique is employed as shown in Eq. 17. To preserve a global continuity between regions, the texture of the outline region is picked as the base texture and the other regional textures will be transferred to it. Figure 40 shows an example of a texture synthesis resultingfromthecombiningoffivefacialregionsfromfivetexturecandidates. arg F min ∑ allpixels kF−(F base −H R )k (17) Where F istheimagegradientofthecomposited texture, F base istheimagegradientofthebasedtexture 52 whichistheoutlinetexture,andH R istheimagegradientofthetexturecandidateoftheregionR. Figure 40: The resultant synthesized texture is shown in the middle. From the texture candidates, they are combined to the base texture (the outline region). Each texture candidate is represented by the unmarked texturearea. 5.6.2 ArtisticTextureGeneration Similar to the surface synthesis part, the texture also can be created in an artistic style from the provided artistic texture example. In this work, the Poisson image editing technique is then modified to capture the texture gradient between the HI-face (from Section 5.6.1) and an average human texture and to transfer the gradient to the AS-face texture (the base texture) as shown in Eq. 18. Note that, we transfer not only the texture gradient within the HI-face texture but also the gradient between the HI-face texture and an average humanfacetexture;thus,thevectorfieldrepresentationisexpandedaccordingly. 53 Figure 41: The comparison of synthesized textures compositing of the five texture candidates in Figure 40 showing from left to right: copy-and-paste of RGB, copy-and-paste of only V channel (from HSV), and our approach. Thecopy-and-pasteisperformedbyfillingthebasetexturewiththeotherregionaltextures. Since the facial texture is mainly differed in tone, transferring only V channel of HSV is also conducted. The artifactsofthecopy-and-pastearestillexistinbothcases. arg F min ∑ allpixels kF−(C−(H−A))k (18) WhereF istheimagegradientoftheresultanttexture,C istheimagegradientoftheAS-face,andH−A is the image gradient between the image gradient of HI-face (H) and the image gradient of an average face (A). Since the AS-face texture can be in any arbitrary color and not only limited to the reddish-base-color of the human skin texture, there is an alternative way to blend the human texture to the artistic texture without reveal the human reddish colors. In this alternative, HSV color space is used instead of the RGB color space to maintain the AS-face base-color. The tone channels (S and V) of HSV are the only channels to be transferredtotheartistictexturewhilethevaluesintheHuechannelarekeptintact. ThisisbecausetheHue channel represents the base-color of the AS-face while the S and V channels represents only the variation of the base-color. This will produce the texture that contains an appearance of the AS-face while only the tonevariationsaremodifiedaccordingtotheHI-face. Figures42illustratestheartisticfacesynthesisprocess comparingthetwoalternativestoblendthetextures. 54 Figure 42: The comparison of the artistic texture synthesis process on different color spaces. To emphasis the difference, the gradients are scaling up by 5. The RGB blending produces the texture with human- skin-base-color on the result while the HSV blending (maintaining Hue) produces the texture with the pure artistic-style-base-color. 55 5.7 Results Weconductedavarietyofexperimentstoevaluatethesynthesizedhumanfacesandtoillustratetheusefulness oftheartisticfacemodeling. 5.7.1 HumanFaceEvaluation Thecontrolledenvironmentexperimentisfirstconductedtoprovetheconceptofthehumanfacereconstruc- tion. Thisexperimentisachievedbyasetofassumptionstotheinputsketch. Theassumptionsare: • Theinputsketchispurelyinafrontalpositionwithoutminorrotationortranslation. • Thefocallength(d)forthesketchimageisaconstantvalueof22m.m. To set up the environment, the thirty ground-truth human face models are generated from FaceGen [51] tocreatethenovel3Dfacesoutsidethedataset. Fromtheseground-truthmodels,colorimagesofthefrontal facemodelsarerenderedwiththefixedfocallengthof22m.m. Asketchimageofeachmodelisthendrawn by an artist from its rendered image to create an input to the system. Figure 43 illustrates the evaluation processpipeline. Inthesystem,firstly,anASMbaseddetection[58]isusedtoroughlyextract2Dfaciallandmarks. Then, the landmarks are manually adjusted to fit to the sketch image via a simple user interface. Also, the facial descriptions(i.e. genderandethnicity)ofthefacesaremanuallylabeledtoreflecttheirdescriptions. Figure44 showsthe3Dlandmarkestimationerrorsfromthe2Dlandmarksofthesketches. Figure45andFigure46shows12examplesof30examplesgeneratedfromoursystemcomparingtotheir facesketchimages,theground-truthmodels. Figure47,Figure48,andFigure49showthereconstructedsur- face comparison between the ground-truths that are used to create the sketch inputs, Radial Basis Functions (RBF), Laplacian-based technique [44], and our approach. Figure 50-(A) visualizes the distribution of the average surface reconstruction errors (in millimeters) over the face areas. Figure 50-(B) compares the re- construction errors of our approach with the RBF and the Laplacian technique [44]. Our approach produces higher accuracy than the RBF and Laplacian based techniques since more variety of face surface examples areutilizedtogeneratethesurface. 56 Figure 43: The pipeline of the evaluation process. The ground-truth 3D models are first rendered with fixed parameterstocreatecoloredfaceimages. Theartistsdrawthesketchestomatchthecoloredfaceimagesand thedrawnsketchesareusedtogeneratethe3Dmodelbyoursystem. Figure 44: The 3D landmark estimation errors of the sketches’ landmarks. The errors are the differences betweeneachestimated3Dlandmarkandeachoftheground-truth’s3Dlandmarksofthethirtyfaceexamples inmillimeters. (A):theerrordistributionofthe833Dlandmarksovertheface. (B):theerrorscomparingour 3D landmark estimation to the morphable model technique [5]. On average, our approach produces 1 m.m. lowerinerrorforeachlandmark 57 Figure 45: Example results from the system. The left-most column shows input sketches drawn by an artist from the rendered images of the ground-truth models. The middle column shows the ground-truth models (surface and textured face respectively). The right-most column shows the 3D human results generated by oursystem. 58 Figure 46: More example results from the system. The left-most column shows input sketches drawn by an artistfromtherenderedimagesoftheground-truthmodels. Themiddlecolumnshowstheground-truthmod- els (surface and textured face respectively). The right-most column shows the 3D human results generated byoursystem. 59 Figure47: ReconstructedsurfacecomparisonbetweenRBF,Laplacian,andourapproach. 60 Figure48: ReconstructedsurfacecomparisonbetweenRBF,Laplacian,andourapproach. 61 Figure49: ReconstructedsurfacecomparisonbetweenRBF,Laplacian,andourapproach. 62 Figure50: (A)Thedistributionofthesurfacereconstructionerrors(inmillimeters)overtheface. Theerrors aretheEuclideandistancesbetweeneachreconstructedvertextoitsground-truthvertex(averagedoverthirty face examples). (B) Reconstruction error comparison between our approach, the RBF, and Laplacian based technique. BothRBFandLaplacianbasedtechniquesareemployedtodeformtheaveragefacesurfacetothe 3Dlandmarkssettingasconstraints. Onaverage,ourapproachproduceslessthan1.6m.m. inerrorsoverall thesurfaces. 63 5.7.2 ArtisticFaceEvaluation To evaluate the artistic face modeling, the ground-truth of the artistic face models are required to be con- structed first. In this experiment, we employed IE algorithm (Section 5.5.2) and the artistic texture synthesis (Section 5.6.2) to produce the 3D artistic models as a ground-truth. The available human face sketches from Section 5.7.1 are then used to synthesize the the HI-faces to the system. Figure 52 and Figure 53 shows the artisticfacemodelsconstructedfromthismethod. Tokeepartisticstyleontheface,thethesketchconstraints areusedonlyoncraftingthehumanidentitybutarenotusedintheartisticfacecraftingprocess. Thethresh- olds from the slide-bar using in these experiments are between 0.1-0.5 to reflect higher human identity over theartistic. Figure51illustratestheartisticevaluationprocesspipeline. Figure51: Thepipelineoftheartisticevaluationprocess. Theground-truth3Dmodelsarefirstgeneratedby our IE algorithm and rendered with fixed parameters to create colored artistic face images. The artists draw thesketchestomatchthecoloredartisticfaceimagesandthedrawnsketchesareusedtogeneratetheartistic 3Dmodelbyoursystem. 64 Figure52: Examplesofartisticfaceresultsconstructedfromtheinputfacesketches. 65 Figure53: Examplesofartisticfaceresultsconstructedfromtheinputfacesketches. 66 SimilartotheSection5.7.1, toevaluate the3Dartistic-stylefacemodelreconstructionofinputsketches, theartistsdrawthefacesketchesfromtherenderedimagesofthepreparedartisticmodelsfromFigure52and Figure 53. The 2D landmarks are then extracted from the sketches by ASM based detection [58]. However, unlike the human face sketches, the artistic sketches are far away from a human shape, the landmarks are required to be manually adjusted to fit to the sketch image via our user interface. To synthesize the surface and its texture of the 3D models, the example cartoon of each model is also provided to the system. The 3D cartoon model is used to replace the average 3D human face in the components of depth estimation and surfacesynthesis. Thecartoontextureisusedinthetexturesynthesis. For3Dlandmarkestimation,sincethedepthoftheartistic-stylefacecanbearbitrary,theartisticexample isrequiredinestimatingitsdepth. Todoso,byreplacingthehumanaverage’s3Dlandmarkswiththeartistic example’s 3D landmarks in PPCA, the estimation is shifted from the human face to the artistic example face while keeping the variances the same. This results in the 3D landmarks with the artistic face depth in the variances of possible human identities. Figure 54 shows the depth estimation errors from using this technique. Figure 54: The 3D landmark estimation errors of the artistic sketches’ landmarks. The errors are the differ- ences between each estimated 3D landmark and each of the ground-truth’s 3D landmarks of the thirty face examples in millimeters. (A): the error distribution of the 83 3D landmarks over the face. (B): the errors comparing our 3D landmark estimation to the morphable model technique [5]. On average, our approach produces7m.m. lowerinerrorforeachlandmark Figure 55 and Figure 56 shows the reconstructed results by our approach. Figure 57 shows the recon- structedartistic-stylesurfacecomparisonbetweentheground-truthsthatareusedtocreatethesketchinputs, 67 Radial Basis Functions (RBF), Laplacian-based technique [44], and our approach. For RBF and Lapla- cian techniques, the original artistic model is used as the base model to be deformed according to the 3D landmarks. Figure58showsthereconstructederrorsbyourapproachcomparingtoRBFandLaplaciantech- niques. For the artistic-style face, the laplacian is performed poorly in visual contest because the technique doesnotawareofthehumanfacestructurebutonlytryingtokeeptheneighborverticesinthesimilarcurves oftheoriginalmodel. 68 Figure55: Examplesofartisticfaceresultsconstructedfromtheinputartisticsketches. 69 Figure56: Examplesofartisticfaceresultsconstructedfromtheinputartisticsketches. 70 Figure57: Reconstructedartistic-stylesurfacecomparisonbetweenRBF,Laplacian,andourapproach. 71 Figure 58: (A) The distribution of the surface reconstruction errors (in millimeters) over the face. The errorsaretheEuclideandistancesbetweeneachreconstructedvertextoitsground-truthvertex(averagedover twelve face examples). (B) Reconstruction error comparison between our approach, the RBF, and Laplacian basedtechnique. BothRBFandLaplacianbasedtechniquesareemployedtodeformtheexample3Dartistic surfaces to the 3D landmarks setting as constraints. On average, our approach produces less than 7 m.m. in errorsoverallthesurfaces. 72 5.7.3 2DPortraitSketchingInterfaceEvaluation To evaluate the usability of the 2D portrait sketching interface, a preliminary user study is conducted. Six novice users were invited to participate in the study. As a familiarization step, all the participants are in- structed how to use the system for about 5 minutes. In the study, the participants were asked to draw a portrait. Then, given additional information of gender, ethnicity, and skin tone, our system was used to gen- erateitsfinal3Dfacemodelwithitstexture. Theparticipantswereaskedtodrawtargetfacesbasedongiven photo examples. Each participant was required to use our system to draw target faces as quickly as possible while maintaining the quality as much as they can. The used crafting time was recorded, and the produced final 3D face models were also retained. It is found that using the system, on average the participants spent less than three minutes to sketch a face portrait and to generate its corresponding 3D face model. Fig. 59 shows several examples of the final face models and corresponding crafting time used by the participants in the study. In this figure, the additional information are shown in the parenthesis in the format of (gender, ethnicity,skintone). Figure59: 2DPortraitSketchingevaluation. 73 To evaluate how our system can be used for general-purpose face drawing applications, the participants were also asked to perform a second experiment on generating faces based on their imagination with addi- tional identity information. All of the participants finished this experiment within 40 seconds; and some of the results are shown in Fig. 60. In this figure, the additional information are shown in the parenthesis in the formatof(gender,ethnicity,skintone)where“-”meansN/A(notavailable). Notethattheparticipantsspent significantly less time in the second user study comparing to the first one, because they were not required to carefully draw face shapes to match any specific photo examples. In addition, an empirical validation study isperformedtoanalyzehowtheresultingfacemodeldependsonstep-by-stepuserinputs(Fig.61showsone example). AsshowninFig.61,whendifferentuserinputswereprovided,theresultingfacewasprogressively refined. In this figure, the user’s inputs are shown in the parenthesis in the form of (gender, ethnicity, skin tone) where “-” represents N/A (not available). In the last column, by changing only the eyes on the sketch (showingonthetop),theresultingfacetextureisalsochangedaccordingly. 74 Figure60: Examplesofgeneratedfacesbasedontheusers’sketchesandpartialidentityinformation. 75 Figure61: Anexampleoftheempiricalvalidationstudy. 76 5.8 Conclusion This section presents a technique to construct a 3D face model from a sparse set of facial landmarks of a portraitsketchimage. Fromasketchimage,thefaciallandmarkdepthsareestimatedintheperspectiveview. Then, 3D meshes as well as a plausible texture are synthesized to fit the landmarks automatically. To extend thesurfaceandtexturegeneration,thesurfaceandtextureblendingtechniquesaredevelopedforaartisticface modelingaswell. Theevaluationshowsthatoursystemperformswellevenwhenusingonly83landmarksto reconstruct the 3D face. This work can be used in a number of applications in various industries such in the entertainmentindustryandinlawenforcement. Oneofthelimitationsofthisworkisthatitcanonlysynthesizeaneutralfacialexpression. Toovercome this issue, the dataset can be extended to cover the other 3D facial expressions. Also, the number of face models (100 faces) in the dataset is relatively small, which may not be sufficient to fit a probabilistic model (the Gaussian distribution assumption). The resolution of the prototype face (i.e. the number of vertices and texture size) is also can be a lot higher to produce more realistic face. Besides, this work only focuses on a frontalportraitinput;thus,auserinterfacecanbedevelopedforadjustingthesurfacefrommanyotherviews. Moreover,manyportraitsketchesnormallycontainmorehatchingandshadinginformationsuchaswrinkles, ridges, and valleys, than those comprising of salient facial feature lines. This information, though not as important as the salient facial feature lines, can produce a more realistic result. So, the inverse-NPR can be exploredtomakeuseofthisinformationinthefuture. Fortheartisticfacemodeling,weplantoexploreother intuitive user interfaces such as a suggestive user interface [13] that can suggests plausible artistic models to auserfromanexampleartisticset. Finally,ourworkcanbefurtherextended toother3Dmodelingdomains wheretheir3Dmodelscontainspecificpatterns. 77 6 Discussion A 3D face modeling is one of the most difficult 3D modeling job since the face structure is very sensitive for fault pattern detection by human eyes. This work focuses on producing the 3D face model automatically fromaverysparsebutmeaningfulinformationofaportraitsketch. Thedifficultyofthisworkliesinfinding anapproachtoproperlydigestthedatasetintosemanticrepresentationsandtorecomposethembacktotheir original forms as natural as possible. This approach is robust in creating a face model but it makes a lot of assumptions. The problems from wrong assumptions can lead to fail cases such as if the sketch image is not drawn fromtheangle directto theface. Inthiscase, minor camera rotation and translation need to be taking into account as well. The other views of sketch are also an another problem. Side-view and 3/4 view (tilted the face from z-axis by 45 degree) are popular sketch images drawn by artists. The approach to deal with theseviewsarerequiredtobeaddedtothesystem. Apartfromthe3Dfacemodeling,thisapproachcouldbealsousedintheotherdomainswherethepattern ofshapesandtexturesoftheobjectscouldbedefined. Tobeabletoextendthisframeworktotheotherobject domain, first, the landmarks that can be connected to create a recognizable sketch of the object must be identified. Then, the collection of 3D models of the object required to be digested to create the internal representations of the object. Finally, the surface and texture synthesis engines are developed to serve the specific object properties. The smart sketching interface and intuitive editor can also be developed to help a user crafts the 3D model in more details. For example, a human body modeling can be easily adapted to use thisapproach. 78 References [1] T. Beeler, B. Bickel, P. Beardsley, B. Sumner, and M. Gross. High-quality single-shot capture of facial geometry. ACMTrans.onGraphics(Proc.SIGGRAPH),29(3):40:1–40:9,2010. [2] T.Beeler,F.Hahn,D.Bradley,B.Bickel,P.Beardsley,C.Gotsman,R.W.Sumner,andM.Gross. High- quality passive facial performance capture using anchor frames. ACM Trans. Graph., 30:75:1–75:10, August2011. [3] I.Biometrix. Faces,http://www.iqbiometrix.com/,2007. [4] V. Blanz, I. Albrecht, J. Haber, and H.-P. Seidel. Creating face models from vague mental images. Comput.Graph.Forum,25(3):645–654,2006. [5] V.Blanz,A.Mehl,T.Vetter,andH.-P.Seidel. Astatisticalmethodforrobust3dsurfacereconstruction from sparse data. In Proceedings of the 3D Data Processing, Visualization, and Transmission, 2nd International Symposium, 3DPVT ’04, pages 293–300, Washington, DC, USA, 2004. IEEE Computer Society. [6] V. Blanz and T. Vetter. A morphable model for the synthesis of 3D faces. In Proc. of SIGGRAPH’99, pages187–194,1999. [7] D.Bourguignon,R.Chaine,M.-P.Cani,andG.Drettakis. Relief: Amodelingbydrawingtool. InProc. ofEurographics WorkshoponSketch-based InterfacesandModeling,pages151–160,2004. [8] D. Bradley, W. Heidrich, T. Popa, and A. Sheffer. High resolution passive facial performance capture. ACMTrans.Graph.,29:41:1–41:10,July2010. [9] S.E.Brennan. Thecaricaturegenerator: Leonardo. volume18,pages170–178.ACM,1985. [10] P. Breuer, K. in Kim, W. Kienzle, B. Schlkopf, and V. Blanz. Automatic 3d face reconstruction from singleimagesorvideo. InAutomaticFace&GestureRecognition, 2008.,2008. [11] Y.Cao,P.Faloutsos,andF.Pighin. Unsupervisedlearningforspeechmotionediting. InSCA’03: Proc. ofthe2003ACMSIGGRAPH/Eurographics SymposiumonComputerAnimation,2003. [12] E. Chang and O. Jenkins. Sketching articulation and pose for facial animation. In ACM SIG- GRAPH/Eurographics SymposiumonComputer Animation,2006. [13] S.Chaudhuri,E.Kalogerakis,L.Guibas,andV.Koltun. Probabilisticreasoningforassembly-based3D modeling. ACMTransactions onGraphics(Proc.SIGGRAPH),30(4),2011. 79 [14] B. W. Choe and H. S. Ko. Analysis and synthesis of facial expressions with hand-generated muscle actuationbasis. InProc.ofIEEEComputer Animation,pages12–19,2001. [15] E.deAguiar,C.Theobalt,C.Stoll,andH.-P.Seidel. Marker-lessdeformablemeshtrackingforhuman shapeandmotioncapture. InProc.ofIEEECVPR’07,June2007. [16] D.DeCarlo,D.Metaxas,andM.Stone. Ananthropometricfacemodelusingvariationaltechniques. In Proc.ofSIGGRAPH’98,pages67–74,1998. [17] G. M. Draper and P. K. Egbert. A gestural interface to free-form deformation. In GI’03: Proc. of GraphicsInterface’03,pages113–120,2003. [18] FaceGen. http://www.facegen.com,2009. [19] L.G.Farkas. AnthropometryoftheHeadandFaces. RavenPress(USA),secondeditionedition,1994. [20] A.Ghosh,G.Fyffe,B.Tunwattanapong,J.Busch,X.Yu,andP.Debevec. Multiviewfacecaptureusing polarizedsphericalgradientillumination. InProceedingsofthe2011SIGGRAPHAsiaConference,SA ’11,pages129:1–129:10,NewYork,NY,USA,2011.ACM. [21] Y.Gingold,T.Igarashi,andD.Zorin.Structuredannotationsfor2D-to-3Dmodeling.ACMTransactions onGraphics (TOG),28(5):148,2009. [22] O. Gunnarsson and S. Maddock. Sketching faces. In Proc. Eurographics Workshop on Sketch-Based InterfacesandModeling,2008. [23] R.-L. Hsu and A. K. Jain. Generating discriminating cartoon faces using interacting snakes. IEEE Trans.PatternAnal.Mach.Intell.,25(11):1388–1398,2003. [24] Identi-It. Identi-itsoftware,http://www.identikit.net/,2011. [25] T.Igarashi,S.Matsuoka,andH.Tanaka. Teddy: asketchinginterfacefor3Dfreeformdesign. InProc. ofSIGGRAPH’99,pages409–416,1999. [26] ITC-irst. Spotit,http://spotit.itc.it/,2007. [27] P. Joshi, W. Tien, M. Desbrun, and F. Pighin. Learning controls for blend shape based realistic facial animation. In SCA’03: Proc. of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation,pages35–42,2003. 80 [28] R. Kalnins, L. Markosian, B. Meier, M. Kowalski, J. Lee, P. Davidson, M. Webb, J. Hughes, and A. Finkelstein. WYSIWYG NPR: drawing strokes directly on 3D models. In Proc. SIGGRAPH 2002, pages755–762,2002. [29] O. Karpenko, J. Hughes, and R. Raskar. Free-form sketching with variational implicit surfaces. Com- puterGraphicsForum,21(3):585–594,2002. [30] O. A. Karpenko and J. F. Hughes. Smoothsketch: 3D free-form shapes from complex sketches. ACM Trans.Graph.,25(3):589–598,2006. [31] H. Koshimizu, M. Tominaga, T. Fujiwara, and K. Murakami. On kansei facial image processing for computerized facial caricaturing system picasso. In Systems, Man, and Cybernetics, 1999. IEEE SMC ’99ConferenceProceedings.1999IEEEInternationalConferenceon,volume6,pages294–299,1999. [32] S. Kshirsagar, S. Garchery, and N. M. Thalmann. Feature point based mesh deformation applied to mpeg-4facialanimation. InProc.Deform’2000,WorkshoponVirtualHumansbyIFIPWorkingGroup 5.10,pages23–34,November2000. [33] M. Lau, J. Chai, Y.-Q. Xu, and H.-Y. Shum. Face poser: Interactive modeling of 3d facial expressions usingfacialpriors. ACMTrans.Graph.,29(1):3:1–3:17,Dec.2009. [34] J. Lee, J. Lee, B. Moghaddam, B. Moghaddam, H. Pfister, H. Pfister, R. Machiraju, and R. Machiraju. Abilinearilluminationmodelforrobustfacerecognition. InICCV05,pages1177–1184,2005. [35] J. Lewis, J. Mooser, Z. Deng, and U. Neumann. Reducing blendshape interference by selected motion attenuation. In I3DG’05: Proc. of the 2005 ACM SIGGRAPH Symposium on Interactive 3D Graphics andGames,pages25–29,2005. [36] H. Li, T. Weise, and M. Pauly. Example-based facial rigging. ACM Transactions on Graphics (Pro- ceedings SIGGRAPH2010),29(3),July2010. [37] L.Liang,H.Chen,Y.-Q.Xu,andH.-Y.Shum. Example-basedcaricaturegenerationwithexaggeration. In PG ’02: Proceedings of the 10th Pacific Conference on Computer Graphics and Applications, page 386.IEEEComputerSociety,2002. [38] H.Lipson. Creatinggeometryfromsketch-basedinput. InACMSIGGRAPH2007courses,2007. [39] J. Liu, Y. Chen, and W. Gao. Mapping learning in eigenspace for harmonious caricature generation. In MULTIMEDIA ’06: Proceedings of the 14th annual ACM international conference on Multimedia, pages683–686.ACM,2006. 81 [40] M. Meyer and J. Anderson. Key point subspace acceleration and soft caching. In ACM SIGGRAPH 2007papers,SIGGRAPH’07,NewYork,NY,USA,2007.ACM. [41] Z. Mo, J. P. Lewis, and U. Neumann. Improved automatic caricature by feature normalization and exaggeration. InACMSIGGRAPH2004Sketches,page57.ACM,2004. [42] A. Moore. An intoductory tutorial on KD-trees. PhD Thesis: Efficient Memory based Learning for RobotControl,PhDThesisTechnical ReportNo.209,1990. UniversityofCambridge. [43] A.MooreandJ.Ostlund. Simplekd-treelibrary. http://www.autonlab.org/,2007. [44] A. Nealen, T. Igarashi, O. Sorkine, and M. Alexa. Fibermesh: designing freeform surfaces with 3D curves. ACMTrans.Graph.,26(3),2007. [45] P.P´ erez,M.Gangnet,andA.Blake. Poissonimageediting. ACMTrans.Graph.,22(3):313–318,2003. [46] S. Romdhani and T. Vetter. Estimating 3d shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2 - Volume 02, CVPR ’05, pages 986–993, Washington, DC,USA,2005.IEEEComputerSociety. [47] J.M.Saragih, S.Lucey, andJ.F.Cohn. Deformable modelfittingbyregularized landmark mean-shift. Int.J.Comput.Vision,91:200–215,January2011. [48] Y.Seol,J.Lewis,J.Seo,B.Choi,K.Anjyo,andJ.Noh. Spacetimeexpressioncloningforblendshapes. ACMTrans.Graph.,31(2):14:1–14:12,Apr.2012. [49] K. Shoemake and T. Duff. Matrix animation and polar decomposition. In Proceedings of the confer- ence on Graphics interface ’92, pages 258–264, San Francisco, CA, USA, 1992. Morgan Kaufmann PublishersInc. [50] E.Sifakis,I.Neverov,andR.Fedkiw. Automaticdeterminationoffacialmuscleactivationsfromsparse motioncapturemarkerdata. ACMTrans.Graph.,24(3):417–425,2005. [51] Singular-Inversions. Facegen,http://www.facegen.com/,2007. [52] W. A. Smith and E. R. Hancock. Facial shape-from-shading and recognition using principal geodesic analysisandrobuststatistics. Int.J.Comput. Vision,76:71–91,January2008. [53] R.W.SumnerandJ.Popovi´ c.Deformationtransferfortrianglemeshes.ACMTrans.Graph.,23(3):399– 405,2004. 82 [54] R.W.Sumner,M.Zwicker,C.Gotsman,andJ.Popovi´ c.Mesh-basedinversekinematics.InSIGGRAPH ’05: ACMSIGGRAPH2005Papers,pages488–495.ACM,2005. [55] J. R. Tena, F. De la Torre, and I. Matthews. Interactive region-based linear 3d face models. In ACM SIGGRAPH2011papers,pages76:1–76:10,NewYork,NY,USA,2011.ACM. [56] M. E. Tipping and C. M. Bishop. Mixtures of probabilistic principal component analyzers. Neural Comput.,11(2):443–482,1999. [57] M.A.TurkandA.P.Pentland. Facerecognitionusingeigenfaces. InCVPR’91,pages586–591,1991. [58] B. van Ginneken, A. F. Frangi, J. J. Staal, B. M. T. H. Romeny, and M. A. Viergever. Active shape modelsegmentationwithoptimalfeatures. IEEETransactionsonMedicalImaging,21:924–933,2002. [59] D. Vlasic, M. Brand, H. Pfister, and J. Popovi´ c. Face transfer with multilinear models. ACM Trans. Graph.,24(3):426–433,2005. [60] X. Wang and X. Tang. Face photo-sketch synthesis and recognition. IEEE Transactions on Pattern AnalysisandMachineIntelligence,31(11):1955–1967,2009. [61] T. Weise, S. Bouaziz, H. Li, and M. Pauly. Realtime performance-based facial animation. In ACM SIGGRAPH2011papers,SIGGRAPH’11,pages77:1–77:10,NewYork,NY,USA,2011.ACM. [62] T. Weise, H. Li, L. V. Gool, and M. Pauly. Face/off: Live facial puppetry. In Proceedings of the 2009 ACMSIGGRAPH/EurographicsSymposiumonComputeranimation(Proc.SCA’09).EurographicsAs- sociation,August2009. [63] L.Yin, X.Wei, Y. Sun, J.Wang, and M.J.Rosato. A 3D facial expression database for facial behavior research. InFGR’06,pages211–216,2006. [64] L.Zhang,N.Snavely,B.Curless,andS.M.Seitz. Spacetimefaces: High-resolutioncaptureformodel- ingandanimation. InACMAnnual Conference onComputerGraphics,pages548–558,August2004. [65] Q.Zhang,Z.Liu,B.Guo,andH.Shum. Geometry-drivenphotorealisticfacialexpressionsynthesis. In SCA’03: Proc.ofthe2003ACMSIGGRAPH/EurographicsSymposiumonComputerAnimation,pages 177–186,2003. 83
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
3D object detection in industrial site point clouds
PDF
Feature-preserving simplification and sketch-based creation of 3D models
PDF
Interactive rapid part-based 3d modeling from a single image and its applications
PDF
Landmark-free 3D face modeling for facial analysis and synthesis
PDF
3D urban modeling from city-scale aerial LiDAR data
PDF
3D deep learning for perception and modeling
PDF
Object detection and recognition from 3D point clouds
PDF
Face recognition and 3D face modeling from images in the wild
PDF
Data-driven 3D hair digitization
PDF
Machine learning methods for 2D/3D shape retrieval and classification
PDF
Point-based representations for 3D perception and reconstruction
PDF
Accurate 3D model acquisition from imagery data
PDF
Articulated human body deformation from in-vivo 3d image scans
PDF
Hybrid methods for robust image matching and its application in augmented reality
PDF
Fast iterative image reconstruction for 3D PET and its extension to time-of-flight PET
PDF
Autostereoscopic 3D diplay rendering from stereo sequences
PDF
Complete human digitization for sparse inputs
PDF
Depth inference and visual saliency detection from 2D images
PDF
Single-image geometry estimation for various real-world domains
Asset Metadata
Creator
Sucontphunt, Tanasai
(author)
Core Title
3D face surface and texture synthesis from 2D landmarks of a single face sketch
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
11/28/2012
Defense Date
10/09/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D avatar,3D face modeling,face portrait,face sketch,OAI-PMH Harvest
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Neumann, Ulrich (
committee chair
), Kuo, C.-C. Jay (
committee member
), Nakano, Aiichiro (
committee member
)
Creator Email
sucontph@usc.edu,tanasai@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-122040
Unique identifier
UC11292330
Identifier
usctheses-c3-122040 (legacy record id)
Legacy Identifier
etd-Sucontphun-1365.pdf
Dmrecord
122040
Document Type
Dissertation
Rights
Sucontphunt, Tanasai
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
3D avatar
3D face modeling
face portrait
face sketch