Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Facial age grouping and estimation via ensemble learning
(USC Thesis Other)
Facial age grouping and estimation via ensemble learning
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
FACIAL AGE GROUPING AND ESTIMATION VIA ENSEMBLE LEARNING by Kuan-Hsien Liu A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (ELECTRICAL ENGINEERING) August 2014 Copyright 2014 Kuan-Hsien Liu To my family and especially to my parents. ii Acknowledgments I would like to thank my advisor Prof. C.-C. Jay Kuo for his perseverant cultivation and guidance. I would like to thank both Prof. C.-C. Jay Kuo and Prof. Shuicheng Yan for research discussion and paper revision. I would also like to thank all my colleagues who have assisted me. I would like to thank Prof. Alexander Sawchuk, Prof. David D'Argenio, Prof. Panayi- otis Georgiou, and Prof. Suya You for their useful suggestions at my Qualifying Exam. I would also like to express my gratitude to Prof. Panayiotis Georgiou, Prof. Aiichiro Nakano, and Prof. Suya You for their valuable comments at my Ph.D. Defense. I am grateful to have my brother, Tsung-Jung Liu, stay with me the whole time during my Ph.D. study. He encourages and helps me on both research discussion and general life. iii Table of Contents Dedication ii Acknowledgments iii List of Tables vii List of Figures x Abstract xii Chapter 1: Introduction 1 1.1 Signicance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Review of Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contributions of the Research . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2: Research Background 12 2.1 Soft Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.1 Gender Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.2 Race Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.3 Age Classication and Estimation . . . . . . . . . . . . . . . . . . 15 2.2 Classication and Regression Using a Machine Learning Method . . . . . 17 2.2.1 Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 3: Age Group Classication via Structured Fusion of Uncertainty-driven Shape Features and Selected Surface Features 23 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Structured Fusion Framework for Age Group Classication . . . . . . . . 29 3.3.1 Shape-based Classier . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.2 Surface-based Classier . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.3 Two-stage Structured Fusion Framework . . . . . . . . . . . . . . . 38 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.1 Results of Shape-based Classier (1st Stage) . . . . . . . . . . . . 39 iv 3.4.2 Results of Surface-based Classier (2nd Stage) . . . . . . . . . . . 42 3.4.3 Results of the Structured Fusion System . . . . . . . . . . . . . . . 44 3.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 44 Chapter 4: Age Estimation via Multistage Learning: from Age Grouping to Deci- sion Fusion 48 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.1 Review of Age Grouping . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2.2 Review of Age Estimation . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Overview of Proposed GEF System . . . . . . . . . . . . . . . . . . . . . . 54 4.4 Age Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4.3 Age Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5 Age Estimation within Age Groups . . . . . . . . . . . . . . . . . . . . . . 62 4.5.1 Facial Components Detection . . . . . . . . . . . . . . . . . . . . . 62 4.5.2 Feature Extraction in Facial Components . . . . . . . . . . . . . . 63 4.5.3 Age Estimators Learning . . . . . . . . . . . . . . . . . . . . . . . 65 4.6 Fusion of Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.6.1 Diversity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.6.2 Intra-system Fusion (AF) . . . . . . . . . . . . . . . . . . . . . . . 70 4.6.3 Inter-system Fusion (EF) . . . . . . . . . . . . . . . . . . . . . . . 72 4.6.4 Intra-inter Fusion (AEF) . . . . . . . . . . . . . . . . . . . . . . . 74 4.6.5 Inter-intra Fusion (EAF) . . . . . . . . . . . . . . . . . . . . . . . 74 4.6.6 Maximum Diversity Fusion (MDF) . . . . . . . . . . . . . . . . . . 76 4.6.7 Composite Fusion (CF) . . . . . . . . . . . . . . . . . . . . . . . . 77 4.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.7.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.7.2 Results of Age Grouping . . . . . . . . . . . . . . . . . . . . . . . . 79 4.7.3 Results of Age Estimation within Age Groups . . . . . . . . . . . . 82 4.7.4 Results of Fusion of Decisions . . . . . . . . . . . . . . . . . . . . . 84 4.7.5 Complexity Comparison . . . . . . . . . . . . . . . . . . . . . . . . 88 4.7.6 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . 89 4.8 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 90 Chapter 5: Facial Age Estimation via Multistage Learning and Deep Fusion: from Gender Grouping to Outlier Prediction and Error Compensation 93 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2.1 Review of Gender Grouping . . . . . . . . . . . . . . . . . . . . . . 96 5.2.2 Review of Age Grouping . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2.3 Review of Age Estimation . . . . . . . . . . . . . . . . . . . . . . . 98 5.3 Gender Grouping (1st Stage) . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.3.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 v 5.3.2 Feature Selection by Fisher Score . . . . . . . . . . . . . . . . . . . 101 5.3.3 Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4 Age Grouping (2nd Stage) . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.4.2 Feature Selection by ANOVA . . . . . . . . . . . . . . . . . . . . . 103 5.4.3 Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.5 Age Estimation with Age Groups (3rd Stage) . . . . . . . . . . . . . . . . 104 5.5.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.5.2 Estimation within Groups . . . . . . . . . . . . . . . . . . . . . . . 104 5.6 Fusion of Decisions (4th Stage) . . . . . . . . . . . . . . . . . . . . . . . . 105 5.6.1 Diversity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.6.2 Intra-system Fusion (AF) . . . . . . . . . . . . . . . . . . . . . . . 107 5.6.3 Inter-system Fusion (EF) . . . . . . . . . . . . . . . . . . . . . . . 108 5.6.4 Intra-inter Fusion (AEF) . . . . . . . . . . . . . . . . . . . . . . . 110 5.6.5 Inter-intra Fusion (EAF) . . . . . . . . . . . . . . . . . . . . . . . 110 5.6.6 Composite Fusion (CF) . . . . . . . . . . . . . . . . . . . . . . . . 111 5.7 Outlier Prediction and Error Compensation (5th Stage) . . . . . . . . . . 112 5.7.1 Denition of Outlier . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.7.2 Feature Representation and Selection . . . . . . . . . . . . . . . . 112 5.7.3 Outlier Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.7.4 Error Compensation on Outlier . . . . . . . . . . . . . . . . . . . . 113 5.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.8.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.8.2 Results of Gender Grouping . . . . . . . . . . . . . . . . . . . . . . 116 5.8.3 Results of Age Grouping . . . . . . . . . . . . . . . . . . . . . . . . 117 5.8.4 Results of Age Estimation within Age Groups . . . . . . . . . . . . 118 5.8.5 Results of Fusion of Decisions . . . . . . . . . . . . . . . . . . . . . 121 5.8.6 Results of Outlier Prediction and Error Compensation . . . . . . . 122 5.8.7 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . 124 5.9 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 125 Chapter 6: Conclusion and Future Work 128 6.1 Summary of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Bibliography 131 vi List of Tables 3.1 Age Range Distribution on the FG-NET and MORPH-II Databases . . . 40 3.2 The Statistics of Shape Features in FG-NET . . . . . . . . . . . . . . . . 42 3.3 The Chi-square Goodness-of-t Test against the Standard Normal in FG- NET. The Critical Value for the 2 Distribution Is 5.991 with the Degree of Freedom = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Performance of Shape Feature Methods in FG-NET . . . . . . . . . . . . 43 3.5 Age Classication Results by GOP+SVM . . . . . . . . . . . . . . . . . . 43 3.6 Age Classication Results by GOP+ANOVA+SVM . . . . . . . . . . . . 44 3.7 Performance of Dierent Methods Tested on FG-NET and MORPH Datasets for Three Age Groups Classication, where GAS: GOP+ANOVA+SVM, CAGAS: CirFace+Angle + GOP+ANOVA+SVM . . . . . . . . . . . . . 45 4.1 The 12 Features Used in the 2nd Stage . . . . . . . . . . . . . . . . . . . . 65 4.2 Correlation (p) between Any Two Decisionsd 1 d 12 in the 3-group System (s 3 ), Mean Correlation = 0.9333, on FG-NET . . . . . . . . . . . . . . . . 69 4.3 Correlation (p) between Any Twom-group Systems (s 3 s 10 ) for Decision d 1 , Mean Correlation = 0.7777, on FG-NET . . . . . . . . . . . . . . . . 70 4.4 Age Range Distribution on the FG-NET and MORPH-II Databases . . . 79 4.5 Age Range Denition for Age Groups of m-group Age Grouping Systems on FG-NET Database and Age Grouping Results (m = No. of groups) . . 81 4.6 Age Range Denition for Age Groups of m-group Age Grouping Systems on MORPH-II Database and Age Grouping Results (m = No. of groups) 82 4.7 Performance Comparison between 'Without Feature Selection' and 'With Feature Selection' for Age Grouping on FG-NET Database, where FDR: Feature Dimension Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 82 vii 4.8 Age Estimation Results (in MAE) of m-group Age Estimation Systems on FG-NET Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.9 Age Estimation Results (in MAE) of m-group Age Estimation Systems on MORPH-II Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.10 Intra Fusion - Finding Decision Subset by SFS for the 3-group System on FG-NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.11 Age Estimation Results by Intra Fusion on FG-NET . . . . . . . . . . . . 85 4.12 Age Estimation Results by Intra Fusion on MORPH-II . . . . . . . . . . . 85 4.13 Inter Fusion - Finding System Subset for d 1 by SBS on FG-NET . . . . . 86 4.14 Age Estimation Results by Inter Fusion on FG-NET . . . . . . . . . . . . 86 4.15 Age Estimation Results by Inter Fusion on MORPH-II . . . . . . . . . . . 86 4.16 Age Estimation Results by Intra-inter Fusion (AEF) on FG-NET and MORPH-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.17 Age Estimation Results by Inter-intra Fusion (EAF) on FG-NET and MORPH-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.18 Age Estimation Results by Composite Fusion (CF) in FG-NET and MORPH- II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.19 Total Number of Required Arithmetic Operations for Selecting N from 12 Decisions in Intra and Inter-intra Fusions . . . . . . . . . . . . . . . . . . 88 4.20 Total Number of Required Arithmetic Operations for Selecting N from 8 Decisions in Inter and Intra-inter Fusions . . . . . . . . . . . . . . . . . . 88 4.21 Total Number of Required Arithmetic Operations for Selecting N from 96 Decisions in Maximum Diversity Fusion and Composite Fusion . . . . . . 88 4.22 MAEs of Dierent Age Estimation Algorithms on the FG-NET Aging Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.23 MAEs of Dierent Age Estimation Algorithms on the MORPH-II Aging Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.1 Age Range Distribution on the FG-NET and MORPH-II Databases . . . 115 5.2 The Number of Faces by Gender on the FG-NET and MORPH-II Databases118 5.3 Gender Grouping Accuracies on the FG-NET and MORPH-II Databases . 118 viii 5.4 Denition of Age Groups and Age Grouping Results on FG-NET Database (m = No. of Groups) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.5 Denition of Age Groups and Age Grouping Results on MORPH-II Database (m = No. of Groups) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.6 MAE (in Years) Results of Age Estimation within m Age Groups on FG- NET Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.7 MAE (in Years) Results of Age Estimation within m Age Groups on MORPH-II Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.8 Correlation (p) between Any Two Decisionsd 1 d 12 in the 3-group System (s 3 ) on FG-NET; Average Correlation = 0.9333 . . . . . . . . . . . . . . . 121 5.9 Correlation (p) between Any Twom-group Systems (s 3 s 10 ) for Decision d 1 on FG-NET; Average Correlation = 0.7777 . . . . . . . . . . . . . . . . 121 5.10 MAE (in Years) Results of Intra Fusion (AF) on FG-NET . . . . . . . . . 122 5.11 MAE (in Years) Results of Intra Fusion (AF) on MORPH-II . . . . . . . 122 5.12 MAE (in Years) Results of Inter Fusion (EF) on FG-NET . . . . . . . . . 123 5.13 MAE (in Years) Results of Inter Fusion (EF) on MORPH-II . . . . . . . . 123 5.14 MAE (in Years) Results of Intra-inter Fusion (AEF) on FG-NET and MORPH-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.15 MAE (in Years) Results of Inter-intra Fusion (EAF) on FG-NET and MORPH-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.16 MAE (in Years) Results of Composite Fusion (CF) on FG-NET and MORPH-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.17 Performance of Outlier Prediction on FG-NET . . . . . . . . . . . . . . . 125 5.18 MAE (in Years) Results after Error Compensation for Outliers Dened from Composite Fusion (CF) on FG-NET . . . . . . . . . . . . . . . . . . 125 5.19 MAE (in Years) Results of Dierent Age Estimation Methods on the FG- NET Aging Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 ix List of Figures 3.1 A cascaded age group classication framework. . . . . . . . . . . . . . . . 26 3.2 The "Angle" shape feature is the angle C in the triangle LRC, where L is the center of the left eye, R is the center of the right eye and C is the chin. 30 3.3 Relationship between "CirFace" and "Angle". . . . . . . . . . . . . . . . . 32 3.4 Some faces from the FG-NET database. . . . . . . . . . . . . . . . . . . . 40 3.5 Some faces from the MORPH-II database. . . . . . . . . . . . . . . . . . . 41 3.6 Modeling shape features as normal distributions, top: CirFace; middle: Angle; bottom: jointly CirFace and Angle. . . . . . . . . . . . . . . . . . . 46 3.7 The relationship between accuracy and the percentages of selected GOP features in the surface-based classier. . . . . . . . . . . . . . . . . . . . . 47 3.8 Classication accuracy vs. parameter K. . . . . . . . . . . . . . . . . . . . 47 4.1 The age grouping system. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2 An example of facial components detection. . . . . . . . . . . . . . . . . . 63 4.3 Age estimation within age groups. . . . . . . . . . . . . . . . . . . . . . . 67 4.4 The m-group age estimation system. . . . . . . . . . . . . . . . . . . . . . 68 4.5 The intra-system fusion (AF) scheme. . . . . . . . . . . . . . . . . . . . . 70 4.6 The inter-system fusion (EF) scheme. . . . . . . . . . . . . . . . . . . . . 73 4.7 The intra-inter fusion (AEF) scheme. . . . . . . . . . . . . . . . . . . . . . 75 4.8 The inter-intra fusion (EAF) scheme. . . . . . . . . . . . . . . . . . . . . . 76 4.9 Some facial images from the FG-NET database. . . . . . . . . . . . . . . . 80 4.10 Some facial images from the MORPH-II database. . . . . . . . . . . . . . 81 x 4.11 Cumulative score (CS) curves of the error levels from 0 to 10 years of dierent age estimation algorithms on the FG-NET aging database. . . . 89 4.12 Cumulative score (CS) curves of the error levels from 0 to 10 years of our age estimation algorithms on the MORPH-II database. . . . . . . . . . . . 91 5.1 The gender grouping system (1st stage). . . . . . . . . . . . . . . . . . . . 101 5.2 The age grouping system (2nd stage). . . . . . . . . . . . . . . . . . . . . 103 5.3 The age estimation within age group system (3rd stage). . . . . . . . . . . 105 5.4 The fusion of decisions: intra fusion (AF) and inter fusion (EF) (4th stage).107 5.5 The fusion of decisions: intra-inter fusion (AEF) and inter-intra fusion (EAF) (4th stage). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.6 Outlier prediction and error compensation (5th stage). . . . . . . . . . . . 113 5.7 The multistage learning framework for age estimation. . . . . . . . . . . . 114 5.8 Some sample faces from FG-NET database. . . . . . . . . . . . . . . . . . 116 5.9 Some sample faces from MORPH-II database. . . . . . . . . . . . . . . . . 117 5.10 Cumulative scores (CS) of dierent age estimation methods on the FG- NET aging database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 xi Abstract Age estimation has been attracted lots of attention last decade. This dissertation includes six chapters. In Chapter 1, we give an introduction on this dissertation, including signif- icance of the research, contributions of the research, and organization of the dissertation. The previous work in this area is also thoroughly reviewed. In Chapter 2, we provide the research background, which includes a brief review of related work on soft biometrics, gender recognition, race classication, age group classi- cation and age estimation. It addresses several feature extraction methods which could be useful in representing facial aging features. Also, some classication and regression algorithms used for age grouping and estimation are discussed. In Chapter 3, we present a structured fusion method for facial age group classication. To utilize the structured fusion of shape features and surface features, we introduced the region of certainty (ROC) to not only control the classication accuracy for shape feature based system but also reduce the classication needs on surface feature based system. In the rst stage, we design two shape features, which can be used to classify frontal faces with high accuracies. In the second stage, a surface feature is adopted and then selected by a statistical method. The statistical selected surface features combined with a SVM classier can oer high classication rates. With properly adjusting the ROC by a single non-sensitive parameter, the structured fusion of two stages can provide a performance improvement. In the experiments, we use face images in the public available FG-NET and MORPH databases and partition them into three pre-dened age groups. It is observed that the proposed method oers a correct classication rate of 95.1% xii in FG-NET and 93.7% in MORPH, which outperforms state-of-the-art methods by a signicant margin. In Chapter 4, we present a novel multistage learning system, called Grouping-Estimat- ion-Fusion (GEF), for human age estimation via facial images. The GEF consists of three stages: 1) age grouping; 2) age estimation within each age group; and 3) decision fusion for nal age estimation. In the rst stage, faces are classied into dierent groups, where each group has a dierent age range. In the second stage, three methods are adopted to extract global features from the whole face and local features from facial components (e.g., eyes, nose, and mouth). Each global or local feature is individually utilized for age estimation in each group. Thus, several decisions (i.e., estimation results) are derived. In the third stage, we obtain the nal estimated age by fusing the diverse decisions from the second stage. To create diverse decisions for fusion, we investigate multiple age grouping systems in the rst stage, where each system has a dierent number of groups and dier- ent age ranges and, thus, various decisions can be made from the second stage and will be delivered to the third stage for fusion, where six fusion schemes (i.e., intra-system fusion, inter-system fusion, intra-inter fusion, inter-intra fusion, maximum diversity fusion and composite fusion) are developed and compared. The performance of the GEF system is evaluated on the FG-NET and the MORPH-II databases, and it outperforms exist- ing state-of-the-art age estimation methods by a signicant margin. That is, the mean absolute errors (MAEs) of age estimation are reduced from 4.48 to 2.73 on FG-NET and 3.98 to 2.91 on MORPH-II. In Chapter 5, the GEF framework is extended. Using one single machine learning model to deal with facial age estimation has several challenges. People of dierent genders and age ranges tend to have dierent aging processes. To ponder these eects, we propose a multistage (5-stage) learning method for facial age estimation. In the rst stage - gender grouping, a model is trained to classify faces into male and female groups. For the second stage - age grouping, faces in male or female group are then classied into age groups, where each group has dierent age range. In the third stage - age xiii estimation within age groups, each age group has its trained model to predict ages of faces classied in that group. For the fourth stage - fusion of decisions, based on the diversity of dierent decisions (i.e., age estimation results from the third stage), decisions are selected for fusion. To make fusion eective, the diversity is created by generating several age grouping systems and adopting dierent facial features for age estimation to obtain various decisions (i.e., estimation results). In the nal stage, through error analysis on results of the previous stage, outlier faces are predicted and their estimation errors can be compensated to further improve estimation results. Our multistage learning method is veried on the FG-NET and MORPH databases, and shows better results compared with other state-of-the-art age estimation methods. In Chapter 6, a summary of the research is made and possible future research direc- tions are discussed. xiv Chapter 1 Introduction 1.1 Signicance of the Research Facial image processing has attracted a lot of attention in the computer vision community over the last two decades. The human face can reveal important perceptual character- istics such as the identication, gender, race, emotion, pose, age, etc. Among these characteristics, the age information has its particular importance. The aging progress is complicated, nonreversible and uncontrollable [69]. It is aected by various factors, including the living environment, climate, health, life style, and biological reasons. Age- related facial image processing is being extensively studied, and facial age group classi- cation is one of major research topics in this area. Although age estimation attempts to oer an exact age for a face, it is a challeng- ing task. For some applications, the age group information is sucient. Examples include age-based facial image retrieval [46], internet access control, security control and surveillance [20], biometrics, age-based human-computer interaction (HCI) [23, 27], age prediction for nding missing children, and age estimation based on the result of age groups classication. Age estimation can be done more accurately if it is worked on groups containing a narrower age range [38, 81]. Hence, the age group classication problem is an interesting one that demands further eorts. Estimating human age from a facial image requires a great amount of information obtained from the original image. This kind of information is often called facial aging feature. One would expect facial aging features can convey much more valuable charac- teristics regarding aging than original images can provide. Another advantage the facial aging feature can oer is lower dimensionality because an image in the size of 200200 1 pixels may be considered as a 40,000-dimensional vector, which is highly dimensional. Given facial aging features, the task of age estimation from a face image is changed into deciding an age from the extracted aging feature. This would make the extraction of facial aging features an important issue since the performance of an age estimation system will rely on the quality of extracted features. For this reason, lots of research on age estimation has been addressed towards aging feature extraction, such as the active appearance model (AAM) [13], age manifold [21, 22], AGing pattErn Subspace (AGES) [26, 27], anthropometric model [45], and patch-based appearance model [86, 89]. Another issue for age estimation is how to build a reliable age prediction system (i.e., age estimator) based on the extracted features. This age estimator could use a machine learning approach to train a model for extracted features and make the age prediction for query faces by the trained model. Therefore, age estimation can be considered as a multiclass classication problem [26, 33, 46, 83], a regression problem [21, 22, 89, 39, 36, 87, 88, 91, 92], or a composite of those two [33, 35, 34]. From dierent perspectives, facial aging could also be treated as an ordinal process. For instance, the face of a 2-year-old child should be more closely related to the face of a 3-year-old child than the face of a 15-year-old teenager. Inspired by ordinal property of face aging, some approaches treat age estimation as a ranking problem [9, 56, 90]. In the past few years, human facial age estimation has drawn a lot of attention in the computer vision community because of its important applications in age-based image retrieval [46], internet access control, security control and surveillance [20, 77], biometrics [20, 65, 71], human-computer interaction (HCI) [27, 23], and electronic cus- tomer relationship management (ECRM) [20]. Although many approaches have been presented to deal with age estimation, most of them only aim to directly estimate an exact age from a very wide age range. However, it would be easier to estimate an exact age from a narrower age range. For example, estimating an age in the age range of 15 to 20 is easier than estimating an age in the age range of 0 to 60. Inspired by this reason, 2 we separate age estimation problem into two parts: age grouping and estimating ages in classied age groups. For age grouping, which is also often called age group classication, the objective is to classify facial images into dierent age groups as accurately as possible. Age grouping involves similar process as age estimation, and also needs to be well taken care of. With higher classication accuracy in age grouping, the smaller estimated error can be achieved for age estimation in each age group. To obtain better performance, these two parts should be integrated for age estimation problem. As a matter of fact, aging processes are very dierent for human with dierent genders and within dierent age ranges. This gives us a new motive to think what the better choice would be for age estimation. Multiple simple learning models (MSLM) or single complicated learning model (SCLM)? MSLM consist of multiple models, where each model would aim at one task. On the other hand, a SCLM contains only one model, which needs to consider several problem-involved factors and may not be able take care all of them well. In order to integrate the gender classication problem into age grouping and age estimation problems, MSLM would be a better choice. 1.2 Review of Previous Work In this section, we will review some related work on age grouping, age estimation, and gender grouping. First, we start with age grouping. Age grouping (i.e., age groups classication) was rst conducted by Kwon and Lobo in [45]. They categorized facial images into three age groups: babies, young adults, and senior adults. They computed six ratios of distances between primary components (e.g., eyes, noses, mouth, etc.) and separated babies from the other two groups. Then, wrinkles on specic areas of a face were located using snakes, and wrinkle indices were used to distinguish senior adults from young adults and babies. There were only 47 images in the experimental dataset, and the correct classication rate for the baby group was below 68%. 3 Horng et al. [43] proposed a system that classies faces with three steps: primary components detection, feature extraction, and age classication. They categorized 230 facial images into four age groups: babies, young, middle-aged and senior adults. First, they used the Sobel edge operator [17] and region labeling to locate the positions of eyes, noses, and mouths. Then, two geometric features and three wrinkle features were extracted. Finally, two back-propagation neural networks were constructed for classi- cation. The correct classication rate was 81.58%, and the facial age groups were subjectively assigned (i.e., not actual ages) in their experiments. Thukral et al. [81] extracted geometric features from faces and fused the results from ve classiers: -SVC, partial least squares (PLS), Fisher linear discriminant (FLD), Na ve Bayes, and k-nearest neighbor (KNN), by adopting the majority decision rule. The nal rate was 70.04% for three age groups (namely, 0-15, 15-30, and 30+). Gunay and Nabiyev [31] proposed an automatic age classication system based on local binary patterns (LBP) [63] for face description. Faces were divided into small regions from which the LBP histograms were extracted and concatenated into a feature vector. For every new face presented to the system, spatial LBP histograms were pro- duced and used to classify the image into one of six age groups: 105, 205, 305, 405, 505, 605. The minimum distance, the nearest neighbor and the k-nearest neighbor classiers were used. Their system oered a classication rate of 80%. Hajizadeh et al. [41] used histograms of oriented gradients (HOG) [15] as the facial feature. HOG features were computed in several regions and these regional features were concatenated to construct a feature vector for each face. A probabilistic neural network (PNN) classier was used to classify facial images into one of four age groups. The classication rate was 87.25%. Liu et al. [52] proposed a structured fusion method for age group classication by building a region of certainty (ROC) to connect the uncertainty-driven shape features with selected surface features. In the rst stage, two shape features are designed to determine the certainty of a face and classify it. In the second stage, the gradient 4 orientation pyramid (GOP) [49] features are selected by a statistical method and then combined with an SVM classier to perform age grouping. Their method was tested in classifying faces into three age groups, and the classication accuracy of 95.1% was reported. Now, we review some work on age estimation. Lanitis et al. [46] used the active appearance models (AAMs) to represent facial features, which combine the shape and appearance features. Age estimation is regarded as a classication problem that can be solved by the shortest distance classier and neural networks. Their approach also dierentiates between age-specic estimation and appearance-specic estimation. The age-specic estimation assumes the aging process is the same for everyone, but the appearance-specic one assumes people having similar looks tend to have similar aging processes. Personalized age estimation is introduced to cluster similar faces before clas- sication. Geng et al. [26, 27] proposed an automatic age estimation method named AGES (AGing pattErn Subspace), which models the long-term aging process of a person (i.e., a sequence of a person's face images), and estimates the person's age by minimizing the reconstruction error. However, the facial features of the same person could be similar in dierent ages. Guo et al. [38] examined the use of the biologically in-spired features (BIF) with manifold learning techniques for face representation. They treated each age as one class label and adopted the SVM technique for age estimation. Guo et al.~ citeguo2009human extracted biologically inspired features (BIF) for each face, and the principal component analysis (PCA) technique is used eciently for dimensionality reduction on the features. They used both the classication and regression approaches to age estimation but on dierent databases. Yan et al. [89] proposed a patch-based regression method for age estimation and the regression error is minimized by a three-complementary-stage procedure. First, each image is encoded as an ensemble of orderless coordinate patches of GMM (Gaussian 5 Mixture Model) distribution. Then, the patch-kernel is designed for characterizing the Kullback-Leibler divergence between the derived models for any two images, and its dis- criminating power is further enhanced by a weak learning process, called inter-modality similarity synchronization. Finally, kernel regression is employed for ultimate human age estimation. Zhang et al. [91] proposed a multi-task warped Gaussian process (MTWGP) model for age estimation. Age estimation is formulated as a multi-task regression problem in which each learning task refers to estimation of the age function for each person. While MTWGP models common features shared by dierent tasks (persons), it also allows task-specic (person-specic) features to be learned automatically. The form of the regression functions in MTWGP is implicitly dened by the kernel function and all its model parameters can be learned from data automatically. Chang et al. [9] designed an ordinal hyperplane ranking algorithm (OHRank) to estimate human ages via facial images. Their algorithm is based on the relative order information among the age labels in a database. Each ordinal hyperplane separates all the facial images into two groups according to the relative order, and a cost-sensitive property is exploited to nd better hyperplane based on the classication costs. Human ages are inferred by aggregating a set of preferences from the ordinal hyperplanes with their cost sensitivities. Guo and Mu [36] are the rst to use the kernel partial least squares (KPLS) regression for age estimation based on following three perspectives. Firstly, the KPLS can reduce feature dimensionality and learn the aging function simultaneously in a single learning framework, instead of performing each task separately using dierent techniques. Sec- ondly, the KPLS can nd a small number of latent variables (e.g., 20) to project thou- sands of features into a very low-dimensional subspace, which may have great impact on real-time applications. Thirdly, the KPLS regression has an output vector that contains multiple labels, so several related problems (e.g., age estimation, gender classication, and ethnicity estimation) can be solved altogether. 6 Li et al. [48] considered the temporally ordinal and continuous characteristic of aging process and proposed learning ordinal discriminative features for facial age estimation. Their method not only aims at preserving the local manifold structure of facial images, but also wants to keep the ordinal information among aging faces. The redundant information is removed from both locality information and ordinal information as much as possible by minimizing nonlinear correlation and rank correlation. Finally, these two issues are formulated into a unied optimization problem of feature selection and a solution is presented. Recognizing gender on human has been drawn much attention over last two decades. Golomb et al. [30] used a trained two-layer neural network, named SEXNET, to recognize male and female from facial images of size 3030. In their experiment, 90 facial images, including 45 males and 45 females, were tested and the reported accuracy was 91.9%. Brunelli and Poggio [67] presented HyperBF networks for gender classication, where 16 geometric features, such as pupil to nose vertical distance, eyebrow thickness, and mouth height, were used to train two competing RBF networks (one for male and the other one for female). Their method was experimented on a dataset consisting of 168 images (21 males and 21 females) and an accuracy of 79% was reported. Gutta et al. [40] designed a hybrid gender recognizer combining neural networks and decision trees to identify men and women. From their experiments on 3,000 facial images of size 6472 selected from FERET database [66], an accuracy of 96% was achieved. Moghaddam and Yang [58] proposed to use nonlinear support vector machines (SVMs) for appearance-based gender classication. They tested on low resolution "thumbnail" faces with size 2112 processed from 1,755 images (1,044 males and 711 females) from the FERET face database. An accuracy of 96.6% was announced and the SVM was shown to be superior to other classiers, such as linear, quadratic, Fisher linear discriminant (FLD), and nearest-neighbor classiers. Baluja and Rowley [3] used an AdaBoost based method for gender identication, where ve types of pixel comparison operators were used. The experiments were carried out on 2,409 face images (1,495 males and 914 7 females) from FERET database and the reported accuracy was 94.4%. A performance comparison between AdaBoost and SVM was conducted and they also showed that face image of size 2020 is better than 1221. Makinen and Raisamo [57] presented a systematic study on gender classication with automatically detected and aligned faces. One nding was that the gender classication rate can be increased if the automatic face alignment methods are further improved. They also found gender classication methods performed almost equally well with dif- ferent image sizes. A neural network and AdaBoost achieved almost as good classi- cation rates as the SVM. Recently, Guo et al. [32] showed gender recognition accuracy is aected signicantly by the age of the person. The experiments were done on the YGA database [32] of 8,000 images with ages from 0 to 93 years. The results showed the gender classication accuracy on adult faces can be 10% higher than that on young or senior faces. 1.3 Contributions of the Research Several contributions are made in this research. They are described as follows. Chapter 2 provides the research background, which includes a brief review of related work on soft biometrics, gender recognition, race classication, age group classication and age estimation. It addresses several feature extraction methods which could be useful in representing facial aging features. Also, some classication and regression algorithms used for age grouping and estimation are discussed. In Chapter 3, we designed a two-stage structured fusion method for facial age group classication. The contribution of Chapter 3 includes following three perspectives: Two shape features are proposed for age group classication rst time. The surface feature, GOP feature, is furthur selected by ANOVA (analysis of vari- ance) to increase the discriminating ability of the extracted feature and reduce the feature dimensionality signicantly. 8 A ROC (region of certainty) concept is introduced to fuse shape feature based system and surface feature based system, and the fused two-stage system provides a better performance. In Chapter 4, we proposed a novel multistage learning system, called Grouping- Estimation-Fusion (GEF), for human age estimation via facial images, which contains following major contributions: We introduce the concept of integrating age grouping and age estimation to improve the performance on age estimation. We use global and local features to obtain several decisions (estimation results), which could oer some certain degree of diversity for our fusion stage. We design several age grouping systems in order to further create higher diversity among decisions. The analysis of diversity is also provided as the basis for decision selection algo- rithms used on our fusion schemes. Six fusion schemes, intra-system fusion (AF), inter-system fusion (EF), intra-inter fusion (AEF), inter-intra fusion (EAF), maximum diversity fusion (MDF), and composite fusion (CF) are proposed to boost the nal performance on age estima- tion. The complexity of the arithmetic operation on the six fusion schemes is also dis- cussed. In Chapter 5, we presented an multistage learning and deep fusion framework, called GGEF-OE, which contains ve stages, to deal with facial age estimation problem. The contributions of this method are listed below: In the rst stage, we train a binary classier to perform gender recognition, where any test face is classied as either male or female. 9 In the second stage, two multi-class models are trained to classify faces from male and female groups, respectively, into several age groups. The age grouping system could have dierent number of groups, varying from 2 to 10. More age grouping systems are created, more diversities would have generated. Each age group has dierent denition of age range. In the third stage, a trained model will be built for each age group to predict age of any test face in that age group. Three facial features are used in the third stage, and each feature is used individually. Hence, there will be three decisions (estimation results) for any test face. In the fourth stage, the various decisions will be fused based on the diversity ana- lyzed on them. We will demonstrate that with appropriate fusion the performance could be signicantly improved. In the fth stage, outliers, faces with higher estimation errors, are dened and used to train a model to predict potential outlier for any test face. Once outliers are detected, the error estimation will be conducted on them and then compensate those errors to further ameliorate the overall estimation results. 1.4 Organization of the Dissertation The rest of this dissertation is organized as follows. Recent developments of age group classication and age estimation methods are addressed in Chapter 2. Two-stage cas- caded age group classication system is proposed to fuse the shape feature based system and surface feature based system in Chapter 3. In Chapter 4, we design a novel three- stage ensemble learning method, called GEF (Grouping-Estimation-Fusion) to solve the facial age estimation problem. In Chapter 5, we propose a multistage learning and deep fusion framework, an extended version of GEF, to deal with facial age estimation. The corresponding experimental results are respectively reported in Chapter 3, Chapter 4 10 and Chapter 5, where extensive performance comparisons are made and results are ana- lyzed. Finally, concluding remarks and future possible research directions are given in Chapter 6. 11 Chapter 2 Research Background 2.1 Soft Biometrics Soft biometric information can be used to classify an individual in broad categories but is not suciently discriminative to perform recognition tasks. For example, the knowledge about gender, ethnicity, age, or other traits such as height, weight, dimensions of limbs, eye color, skin color, hair color, etc., can be termed as soft biometrics. While such information is too broad to identify an individual, it can be valuable to narrow down the search space, or actually help improve results while performing identication. Although there are a wide variety of soft biometric traits that can be gathered from an individual, only a limited number of traits can be gathered from a given sensor. For example, from a camera set up to acquire face images for facial recognition, soft biometric traits such as gender, ethnicity, or age can be determined with a lot more accuracy than height or weight. Due to the popularity and reliability of the face-based biometric, facial images have been used extensively to obtain gender, ethnicity, and age information. Most key approaches in this area follow the strategy of training classiers for a given set of classes in order to perform classication. A majority of the approaches rely on the appearance information present in face images. Typical feature representation used for this purpose includes gray scale pixel intensities (used directly or represented in terms of PCA eigenvectors), Local Binary Patterns (LBP), Haar wavelets, and Gabor wavelets among others. The classiers of choice are Adaboost (along with various variants of boosting), SVM, Neural Networks, and LDA among others. While each of the classiers has its own advantages and limitations, SVM seems to be the most popular choice for 12 gender and ethnicity classication due to its relatively high accuracy and generalizing ability. 2.1.1 Gender Recognition Gender recognition is a fundamental task for human beings, as many social functions critically depend on the correct gender perception [60]. Automatic gender classica- tion has many important applications, for example, biometric authentication, high-tech surveillance and security systems, criminology, automatic psychophysiological inspection, augmented reality , intelligent user interface, visual surveillance, collecting demographic statistics for marketing, etc. Also the applicability of gender recognition is growing in such areas as social science, statistics and marketing research. Also there are a lot of applications (especially in social nets) based on dierent face recognition algorithms (including sex classication) for entertainment of users. Human faces provide important visual information for gender perception. Gender recognition by face images is one of the actual problems of computer vision and has received much research interest in the last two decades. Gender recognition can be regarded as classication problem of detected faces into classes (males & females). The gender recognition task is being investigated from the beginning of 1990s. Generally gender classication involves a process of determining the gender of a subject from face images. Two key components of gender classication are feature extraction and pattern classication. A number of dierent techniques based on facial images have been reported in the literature for solving this problem. The best of the state-of-the-art results reported in scientic papers are about 95% accuracy. After testing of several the most promising approaches we succeeded in achieving 96% for male and 95% for female faces on FERET face image database. It was achieved on LBP (Local Binary Patterns) features classied by SVM (Support Vector Machine) with RBF (Radial Basis Function) kernel function. 13 2.1.2 Race Classication Unlike gender, racial (ethnic) categories are loosely dened due to the intermingling of races and the natural variations within races [12]. Cross-cultural studies, for example, have shown that people generally agree on attractiveness ratings across dierent ethnic groups. However, evidence also suggests that we perceive our own ethnic group dierently from other ethnic groups. Firstly, people can recognize individuals belonging to dierent races and ethnic groups (where ethnic group refers to distinct populations within a particular racial grouping, e.g. comparing Germans to Britons within the Caucasian grouping). Secondly, faces from the same race as the observer elicit more brain activity in regions linked to face recognition. Lastly, humans are better at recognizing faces of their own ethnicity/race than faces of other races. One plausible explanation for superior recognition of same race and same ethnic group faces is exposure. Most people, especially young people, have more exposure to their own ethnic group. This variation in exposure can contribute to the development of visual expertise for same group faces. If individuals are exposed more frequently to dierent ethnic groups, one might expect their visual expertise to include other ethnic groups as well. Recent stud- ies [29, 79] showed that individuals from minority ethnic groups are better at recognizing other ethnic groups in their area than individuals from majority ethnic groups. Thus, despite agreement on attractiveness across races, there may remain a signicant element of ethnic recognition, and potential preference, within particular racial categories that potentially may in uence mate preferences and subsequent mate choice. To date, however, studies comparing dierences within ethnic groups have focused on groups that show a signicant separation of culture and geography (North America, Germany and the Czech Republic). This means one cannot discount an in uence of envi- ronmental and/or sociocultural factors on facial morphology and/or greater familiarity with faces of one's own ethnicity compared to other groups. In order to resolve these issues, we tested whether recognition is also possible in a population where there is a large overlap of both culture and geography between the dierent ethnic groups. 14 Race recognition using face images is relatively a new topic in computer vision and it is not fully discovered with the state of the art features, though a signicant progress has been made in face recognition. Some attempts are made for only two or three class (e.g., Asian and non-Asian) problems, which is relatively easier than other class problems. Also no feature selection methods are applied to race recognition problems. Therefore, race recognition is still a challenge problem and needs to be investigated. 2.1.3 Age Classication and Estimation There are many popular real-world applications related to age estimation [20]. Computer-aided age synthesis signicantly relieves the burden of tedious manual work while at the same time providing more photorealistic eects and high-quality pictures. Age estimation by machine is useful in applications where we don't need to specically identify the individual, such as a government employee, but want to know his or her age. With the input of a monitoring camera, an age estimation system can warn or stop underage drinkers from entering bars or wine shops; prevent minors from purchasing tobacco products from vending machines; refuse the aged when he/she wants to try a roller coaster in an amusement park; and deny children access to adult Websites or restricted movies. In Japan, police found that a particular age group is more apt to money transfer fraud on ATMs, in which age estimation from surveillance monitoring can play an important role. Age estimation software can also be used in health care systems, such as robotic nurse and intelligent intensive care unit, for customized services. Since dierent groups of customers have very dierent consuming habits, prefer- ences, responsiveness, and expectation to marketing, companies can gain more prots by acknowledging this fact, responding directly to all customers' specic needs, and pro- viding customized products or services. For example, a fast food vendor might want to know what percentage of each age group prefers and purchases what kind of sandwiches; the advertisers want to target specic audiences (potential customers) for specic adver- tisements in terms of age groups; a mobile phone company wants to know which age 15 group is more interested in their new product models showing in a public kiosk; a store display might show a business suit as an adult walks by or jeans as a teenager walks by. Obviously, it is almost impossible to realize those due to privacy issues. However, with the help of a computer-based automatic age estimation system, a camera snapping pho- tos of customers could collect demographic data by capturing customers' face images and automatically labeling age groups. All of these can be done without violating anyone's privacy. Although, as aforementioned, the real-world applications are very rich and attractive, existing facts and attitudes from the perception eld reveal the diculties and challenges of automatic age estimation by computer. Dierent people have dierent rates of the aging process, which is determined by not only the person's genes but also many other factors, such as health condition, living style, working environment, and sociality. The eects of ultraviolet radiation, usually through exposure to sunlight, may cause solar aging, which is another strong cause for advanced signs of face aging. Age estimation is not a standard classication problem. Derived from dierent appli- cation scenarios, it can be taken as either a multiclass classication problem or a regres- sion problem. A large aging database is often hard to collect, especially the chronomet- rical image series for an individual. Age progression displayed on faces is uncontrollable and personalized. Such special characteristics of aging variation cannot be captured accurately due to the prolic and diversied information conveyed by human faces. The existing age estimation systems using face images typically consist of two concatenated modules: age image representation and age estimation techniques. Age estimation can be approached by either a classier or regressor since dierent databases and systems may be too biased or unbalanced for evaluation. But it is not purely a classication or regression problem. A promising approach to age estimation is to combine regression and classication methods as demonstrated in previous work. It is still interesting to develop more advanced schemes for the combination of classiers and regressors so that the accuracy of age estimation might be improved further. 16 2.2 Classication and Regression Using a Machine Learn- ing Method Support Vector Machines (SVMs) are a popular machine learning method for classica- tion, regression, and other learning tasks. 2.2.1 Classication A classication task usually involves separating data into training and testing sets. Each instance in the training set contains one "target value" (i.e. the class labels) and several "attributes" (i.e. the features or observed variables). The goal of SVM is to produce a model (based on the training data) which predicts the target values of the test data given only the test data attributes. Two types of SVM used for classication are described below: C-Support Vector Classication - Given training vectorsx i 2R n ,i = 1, ...,l, in two classes, and an indicator vectory2R l such that y i 2f1;1g, C-SVC [6, 14] solves the following primal problem: min w;b; 1 2 w T w +C P l i=1 i subject to y i w T (x i ) +b 1 i ; i 0; i = 1;:::;l; (2.1) where (x i ) mapsx i into a higher dimensional space andC is the regularization param- eter. Due to the possible high dimensionality of the vector variablew, usually we solve the following dual problem: min 1 2 T Qe T subject to y T = 0; 0 i C; i = 1;:::;l; (2.2) 17 wheree = [1;:::; 1] T is the vector of all ones, Q is an l by l positive semidenite matrix, Q ij y i y j K (x i ;x j ), and K (x i ;x j ) (x i ) T (x j ) is the kernel. Once (2.8) is solved, using the primal-dual relationship, the decision function is sgn w T (x) +b = sgn l X i=1 y i i K (x i ;x) +b ! : (2.3) -Support Vector Classication - The -support vector classication [72] intro- duces a new parameter 2 (0; 1]. It is proved that is an upper bound on the fraction of the training errors and a lower bound of the fraction of support vectors. Given training vectorsx i 2R n , i = 1, ..., l, in two classes, and a vectory2R l such that y i 2f1;1g, the primal problem is: min w;b;; 1 2 w T w + 1 l P l i=1 i subject to y i w T (x i ) +b i ; i 0; i = 1;:::;l; 0: (2.4) The dual problem is: min 1 2 T Q subject to y T = 0; e T ; 0 i 1 l ; i = 1;:::;l; (2.5) where Q ij y i y j K (x i ;x j ) . The decision function is: sgn l X i=1 y i i K (x i ;x) +b ! : (2.6) 18 2.2.2 Regression Suppose we have a set of training data (x 1 ;y 1 ); (x 2 ;y 2 );:::; (x m ;y m ), where x i 2R n is a feature vector, andy i 2R is the target output. In" support vector regression ("-SVR) [5], we want to nd a linear function, f(x) =hw; xi +b = w T x +b; (2.7) which has at most deviation " from the actually obtained target outputs y i for all the training data and at the same time as at as possible, where w2 R n ;b2 R. In other words, we want to nd w and b such that kf(x i )y i )k 1 ";8i = 1;:::;m; (2.8) wherekk 1 is the l 1 norm and " 0. Flatness in (2.7) means we have to seek a smaller w [75]. For this reason, it is required to minimizekwk 2 2 , wherekk 2 is the Euclidean (l 2 ) norm. Generally, this can be written as a convex optimization problem by requiring min 1 2 kwk 2 2 subject to kf(x i )y i )k 1 ";i = 1;:::;m: (2.9) Introducing two slack variables" i 0; ^ " i 0, to cope with infeasible constraints of (2.9), (2.9) becomes min 1 2 kwk 2 2 +C P m i=1 (" i + ^ " i ) subject to y i f(x i ) +" +" i ;y i f(x i )" ^ " i ;" i 0; ^ " i 0;i = 1;:::;m;" 0; (2.10) 19 whereC is a penalty parameter for the error term. The optimization problem (2.10) can be solved through its Lagrangian dual problem max 1 2 P m i=1 P m j=1 (a i ^ a i ) (a j ^ a j ) x T i x j " P m i=1 (a i + ^ a i ) + P m i=1 (a i ^ a i )y i subject to P m i=1 (a i ^ a i ) = 0; 0a i ; ^ a i C;i = 1;:::;m; (2.11) wherea i 0 and ^ a i 0, being Lagrange multipliers. After solving (2.11), we can obtain w = m X i=1 (a i ^ a i ) x i ; (2.12) f(x) = m X i=1 (a i ^ a i ) x T i x +b: (2.13) By using Karush-Kuhn-Tucker (KKT) conditions as below a i " +" i + w T x i +by i = 0; ^ a i " + ^ " i w T x i b +y i = 0; (Ca i )" i = 0; (C ^ a i ) ^ " i = 0; (2.14) where b can be computed as follows: b = 8 < : y i " w T x i ; 0<a i <C y i +" w T x i ; 0< ^ a i <C: (2.15) The support vectors are dened as those data points that contribute to predictions given by (2.13), and are x i 's wherea i ^ a i 6= 0. The complexity of f(x) is related to the number of support vectors. 20 Similarly, for nonlinear regression, we just dene f(x) = w T '(x) +b, and '(x) denotes a xed feature-space transformation. Then w = m X i=1 (a i ^ a i )'(x i ); (2.16) f(x) = P m i=1 (a i ^ a i )' T (x i )'(x) +b = P m i=1 K(x i ; x) +b; (2.17) where K(x i ; x) is a kernal function. The kernel function K(x i ; x j ) can be dened as K(x i ; x j ) =' T (x i )'(x j ): (2.18) There are four basic kernels [44]. We list two commonly used ones: Linear: K(x i ; x j ) = x T i x j (2.19) Radial basis function (RBF): K(x i ; x j ) = exp kx i x j k 2 2 ; > 0; (2.20) where is a kernel parameter. Since the proper value for the parameter " is dicult to determine, we try to resolve this problem by using a dierent version of the regression algorithm, support vector regression (-SVR) [5], in which " itself is a variable in the optimization process and is controlled by another new parameter 2 (0; 1). In fact, is a parameter that can be used to control the number of support vectors and the upper bound on the fraction of 21 error points. Hence, this makes a more convenient parameter than " in adjusting the accuracy level to the data. Therefore, -SVR is to solve min 1 2 kwk 2 2 +C " + 1 m P m i=1 (" i + ^ " i ) subject to y i f(x i ) +" +" i ;y i f(x i )" ^ " i ;" i 0; ^ " i 0;i = 1;:::;m;" 0: (2.21) The dual problem is max 1 2 P m i=1 P m j=1 (a i ^ a i ) (a j ^ a j )K(x i ; x j ) + P m i=1 (a i ^ a i )y i subject to P m i=1 (a i ^ a i ) = 0; P m i=1 (a i + ^ a i )C; 0a i ; ^ a i C=m;i = 1;:::;m: (2.22) Following the same procedure, we can obtain the same expressions for w and f(x) as in (2.16) and (2.17). In this work, we choose -SVR as our tool to do all the regressions because of its convenience on parameter selection. 22 Chapter 3 Age Group Classication via Structured Fusion of Uncertainty-driven Shape Features and Selected Surface Features 3.1 Introduction Facial image processing has attracted a lot of attention in the computer vision community over the last two decades. The human face can reveal important perceptual character- istics such as the identication, gender, race, emotion, pose, age, etc. Among these characteristics, the age information has its particular importance. The aging progress is complicated, nonreversible and uncontrollable [69]. It is aected by various factors, including the living environment, climate, health, life style, and biological reasons. Age- related facial image processing is being extensively studied, and facial age groups classi- cation is one of major research topics in this area. Although age estimation attempts to oer an exact age for a face, it is a challenging task. For some applications, the age group information is sucient. Examples include age-based facial image retrieval [46], internet access control, security control and surveil- lance [20], biometrics, age-based human-computer interaction (HCI) [23], age prediction 23 for nding missing children, and age estimation based on the result of age group classi- cation. Age estimation can be done more accurately if it is worked on groups containing a narrower age range [81]. Hence, the age group classication problem is an interesting one that demands further eorts. One key module in an age group classication system is facial feature extraction. To be qualied as a good and reliable facial feature, it should have high discriminating power and be invariant under various transformations. In general, human facial features change as the age increases. They could be categorized into two types: shape-based and surface-based features. Both are utilized in our proposed structured fusion system for age group classication as shown in Fig. 3.1. Typically, for shape-based features, the locations of key facial components such as eyes and the mouth are detected rst. After the localization procedure, they could be used to compute the ratios of distances between the components as described in [16, 43, 45]. Here, we develop two new shape features. The rst one is the circularity of a face denoted by "CirFace". The second one is the angle between two eyes and the chin as shown in Fig. 3.2, which is denoted by "Angle". These two features will be used in the rst stage of the system. We conduct the statistical analysis on values of "CirFace" and "Angle" for each group and use the Gaussian distribution to model these data. Then, the maximum likelihood (ML) decision rule is used for classication. For surface-based features, the gradient orientation (GO) [10, 15] and structure infor- mation were shown to be robust to illumination change and successfully applied to many areas, such as disparity estimation [51], visual quality assessment [55] and face recogni- tion tasks. It has also been shown that collecting gradient orientation in a hierarchical way can retain most visual information [42]. Based on this observation, the gradient orientation pyramid (GOP) [49] is a feature which could provide gradient information in a hierarchical manner. Being motivated by the eectiveness of GOP, the second stage of our facial age group classication system will adopt the GOP to represent facial 24 attributes. However, the dimension of the GOP feature vector could be very high depend- ing on the number of pyramid layers and the size of face images and, as a result, a feature selection/reduction technique is needed. In this work, we use a simple feature selection method due to its lower complexity in the training and test stages while oering a good classication rate. The selected GOP features are rst determined by a hypothesis test- ing method known as the analysis of variance (ANOVA) [28], and then the selected GOP features are sent to an SVM classier [7] for accuracy evaluation. For further performance improvement, we design a two-stage structured fusion frame- work that has shape-based and surface-based features in cascade. That is, the shape fea- tures are used in the rst stage to determine if an input face could be classied with some degree of certainty or should be sent to the next stage depending on the classication condence level. If a face is sent to the second stage, it will be classied based on the GOP method with feature selection. Finally, results from two stages will be combined to get the overall classication accuracy. Our current work has the following main contributions. First, two new shape-based facial features, "CirFace" and "Angle", are introduced for age group classication at the rst time. Second, a jointly Gaussian distribution is used to model these new shape fea- tures and an ML decision rule is chosen based on this model. Third, a feature selection technique is presented in selecting GOP features for complexity reduction and accuracy improvement. Finally, a novel two-stage structured fusion system for age group classi- cation is proposed by designing a region of certainty (ROC) to utilize both shape and surface features, and our method outperforms state-of-the-art methods by a signicant margin. 3.2 Related Work Joint shape/surface features. Age classication was rst conducted by Kwon and Lobo in [45]. They categorized facial images into three age groups: babies, young adults, 25 Figure 3.1: A cascaded age group classication framework. and senior adults. They computed six ratios of distances between primary components (e.g., eyes, noses, mouth, etc.) and separated babies from the other two groups. Then, wrinkles on specic areas of a face were located using snakes, and wrinkle indices were used to distinguish senior adults from young adults and babies. There were only 47 images in the experimental dataset, and the correct classication rate for the baby group was below 68%. Horng et al. [43] proposed a system that classies faces with three steps: primary components detection, feature extraction, and age classication. They categorized 230 facial images into four age groups: babies, young, middle-aged and senior adults. They rst used the Sobel edge operator [17] and region labeling to locate the positions of eyes, noses, and mouths. Then, two geometric features and three wrinkle features were extracted. Finally, two back-propagation neural networks were constructed for classication. The correct classication rate was 81.58%. The facial age groups were subjectively assigned (i.e., not actual ages) in their experiments. Lanitis et al. [46] used the active appearance models (AAMs) [13] to represent facial features, which combine the shape and appearance features. Age estimation is regarded as a classication problem 26 that can be solved by the shortest distance classier and neural networks. Their approach also dierentiates between age-specic estimation and appearance-specic estimation. The age-specic estimation assumes the aging process is the same for everyone, but the appearance-specic one assumes people having similar looks tend to have similar aging processes. Personalized age estimation is introduced to cluster similar faces before classication. Shape features. Shen and Ji [74] determined the ratio of the distance between eyes to the distance between eye and nose. Instead of applying simple thresholding, they conducted statistical analysis on the ratio for babies and adults, and t the data with the Gaussian distribution model for age classication. The correct classication rates were 75.9% and 71% for babies and adults, respectively. Thukral et al. [81] extracted geometric features from faces and fused the results from ve classiers: -SVC, partial least squares (PLS), Fisher linear discriminant (FLD), Naive Bayes, and k-nearest neighbor (KNN), by adopting the majority decision rule. The nal rate was 70.04% for three age groups (namely, 0-15, 15-30, and 30+). Surface features. Tonchev et al. [82] presented a combination of the subspace projection algorithm and a classier for two age groups (children and adults), and their mean classication rate was 77.7%. Gunay and Nabiyev [31] proposed an automatic age classication system based on local binary patterns (LBP) [63] for face description. Faces were divided into small regions from which the LBP histograms were extracted and concatenated into a feature vector. For every new face presented to the system, spatial LBP histograms were produced and used to classify the image into one of six age groups: 10 5; 20 5; 30 5; 40 5; 50 5; 60 5. The minimum distance, the nearest neighbor and the k-nearest neighbor classiers were used. Their system oered a classication rate of 80%. Hajizadeh et al. [41] used histograms of oriented gradients (HOG) [15] as the facial feature. HOG features were computed in several regions and these regional features were concatenated to construct a feature vector for each face. A probabilistic neural network (PNN) classier was used to classify facial images into 27 one of four age groups. The classication rate was 87.25%. Guo et al. [39] extracted biologically inspired features (BIF) for each face, and the principal component analysis (PCA) technique is used eciently for dimensionality reduction on the features. They used both the classicattion and regression approaches to age estimation but on dierent databases. Guo et al. [38] examined the use of the BIF with manifold learning for face representation, and adopted the SVM technique for age estimation. Aging process modeling. Geng et al. [26] proposed an automatic age estima- tion method named AGES (AGing pattErn Subspace), which models the aging pro- cess, i.e., a sequence of a person's face images. Yan et al. [89] proposed a patch-based regression method for age estimation and the regression error can be minimized by a three-complementary-stage structure. Zhang et al. [91] proposed a multi-task warped Gaussian process (MTWGP) model for age estimation. Age estimation is formulated as a multi-task regression problem in which each learning task refers to estimation of the age function for each person. While MTWGP models common features shared by dif- ferent tasks (persons), it also allows task-specic (person-specic) features to br learned automatically. The form of the regression functions in MTWGP is implicitly dened by the kernel function and all its model parameters can be learned from data automatically. Chang et al. [9] designed an ordinal hyperplane ranking algorithm (OHRank) to estimate human ages via facial images. Their algorithm is based on the relative order information among the age labels in a database. Each ordinal hyperplane separates all the facial images into two groups according to the relative order, and a cost-sensitive property is exploited to nd better hyperplane based on the classication costs. Human ages are inferred by aggregating a set of preferences from the ordinal hyperplanes with their cost sensitivities. To summarize, two types of face features have been used most often for facial image analysis. They are shape and surface features. For example, children's faces tend to be of round shape and their skins are smoother. Both are valuable features. The key is how 28 to combine them together. We propose a structural way to integrate these two types of features and classiers to result in a better system in the next section. 3.3 Structured Fusion Framework for Age Group Classi- cation The proposed facial age groups classication system consists of two stages. They are detailed in Section 3.3.1 and Section 3.3.2, respectively. Then, the integrated system is presented in Section 3.3.3. 3.3.1 Shape-based Classier The shape-based classier has three components (i.e., shape feature extraction, statistical analysis of features, and maximum likelihood (ML) classication) as described below. Shape feature extraction. Studies on craniofacial growth indicate that the human face shape changes from a circular one to an oval one as a person grows [69]. Being motivated by the biological observations, two new shape features are chosen to target at frontal faces in our work. The rst one is the circularity of face, denoted by "Cir- Face". Its value will decrease (from the circular to the oval shape) as a person becomes older. We extract "CirFace" from color or grey-scale face images as proposed in [73]. The calculation of "CirFace" involves three steps: skin detection, edge detection, and modication. The nal result of these steps is a 0-1 binary image; where 1 means skin pixels while 0 means non-skin pixels. The area of each face is dened as: Area = X (r;c)2I 1 (3.1) wherer andc are the row and the column of imageI, respectively. The perimeter could be calculated by: 29 Figure 3.2: The "Angle" shape feature is the angle C in the triangle LRC, where L is the center of the left eye, R is the center of the right eye and C is the chin. Perimeter = #fkj(r k+1 ;c k+1 )2N 4 (r k ;c k )g + p 2#fkj(r k+1 ;c k+1 )2N 8 (r k ;c k )nN 4 (r k ;c k )g (3.2) where k+1 is computed modulo K, the length of the pixel sequence;N 4 is the 4-connected neighbors and N 8 is the 8-connected neighbors. The circularity of each face can be calculated as CirFace = 4Area Perimeter 2 : (3.3) The "CirFace" describes how circular the shape of a face is and its value is between 0 and 1. If this value is closer to 1, the face will look like a circle. The second shape feature is called "Angle", which is the angle C of the triangle LRC as shown in Fig. 3.2. Like "CirFace", the value of "Angle" will tend to change from a large to a small value (from the circular to the oval shape) as a person becomes older. 30 To extract this feature, eyes and the chin need to be located from a face image. Several algorithms [2, 16, 25, 45, 59, 80] can be used to locate these facial components. Here, we adopt the method by Dehshibi et al. [16] to nd eyes and the chin. After the position of the left eye p L , the position of right eye p R , and the position of the chin p C are found, the "Angle" feature can be calculated as: Angle = cos 1 < (p C p R ) (p C p L )> kp C p R kkp C p L k (3.4) where <> is the dot product of two vectors andkk is the length of a vector. Statistical analysis of features. We conduct the Chi-square goodness-of-t test [62] to check whether the probability distributions of the "CirFace and "Angle" features can be well approximated by the Gaussian distribution for each age group. The test is performed by grouping the data into bins, calculating the observed and expected counts for those bins, and computing the chi-square test statistic: 2 : 2 = N X i=1 (O i E i ) 2 E i (3.5) whereO i are the observed counts andE i are the expected counts. Here, the null hypoth- esis is a shape feature is Gaussian distributed with its mean and variance estimated from the extracted shape feature data. The signicance level for the test is 0.05. That is, if the null hypothesis is not rejected at the 5% signicance level, we claim that this shape feature is Gaussian distributed. It turns out that both features pass the test individually and jointly. The shape features, "CirFace" and "Angle", are correlated as shown in Fig. 3.3. Hence, we may consider them at the same time by constructing a joint Gaussian distri- bution model. From the training data set, we nd CirFace x i;j and Angle y i;j for each face in a certain age group, where i =b;c;a; j = 1;:::;n, and where b;c;a denote baby, child, and adult, respectively, and n is the number of faces in each group. Every group has two random variables X i and Y i representing "CirFace" and "Angle", respectively. 31 Figure 3.3: Relationship between "CirFace" and "Angle". Let W i = [X i Y i ]' be a vector representation for random variables X i andY i . Then, the joint pdf of W i can be written as p W i (w) = 1 (2) n 2 jK W i j 1 2 exp 1 2 (w W i ) 0 K 1 W i (w W i ) (3.6) where w = [x y] 0 is the CirFace and Angle pair for an input test face, n = 2, W i = [ X i Y i ] 0 is the mean vector, and K W i =E[(W i W i )(W i W i ) 0 ] is the covariance matrix. 2D ML classier. Given CirFace and Angle feature pair w = [x y] 0 , we use the following maximum likelihood (ML) classier for age groups classication: 32 face = 8 > > > > < > > > > : baby if max( p W b (w), p Wc (w), p Wa (w) ) = p W b (w) child if max( p W b (w), p Wc (w), p Wa (w) ) = p Wc (w) adult if max( p W b (w), p Wc (w), p Wa (w) ) = p Wa (w) (3.7) 3.3.2 Surface-based Classier The surface-based classier consists of three important components; namely, GOP fea- ture extraction, ANOVA feature selection, and SVM classication. GOP feature extraction. The gradient orientation pyramid (GOP) can provide the image gradient information as well as the pyramid information. For a given image, we rst build a pyramid of this image and then compute the gradients in each layer of the pyramid. Finally, these gradients are combined together as a GOP feature. The procedure is detailed below. For a given image I(x;y), where (x;y) indicates the pixel coordinates, the pyramid of I can be dened as P (I) =fI((x;y);)g s =0 ; (3.8) where I((x;y); 0) =I(x;y); (3.9) I((x;y);) = [I((x;y); 1)(x;y)]# 2 ; = 1;:::;s; (3.10) and(x;y) is the Gaussian kernel with 0.5 as the standard deviation used in our exper- iments. Also, * in (3.10) denotes the convolution operation,# 2 denotes down-sampling by factor 2, and s is the total number of pyramid layers. Once pyramid P (I) is constructed, the gradient orientation at every specic layer can be computed by nding its normalized gradient vectors at each pixel as 33 GO (I((x;y);)) = 8 > > > < > > > : r(I((x;y);)) jr(I((x;y);))j ifjr(I((x;y);))j> (0; 0) T otherwise (3.11) where is a threshold used for dealing with at pixels. Then, we reshape the GO of each layer via GO(I;) =fGO (I((x;y);))g = 2 6 4 g 1;1 g 1;n . . . . . . . . . g m;1 gm;n 3 7 5 reshape ! GO r (I;) = [g 1;1 g 1;n g m;n ] 0 2 R h2 (3.12) where h = m n is the number of pixels in the pyramid layer and g i;j is a row vector of size 1 2. Finally, the GOP ofI can be obtained by stacking up theGO r (I;) from all layers as GOP (I) = 2 6 6 6 6 6 6 6 4 GO r (I; = 0) GO r (I; = 1) . . . GO r (I; =s) 3 7 7 7 7 7 7 7 5 = 2 6 6 6 6 4 g x;1 g y;1 . . . . . . g x;h g y;h 3 7 7 7 7 5 2 R h2 (3.13) where h =h 0 +h 1 + +h is the total number of pixels across all pyramid layers. Feature reduction via selection. The purpose of feature selection is to keep those features having higher discriminating power and discard those features having lower discriminating power. The dimension (i.e., the number of elements) of a GOP feature vector is very high since it is related to the total number of pixels in the pyramid. To reduce the dimension, we need to determine which features in a GOP feature vector are signicantly dierent across age groups. Based on the idea of hypothesis testing, the "unpaired t test" [28] or the "analysis of variance (ANOVA)" [28] method can be used for the separation of two groups. However, to separate three groups, "ANOVA" 34 is more suitable and adopted here. The procedure to identify which feature has higher discriminating power among three age groups using ANOVA is described below. Given m groups and n faces per group, for a feature X in the GOP feature vector, we calculate the following quantities: Mean of each group x 1 ; x 2 ; ; x m x = 1 n n X i=1 x i (3.14) Variance of each group s 2 1 ;s 2 2 ; ;s 2 m s 2 = 1 n 1 n X i=1 (x i x) 2 (3.15) Within group variance s 2 within = 1 m m X i=1 s 2 i (3.16) Overall mean X = 1 m m X i=1 x i (3.17) Standard error of the mean s 2 X = 1 m 1 m X i=1 ( x i X) 2 (3.18) Between groups variance s 2 between =ns 2 X (3.19) F statistic value F = s 2 between s 2 within (3.20) 35 Degree of freedom v n =m 1 v d =m(n 1) (3.21) F critical F crit F (v n ;v d ) (3.22) where F (v n ;v d ) can be obtained from Table 3-1 on [28]. If F > F crit , we reject the null hypothesis H 0 : fThere is no signicant dierence on feature X between age groupsg with P <, and is the signicance level, which is usually set to 0.05 or 0.01. Hence, we select those features with higher F values to get the selected GOP feature vector. SVM classication. We explain how to get the SVM classier from the training features and use this classier to predict the labels of testing features. The steps of feature classication are described below. Feature Labeling: Each face has a corresponding feature vector, which is labeled with value i if the face belongs to class i. Linear Scaling: Linearly scale training and testing data. Every feature in a feature vector is linearly scaled to range [0, 1] among all faces. This is conducted to avoid the dominance of attributes with a large dynamic range over those with a smaller dynamic range. N-Fold Cross-Validation: We divide the entire faces into N subsets of equal size, where each subset consists of the same number of faces from each class. Then, we choose one subset as the testing set while using the other N-1 subsets as the training set. This process is repeated for N times, where each subset is used as the 36 testing set once. The technique, called the N-fold cross-validation, is employed to average the testing results and increase the condence level. Kernel Selection: Given a training set of feature-label pairs (x i ;y i );i = 1; ;l; where x i 2 R n and y i 2f1;1g l , the SVM requires the solution of the following optimization problem: min w;b; 1 2 w T w +C l X i=1 i subject to 8 < : y i (w T '(x i ) +b) 1 i i 0 (3.23) Here, the training feature vectors x i are mapped into a higher dimensional space by a function ' with the kernel function K(x i ; x j )'(x i ) T '(x j ) . There are two common choices for the kernel function: the linear kernel and the radial basis function (RBF) kernel. The RBF kernel is often used when the dimension of the feature vector is low [44, 53, 54]. On the other hand, if the dimension of the feature vector is high, which is our current case, the nonlinear mapping does not improve the performance much [44, 50]. Thus, we choose the simpler linear kernel for the SVM algorithm in our experiment for lower complexity. Mathematically, the linear and the RBF kernel functions can be written as: Linear: K(x i ; x j ) = x T i x j (3.24) Radial basis function: K(x i ; x j ) = exp( kx i x j k 2 ); > 0 (3.25) where is the kernel parameter. 37 3.3.3 Two-stage Structured Fusion Framework In this section, we describe how to get the shape-based and the surface-based classiers to work together under one framework. Here, we propose a two-stage structured fusion framework as illustrated in Fig. 3.1. Stage 1: Region of certainty (ROC) determination with shape features. The shape-based classier is adopted in the rst stage. We consider the joint "CirFace" and "Angle" classier. Since the feature dimension is 2, directly fusing them to the high dimensional surface features in SVM classication is not eective. Instead, we consider an indirect fusion approach. A region of certainty (ROC) is set to determine if an input face could be accurately classied into baby, child or adult group in this stage or to be sent to the next stage due to uncertainty. As a result, ROC serves as the bridge to connect two stages. We choose the ROC as follows. First, we decide the certain area A for the baby group: A =fp W b (w)>Kp Wc (w)g\fp W b (w)>Kp Wa (w)g; and the certain area B for the adult group: B =fp Wa (w)>Kp W b (w)g\fp Wa (w)>Kp Wc (w)g: There is no certain area for the child group since feature overlaps vastly occur in the child group. Finally, we have the desired ROC: ROC =A[B (3.26) To be qualied as a good choice of ROC, the parameter K should be insensitive to classication results. We will analyze sensitivity of K in Section 3.4.3. For a face with CirFace x and Angle y, if w = (x;y)2 ROC, this face will be classied using the ML decision rule stated in (3.7). Otherwise, it will be sent to the 2nd stage. 38 Stage 2: SVM classier with surface features. If a face is sent to the 2nd stage, it will be classied using the surface-based classier. The GOP features of this face will be extracted and then selected by ANOVA technique. After that, an SVM classier will be used to decide its belonged group. Performance of overall system. Finally, the results of the 1st stage and 2nd stage will be combined to compute the overall classication accuracy. For example, 100 faces are tested. In 1st stage, 30 faces are in ROC and the classication rate is 29/30. Then, there are 70 faces to be classied in the 2nd stage, and its classication rate is 64/70. The overall correct classication rate is (29+64)/(30+70) = 0.93. 3.4 Experimental Results We choose the FG-NET Aging Database [1], which contains 1002 faces of 82 individuals, and the MORPH database [70], which contains 55,134 images of more than 13,000 indi- viduals, in our experiments. Some facial images from FG-NET and MORPH-II databases are shown in Fig. 3.4 and Fig. 3.5, respectively. The age range distribution of face images is listed in Table 3.1. Three age groups are in FG-NET. They are baby (0-3), child (4- 19) and adult (20-59). Three age groups are in MORPH. They are child (16-19), adult (20-59) and senior (60-77). For the shape-based and surface-based classiers, the 5-fold cross-validation is utilized to calculate the averaged accuracy. One subset is sequentially tested using the classier trained based on the remaining 4 subsets. 3.4.1 Results of Shape-based Classier (1st Stage) The "CirFace" and "Angle" features were extracted from frontal face images in each age group and the statistics of these two features for each group is listed in Table 3.2. We apply the Chi-square goodness-of-t test [62] against the normal distribution to the two shape features for each age group and show the results in Table 3.3. We see that the normal distribution assumption of the shape features in each age group is valid. Hence, 39 Figure 3.4: Some faces from the FG-NET database. Table 3.1: Age Range Distribution on the FG-NET and MORPH-II Databases Age FG-NET MORPH-II No. of images Percentage No. of images Percentage 0-9 371 37.03 % 0 0.00 % 10-19 339 33.83 % 7,469 13.55 % 20-29 144 14.37 % 16,325 29.61 % 30-39 70 7.88 % 15,357 27.85 % 40-49 46 4.59 % 12,050 21.85 % 50-59 15 1.50 % 3,599 6.53 % 60-69 8 0.80 % 318 0.58 % 70-77 0 0.00 % 16 0.03 % Total 1,002 100.00 % 55,134 100.00 % 40 Figure 3.5: Some faces from the MORPH-II database. we use the normal distribution to model these two shape features and show the modeling results in Fig. 3.6. The method described in Section 3.3.1 is used to classify facial images into three age groups. The results of the proposed new shape features, CirFace and Angle, are listed in Table 3.4. Also, the results of other existing features, Ratio 1 and Ratio 2 [16, 45], are listed in Table 3.4 for comparison. This result shows that proposed new shape features outperform existing shape fea- tures by a signicant margin. Actually, the separation degree of the distribution curves shown in Fig. 3.6 supports this result. That is, if the separation degree of a shape feature 41 Table 3.2: The Statistics of Shape Features in FG-NET Feature Group Mean STD Baby 0.9511 0.0246 CirFace Child 0.8914 0.0279 Adult 0.8599 0.0242 Baby 39.1888 3.0105 Angle Child 35.7340 1.9667 Adult 32.8581 1.4098 Table 3.3: The Chi-square Goodness-of-t Test against the Standard Normal in FG- NET. The Critical Value for the 2 Distribution Is 5.991 with the Degree of Freedom = 2 Feature Group 2 Accept/Reject Baby 2.1695 Accept CirFace Child 0.8523 Accept Adult 1.5330 Accept Baby 0.5483 Accept Angle Child 2.5075 Accept Adult 3.0299 Accept is higher, the feature has a higher discriminating power and will have higher classication accuracy. 3.4.2 Results of Surface-based Classier (2nd Stage) To test the performance of the surface-based classier, we use the LIBSVM library [7] in the training and testing of SVM classiers. Some pre-processing work on facial images is needed, such as face alignment, cropping the facial area to remove the background, and histogram equalization. For computational reasons, all facial images are reduced to the same size of 180150. To demonstrate the importance of feature selection, we compare two approaches in the age groups classication system: 42 Table 3.4: Performance of Shape Feature Methods in FG-NET Classication Rate Shape feature Baby Child Adult Average CirFace [ours] 84% 54% 82% 73.3% Angle [ours] 68% 58% 86% 70.7% Ratio 1 [45] 56% 50% 58% 54.7% Ratio 2 [45] 62% 44% 88% 64.7% Table 3.5: Age Classication Results by GOP+SVM Classication Rate Database Baby Child Adult Senior Average FG-NET 82.8% 68.6% 69.7% |{ 71.1% MORPH |{ 93.7% 76.2% 72.1% 80.6% (i) GOP+SVM: a method without feature selection, and (ii) GOP+ANOVA+SVM: the proposed method (with feature selection). In using the GOP method, we consider four pyramid layer (s = 4). In this case, the dimension of a GOP feature vector is 172034. The results for (i) and (ii) are showed in Table 3.5 and Table 3.6, respectively. As shown in Table 3.6, we see that the use of the feature selection, ANOVA, improves the classication accuracy as well as reduces the training and testing complexity. In FG- NET, the classication rate is 91.4% for GOP+ANOVA+SVM as compared with 71.1% for GOP+SVM, and only 12% of GOP feature is selected. In MORPH, the classication rate is 90.3% for GOP+ANOVA+SVM as compared with 80.6% for GOP+SVM, and only 21% of GOP feature is selected. To analyze the benet of feature selection, the relationship between the accuracy and selected features is plotted in Fig. 3.7. It is obvious that for selecting certain num- ber of features (e.g., 12% of features in FG-NET and 21% of features in MORPH), the 43 Table 3.6: Age Classication Results by GOP+ANOVA+SVM Classication Rate Database Baby Child Adult Senior Average FG-NET 92.1% 91.9% 90.1% |{ 91.4% MORPH |{ 96.1% 88.1% 86.8% 90.3% classication achieves highest accuracy. These features show the most signicant dier- ences between age groups. If we select more features or fewer features, the classication accuracy shall start to decrease. 3.4.3 Results of the Structured Fusion System In this subsection, we perform experiments on the proposed two-stage age groups classi- cation system as described in Section 3.3. We tested dierent ROCs by varying parameter K and show the corresponding results in Fig. 3.8. From K = 1, when K increases (i.e., area of ROC become smaller), the classication rate becomes higher and achieve the highest value 95.1% at K = 5.5-5.6 and 7.4-7.9 for FG-NET dataset and 93.7% at K = 5.4-6.7 for MORPH. Regarding sensitivity of K, when 4 K 30, we observe that the overall rate is stable within a small range and, hence, not sensitive for varying K values. Next, we compare the performance of the proposed age classication method with some state-of-the-art methods in Table 3.7. The same number of data and same train- ing/testing data settings are tested for methods in [31, 41, 43] and our method. The results show that our method is most eective and can oer better performance to the age groups classication problem than others. 3.5 Conclusion and Future Work We proposed a facial age groups classication system using a structured fusion of shape- feature and surface-feature based classiers. Two new shape features were developed and a new surface feature based method was designed. By setting a ROC to jointly 44 Table 3.7: Performance of Dierent Methods Tested on FG-NET and MORPH Datasets for Three Age Groups Classication, where GAS: GOP+ANOVA+SVM, CAGAS: Cir- Face+Angle + GOP+ANOVA+SVM Method FG-NET MORPH WAS [26] 78.7% 75.1% AGES [26] 81.1% 77.8% MLPs [46] 80.5% 77.3% BIF [39] 87.3% 85.6% MTWGP [91] 86.4% 84.8% OHRank [9] 88.5% 86.6% PLO [48] 86.6% 84.9% GAS [ours] 91.4% 90.3% CAGAS [ours] 95.1% 93.7% classify frontal face images with two stages, the resulting system gave a highly accu- rate classication result. Experimental results demonstrated that the proposed method outperforms the state-of-the-art methods. In the future, we will explore other shape/surface features and fuse dierent shape/surface features together. We will explore the impact of facial landmarks in the proposed shape-based approach. Also, we may consider the eects of the number of pyra- mid layers for GOP and use/combine other surface-based features. Besides, non-frontal faces will be considered in the future. More than three age groups (i.e., decades of life: 0-9, 10-19, 20-29, 30-39, 40-49, etc) classication problem will also be an interesting and challenging topic. 45 Figure 3.6: Modeling shape features as normal distributions, top: CirFace; middle: Angle; bottom: jointly CirFace and Angle. 46 Figure 3.7: The relationship between accuracy and the percentages of selected GOP features in the surface-based classier. Figure 3.8: Classication accuracy vs. parameter K. 47 Chapter 4 Age Estimation via Multistage Learning: from Age Grouping to Decision Fusion 4.1 Introduction the past few years, human facial age estimation has drawn a lot of attention in the computer vision community because of its important applications in age-based image retrieval [46], internet access control, security control and surveillance [20, 77], biomet- rics [20, 65, 71], human-computer interaction (HCI) [27, 23], and electronic customer relationship management (ECRM) [20]. Estimating human age from a facial image requires a great amount of information from the input image. This kind of information is often called facial aging features. Extraction of these features is important since the performance of an age estimation system will heavily rely on the quality of extracted features [20]. Lots of research on age estimation has been conducted towards aging feature extraction. Examples include: the active appearance model (AAM) [13], age manifold [21, 22], AGing pattern Subspace (AGES) [26, 27], anthropometric model [45], biologically inspired features (BIF) [39], and patch-based appearance model [86, 89]. Another aspect for age estimation is to build a reliable age prediction system (i.e., age estimator) based on extracted features. The age estimator can use a machine learning approach to train a model for extracted features and make age prediction for query faces 48 with the trained model. Generally speaking, age estimation can be viewed as a multiclass classication problem [26, 33, 46, 83], a regression problem [21, 22, 89, 39, 36, 87, 88, 91, 92] or a composite of these two [33, 35, 34]. From a dierent perspective, facial aging can also be treated as an ordinal process. For instance, the face of a 2-year-old child should be more closely related to the face of a 3-year-old child than the face of a 15-year-old teenager. Thus, age estimation can also be treated as a ranking problem [9, 56, 90]. Although many approaches have been presented to deal with age estimation, most of them directly estimate an age from a very wide age range. However, it would be more meaningful to estimate the age from a narrower age range. For example, estimating an age in the age range of 15 to 20 is easier than estimating an age in the age range of 0 to 60. The task of age grouping (or, age group classication) is to classify facial images into dierent age groups. With higher accuracy in age grouping, the age estimation error in each age group is expected to be lower. Being motived by the above observation, we present a novel age estimation framework, called Grouping-Estimation-Fusion (GEF), in this work. The proposed age estimation system consists of three main stages: 1) age grouping, 2) age estimation within age groups, and 3) fusion of decisions. There are sev- eral main contributions of our current work. They are: 1) diverse decisions (i.e., dierent age estimation results) are generated by creating multiple age grouping systems; 2) the relationship between the performances of age estimation and age grouping is extensively explored; 3) a systematic way of measuring the diversity between decisions for intra- system and inter-system is proposed; and 4) six decision fusion schemes are presented to perform age estimation. The performance of our proposed solution is evaluated on the FG-NET and the MORPH-II databases, and it outperforms existing state-of-the-art age estimation methods by a signicant margin. That is, the mean absolute errors (MAEs) of age estimation can be reduced from 4.48 to 2.73 on FG-NET and 3.98 to 2.91 on MORPH-II. The remainder of this chapter is organized as follows. Related previous work is brie y reviewed in Section 4.2. An overview of the proposed GEF age estimation scheme 49 is presented in Section 4.3. The age grouping method is introduced in Section 4.4. Age estimation within each age group is detailed in Section 4.5. Analysis of diversity, designs of fusion schemes, and decision selection algorithms used in our fusion schemes are discussed in Section 4.6. Experimental results are shown in Section 4.7. Finally, concluding remarks and future work are given in Section 4.8. 4.2 Related Work In recent years, facial age grouping and facial age estimation problems are being exten- sively studied. A lot of approaches have been proposed in these two research topics. We brie y review age grouping in Section 4.2.1 and age estimation in Section 4.2.2. 4.2.1 Review of Age Grouping Age grouping (i.e., age group classication) was rst conducted by Kwon and Lobo in [45]. They categorized facial images into three age groups: babies, young adults, and senior adults. They computed six ratios of distances between primary components (e.g., eyes, noses, mouth, etc.) and separated babies from the other two groups. Then, wrinkles on specic areas of a face were located using snakes, and wrinkle indices were used to distinguish senior adults from young adults and babies. There were only 47 images in the experimental dataset, and the correct classication rate for the baby group was below 68%. Horng et al. [43] proposed a system that classies faces with three steps: primary components detection, feature extraction, and age classication. They classied 230 facial images into four age groups: babies, young, middle-aged and senior adults. They rst used the Sobel edge operator [18] and region labeling to locate the positions of eyes, noses, and mouths. Then, two geometric features and three wrinkle features were 50 extracted. Finally, two back-propagation neural networks were constructed for classica- tion. The correct classication rate was 81.58%. The facial age groups were subjectively assigned (i.e., not actual ages) in their experiments. Thukral et al. [81] extracted geometric features from faces and fused the results from ve classiers: -SVC [72], partial least squares (PLS) [4], Fisher linear discriminant (FLD), Na ve Bayes, and k-nearest neighbor (KNN) [18], by adopting the majority deci- sion rule. The nal rate was 70.04% for three age groups (namely, 0-15, 15-30, and 30+). Gunay and Nabiyev [31] proposed an automatic age classication system based on local binary patterns (LBP) [63] for face description. Faces were divided into small regions from which the LBP histograms were extracted and concatenated into a feature vector. For every new face presented to the system, spatial LBP histograms were pro- duced and used to classify the image into one of six age groups: 105, 205, 305, 405, 505, 605. The minimum distance, the nearest neighbor and the k-nearest neighbor classiers were used. Their system gave a classication rate of 80%. Hajizadeh et al. [41] used histograms of oriented gradients (HOG) [15] as the facial feature. HOG features were computed in several regions and these regional features were concatenated to construct a feature vector for each face. A probabilistic neural network (PNN) classier was used to classify facial images into one of four age groups. The classication rate was 87.25%. Liu et al. [52] proposed a structured fusion method for age group classication by building a region of certainty (ROC) to connect the uncertainty-driven shape features with selected surface features. In the rst stage, two shape features are designed to determine the certainty of a face and classify it. In the second stage, the gradient orientation pyramid (GOP) [49] features are selected by a statistical method and then combined with an SVM classier to perform age grouping. Their method was tested in classifying faces into three age groups, and the classication accuracy of 95.1% was reported. 51 4.2.2 Review of Age Estimation Lanitis et al. [46] used the active appearance models (AAMs) by combining shape and appearance facial features. Age estimation is treated as a classication problem and solved by the shortest distance classier and neural networks. They dierentiated age- specic and appearance-specic estimation problems. Personalized age estimation is introduced to cluster similar faces before classication. Geng et al. [26, 27] proposed an automatic age estimation method named AGES (AGing pattErn Subspace), which models the long-term aging process of a person (i.e., a sequence of a person's face images), and estimates the person's age by minimizing the reconstruction error. However, the facial features of the same person could be similar in dierent ages. Guo et al. [38] used biologically in-spired features (BIF) with manifold learning for face representation. They treated each age as a class label and adopted SVM for age estimation. Guo et al. [39] extracted BIF for each face, applied the principal component analysis (PCA) [85] for feature dimensionality reduction. They used classication and regression approaches to age estimation. Yan et al. [89] proposed a patch-based regression method for age estimation and the regression error is minimized by a three-complementary-stage procedure. First, each image is encoded as an ensemble of orderless coordinate patches of GMM (Gaussian Mixture Model) distribution. Then, the patch-kernel is designed for characterizing the Kullback-Leibler divergence between the derived models for any two images, and its dis- criminating power is further enhanced by a weak learning process, called inter-modality similarity synchronization. Finally, kernel regression is employed for ultimate human age estimation. Zhang et al. [91] proposed a multi-task warped Gaussian process (MTWGP) model for age estimation. Age estimation is formulated as a multi-task regression problem in which each learning task refers to the estimation of the age function for each person. 52 Besides modelling common features shared by dierent tasks (persons), MTWGP also allows task-specic (person-specic) features to be learned automatically. Chang et al. [9] proposed an ordinal hyperplane ranking algorithm (OHRank) using the relative order information among age labels in a database. Each ordinal hyperplane separates all facial images into two groups by the relative order, and a cost-sensitive property is used to nd a better hyperplane by minimizing the classication cost. Human age is then inferred by aggregating a set of preferences from multiple ordinal hyperplanes. Guo and Mu [36] used the kernel partial least squares (KPLS) regression for age estimation with three advantages: 1) the KPLS can reduce feature dimensionality and learn the aging function simultaneously in a single learning framework; 2) the KPLS can nd a small number of latent variables (e.g., 20) to project thousands of features into a low-dimensional subspace, which is attractive in real-time applications; and 3) the KPLS has an output vector consisting of multiple labels to solve several related problems (e.g., age estimation, gender classication, and ethnicity estimation) together. Li et al. [48] considered temporally ordinal and continuous characteristics of the aging process and proposed to learn ordinal discriminative facial features. Their method aimed at preserving the local manifold structure of facial images while keeping the ordinal infor- mation among aging faces. The two factors were formulated into a unied optimization problem, and a solution was presented. Existing approaches handle age estimation as either age grouping or exact age esti- mation. In this work, we integrate age grouping and exact age estimation in the GEF framework to increase age estimation ability. Several global and local features are sepa- rately used in each system to create several decisions. Also, several such systems are built to increase diversity between decisions. Furthermore, six fusion schemes, intra-system fusion (AF), inter-system fusion (EF), intra-inter fusion (AEF), inter-intra fusion (EAF), maximum diversity fusion (MDF), and composite fusion (CF) are created to achieve bet- ter estimation accuracy. 53 4.3 Overview of Proposed GEF System The proposed GEF age estimation scheme consists of three stages. In the rst stage, we adopt the age grouping method in [52] to classify face images into dierent age groups. The entire age range is divided into several non-overlapping ranges and each age group has a dierent range. Then, the gradient orientation pyramid (GOP) [49] is adopted to represent overall facial features. To further increase the discriminating ability of the feature space, the analysis of variance (ANOVA) [28] is employed to select the more discriminative features from the GOP feature vector and also signicantly reduce the dimensionality of the GOP feature vector. Then the support vector machine (SVM) [7] with linear kernel is adopted to learn a model and classify faces into age groups. In the second stage, an exact age for each face is estimated to be a value within its group range. Here, both local and global features are used. Local features are obtained by extracting features from local facial areas. A cascade object detector using the Viola-Jones [84] algorithm is adopted to detect three facial components (eyes, nose and mouth). Three methods (biologically inspired features (BIF) [39], histograms of oriented gradients (HOG) [15] and local binary pattern (LBP) [63]) are adopted to extract local aging features from detected facial components. For global features, since the facial landmarks are provided in the FG-NET [1] database, we use the active appearance model (AAM) [13] to represent them. However,the facial landmarks of the MORPH-II [70] database are not provided, the BIF, HOG, and LBP are used to extract global features from the whole face. Every global or local feature (e.g., BIF eyes or LBP mouth) is used by support vector regression (SVR) [7] to predict ages for faces in each group. At the end of this stage, decisions (i.e., estimation results) from the system outputs are produced and they are used as the input features to the third stage. In the third stage, we focus on fusion of decisions from the 2nd stage. To construct a powerful fusion scheme, it requires richer diversity shown by the selected decisions. To achieve this goal, multiple systems are created and each system has a dierent number 54 of age groups and dierent age ranges. For example, if the entire age range is from 0 to 70, one system could have 3 age groups: 0-10, 11-30, 31-70 while another may have 5 groups: 0-10, 11-20, 21-30, 31-50, 51-70. With the analysis of diversity in decisions, six ecient fusion schemes are proposed and compared to yield the nal age estimation result. 4.4 Age Grouping The function of age grouping (i.e., age groups classica-tion) is to classify face images into dierent groups based on their ages. The entire age range is divided into several non- overlapping sub-ranges and each sub-range is considered as an age group. In our previous work, it has been shown that when the number of groups is small (e.g., 2 or 3), both geometric and texture features may be utilized together to enhance the discriminative power in age grouping. However, if number of groups is not small (e.g., 4 or larger than 4), the geometric features may not oer the help to improve the classication accuracy. Hence, we propose a new age grouping method to tackle this larger number of goups problem by carrying out following procedures: feature extraction, feature synthesis, feature selection, and feature classication. The objective of age grouping (i.e., age group classication) is to classify face images into dierent groups based on their ages. The entire age range could be divided into several non-overlapping ranges and each range constitutes an age group. From [52], it can be seen that when the number of groups is small (e.g., 2 or 3), both shape(geometric) and surface(texture) features may be utilized for age grouping. However, if the number of groups is larger (e.g., 4 or larger than 4), the shape features may not oer help for improving the classication accuracy. Hence, we adopt the surface feature based age grouping method from a previous work [52] to tackle the age grouping problem. This includes following procedures: feature extraction, feature synthesis, feature selection, and age classication. 55 4.4.1 Feature Extraction The facial aging features used in previous works can be categorized into two types: geometric features and texture features. The geometric features, such as facial shapes and distance ratios between facial components, would reveal noticeable changes in babyhood (ages 0 to 3) and childhood (ages 4 to 17). The texture features, such as wrinkles and skin surfaces, are obviously appeared in adulthood (ages 18 to 70). Since the range is very wide in adulthood, it would be divided into several sub-ranges. If more age groups are dened in adulthood, it would need a reliable texture feature for age grouping. The gradient orientation pyramid (GOP) can provide the image gradient information as well as the pyramid information. For a given image, we rst build a pyramid of this image and then compute the gradients in each layer of the pyramid. Finally, these gradients are combined together as a GOP feature. The procedure is detailed below. For a given image I(x;y), where (x;y) indicates the pixel coordinates, the pyramid of I can be dened as P (I) =fI((x;y);)g s =0 ; (4.1) where I((x;y); 0) =I(x;y); (4.2) I((x;y);) = [I((x;y); 1)(x;y)]# 2 ; = 1;:::;s; (4.3) and(x;y) is the Gaussian kernel with 0.5 as the standard deviation used in our experi- ments. Also, * in (4.3) denotes the convolution operation,# 2 denotes down-sampling by factor 2, and s is the total number of pyramid layers. Once pyramid P (I) is constructed, the gradient orientation (GO) at every specic layer can be computed by nding its normalized gradient vectors at each pixel as GO (I((x;y);)) = 8 > > > < > > > : r(I((x;y);)) jr(I((x;y);))j ifjr(I((x;y);))j> (0; 0) T otherwise (4.4) 56 where is a threshold used for dealing with at pixels. Then, we reshape the GO of each layer via GO(I;) =fGO (I((x;y);))g = 2 6 4 g 1;1 g 1;n . . . . . . . . . g m;1 gm;n 3 7 5 reshape ! GO r (I;) = [g 1;1 g 1;n g m;n ] 0 2 R h2 (4.5) where h = m n is the number of pixels in the pyramid layer and g i;j is a row vector of size 1 2. Finally, the GOP ofI can be obtained by stacking up theGO r (I;) from all layers as GOP (I) = 2 6 6 6 6 6 6 6 4 GO r (I; = 0) GO r (I; = 1) . . . GO r (I; =s) 3 7 7 7 7 7 7 7 5 = 2 6 6 6 6 4 g x;1 g y;1 . . . . . . g x;h g y;h 3 7 7 7 7 5 2 R h2 (4.6) where h =h 0 +h 1 + +h is the total number of pixels across all pyramid layers. 4.4.2 Feature Selection The purpose of feature selection is to pick out features having higher discrimination and discard features with worse discrimination. To achieve this purpose, we propose to use a statistic based method to select features for classication steps. Using the idea of hypothesis testing, the analysis of variance (ANOVA) method is adopted here to measure which feature has higher discriminating power among age groups. The feature selection procedure is described below. Analysis of Variance (ANOVA) 57 1. Divide the total number of faces N into m groups and each group has n i (i = 1;:::;m) faces. N = m X i=1 n i 2. Compute the F value for feature X in the GOP feature vector G(I) via following equations Mean of each group x 1 ; x 2 ; ; x m ; x i = 1 n i n i X j=1 x ij ;i = 1;:::;m; where x ij represents the feature of the jth face of the ith group. Sum of squared deviations for each group SS 1 ;SS 2 ; ;SS m SS i = n i X j=1 (x ij x i ) 2 ;i = 1;:::;m Within groups sum of squares SS within = m X i=1 SS i Within group variance 2 within = SS within DF within =MS within ; where DF within =Nm Between groups sum of squares SS between = m X i=1 n i x 2 i 1 N m X i=1 n i x i ! 2 58 Between groups variance 2 between = SS between DF between =MS between ; where DF between =m 1 F statistic value F = 2 between 2 within (4.7) 3. Select features from the GOP feature vector G(I) by F statistic value. A feature with largestF value is selected rst; a feature with 2nd largest F value is selected next, and so on. 4. IfF >F crit , we reject the null hypothesisH 0 :fThere is no signicant dierence on feature X between age groupsg with probability P < , and is the signicance level, which is usually set to be 0.05 or 0.01. Hence, we select those features with largerF values (i.e., higher discriminating pow- ers) to get the selected feature vector AG. 4.4.3 Age Classication Support vector machines (SVM) are a widely used machine learning method for classication, regression, and other learning tasks. In this age grouping stage, we use support vector classication (SVC) with linear kernel to learn a model and classify faces into dierent age groups. The procedure for using SVC to implement feature classication is described as follows. Feature classication by SVC 59 1. Divide the total number of faces N into m groups and each group has n i (i = 1;:::;m) faces. N = m X i=1 n i 2. Labeling: The ANOVA selected GOP feature vectorAG k extracted from face k (k = 1;:::;N) will be marked with labeli (i = 1;:::;m) if face k belongs to groupi. 3. Scaling: Every feature in a feature vector AG k is linearly scaled to range [0, 1] among all faces. This is conducted to avoid the dominance of attributes with a large dynamic range over those with a smaller dynamic range. The linear scaling operation is performed for both training and testing data via x = r min(R) max(R) min(R) ; wherex is the scaled feature,r is the raw feature, and max(R), and min(R) specify the maximum and minimum values of the feature range R, respectively. 4. Cross-Validation: M-fold cross-validation is employed to average the results and increase the condence level. Each age group is divided into M subsets of equal or near equal size. First, we choose 1 subset from each age group to form a testing set, while using the other M-1 subsets from each age group to form a training set. This process is repeated for M times, where each subset is used as the testing set once. 60 5. SVC: Given a training set of feature-label pairs (x i ;y i );i = 1;:::;l; where x i 2R n and y i 2f1;1g l , the SVC requires the solution of the following optimization problem: min w;b; w T w +C l X i=1 i subject to 8 < : y i (w T '(x i ) +b) 1 i i 0 The training feature vectors x i are mapped into a higher dimensional space by a function ' with the kernel function K(x i ; x j )'(x i ) T '(x j ). 6. Kernel Selection: There are two common choices for the kernel function: the linear kernel and the radial basis function (RBF) kernel. Mathematically, the linear and the RBF kernel functions can be written as: Linear: K(x i ; x j ) = x T i x j (4.8) Radial basis function: K(x i ; x j ) = exp( kx i x j k 2 ); > 0; (4.9) where is the kernel parameter. The RBF kernel is often used when the dimension of the feature vector is low [44]. On the other hand, if the dimension of the feature vector is high, which is our current case, the nonlinear mapping does not improve the performance much [44]. Thus, we choose the simpler linear kernel for the SVC algorithm in our age grouping stage for lower complexity. 61 Figure 4.1: The age grouping system. The entire age grouping algorithm, including feature extraction, feature selection, and feature classication, is summarized in Fig. 4.1. 4.5 Age Estimation within Age Groups After age grouping is completed, each face is classied into an age group, which has a dened range. An exact age for each classied face will be estimated to be a value within the dened age range of its age group. The approach includes the following procedures: (i) facial components detection, (ii) feature extraction from facial components, and (iii) age estimator learning. 4.5.1 Facial Components Detection In addition to global facial information, we explore the local facial information by detect- ing facial components and extracting several features from the detected facial compo- nents. In image processing, the Viola-Jones [84] algorithm is one of the most ecient and widely used algorithms in object detection. This algorithm also demonstrates exceptional competence in detecting faces. 62 Figure 4.2: An example of facial components detection. Since the eyes, nose, and mouth are important parts of a face, we intend to extract local aging features from them. In this step, a cascade object detector using the Viola- Jones [84] algorithm is adopted to detect these three important facial components. Fig. 4.2 shows an example of the detection results on facial components. One advantage of using these detected facial components is the lower feature dimen- sionality, since the image sizes of detected facial components are much smaller than the image size of a whole face. Another advantage will be richer diversity because of more features. 4.5.2 Feature Extraction in Facial Components Recently, the biologically inspired features (BIF) [39], histograms of oriented gradients (HOG) [15], and local binary pattern (LBP) [63] methods are widely used to extract facial aging information. Thus, we adopt the BIF, HOG, and LBP methods for global and local aging feature extraction. For LBP, we use a modied one, called uniform LBP, which is brie y described below. 63 First, the local binary pattern (LBP) operator [63] is dened as: LBP P;R = P1 X p=0 s (g p g c ) 2 p ; (4.10) where s (g p g c ) = 8 < : 1 if g p g c 0 otherwise: (4.11) and g c corresponds to the gray value of the center pixel of the local neighbourhood andg p (p = 0;:::;P 1) corresponds to the gray values of P equally spaced pixels on a circle of radiusR (R> 0) from a circularly symmetric neighbour set. Then the uniformity measure U of LBP P;R is dened as below U(LBP P:R ) = js (g p1 g c )s (g 0 g c )j + P P1 p=1 js (g p g c )s (g p1 g c )j; (4.12) which corresponds to the number of spatial transitions (bitwise 0/1 changes) inLBP P;R . For example, pattern 00000000 2 hasU value of 0, and pattern 00000100 2 hasU value of 2. The uniform LBP patterns (LBP u2 P;R ) refer to the patterns which have limited tran- sition or discontinuities (U 2) in the circular binary presentation [63]. In practice, the mapping from LBP P;R to LBP u2 P;R , which has P (P-1)+3 distinct output values, is implemented with a lookup table of 2 P elements. Finally, the uniform LBP feature (LBPu) can be written as LBPu =nhist(b); b = 0; 1;:::;P (P 1) + 2; (4.13) whereb represents the bin of the histogram, andnhist denotes the normalized histograms of LBP u2 P;R . In this work, we choose P = 8 and R = 1 for the uniform LBP operator. Therefore, we have 59 (i.e., P (P 1) + 3) values to represent the feature in (4.13). 64 Table 4.1: The 12 Features Used in the 2nd Stage Feature Type F1: AAM app (FG-NET) / BIF (MORPH-II) Global F2: AAM sha (FG-NET) / HOG (MORPH-II) Global F3: AAM tex (FG-NET) / LBPu (MORPH-II) Global F4: BIF eyes Local F5: BIF nose Local F6: BIF mouth Local F7: HOG eyes Local F8: HOG nose Local F9: HOG mouth Local F10: LBP eyes Local F11: LBP nose Local F12: LBP mouth Local After extracting local aging features, we have 9 dierent local aging features (F4, F5,. . . , F12) as shown in Table 4.1. As for the global aging feature extraction, since the facial landmarks are provided in the FG-NET database, like other age estimation methods, we are able to adopt active appearance models (AAM) to extract global fea- tures, including appearance, shape and texture features (F1, F2, and F3), as shown in Table 4.1. However, MORPH-II database does not provide facial landmarks, and the extracted BIF, HOG, and LBPu features from the whole face will serve as the global features (F1, F2, and F3), shown in Table 4.1. 4.5.3 Age Estimators Learning Each of the 9 types of local aging features and 3 types of global aging features can be used to obtain an age estimator (AE). Similar to the rst stage, the support vector machines (SVM) method is adopted for the process of learning for age estimators (AEs) in the second stage. In each classied age group, we use support vector regression (SVR) with linear kernel to learn a model from training faces and predict ages for testing faces. The nonlinear kernel, radial basis function (RBF), is also tested in the experiment. However, its results are almost the same as the linear kernel. To lower the complexity, the linear kernel is chosen in this stage. The procedure for using SVR to 65 implement an age estimator is described as follows. Age estimation using feature regression by SVR 1. Classied Results: For an age grouping system withm groups, each face is classied with label i, i2f1; 2;:::;mg. 2. Feature labeling: The feature vector of each face is labeled with three parameters: group i, classied group i c , and age a. 3. Scaling: Every feature in a feature vector is linearly scaled to range [0, 1] among all faces. This is conducted to avoid the dominance of attributes with a large dynamic range over those with a smaller dynamic range. The linear scaling operation is performed for both training and testing data via x = r min(R) max(R) min(R) ; (4.14) wherex is the scaled feature,r is the raw feature, and max(R) and min(R) specify the maximum and minimum values of the feature range R, respectively. 4. Cross validation: The leave-one-person-out (LOPO) cross validation technique is used on the FG-NET database. For the MORPH-II database, the same setting as previous studies [36, 37] is followed. 5. Training: The training feature vectors are divided into m groups based on their rst label i (e.g., a feature vector is in group i if it has a label i). The feature vectors of each group are used to train a model (i.e., age estimator). Totallym age estimators AE i (i = 1;:::;m) will be trained. 6. Testing: A testing feature vector is rst assigned to an age esetimator based on its second labeli c (i c = 1;:::;m). Ifi c = 1, then the age of the testing feature vector 66 Figure 4.3: Age estimation within age groups. will be predicted by the age estimatorAE 1 , and the predicted age will be conned to the age range of group 1. 7. The SVR with linear kernel is used in training and testing. The procedure of age estimation within age groups is also brie y described in Fig. 4.3. 4.6 Fusion of Decisions In order to further improve the performance, we investigate several fusion schemes based on the decisions (i.e., estimation results from the second stage). In the third stage (i.e., the fusion stage), six novel fusion schemes are rst proposed: intrA-system Fusion (AF), intEr-system Fusion (EF), intrA-intEr Fusion (AEF), intEr-intrA Fusion (EAF), Maximum Diversity Fusion (MDF), and Composite Fusion (CF). Here a system means the m-group age estimation system, which is a combination of rst stage and second stage, as shown in Fig. 4.4. In FG-NET, we investigate m = 3, 4, 5, 6, 7, 8, 9, 10-group age estimation systems, and each system has 12 decisions (i.e., 12 estimation results from 12 AEs). Totally, 96 67 Figure 4.4: The m-group age estimation system. decisions can be used for fusion. In MORPH-II, due to lack of age 0 to 15, only m = 2, 3, 4, 5, 6, 7-group systems are investigated. Each system has 12 decisions, and 72 decisions can be used for fusion. In this section, we will use FG-NET to demonstrate our fusion schemes. In the fusion stage, these decisions (i.e., prediction results) obtained from the second stage are treated as input features and will be fused by the fusion scheme to train another SVR function. Age estimation is realized through the similar procedure (step 2 to 7 for learning an age estimator) in the second stage. In this stage, we propose two decision selection algorithms for six fusion schemes to nd decision subset, which will demonstrate a competitive result as compared with the optimal subset obtained by the exhaustive search. 4.6.1 Diversity Analysis The goal of analyzing the diversity between dierent decisions is to nd out how to fuse them in a more ecient way and gain improvements after fusion. Since each estimator would make dierent errors on dierent faces, a strategic fusion of these estimators could reduce the total estimation error. Therefore, we need to fuse a set of estimators whose decisions are adequately dierent from those of others. 68 Table 4.2: Correlation (p) between Any Two Decisions d 1 d 12 in the 3-group System (s 3 ), Mean Correlation = 0.9333, on FG-NET p d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 d 10 d 11 d 12 d 1 1.000 0.991 0.991 0.962 0.955 0.956 0.906 0.906 0.897 0.976 0.972 0.969 d 2 0.991 1.000 0.980 0.961 0.961 0.959 0.901 0.911 0.899 0.982 0.979 0.976 d 3 0.991 0.980 1.000 0.964 0.958 0.957 0.907 0.910 0.898 0.979 0.976 0.972 d 4 0.962 0.961 0.964 1.000 0.938 0.939 0.910 0.899 0.883 0.958 0.953 0.950 d 5 0.955 0.961 0.958 0.938 1.000 0.942 0.880 0.905 0.889 0.970 0.974 0.968 d 6 0.956 0.959 0.957 0.939 0.942 1.000 0.887 0.891 0.893 0.958 0.960 0.960 d 7 0.906 0.901 0.907 0.910 0.880 0.887 1.000 0.832 0.831 0.899 0.894 0.893 d 8 0.906 0.911 0.910 0.899 0.905 0.891 0.832 1.000 0.831 0.911 0.914 0.908 d 9 0.897 0.899 0.898 0.883 0.889 0.893 0.831 0.831 1.000 0.901 0.902 0.908 d 10 0.976 0.982 0.979 0.958 0.970 0.958 0.899 0.911 0.901 1.000 0.989 0.984 d 11 0.972 0.979 0.976 0.953 0.974 0.960 0.894 0.914 0.902 0.989 1.000 0.987 d 12 0.969 0.976 0.972 0.950 0.968 0.960 0.893 0.908 0.908 0.984 0.987 1.000 To measure the diversity between pair-wise decisions of estimators, some measures can be used for quantitative assessment of diversity. Here we propose to use Pear- son's linear correlation coecientp to measure the diversity between pair-wise decisions (d 1 ;:::;d 12 ). Diversity is measured as the correlation between two estimator outputs and 0p 1. Maximum diversity is observed when p = 0, indicating the two decisions are uncorrelated. Table 4.2 shows the intra correlationp intra between any two decisions (s 3 d i ;s 3 d j ) for 3-group systems 3 . Table 4.3 shows the inter correlationp inter between any two systems' decisions (s m d 1 ;s n d 1 ) for decisiond 1 . It is clear that the intra-system correlationp intra is much higher than the inter-system correlationp inter . This means intra-system diversity Div intra is much lower than inter-system diversity Div inter . Based on the diversity, it is expected that inter-system fusion will oer greater performance improvement than intra-system fusion since the inter-system diversity is higher. From the experiments, it is veried that the performance of inter-system fusion (MAE in Table 4.8) is better than that of intra-system fusion (MAE in Table 4.10). 69 Table 4.3: Correlation (p) between Any Twom-group Systems (s 3 s 10 ) for Decisiond 1 , Mean Correlation = 0.7777, on FG-NET p s 3 d 1 s 4 d 1 s 5 d 1 s 6 d 1 s 7 d 1 s 8 d 1 s 9 d 1 s 10 d 1 s 3 d 1 1.000 0.851 0.829 0.812 0.783 0.750 0.704 0.636 s 4 d 1 0.851 1.000 0.839 0.792 0.786 0.745 0.701 0.632 s 5 d 1 0.829 0.839 1.000 0.841 0.841 0.766 0.726 0.649 s 6 d 1 0.812 0.792 0.841 1.000 0.846 0.829 0.772 0.693 s 7 d 1 0.783 0.786 0.841 0.846 1.000 0.856 0.821 0.748 s 8 d 1 0.750 0.745 0.766 0.829 0.856 1.000 0.899 0.808 s 9 d 1 0.704 0.701 0.726 0.772 0.821 0.899 1.000 0.818 s 10 d 1 0.636 0.632 0.649 0.693 0.748 0.808 0.818 1.000 Figure 4.5: The intra-system fusion (AF) scheme. 4.6.2 Intra-system Fusion (AF) For each m-group age estimation system, the 12 AEs will deliver 12 dierent decisions (i.e., estimation results)d 1 ;:::;d 12 , and there are 2 12 possible ways of selection for fusion. Therefore, a systematic algorithm is needed to nd an eective subset from 12 decisions d 1 ;:::;d 12 . Here we propose to apply the sequential forward selection (SFS) algorithm to achieve this goal. The intra-system fusion scheme is illustrated in Fig. 4.5. 70 First, given a decision set D =fd j jj = 1;:::; 12g, we want to nd a subset D N = fd i1 ;d i2 ;:::;d iN g, withN 12, to optimize an objective functionJ(D N ), which can be dened as the following form: J(D N ) =MAE(f(D N );A GT ) (4.15) where MAE, which is dened in (4.16) in Section 4.7.3, represents the mean absolute error between estimated age f(D N ) and ground truth age A GT . The objective function evaluates feature subsets by their estimation accuracy by using cross validation to avoid overtting. Sequential Forward Selection (SFS) is one of the simplest greedy search algorithms to achieve the above goal. Starting from a decision setD k (being empty at the start), we sequentially add one decisiond that results in the lowest objective function J(D k +d ) between ground truth ageA GT and estimated agef(D k +d ) to the set when combined with the decision set D k that have already been selected. The algorithm can be stated below for clarity: Algorithm - Sequential Forward Selection (SFS) 1. Start with the empty decision set D 0 =fg. 2. Select the next best decision. d = arg min d2DD k J(D k +d) 3. Update D k+1 = D k +d ; k = k + 1: 71 4. Go to 2. An illustration of intra-system fusion is also presented below. For example, we con- sider the 3-group age estimation system and try to nd a subset from 12 decisions (s 3 d j ;j = 1;:::; 12) by using SFS algorithm to obtain the best performance (smallest MAE). First, each decisionfd j g forj = 1;:::; 12 is selected for MAE performance evalu- ation. Ifd 7 provides the smallest MAE (denoted asMAE 1 ), then the decision subsetD 1 is updated tofd 7 g. Next, every decision set (d 7 ;d j ) for j6= 7 is selected for MAE per- formance evaluation. If (d 7 ;d 3 ) provides the smallest MAE (denoted as MAE 2 ), which is also smaller than MAE 1 , then the decision subset is updated to D 2 =fd 7 ;d 3 g. After that, every decision set (d 7 ;d 3 ;d j ) forj6= 3; 7, is selected for performance evaluation. If (d 7 ;d 3 ;d 8 ) provides the smallest MAE (denoted as MAE 3 ), which is not smaller than MAE 2 , then the decision subset will not be updated and the nal decision subset is D 2 =fd 7 ;d 3 g. 4.6.3 Inter-system Fusion (EF) To investigate the eectiveness of the inter-system fusion, rst we focus on one specic decision (e.g.,d 1 ) from 12 decisions, and then selectd 1 from all 8 systems (s 3 ;s 4 ;:::;s 10 ) for fusion, where s m represents the m-group age estimation system. Therefore, in FG- NET we have 8 systems (i.e.,s 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 ) for fusion, and it still has 2 8 ways of selection. In MORPH-II, we have 6 systems for fusion because 6 systems were used. To eciently select good candidates, we could adopt the SFS algorithm to nd a good subset from 8 systems s m (m = 3; 4;:::; 10). However, dierent systems have dierent age range denitions for their age groups, and it is expected that the same decisions from dierent systems (e.g., s m d 1 for m = 3; 4;:::; 10) would exhibit higher diversity than dierent decisions from the same system (e.g., s 3 d j for j = 1; 2;:::; 12). Contrary to SFS, we propose to utilize the Sequential Backward Selection (SBS) algorithm to 72 Figure 4.6: The inter-system fusion (EF) scheme. quickly nd a subset from 8 systemss m (m = 3; 4;:::; 10) to achieve the minimum MAE. The inter-system fusion scheme is illustrated in Fig. 4.6. SBS works in the opposite direction of SFS. First, SBS starts with a full system set S = fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g, and it sequentially removes a system that least reduces the value of the objective functionJ(S k s ). The SBS algorithm is stated below. Algorithm - Sequential Backward Selection (SBS) 1. Start with the full system set S 0 =f g. 2. Remove the worst system. s = arg min s2S k J(S k s) 3. Update S k+1 = S k s ; k = k + 1: 73 4. Go to 2. For a given decisiond 1 , rst, SBS evaluates the MAE (denoted as MAE 0 ) for fusion of the full system set S =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g. Then it removes one system and evaluates MAE for fusion of 7 systems. Let the fusion of 7 systems (with s 10 removed) have the smallest MAE (denoted as MAE 1 ), and then the system subset S 1 is updated to S 1 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 g if MAE 1 is smaller than MAE 0 . The update stops if MAE i+1 is not smaller than MAE i . 4.6.4 Intra-inter Fusion (AEF) Through the intra-system fusion, each system s m has its best decision subset DS m selected, where m = 3;:::; 10. In addition to considering intra-system information, we would like to include the inter-system information. The basic idea of intra-inter fusion is described below. Given the decision subsetsfDS m ;m = 3;:::; 10g obtained from intra-system fusion, we perform the fusion on decision subsets from all systems by using the SBS algorithm. First, the full setfDS 3 ;:::;DS 10 g is evaluated with the MAE 0 . Then it removes one DS m and evaluates the MAE 1 for fusion of the remaining 7 DS n (n6=m). If MAE 1 is smaller thanMAE 0 , the DS subset is updated. IfMAE 1 is not smaller thanMAE 0 , the DS subset won't be updated. The same procedure will proceed until MAE i+1 is larger than MAE i . The intra-inter fusion scheme is illustrated in Fig. 4.7. 4.6.5 Inter-intra Fusion (EAF) After the inter-system fusion, each decision d j has its best system subset SS j selected, where j = 1;:::; 12. Besides having inter-system information, we also would like to add information from the intra-system. The basic principle of inter-intra fusion is addressed in the following. 74 Figure 4.7: The intra-inter fusion (AEF) scheme. Given the system subsetsfSS j ;j = 1;:::; 12g obtained from inter-system fusion, the SFS algorithm is utilized for selecting SS j to cover the intra-system information. First, each SS j is selected for MAE 1 evaluation. The SS subset would be updated asfSS a g if SS a has the minimum MAE 1 . Then eachfSS a ;SS j g (for j = 1;:::; 12 & j6= a) is evaluated for MAE 2 . IffSS a ;SS b g has the minimum MAE 2 and MAE 2 is smaller than MAE 1 , the SS subset would be updated asfSS a ;SS b g. Otherwise, the SS subset won't be updated. The same procedure will continue until MAE i+1 is not smaller than MAE i . The inter-intra fusion scheme is illustrated in Figure 4.8. 75 Figure 4.8: The inter-intra fusion (EAF) scheme. 4.6.6 Maximum Diversity Fusion (MDF) Four fusion schemes mentioned in Section 4.6.2 to Section 4.6.5 are mainly structured in one direction and/or the other direction. Dierent from previous four fusion schemes, we propose another two fusion schemes considering two directions at the same time. The rst two-directional fusion scheme is called maximum diversity fusion (MDF). In MDF, rst, given a full set M = fs n d i jn = 3;:::; 10 and i = 1;:::; 12g containing 96 decisions, we want to nd a subset M k = fm 1 ;:::;m k j where m k is used to represent s n d i g which optimizes the performance based on the diversity. Initially, a decision with minimum MAE will be selected as the rst decision set 76 M 1 . Among the non-selected decisions, a decision m would be selected and added to the previous updated decision set M k , if it could show the maximum diversity (i.e., minimum correlation) with previous decision set M k . DIV (M k ;m) means the diver- sity between the decision setM k and decisionm. The MDF algorithm is described below. Algorithm - Maximum Diversity Fusion (MDF) 1. Start with the best decision set M 1 =fm 1 g. 2. Find the decision having the maximum diversity with M k . m = arg max m2MM k DIV (M k ;m) 3. Update M k+1 = M k +m ; k = k + 1: 4. If J(M k+1 )J(M k ), then go to 2). 5. If J(M k+1 )>J(M k ), then stop and report the nal decision set as M k . 4.6.7 Composite Fusion (CF) The other two-directional fusion scheme is composite fusion (CF). To carry out the decision selection in CF, we need to consider 96 decisions together and use SFS or SBS algorithm to nd the best subset of 96 decisions. It is expected that most of the decisions have low diversity with others, and it may be less ecient to do selection directly using SBS algorithm because of its start from the full set. So the SFS algorithm is chosen to perform decision selection in CF. For a given full set C =fs n d i jn = 3;:::; 10 and 77 i = 1;:::; 12g having 96 decisions, the goal is to nd a subset C k =fc 1 ;:::;c k j where c k represents s n d i g which optimizes the objective function J(C k ), which has the same denition as (4.15). The CF algorithm is described below. Algorithm - Composite Fusion (CF) 1. Start with the empty decision set C 0 =fg. 2. Find the next best decision. c = arg min c2CC k J(C k +c) 3. Update C k+1 = C k +C ; k = k + 1: 4. If J(C k+1 )J(C k ), then go to 2). 5. If J(C k+1 )>J(C k ), then stop and C k is the nal decision set. We will show the estimation results of these six fusion schemes in the experimental part allocated in Section 4.7. 4.7 Experiments 4.7.1 Database Two databases used to evaluate the performance of our proposed framework in this work are the FG-NET aging database [1] and MORPH database [70] (MORPH-II is used for our study). The FG-NET aging database is the most frequently used database for age 78 Table 4.4: Age Range Distribution on the FG-NET and MORPH-II Databases Age FG-NET MORPH-II No. of images Percentage No. of images Percentage 0-9 371 37.03 % 0 0.00 % 10-19 339 33.83 % 7,469 13.55 % 20-29 144 14.37 % 16,325 29.61 % 30-39 70 7.88 % 15,357 27.85 % 40-49 46 4.59 % 12,050 21.85 % 50-59 15 1.50 % 3,599 6.53 % 60-69 8 0.80 % 318 0.58 % 70-77 0 0.00 % 16 0.03 % Total 1,002 100.00 % 55,134 100.00 % estimation related works because it is publicly available and free. The FG-NET has 1,002 color or gray facial images composed of 82 Europeans with a wide age range from 0 to 69 years old. Each individual has 6-18 images labeled with the ground truth ages. The MORPH-II database contains 55,134 images from 13,618 individuals with ages ranging from 16 to 77. The MORPH-II is a multi-racial database, including African, European, Asian, Hispanic and others. Each individual has about 4 images labeled with the ground truth ages. Some facial images from FG-NET and MORPH-II databases are shown in Fig. 4.9 and Fig. 4.10, respectively. The age range distribution of face images is listed in Table 4.4. The face images are preprocessed and resized to 180150. Only gray level images are used to extract the BIF, HOG and LBPu global and local features. 4.7.2 Results of Age Grouping In the age grouping stage, following the previous work [52], 5-fold cross validation scheme is used to evaluate the classication accuracy of our algorithm on FG-NET and MORPH- II databases. To increase diversity between decisions for the fusion stage, multiple age grouping systems are investigated. Table 4.5 and Table 4.6 list the denitions of age groups for each system in the FG-NET and MORPH-II, respectively. Here we dene groups based on the group size in order to avoid the insuciency of faces in one group. 79 Figure 4.9: Some facial images from the FG-NET database. Note that the number of groups in the MORPH-II is less than that in the FG-NET because MORPH-II does not have faces with ages from 0 to 15 years old. Table 4.5 and Table 4.6 also show the classication accuracy for these age grouping systems. We can observe that the overall classication accuracy decreases as the number of groups increases. To demonstrate the importance of the feature selection, we also investigate our algo- rithm with and without using ANOVA for feature selection in age grouping systems. We used the FG-NET to show this and the results are shown in Table 4.7. From the 80 Figure 4.10: Some facial images from the MORPH-II database. Table 4.5: Age Range Denition for Age Groups of m-group Age Grouping Systems on FG-NET Database and Age Grouping Results (m = No. of groups) m Age range denition for age group Classication AG1 AG2 AG3 AG4 AG5 AG6 AG7 AG8 AG9 AG10 accuracy 3 0-3 4-19 20-69 - - - - - - - 93.7 % 4 0-5 6-12 13-21 22-69 - - - - - - 91.4 % 5 0-4 5-10 11-16 17-25 26-69 - - - - - 88.5 % 6 0-4 5-9 10-14 15-19 20-29 30-69 - - - - 86.7 % 7 0-4 5-9 10-14 15-19 20-25 26-35 36-69 - - - 83.6 % 8 0-4 5-9 10-14 15-19 20-24 25-29 30-39 40-69 - - 81.3 % 9 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-40 41-69 - 77.3 % 10 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-49 50-69 74.4 % 81 Table 4.6: Age Range Denition for Age Groups of m-group Age Grouping Systems on MORPH-II Database and Age Grouping Results (m = No. of groups) m Age range denition for age group Classication AG1 AG2 AG3 AG4 AG5 AG6 AG7 accuracy 2 16-39 40-77 - - - - - 95.4 % 3 16-29 30-49 50-77 - - - - 90.9 % 4 16-19 20-29 30-49 50-77 - - - 87.2 % 5 16-19 20-29 30-39 40-49 50-77 - - 78.5 % 6 16-19 20-29 30-35 36-41 42-49 50-77 - 75.0 % 7 16-19 20-29 30-34 35-39 40-44 45-49 50-77 71.4 % Table 4.7: Performance Comparison between 'Without Feature Selection' and 'With Fea- ture Selection' for Age Grouping on FG-NET Database, where FDR: Feature Dimension Reduction m Without feature selection With feature selection using ANOVA Classication Accuracy Classication Accuracy FDR Rate 3 71.1% 91.4% 80% 4 66.7% 88.7% 80% 5 58.7% 86.7% 81% 6 53.6% 83.6% 81% 7 50.3% 81.3% 82% 8 45.3% 77.3% 86% 9 41.4% 74.4% 90% 10 37.4% 72.4% 85% performance comparison, we can see that the classication accuracy is improved and feature dimension is signicantly reduced by using ANOVA for feature selection. 4.7.3 Results of Age Estimation within Age Groups For the age estimation stage, leave-one-person-out (LOPO) cross validation (i.e., in each fold, the images of one person are used as the test set and those of the others are used as the training set) is used for tests on the FG-NET. For the MORPH-II, the same experimental setting is used as in the previous stud- ies [36, 37]. The whole MORPH-II database W is divided into 3 subsets S1, S2, S3. The S1 (or S2) is used for training and the remaining WnS1 (or WnS2) is used for testing. Then the two testing results are averaged. 82 The performance of age estimation is measured by the mean absolute error (MAE) and the cumulative score (CS) [21, 27]. The mean absolute error (MAE) is dened as the average of absolute errors between the estimated ages and the ground truth ages: MAE = N X i=1 a 0 i a i =N; (4.16) where a i is the ground truth age for the test image i, a 0 i is the estimated age, and N is the total number of test images. The cumulative score (CS) is dened as: CS(L) = (n eL =N) 100%; (4.17) wheren eL is the number of test images on which the age estimation makes an absolute error e no larger than L years. Each feature listed in Table 4.1 is used to test our two-stage m-group age estimation system (as shown in Fig. 4.4). The MAE results on FG-NET and MORPH-II are shown in Table 4.8 and Table 4.9, respectively. Since estimating exact ages could be easier under narrower age ranges (e.g., a system with more groups), the lower MAEs would be expected on systems containing more groups. However, the classication accuracy decreases when the number of groups increases for age grouping systems in the rst stage. So there is a trade-o between age grouping accuracy and number of age groups. On FG-NET, for example, 3-group system (the highest age grouping accuracy) and 10-group system (the largest number of groups or the lowest age grouping accuracy) do not have the lowest MAEs. Instead, 7-group system has the lowest MAEs. 83 Table 4.8: Age Estimation Results (in MAE) of m-group Age Estimation Systems on FG-NET Database Features MAE (years) m = 1 m = 3 m = 4 m = 5 m = 6 m = 7 m = 8 m = 9 m = 10 AAM app 6.47 4.53 4.04 3.73 3.57 3.56 3.79 4.20 4.98 AAM shape 6.92 4.71 4.10 3.76 3.60 3.52 3.79 4.20 4.98 AAM tex 7.44 4.79 4.15 3.83 3.68 3.65 3.84 4.29 5.07 BIF EyePair 7.48 4.61 4.09 3.88 3.77 3.74 3.97 4.39 5.14 BIF Nose 9.25 5.45 4.39 4.13 3.91 3.80 4.09 4.49 5.29 BIF Mouth 8.29 4.99 4.20 3.87 3.83 3.82 4.05 4.44 5.23 HOG EyePair 8.18 5.30 4.30 4.14 3.78 3.75 3.97 4.39 5.11 HOG Nose 10.19 5.84 4.73 4.31 4.10 3.90 4.13 4.50 5.25 HOG Mouth 9.76 5.64 4.50 4.04 3.68 3.77 4.03 4.42 5.19 LBPu EyePair 8.96 5.31 4.32 3.96 3.75 3.66 3.91 4.34 5.11 LBPu Nose 9.48 5.35 4.31 3.93 3.78 3.70 3.96 4.37 5.15 LBPu Mouth 9.15 5.23 4.20 3.90 3.75 3.73 3.98 4.40 5.19 Table 4.9: Age Estimation Results (in MAE) of m-group Age Estimation Systems on MORPH-II Database Features MAE (years) m = 2 m = 3 m = 4 m = 5 m = 6 m = 7 BIF 4.10 4.41 4.73 4.29 4.06 4.23 HOG 4.19 5.20 4.76 4.34 4.06 4.23 LBPu 4.07 4.68 4.76 4.31 4.06 4.18 BIF EyePair 4.75 4.90 4.70 4.32 4.06 4.24 BIF Nose 5.11 5.00 4.76 4.33 4.05 4.26 BIF Mouth 4.97 4.79 4.76 4.31 4.08 4.25 HOG EyePair 5.28 5.25 4.74 4.33 4.05 4.25 HOG Nose 5.61 5.45 4.79 4.36 4.07 4.25 HOG Mouth 5.61 5.40 4.77 4.34 4.08 4.26 LBPu EyePair 4.91 4.99 4.77 4.32 4.05 4.24 LBPu Nose 5.17 5.02 4.78 4.37 4.05 4.24 LBPu Mouth 4.62 4.77 4.74 4.31 4.06 4.22 4.7.4 Results of Fusion of Decisions The experimental setting in the fusion stage is the same as in the previous stage and the same cross validation strategy is utilized. Intra Fusion (AF) - We rst show the MAE results of using the SFS algorithm to determine a decision subset in the 3-group system for FG-NET. In Table 4.10, the SFS algorithm keeps updating the subset until MAE starts to increase. After four times of 84 Table 4.10: Intra Fusion - Finding Decision Subset by SFS for the 3-group System on FG-NET Decision Subset k-th update MAE D 1 =fd 4 g 1 4.847 D 2 =fd 4 ;d 1 g 2 4.554 D 3 =fd 4 ;d 1 ;d 7 g 3 4.518 D 4 =fd 4 ;d 1 ;d 7 ;d 9 g 4 4.511 D 5 =fd 4 ;d 1 ;d 7 ;d 9 ;d 6 g 5 4.520 Table 4.11: Age Estimation Results by Intra Fusion on FG-NET m Decision Subset No. of updates k MAE 3 DS 3 =fd 4 ;d 1 ;d 7 ;d 9 g 4 4.51 4 DS 4 =fd 1 ;d 4 ;d 6 ;d 7 ;d 9 ;d 2 g 6 3.97 5 DS 5 =fd 2 ;d 1 ;d 9 ;d 3 g 4 3.72 6 DS 6 =fd 2 ;d 9 ;d 7 ;d 1 g 4 3.54 7 DS 7 =fd 2 ;d 1 ;d 9 ;d 7 g 4 3.57 8 DS 8 =fd 1 ;d 2 ;d 3 g 3 3.82 9 DS 9 =fd 1 ;d 1 ;d 7 ;d 3 g 4 4.23 10 DS 10 =fd 1 ;d 2 ;d 10 ;d 11 g 4 4.99 Table 4.12: Age Estimation Results by Intra Fusion on MORPH-II m Decision Subset No. of updates k MAE 2 DS 2 =fd 4 ;d 2 ;d 3 ;d 7 ;d 5 ;d 1 g 6 3.58 3 DS 3 =fd 1 g 1 4.56 4 DS 4 =fd 2 ;d 1 0;d 4 ;d 1 ;d 5 g 5 4.35 5 DS 5 =fd 3 g 1 4.77 6 DS 6 =fd 4 ;d 1 2;d 5 g 3 4.39 7 DS 7 =fd 1 1g 1 4.52 updates, the fusion subset is nalized asfd 4 ;d 1 ;d 7 ;d 9 g and it achieves the lowest MAE. Table 4.11 and Table 4.12 show the decision subsets and MAE results for each system on FG-NET and MORPH-II, respectively. Most systems only need to fuse a few decisions and can achieve the best performance. Inter Fusion (EF) - We then show the MAE results of using SBS algorithm to nd the system subset for the decisiond 1 . From Table 4.13, the SBS algorithm nds the system subset after one update and achieves the lowest MAE. Table 4.14 and Table 4.15 show the system subset and MAE results for each decision on FG-NET and MORPH-II, 85 Table 4.13: Inter Fusion - Finding System Subset for d 1 by SBS on FG-NET System Subset k-th update MAE S 0 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 0 2.844 S 1 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 g 1 2.833 S 2 =fs 3 ;s 4 ;s 5 ;s 7 ;s 8 ;s 9 g 2 2.841 Table 4.14: Age Estimation Results by Inter Fusion on FG-NET Decision System Subset No. of updates k MAE d 1 SS 1 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 g 1 2.83 d 2 SS 2 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 1 2.84 d 3 SS 3 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 g 1 2.89 d 4 SS 4 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 9 g 2 2.98 d 5 SS 5 =fs 3 ;s 4 ;s 5 ;s 7 ;s 8 ;s 9 g 2 3.08 d 6 SS 6 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 0 2.98 d 7 SS 7 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 g 3 3.02 d 8 SS 8 =fs 3 ;s 4 ;s 5 ;s 7 ;s 8 ;s 9 g 2 3.24 d 9 SS 9 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 g 2 3.08 d 10 SS 10 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 g 2 2.95 d 11 SS 11 =fs 3 ;s 4 ;s 5 ;s 7 ;s 8 ;s 9 g 2 2.96 d 12 SS 12 =fs 3 ;s 4 ;s 5 ;s 7 ;s 8 ;s 9 ;s 10 g 1 2.96 Table 4.15: Age Estimation Results by Inter Fusion on MORPH-II Decision System Subset No. of updates k MAE d 1 SS 1 =fs 2 ;s 3 ;s 4 ;s 5 ;s 6 ;s 7 g 0 3.21 d 2 SS 2 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.18 d 3 SS 3 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.22 d 4 SS 4 =fs 2 ;s 3 ;s 4 ;s 5 ;s 7 g 1 3.44 d 5 SS 5 =fs 2 ;s 4 ;s 6 ;s 7 g 2 3.40 d 6 SS 6 =fs 2 ;s 3 ;s 4 ;s 7 g 2 3.36 d 7 SS 7 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.37 d 8 SS 8 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.46 d 9 SS 9 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.41 d 10 SS 10 =fs 2 ;s 3 ;s 4 ;s 5 ;s 6 ;s 7 g 0 3.30 d 11 SS 11 =fs 2 ;s 3 ;s 4 ;s 7 g 2 3.37 d 12 SS 12 =fs 2 ;s 3 ;s 4 ;s 5 ;s 7 g 1 3.31 respectively. Most decisions need only 1 or 2 times of updates to nd the system subset and achieve the best results. It also means that using SBS algorithm can nd the system subset faster than using SFS. 86 Table 4.16: Age Estimation Results by Intra-inter Fusion (AEF) on FG-NET and MORPH-II Database DS Subset MAE FG-NET fDS 3 ;DS 4 ;DS 5 ;DS 7 ;DS 8 ;DS 9 ;DS 10 g 2.75 MORPH-II fDS 2 ;DS 4 ;DS 6 g 2.96 Table 4.17: Age Estimation Results by Inter-intra Fusion (EAF) on FG-NET and MORPH-II Database SS Subset MAE FG-NET fSS 1 ;SS 7 ;SS 9 g 2.77 MORPH-II fSS 2 ;SS 12 ;SS 3 g 3.08 Table 4.18: Age Estimation Results by Composite Fusion (CF) in FG-NET and MORPH- II Database Decision Subset MAE FG-NET fs 7 d 2 ;s 4 d 1 ;s 5 d 1 ;s 8 d 1 ;s 6 d 7 ;s 3 d 9 ;s 4 d 7 ;s 6 d 9 ;s 5 d 7 ;s 6 d 3 g 2.73 MORPH-II fs 2 d 4 ;s 4 d 4 ;s 2 d 1 ;s 4 d 12 ;s 3 d 10 ;s 3 d 11 ;s 4 d 1 ;s 2 d 3 ;s 3 d 3 g 2.91 Intra-Inter Fusion (AEF) - We apply SBS algorithm to nd the DS subset, where each system has its specic decision subset (as shown in Table 4.11 and Table 4.12). The fusion of DS subset and MAE result are shown in Table 4.16. Inter-Intra Fusion (EAF) - We use SFS algorithm to nd the SS subset, where each decision has its specic system subset (as shown in Table 4.14 and Table 4.15). The fusion of SS subset and MAE result are also shown in Table 4.17. Maximum Diversity Fusion (MDF) - The numbers of the needed decisions are 21 and 23 to achieve the best MAE performance by applying MDF to all the decisions for nal age estimation. The minimum MAEs are 2.89 and 3.23 years on FG-NET and MORPH-II, respectively. Composite Fusion (CF) - The experimental results of CF for the FG-NET and MORPH-II are shown in Table 4.18. We list the nal decision subset and its correspond- ing MAE. For the FG-NET and MORPH-II respectively, the numbers of nal selected decisions are 11 and 9, and the minimum MAEs are 2.73 and 2.91 years, which are the lowest MAEs among six fusion schemes. 87 Table 4.19: Total Number of Required Arithmetic Operations for Selecting N from 12 Decisions in Intra and Inter-intra Fusions Method N=3 N=4 N=5 N=6 Intra fusion 36 46 55 63 Inter-intra fusion 36 46 55 63 Exhaustive search 221 496 793 925 Table 4.20: Total Number of Required Arithmetic Operations for Selecting N from 8 Decisions in Inter and Intra-inter Fusions Method N = 4 N = 5 N = 6 Inter fusion 30 24 17 Intra-inter fusion 30 24 17 Exhaustive search 71 57 29 Table 4.21: Total Number of Required Arithmetic Operations for Selecting N from 96 Decisions in Maximum Diversity Fusion and Composite Fusion Method N = 9 N = 10 N = 21 Maximum diversity fusion 837 925 1827 Composite fusion 837 925 1827 Exhaustive search 1:3 10 12 1:1 10 13 7:8 10 20 4.7.5 Complexity Comparison The complexity analysis of fusion schemes can show how ecient the fusion could be. Here we measure the computational complexity based on the total number of required arithmetic operations for 1) MAE computation and 2) MAE value sorting in ascending order. Each fusion scheme is compared with its corresponding exhaustive search. We assume that the intra fusion and inter-intra fusion (inter fusion and intra-inter fusion) have the same complexity if the inter (intra) fusion result is known. Also, the maximum diversity fusion and composite fusion have the same complexity because their subsets are both selected from all possible decisions. Table 4.19, Table 4.20, and Table 4.21 list the computational complexities of each fusion scheme and the corresponding exhaustive search. From the comparison, it can be seen that the complexity could be greatly reduced if a proper fusion scheme is used. 88 Figure 4.11: Cumulative score (CS) curves of the error levels from 0 to 10 years of dierent age estimation algorithms on the FG-NET aging database. 4.7.6 Performance Comparison To prove the robustness and eectiveness of our proposed methods in age estimation, we compare the MAEs and cumulative scores (CS) with other methods. Table 4.22 and Table 4.23 show the MAEs of dierent age estimation methods on the FG-NET and MORPH-II, respectively. Fig. 4.11 shows the cumulative scores of dierent methods on the FG-NET and Fig. 4.12 shows the cumulative scores of our methods on the MORPH- II. To have a fair comparison, our experimental congurations are exactly the same as other methods. It can be observed that our proposed method outperforms other state- of-the-art methods in terms of MAE and CS. More specically, the MAEs of our method can be as low as 2.73 on FG-NET and 2.91 on MORPH-II. The CS at the 10-year error level of our method can be as high as 95.7% on FG-NET and 95.7% on MORPH-II. 89 Table 4.22: MAEs of Dierent Age Estimation Algorithms on the FG-NET Aging Database Method MAE WAS [26] 8.06 AGES [26] 6.77 KAGES [24] 6.18 QM [46] 6.55 MLPs [46] 6.98 RUN [88] 5.78 RankBoost [90] 5.67 GP [91] 5.39 BM [87] 5.33 RED-SVM [8] 5.24 LARR [33] 5.07 PFA [34] 4.97 RPK [89] 4.95 MHR [68] 4.87 MTWGP [91] 4.83 PLO [48] 4.82 BIF [39] 4.77 NDF [19] 4.67 FLP [61] 4.61 OHRank [9] 4.48 GEF (intra-fusion) [ours] 3.54 GEF (inter-fusion) [ours] 2.83 GEF (intra-inter-fusion) [ours] 2.75 GEF (inter-intra-fusion) [ours] 2.77 GEF (MDF) [ours] 2.89 GEF (CF) [ours] 2.73 4.8 Conclusion and Future Work In this work, we proposed a three-stage learning framework for age estimation. In the rst stage, we presented an age grouping appraoch to classify faces into dierent groups. Then, in the second stage, to lower feature dimensionality, some local features were extracted. These local features and some global features, totally 12 features, were sepa- rately used by SVR to predict facial ages in each classied group. Therefore, 12 decision results are delivered at the output of the second stage. Finally, in the third stage, based on the analysis of diversity, a fusion framework is designed to eectively utilize decision 90 Table 4.23: MAEs of Dierent Age Estimation Algorithms on the MORPH-II Aging Database Method MAE BIF [39] 5.09 OHRank [9] 6.07 KPLS [36] 4.04 KCCA [37] 3.98 RED-SVM [8] 6.49 Rank-FFS [11] 4.42 GEF (intra-fusion) [ours] 3.58 GEF (inter-fusion) [ours] 3.18 GEF (intra-inter-fusion) [ours] 2.96 GEF (inter-intra-fusion) [ours] 3.08 GEF (MDF) [ours] 3.23 GEF (CF) [ours] 2.91 Figure 4.12: Cumulative score (CS) curves of the error levels from 0 to 10 years of our age estimation algorithms on the MORPH-II database. 91 contexts from several age grouping systems to boost performance for age estimation. The proposed GEF framework was evaluated on the most frequently used FG-NET and MORPH aging databases and demonstated a better performance than the state-of-the- art age estimation methods. We are planning to improve our proposed method on the age estimation performance by further exploiting other fusion schemes in following perspectives: 1) to increase diver- sity among decision contexts by including other features or age grouping systems; 2) to investigate other decision selection algorithm; 3) to explore other machine learning meth- ods.. We also will verify our method on other databases or the potential of cross-database to demonstrate the robustness of our method. 92 Chapter 5 Facial Age Estimation via Multistage Learning and Deep Fusion: from Gender Grouping to Outlier Prediction and Error Compensation 5.1 Introduction Age estimation on human faces has been an active research topic for the previous decade. Recently, cameras are strewn around many places in the world, catching our faces every- where we go. Facial soft biometrics might still be the most common trait that people rely on for age-related research. Normally, human can predict identity, expression, gender, and race of a person more accurately than predict age of the person, and aging process on dierent people is also quite distinct. Therefore, a lot of automatic facial age estimation methods were developed to compensate human for the weak ability in age estimation. In the previous several years, human facial age estimation has attracted a lot of inter- ests in the computer vision society because of its importance in many applications, such as age-based image retrieval [46], internet access control, security control and surveil- lance [20, 77], biometrics [20, 65, 71], human-computer interaction (HCI) [27, 23], and 93 electronic customer relationship management (ECRM) [20]. So far most age estima- tion approaches use a single machine learning model trained based on some trait(s) or feature(s) to predict ages of the test data. The existing age estimation methods can be divided into three categories: multiclass classication [26, 33, 46, 83], regres- sion [21, 22, 89, 39, 36, 87, 88, 91, 92], or a composite of these two [33, 35, 34]. From another point of view, facial aging could also be treated as an ordinal process. Intuitively, the face of a 1-year-old baby should be closer related to the face of a 5-year-old child than the face of an 18-year-old teenager. Stimulated by ordinal property of face aging, some methods consider age estimation as a ranking problem [9, 56, 90]. As a matter of fact, aging processes are very dierent for human with dierent genders and within dierent age ranges. This gives us a motive to think what the better choice would be for age estimation. Multiple simple learning models (MSLM) or single complicated learning model (SCLM)? MSLM consist of multiple models, where each model would aim at one task. On the other hand, a SCLM contains only one model, which needs to consider several problem-involved factors and may not be able take care all of them well. In the experiment, we will show that multiple simple learning models (MSLM), which is our method, can be a better option for age estimation. From recent studies [78, 47, 76] on facial aging, they showed the aging process is dierent between male and female in some ways, including the presence of facial hair, increased facial vascularity, increased thickness, increased sebaceous content, hormonal in?uences, and potentially diering rates of fat and bone absorption during the life cycle. Women tend to develop more and deeper wrinkles in the perioral region than men; their skin contains a signicantly smaller number of appendages than men [64]. Therefore, gender shall be determined before age is estimated. To cover possible factors aecting aging process, we propose an MSLM-based method, called GGEF-OE, which contains ve stages, to deal with age estimation problem. In the rst stage, we train a binary classier to perform gender recognition, where any test face is classied as either male or female. In the second stage, two multi-class models 94 are trained to classify faces from male and female groups, respectively, into several age groups. The age grouping system could have dierent number of groups, varying from 2 to 10. More age grouping systems are created, more diversities would have generated. Each age group has dierent denition of age range. In the third stage, a trained model will be built for each age group to predict age of any test face in that age group. Three facial features are used in the third stage, and each feature is used individually. Hence, there will be three decisions (estimation results) for any test face. In the fourth stage, the various decisions will be fused based on the diversity analyzed on them. We will demonstrate that with appropriate fusion the performance could be signicantly improved. In the last stage, outliers, faces with higher estimation errors, are dened and used to train a model to predict potential outlier for any test face. Once outliers are detected, the error estimation will be conducted on them and then compensate those errors to further ameliorate the overall estimation results. Experimental results on the commonly used FG-NET [1] and MORPH [70] databases are provided to demonstrate the eectiveness of the proposed method. The rest of the chapter is organized as follows. Section 5.2 introduces related work on gender classication, age grouping and age estimation. Section 5.3 will discuss the gender grouping. Section 5.4 focus on our proposed age grouping method. Section 5.5 is about the age estimation within age groups. Section 5.6 covers the idea of the diversity analysis and fusion of decisions. Section 5.7 will introduce the concept of outlier prediction and error compensation. Then, the experimental results are given in Section 5.8. Finally, Section 5.9 will draw the conclusion and give some discussions of future work. 5.2 Related Work Some studies have proposed novel methods for gender classication, age grouping and age estimation problems. We contribute by brie y reviewing their work of these three topics. 95 5.2.1 Review of Gender Grouping Recognizing gender on human has been drawn much attention over last two decades. Golomb et al. [30] used a trained two-layer neural network, named SEXNET, to recognize male and female from facial images of size 3030. In their experiment, 90 facial images, including 45 males and 45 females, were tested and the reported accuracy was 91.9%. Brunelli and Poggio [67] presented HyperBF networks for gender classication, where 16 geometric features, such as pupil to nose vertical distance, eyebrow thickness, and mouth height, were used to train two competing RBF networks (one for male and the other one for female). Their method was experimented on a dataset consisting of 168 images (21 males and 21 females) and an accuracy of 79% was reported. Gutta et al. [40] designed a hybrid gender recognizer combining neural networks and decision trees to identify men and women. From their experiments on 3,000 facial images of size 6472 selected from FERET database [66], an accuracy of 96% was achieved. Moghaddam and Yang [58] proposed to use nonlinear support vector machines (SVMs) for appearance-based gender classication. They tested on low resolution "thumbnail" faces with size 2112 processed from 1,755 images (1,044 males and 711 females) from the FERET face database. An accuracy of 96.6% was announced and the SVM was shown to be superior to other classiers, such as linear, quadratic, Fisher linear discriminant (FLD), and nearest-neighbor classiers. Baluja and Rowley [3] used an AdaBoost based method for gender identication, where ve types of pixel comparison operators were used. The experiments were carried out on 2,409 face images (1,495 males and 914 females) from FERET database and the reported accuracy was 94.4%. A performance comparison between AdaBoost and SVM was conducted and they also showed that face image of size 2020 is better than 1221. Makinen and Raisamo [57] presented a systematic study on gender classication with automatically detected and aligned faces. One nding was that the gender classication rate can be increased if the automatic face alignment methods are further improved. 96 They also found gender classication methods performed almost equally well with dif- ferent image sizes. A neural network and AdaBoost achieved almost as good classi- cation rates as the SVM. Recently, Guo et al. [32] showed gender recognition accuracy is aected signicantly by the age of the person. The experiments were done on the YGA database [32] of 8,000 images with ages from 0 to 93 years. The results showed the gender classication accuracy on adult faces can be 10% higher than that on young or senior faces. 5.2.2 Review of Age Grouping Age grouping (i.e., age group classication) was rst conducted by Kwon and Lobo in [45]. They categorized facial images into three age groups: babies, young adults, and senior adults. They computed six ratios of distances between primary components (e.g., eyes, noses, mouth, etc.) and separated babies from the other two groups. Then, wrinkles on specic areas of a face were located using snakes, and wrinkle indices were used to distinguish senior adults from young adults and babies. There were only 47 images in the experimental dataset, and the correct classication rate for the baby group was below 68%. Horng et al. [43] proposed a system that classies faces with three steps: primary components detection, feature extraction, and age classication. They classied 230 facial images into four age groups: babies, young, middle-aged and senior adults. They rst used the Sobel edge operator [18] and region labeling to locate the positions of eyes, noses, and mouths. Then, two geometric features and three wrinkle features were extracted. Finally, two back-propagation neural networks were constructed for classica- tion. The correct classication rate was 81.58%. The facial age groups were subjectively assigned (i.e., not actual ages) in their experiments. Thukral et al. [81] extracted geometric features from faces and fused the results from ve classiers: -SVC [72], partial least squares (PLS) [4], Fisher linear discriminant 97 (FLD), Na ve Bayes, and k-nearest neighbor (KNN) [18], by adopting the majority deci- sion rule. The nal rate was 70.04% for three age groups (namely, 0-15, 15-30, and 30+). Gunay and Nabiyev [31] proposed an automatic age classication system based on local binary patterns (LBP) [63] for face description. Faces were divided into small regions from which the LBP histograms were extracted and concatenated into a feature vector. For every new face presented to the system, spatial LBP histograms were pro- duced and used to classify the image into one of six age groups: 105, 205, 305, 405, 505, 605. The minimum distance, the nearest neighbor and the k-nearest neighbor classiers were used. Their system gave a classication rate of 80%. Hajizadeh et al. [41] used histograms of oriented gradients (HOG) [15] as the facial feature. HOG features were computed in several regions and these regional features were concatenated to construct a feature vector for each face. A probabilistic neural network (PNN) classier was used to classify facial images into one of four age groups. The classication rate was 87.25%. Liu et al. [52] proposed a structured fusion method for age group classication by building a region of certainty (ROC) to connect the uncertainty-driven shape features with selected surface features. In the rst stage, two shape features are designed to determine the certainty of a face and classify it. In the second stage, the gradient orientation pyramid (GOP) [49] features are selected by a statistical method and then combined with an SVM classier to perform age grouping. Their method was tested in classifying faces into three age groups, and the classication accuracy of 95.1% was reported. 5.2.3 Review of Age Estimation Lanitis et al. [46] used the active appearance models (AAMs) by combining shape and appearance facial features. Age estimation is treated as a classication problem and 98 solved by the shortest distance classier and neural networks. They dierentiated age- specic and appearance-specic estimation problems. Personalized age estimation is introduced to cluster similar faces before classication. Geng et al. [26, 27] proposed an automatic age estimation method named AGES (AGing pattErn Subspace), which models the long-term aging process of a person (i.e., a sequence of a person's face images), and estimates the person's age by minimizing the reconstruction error. However, the facial features of the same person could be similar in dierent ages. Guo et al. [38] used biologically in-spired features (BIF) with manifold learning for face representation. They treated each age as a class label and adopted SVM for age estimation. Guo et al. [39] extracted BIF for each face, applied the principal component analysis (PCA) [85] for feature dimensionality reduction. They used classication and regression approaches to age estimation. Yan et al. [89] proposed a patch-based regression method for age estimation and the regression error is minimized by a three-complementary-stage procedure. First, each image is encoded as an ensemble of orderless coordinate patches of GMM (Gaussian Mixture Model) distribution. Then, the patch-kernel is designed for characterizing the Kullback-Leibler divergence between the derived models for any two images, and its dis- criminating power is further enhanced by a weak learning process, called inter-modality similarity synchronization. Finally, kernel regression is employed for ultimate human age estimation. Zhang et al. [91] proposed a multi-task warped Gaussian process (MTWGP) model for age estimation. Age estimation is formulated as a multi-task regression problem in which each learning task refers to the estimation of the age function for each person. Besides modelling common features shared by dierent tasks (persons), MTWGP also allows task-specic (person-specic) features to be learned automatically. Chang et al. [9] proposed an ordinal hyperplane ranking algorithm (OHRank) using the relative order information among age labels in a database. Each ordinal hyperplane 99 separates all facial images into two groups by the relative order, and a cost-sensitive property is used to nd a better hyperplane by minimizing the classication cost. Human age is then inferred by aggregating a set of preferences from multiple ordinal hyperplanes. Guo and Mu [36] used the kernel partial least squares (KPLS) regression for age estimation with three advantages: 1) the KPLS can reduce feature dimensionality and learn the aging function simultaneously in a single learning framework; 2) the KPLS can nd a small number of latent variables (e.g., 20) to project thousands of features into a low-dimensional subspace, which is attractive in real-time applications; and 3) the KPLS has an output vector consisting of multiple labels to solve several related problems (e.g., age estimation, gender classication, and ethnicity estimation) together. Li et al. [48] considered temporally ordinal and continuous characteristics of the aging process and proposed to learn ordinal discriminative facial features. Their method aimed at preserving the local manifold structure of facial images while keeping the ordinal infor- mation among aging faces. The two factors were formulated into a unied optimization problem, and a solution was presented. Existing approaches handle age estimation as either age grouping or exact age estima- tion. In this work, we integrate gender grouping, age grouping and exact age estimation into a multistage leaarning system to enhance age estimation performance. 5.3 Gender Grouping (1st Stage) The gender grouping method used in our multistage learning system is described in the following subsections. One of the benets about gender grouping is reducing estimation complexity because age can be predicted within male and female groups separately. As discussed in Section 5.1, male and female tend to possess dierent aging process. First stage, gender grouping, is to classify any face into male or female group. Here we propose to adopt GOP features to represent facial information, and then apply the Fisher Score (FS) to select discriminant features. Once the feature selection is done, an 100 Figure 5.1: The gender grouping system (1st stage). SVM classier is used to train a model to recognize male and female. The procedure of gender grouping is depicted in Fig. 5.1. 5.3.1 Feature Extraction The rst step is to extract features from faces. Since GOP feature can provide the image gradient information and the pyramid information, it is adopted to represent facial features. To obtain a GOP feature, rst, a pyramid of an image is built, and then the gradients in each layer of the pyramid is computed. Finally, these gradients are combined together as a GOP feature. 5.3.2 Feature Selection by Fisher Score The second step is feature selection, which is realized by applying the Fisher Score (FS). The key idea of Fisher score is to nd a subset of features, such that in the data space spanned by the selected features, the distances between data points in dierent classes are as large as possible, while the distances between data points in the same class are as small as possible. We brie y describe Fisher score for feature selection below. Given a data setf(x i ;y i )g n i=1 , where x i 2 R 1d and y i 2f1; 2;:::;cg, we intend to obtain a feature subset of size m which contains the most informative features. We use X = [x 1 ; x 2 ;:::; x n ]2R nd to represent the feature matrix and x j denotes the j-th 101 column (feature) of X. Specically, let j k and j k be the mean and standard deviation of k-th class, corresponding to the j-th feature. Let j and j denote the mean and standard deviation of the whole data set corresponding to the j-th feature. Then the Fisher score of the j-th feature can be computed by (5.1), F x j = P c k=1 n k j k j 2 ( j ) 2 (5.1) where j 2 = P c k=1 n k j k 2 . After computing the Fisher score for each feature, we select top-m ranked features with large scores. So the original feature matrix X2R nd reduces to Z2R nm . 5.3.3 Classication Support vector machine (SVM) is a widely used machine learning method for classica- tion, regression, and other learning tasks. Here an SVM classier with linear kernel is chosen to train a model and then the model is used to recognize gender of testing faces. 5.4 Age Grouping (2nd Stage) The objective of age grouping (i.e., age group classication) is to classify face images into dierent groups based on their ages. The entire age range could be divided into several non-overlapping ranges and each range constitutes an age group. From [52], it can be seen that when the number of groups is small (e.g., 2 or 3), both shape(geometric) and surface(texture) features may be utilized for age grouping. However, if the number of groups is larger (e.g., 4 or larger than 4), the shape features may not oer help for improving the classication accuracy. Hence, we adopt the surface feature based age grouping method from a previous work [52] to tackle the age grouping problem. 102 Figure 5.2: The age grouping system (2nd stage). 5.4.1 Feature Extraction The facial feature used in age grouping is also the GOP feature on account of its advan- tage. By following the same procedure described in "Gender Grouping" section, the GOP feature can be extracted. 5.4.2 Feature Selection by ANOVA The purpose of feature selection is to pick out features with higher discrimination and discard features with lower discrimination. To achieve this purpose, we propose to use a statistics based method to select features for classication steps. Based on the idea of hypothesis testing, the analysis of variance (ANOVA) method is adopted to measure which feature has a higher discriminating power among age groups. 5.4.3 Classication In the age grouping stage, we use multi-class SVM with linear kernel to train two classi- ers (one for male group and the other one for female group). These two classiers are used to classify male faces and female faces into dierent age groups, respectively. The fundamental concept of age grouping is summarized in Fig. 5.2. 103 5.5 Age Estimation with Age Groups (3rd Stage) After age grouping stage, each face has been classied into an age group, which has a dened range. In this stage, an exact age for each classied face will be estimated to be a value within the dened age range of its age group. 5.5.1 Feature Extraction In this stage, we adopted three dierent methods for feature extraction. Recently, the biologically inspired features (BIF) [13], histograms of oriented gradients (HOG) [36], and local binary pattern (LBP) [34] methods are widely used to extract facial aging information. Since they are quite eective features, we adopt the BIF, HOG, and LBP methods for aging feature extraction. For LBP feature, we use a modied version, called uniform LBP (LBPu), because it has been proven to be more eective than LBP for dealing with face related problem. 5.5.2 Estimation within Groups To predict an exact age for a face, this could be treated as a classication problem or a regression problem. Since there are many classes (i.e., an age is a class), we chose to use the regression method to estimate ages. In each classied age group, we choose one type of feature (e.g., BIF) and use a support vector regression (SVR) with linear kernel to learn a model from training faces and predict ages for testing faces. The nonlinear kernel, radial basis function (RBF), is also tested in the experiment. However, its results are almost the same as the results of the linear kernel. To lower the complexity, the linear kernel is chosen in this stage. Since there are three dierent types of features and dierent number of age groups, there will be several age estimators needed. Given one feature type selected, for example, if we use a 5-group age grouping system, then it would need 5 estimators (1 estimator for each group) to predict ages of any classied faces from previous stage. After utilizing 104 Figure 5.3: The age estimation within age group system (3rd stage). all 3 dierent features, 3 dierent decisions (i.e., estimation results) could be acquired for each m-group age grouping system. The implementation procedure of age estimation within age groups is brie y summarized in Fig. 5.3. To evaluate the performance of age estimation, we can select one decision (i.e., esti- mation result) from an m-group age estimation system with one type of features, and compare the decision with the ground truth. 5.6 Fusion of Decisions (4th Stage) In order to further improve the estimation performance obtained from the age estima- tion stage, we investigate several fusion schemes based on the decisions (i.e., estimation results) from the previous stage. Suppose we have m-group (m = 3, 4, ..., 10) age estimation systems built from pre- vious stages, and each system has 12 dierent decisions because of 12 dierent features. 105 Totally, 96 decisionsd s;f (s = 3, 4, ..., 10 and f = 1, ..., 12) can be used for fusion, where s: index for system and f: index for feature. During the fusion stage, the decisions are treated as input features and will be selected and used in training/testing. Since no grouping is involved in the fusion stage, there is only one model needed and SVR is used to train the model. Here we propose two decision selection algorithms for fusion schemes. First, we will describe how to choose the algorithms eciently for fusion by analyzing diversity between decisions. 5.6.1 Diversity Analysis The purpose of analyzing the diversity between dierent decisions is to nd a more ecient way to fuse them and gain improvements after the fusion. Since all 96 decisions could show dierent errors on the same faces, a strategic fusion of these decisions could reduce the estimation error. Therefore, we need to fuse a set of decisions, which are adequately dierent from others. To measure the diversity between pair-wise decisions, some measures can be used for quantitative assessment of diversity. Here we propose to use Pearson's linear correlation coecient p to measure the diversity between pairwise decisions d s;f (s = 3, 4, ..., 10 and f = 1, ..., 12). Diversity is measured as the correlation between two decisions and 0p 1. Maximum diversity is observed when p = 0, indicating the two decisions are uncorrelated. Diversity could be measured in following two main directions. One direction is intra- system diversity, which computes the correlation between any two decisions from the same system (where the decisions are all obtained by using dierent types of features). The other one is inter-system diversity, which nds the correlation between any two decisions from dierent systems (where the decisions are all obtained by using the same type of feature). By considering these two diversities, it can allow us to anticipate which direction would provide higher performance improvement. Fig. 5.4 shows the two 106 Figure 5.4: The fusion of decisions: intra fusion (AF) and inter fusion (EF) (4th stage). directions for diversity measurement. In the experiment section, the diversity analysis will show that the intra-system diversity is much lower than inter-system diversity. 5.6.2 Intra-system Fusion (AF) For the intra-system fusion, as indicated in Fig. 5.4, each m-group age estimation system has 12 decisions (i.e., d s;1 , d s;2 , ..., d s;12 ) for fusion. Since there are 12 decisions per system, there are 2 12 1 possible ways of selection for fusion. Therefore, a systematic algorithm is needed to eectively nd a decision subset from 12 decisions. Here we propose to apply the sequential forward selection (SFS) algorithm to achieve this goal. Given a decision set D = fd j jj = 1;:::; 12g, we want to nd a subset D N = fd i1 ;d i2 ;:::;d iN g, withN 12, to optimize an objective functionJ(D N ), which can be dened as the following form: J(D N ) =MAE(f(D N );A GT ) (5.2) 107 where MAE represents the mean absolute error between estimated age f(D N ) and ground truth age A GT . The SFS algorithm starts from a decision set D k (being empty at the start), we sequentially add one decision d , where d 2fd s;f j f = 1, ...,12g for a given s (= 3,4, ...,10), that results in the lowest objective function J(D k +d ) between ground truth age A GT and the estimated age f(D k +d ) to the set when combined with the decision set D k that have already been selected. The SFS algorithm can be stated below for clarity: Algorithm - Sequential Forward Selection (SFS) 1. Start with the empty set D 0 =fg. 2. Select the next best decision. d = arg min d2DD k J(D k +d) 3. Update D k+1 = D k +d ; k = k + 1: 4. If J(D k+1 )J(D k ), then go to 2. 5. If J(D k+1 )>J(D k ), then stop and D k is the desired subset. 5.6.3 Inter-system Fusion (EF) For the inter-system fusion, as shown in Fig. 5.4, each feature type has 8 decisions (i.e., d 3;f , d 4;f , d 5;f , d 6;f , d 7;f , d 8;f , d 9;f , and d 10;f ) for fusion. There will be 2 8 1 = 255 108 possible ways of combination for the inter fusion (EF). As mentioned in diversity analysis, inter-system tends to show higher diversity and would require more decisions selected to achieve better performance. Therefore, we propose to use Sequential Backward Selection (SBS) algorithm to quickly nd a subset from 8 decisions to achieve the minimum mean absolute error (MAE). Given a decision set D = fd j jj = 1;:::; 8g, we want to nd a subset D N = fd i1 ;d i2 ;:::;d iN g, with N 8, to optimize an objective function J(D N ), which can be dened as the following form: J(D N ) =MAE(f(D N );A GT ) (5.3) where MAE represents the mean absolute error between estimated age f(D N ) and ground truth age A GT . The SBS algorithm starts with a full decision set = fd 3;f ;d 4;f ;d 5;f ;d 6;f ;d 7;f ;d 8;f ;d 9;f ;d 10;f g, and it sequentially removes a decision s , where s 2fd 3;f ;d 4;f ;d 5;f ;d 6;f ;d 7;f ;d 8;f ;d 9;f ;d 10;f g, that least reduces the value of the objective function J(S k s ). The SBS algorithm is stated below. Algorithm - Sequential Backward Selection (SBS) 1. Start with the full system set S 0 =f g. 2. Remove the worst system. s = arg min s2S k J(S k s) 3. Update S k+1 = S k s ; k = k + 1: 109 Figure 5.5: The fusion of decisions: intra-inter fusion (AEF) and inter-intra fusion (EAF) (4th stage). 4. If J(S k+1 )J(S k ), then go to 2. 5. If J(S k+1 )>J(S k ), then stop and S k is the desired subset. 5.6.4 Intra-inter Fusion (AEF) After performing the intra-system fusion, each system s m has obtained its best decision subsetDS m , wherem = 3;:::; 10. In addition to considering intra-system information, we could also include the inter-system information by fusing the results of the intra-system fusion (AF). Fig. 5.5 shows the concept of the intra-inter fusion (AEF). 5.6.5 Inter-intra Fusion (EAF) After obtaining the results of the inter-system fusion, each featuref j has its best system subset SS j selected, where j = 1;:::; 12. Besides having inter-system information, we 110 could add information from the intra-system by fusing the results of the inter-system fusion (FE). The idea of inter-intra fusion (EAF) is also depicted in Fig. 5.5. 5.6.6 Composite Fusion (CF) Composite fusion (CF) will be the two-directional fusion unlike intra fusion and inter fusion, where fusion is taken place in one direction. To carry out the CF, there are 96 decisions d s;f (s = 3, 4, ..., 10 and f = 1, ..., 12) needed to be considered. Obviously, it is not realistic to try all 2 96 1 possible ways of selection to locate the best composite fusion option. Among the 96 decisions, it is expected that most of the decisions have low diversity with others, and it would be less ecient to use SBS algorithm because of its starting from the full set. Hence, we propose to apply the Sequential Forward Selection (SFS) algorithm to nd a subset to have the minimum MAE. The SFS algorithm starts from a decision set D k (being empty at the start), we sequentially add one decision d , where d 2fd s;f j s = 3,4, ...,10 and f = 1, ...,12g, that results in the lowest objective function J(D k +d ) between ground truth age A GT and the estimated age f(D k +d ) to the set when combined with the decision set D k that have already been selected. The SFS algorithm can be stated below for clarity: Algorithm - Sequential Forward Selection (SFS) 1. Start with the empty set D 0 =fg. 2. Select the next best decision. d = arg min d2DD k J(D k +d) 3. Update D k+1 = D k +d ; k = k + 1: 111 4. If J(D k+1 )J(D k ), then go to 2. 5. If J(D k+1 )>J(D k ), then stop and D k is the desired subset. 5.7 Outlier Prediction and Error Compensation (5th Stage) From the estimation results of the fusion stage, we can perform the error analysis on the training dataset by comparing their estimated ages with ground truth ages. Based on the error analysis, we could try to train a model to predict which testing faces would contribute big errors to our nal performance evaluation. 5.7.1 Denition of Outlier First, we need to give a denition of an outlier face. An outlier face is a face with estimation error e larger than certain error level L years, which is dened in (5.4), face = 8 > > < > > : outlier if e>L inlier if eL (5.4) . Therefore, the outlier prediction problem becomes a two-group classication problem. One group is outlier and the other is inlier. Usually, the estimation error within 10 years is acceptable in age estimation problem. In this case, we can set L = 10. However, the error level L can be set as other values, such as 5, 8, or 12. 5.7.2 Feature Representation and Selection Since GOP feature is eective in dealing with classication problem, we continue to use the GOP feature to represent facial information. Regarding feature selection, ANOVA 112 Figure 5.6: Outlier prediction and error compensation (5th stage). method is adopted to evaluate which feature can show a higher discriminating ability among outlier and inlier groups. 5.7.3 Outlier Prediction The fundamental procedure of outlier prediction is shown in Fig. 5.6. Here an SVM classier with linear kernel is also chosen to train an outlier prediction model and then the model is used to identify outliers and inliers on testing faces. 5.7.4 Error Compensation on Outlier After the outlier prediction, in order to improve the estimation performance, we need to reduce the estimation error on those predicted outliers. Therefore, we used the following way to mitigate the estimation error on the outliers. The basic concept of the error compensation is also displayed in Fig. 5.6. First, the estimation error e of the outlier is dened in the following. e =ab a; (5.5) 113 Figure 5.7: The multistage learning framework for age estimation. where a: ground truth age of the outlier, b a: estimated age of the outlier. Then, our focus is to predict e in order to get the estimated errorb e. Same as in the outlier prediction step, the GOP feature representation and ANOVA feature selection are also adopted for the error estimation on outliers. To estimate the error e, the SVR is chosen to train an error estimation model. This model is then used to estimate error e for outliers predicted from testing faces. After the error estimation, the estimated errorb e will be added back to the estimated age of outlierb a following the rule in (5.6). Final estimated age = 8 > > < > > : ^ a + ^ e if face2 outlier ^ a else (5.6) . If the error could be estimated more accurately, it would reduce the MAE after the error compensation. Finally, Fig. 5.7 is used to summarize out multistage learning framework on age estimation. 114 Table 5.1: Age Range Distribution on the FG-NET and MORPH-II Databases Age FG-NET MORPH-II No. of images Percentage No. of images Percentage 0-9 371 37.03 % 0 0.00 % 10-19 339 33.83 % 7,469 13.55 % 20-29 144 14.37 % 16,325 29.61 % 30-39 70 7.88 % 15,357 27.85 % 40-49 46 4.59 % 12,050 21.85 % 50-59 15 1.50 % 3,599 6.53 % 60-69 8 0.80 % 318 0.58 % 70-77 0 0.00 % 16 0.03 % Total 1,002 100.00 % 55,134 100.00 % 5.8 Experimental Results 5.8.1 Database Two databases used to evaluate the performance of our proposed framework in this work are the FG-NET aging database [1] and MORPH database [70] (MORPH-II is used for our study). The FG-NET aging database is the most frequently used database for age estimation related works because it is publicly available and free. The FG-NET has 1,002 color or gray facial images composed of 82 Europeans with a wide age range from 0 to 69 years old. Each individual has 6-18 images labeled with the ground truth ages. The MORPH-II database contains 55,134 images from 13,618 individuals with ages ranging from 16 to 77. The MORPH-II is a multi-racial database, including African, European, Asian, Hispanic and others. Each individual has about 4 images labeled with the ground truth ages. Some sample face images from FG-NET and MORPH-II databases are shown in Fig. 5.8 and Fig. 5.9, respectively. The age range distribution of face images is listed in Table 5.1. The face images are preprocessed and resized to 180 150. Only gray level images are used to extract the BIF, HOG and LBPu features. 115 Figure 5.8: Some sample faces from FG-NET database. 5.8.2 Results of Gender Grouping The number of faces by gender for FG-NET and MORPH-II databases is shown in Table 5.2. In the experimental setup, a 5-fold cross validation scheme is used to evaluate our gender grouping algorithm on FG-NET and MORPH-II databases. Table 5.3 shows the accuracy of our gender grouping algorithm on FG-NET and MORPH-II databases. The average accuracies are 96.7% on FG-NET and 98.4% on MORPH-II. 116 Figure 5.9: Some sample faces from MORPH-II database. 5.8.3 Results of Age Grouping In the age grouping stage, 5-fold cross validation scheme is also used to evaluate the classication accuracy of our algorithm on FG-NET and MORPH-II databases. To increase diversity between decisions for the fusion stage, multiple age grouping systems were investigated. Table 5.4 and Table 5.5 list the denitions of age groups for each system in the FG-NET and MORPH-II, respectively. Note that the number of groups in the MORPH-II is less than that in the FG-NET because MORPH-II does not have faces with ages from 0 to 15 years old. 117 Table 5.2: The Number of Faces by Gender on the FG-NET and MORPH-II Databases Gender FG-NET MORPH-II Male 571 46,645 Female 431 8,489 Total 1,002 55,134 Table 5.3: Gender Grouping Accuracies on the FG-NET and MORPH-II Databases Gender FG-NET MORPH-II Male 97.9 % 99.8 % Female 95.1 % 85.6 % Average 96.7 % 98.4 % Table 5.4 and Table 5.5 also show the classication accuracy on male and female for these age grouping systems. We can observe that the average classication accuracy decreases as the number of groups increases. 5.8.4 Results of Age Estimation within Age Groups For the age estimation stage, leave-one-person-out (LOPO) cross validation (i.e., in each fold, the images of one person are used as the test set and those of the others are used as the training set) is used for tests on the FG-NET. For the MORPH-II, the same experimental setting is used as in the previous studies [36, 37]. The whole MORPH-II database W is divided into 3 subsets S1, S2, S3. The S1 (or S2) is used for training and the remaining WnS1 (or WnS2) is used for testing. Then the two testing results are averaged. The performance of age estimation is measured by the mean absolute error (MAE) and the cumulative score (CS) [21, 27]. The mean absolute error (MAE) is dened as the average of absolute errors between the estimated ages and the ground truth ages: MAE = N X i=1 a 0 i a i =N; (5.7) 118 Table 5.4: Denition of Age Groups and Age Grouping Results on FG-NET Database (m = No. of Groups) m Age range denition for age group Male Female Average AG1 AG2 AG3 AG4 AG5 AG6 AG7 AG8 AG9 AG10 accuracy accuracy accuracy 3 0-3 4-19 20-69 - - - - - - - 96.0 % 97.9 % 96.8 % 4 0-5 6-12 13-21 22-69 - - - - - - 94.9 % 97.2 % 95.9 % 5 0-4 5-10 11-16 17-25 26-69 - - - - - 94.0 % 94.9 % 94.4 % 6 0-4 5-9 10-14 15-19 20-29 30-69 - - - - 94.6 % 94.9 % 94.7 % 7 0-4 5-9 10-14 15-19 20-25 26-35 36-69 - - - 91.8 % 94.2 % 92.8 % 8 0-4 5-9 10-14 15-19 20-24 25-29 30-39 40-69 - - 87.2 % 92.1 % 89.3 % 9 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-40 41-69 - 83.7 % 90.0 % 86.4 % 10 0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-49 50-69 81.8 % 89.3 % 85.0 % Table 5.5: Denition of Age Groups and Age Grouping Results on MORPH-II Database (m = No. of Groups) m Age range denition for age group Male Female Average AG1 AG2 AG3 AG4 AG5 AG6 AG7 accuracy accuracy accuracy 2 16-39 40-77 - - - - - 94.3 % 100 % 94.8 % 3 16-29 30-49 50-77 - - - - 90.3 % 98.3 % 91.0 % 4 16-19 20-29 30-49 50-77 - - - 87.2 % 97.7 % 88.2 % 5 16-19 20-29 30-39 40-49 50-77 - - 78.5 % 96.0 % 80.1 % 6 16-19 20-29 30-35 36-41 42-49 50-77 - 73.8 % 92.0 % 75.4 % 7 16-19 20-29 30-34 35-39 40-44 45-49 50-77 70.9 % 88.6 % 72.5 % where a i is the ground truth age for the test image i, a 0 i is the estimated age, and N is the total number of test images. The cumulative score (CS) is dened as: CS(L) = (n eL =N) 100%; (5.8) wheren eL is the number of test images on which the age estimation makes an absolute error e no larger than L years. The MAE (in years) results on FG-NET and MORPH-II are shown in Table 5.6 and Table 5.7, respectively. Since estimating exact ages could be easier under narrower age ranges (e.g., a system with more groups), the lower MAEs would be expected on systems containing more groups. However, the classication accuracy decreases when the number of groups increases in age grouping systems. So there is a trade-o between age grouping accuracy and number of age groups. On FG-NET, for example, 3-group 119 Table 5.6: MAE (in Years) Results of Age Estimation withinm Age Groups on FG-NET Database Features m groups m = 3 m = 4 m = 5 m = 6 m = 7 m = 8 m = 9 m = 10 f1: AAM app 4.26 3.55 3.11 2.56 2.57 2.65 3.24 5.68 f2: AAM shape 4.42 3.61 3.14 2.59 2.51 2.65 3.23 5.64 f3: AAM tex 4.53 3.66 3.22 2.65 2.58 2.70 3.32 5.76 f4: BIF EyePair 4.29 3.61 3.32 2.80 2.65 2.81 3.40 5.78 f5: BIF Nose 5.11 3.92 3.50 2.88 2.63 2.93 3.49 5.96 f6: BIF Mouth 4.71 3.74 3.29 2.82 2.68 2.94 3.52 5.90 f7: HOG EyePair 5.05 3.88 3.56 2.86 2.67 2.87 3.45 5.79 f8: HOG Nose 5.61 4.17 3.64 3.12 2.66 2.99 3.55 5.87 f9: HOG Mouth 5.44 4.09 3.50 2.69 2.64 2.86 3.42 5.83 f10: LBPu EyePair 5.00 3.83 3.32 2.71 2.55 2.76 3.35 5.82 f11: LBPu Nose 5.04 3.82 3.28 2.72 2.57 2.79 3.38 5.83 f12: LBPu Mouth 4.96 3.72 3.25 2.72 2.60 2.81 3.41 5.84 Table 5.7: MAE (in Years) Results of Age Estimation withinm Age Groups on MORPH- II Database Features m groups m = 2 m = 3 m = 4 m = 5 m = 6 m = 7 BIF 4.10 4.41 4.73 4.29 4.06 4.23 HOG 4.19 5.20 4.76 4.34 4.06 4.23 LBPu 4.07 4.68 4.76 4.31 4.06 4.18 BIF EyePair 4.75 4.90 4.70 4.32 4.06 4.24 BIF Nose 5.11 5.00 4.76 4.33 4.05 4.26 BIF Mouth 4.97 4.79 4.76 4.31 4.08 4.25 HOG EyePair 5.28 5.25 4.74 4.33 4.05 4.25 HOG Nose 5.61 5.45 4.79 4.36 4.07 4.25 HOG Mouth 5.61 5.40 4.77 4.34 4.08 4.26 LBPu EyePair 4.91 4.99 4.77 4.32 4.05 4.24 LBPu Nose 5.17 5.02 4.78 4.37 4.05 4.24 LBPu Mouth 4.62 4.77 4.74 4.31 4.06 4.22 system (the highest age grouping accuracy, Table 5.4) or 10-group system (the largest number of groups) does not have the lowest MAE. Instead, 7-group system has the lowest MAE, which is 2.51 years. 120 Table 5.8: Correlation (p) between Any Two Decisions d 1 d 12 in the 3-group System (s 3 ) on FG-NET; Average Correlation = 0.9333 p d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 d 9 d 10 d 11 d 12 d 1 1.000 0.991 0.991 0.962 0.955 0.956 0.906 0.906 0.897 0.976 0.972 0.969 d 2 0.991 1.000 0.980 0.961 0.961 0.959 0.901 0.911 0.899 0.982 0.979 0.976 d 3 0.991 0.980 1.000 0.964 0.958 0.957 0.907 0.910 0.898 0.979 0.976 0.972 d 4 0.962 0.961 0.964 1.000 0.938 0.939 0.910 0.899 0.883 0.958 0.953 0.950 d 5 0.955 0.961 0.958 0.938 1.000 0.942 0.880 0.905 0.889 0.970 0.974 0.968 d 6 0.956 0.959 0.957 0.939 0.942 1.000 0.887 0.891 0.893 0.958 0.960 0.960 d 7 0.906 0.901 0.907 0.910 0.880 0.887 1.000 0.832 0.831 0.899 0.894 0.893 d 8 0.906 0.911 0.910 0.899 0.905 0.891 0.832 1.000 0.831 0.911 0.914 0.908 d 9 0.897 0.899 0.898 0.883 0.889 0.893 0.831 0.831 1.000 0.901 0.902 0.908 d 10 0.976 0.982 0.979 0.958 0.970 0.958 0.899 0.911 0.901 1.000 0.989 0.984 d 11 0.972 0.979 0.976 0.953 0.974 0.960 0.894 0.914 0.902 0.989 1.000 0.987 d 12 0.969 0.976 0.972 0.950 0.968 0.960 0.893 0.908 0.908 0.984 0.987 1.000 Table 5.9: Correlation (p) between Any Twom-group Systems (s 3 s 10 ) for Decisiond 1 on FG-NET; Average Correlation = 0.7777 p s 3 d 1 s 4 d 1 s 5 d 1 s 6 d 1 s 7 d 1 s 8 d 1 s 9 d 1 s 10 d 1 s 3 d 1 1.000 0.851 0.829 0.812 0.783 0.750 0.704 0.636 s 4 d 1 0.851 1.000 0.839 0.792 0.786 0.745 0.701 0.632 s 5 d 1 0.829 0.839 1.000 0.841 0.841 0.766 0.726 0.649 s 6 d 1 0.812 0.792 0.841 1.000 0.846 0.829 0.772 0.693 s 7 d 1 0.783 0.786 0.841 0.846 1.000 0.856 0.821 0.748 s 8 d 1 0.750 0.745 0.766 0.829 0.856 1.000 0.899 0.808 s 9 d 1 0.704 0.701 0.726 0.772 0.821 0.899 1.000 0.818 s 10 d 1 0.636 0.632 0.649 0.693 0.748 0.808 0.818 1.000 5.8.5 Results of Fusion of Decisions First, the intra-system diversity and inter-system diversity are analyzed. Table 5.8 shows the intra-system diversity between decisions of 3-group age estimation system. Table 5.9 shows the inter-system diversity between decisions for feature f1 (i.e., BIF). From above two tables, the inter-system demonstrates higher diversity than the intra-system. Hence, it is expected that the fusion along inter-system direction would have better performance improvement. The experimental setting in the fusion stage is the same as in the previous stage and the same cross validation strategy is utilized. First, the MAE results of the intra fusion (AF) of each system on FG-NET and MORPH-II are shown in Table 5.10 and Table 5.11, respectively. Then, the MAE results 121 Table 5.10: MAE (in Years) Results of Intra Fusion (AF) on FG-NET m Decision Subset No. of updates k MAE 3 DS 3 =fd 1 ;d 4 ;d 7 g 3 4.20 4 DS 4 =fd 1 ;d 4 ;d 7 ;d 9 g 4 3.50 5 DS 5 =fd 1 ;d 2 ;d 9 ;d 6 ;d 3 g 5 3.09 6 DS 6 =fd 1 ;d 9 ;d 3 g 3 2.54 7 DS 7 =fd 2 ;d 1 ;d 11 ;d 5 g 4 2.53 8 DS 8 =fd 1 ;d 2 ;d 3 ;d 9 g 4 2.67 9 DS 9 =fd 2 ;d 3 ;d 1 ;d 4 g 4 3.26 10 DS 10 =fd 2 ;d 6 ;d 4 ;d 9 ;d 1 ;d 5 g 6 5.36 Table 5.11: MAE (in Years) Results of Intra Fusion (AF) on MORPH-II m Decision Subset No. of updates k MAE 2 DS 2 =fd 4 ;d 2 ;d 3 ;d 7 ;d 5 ;d 1 g 6 3.58 3 DS 3 =fd 1 g 1 4.56 4 DS 4 =fd 2 ;d 10 ;d 4 ;d 1 ;d 5 g 5 4.35 5 DS 5 =fd 3 g 1 4.77 6 DS 6 =fd 4 ;d 12 ;d 5 g 3 4.39 7 DS 7 =fd 11 g 1 4.52 of the inter fusion (EF) of each feature type on FG-NET and MORPH-II are shown in Table 5.12 and Table 5.13, respectively. The MAE results of the intra-inter fusion (AEF) on FG-NET and MORPH-II are shown in Table 5.14. The MAE results of the inter- intra fusion (EAF) on FG-NET and MORPH-II are shown in Table 5.15. Finally, the MAE results of the composite fusion (CF) on FG-NET and MORPH-II are shown in Table 5.16. Comparing the results of AF with EF, the MAEs of EF are smaller than AF, which tells us that fusion among higher diverse decisions can really oer benet. Overall, the CF gives the best MAEs, which are 1.95 years on FG-NET and 2.88 years on MORPH-II. 5.8.6 Results of Outlier Prediction and Error Compensation Here the error analysis is conducted based on the result of composite fusion (CF) from previous stage. As we discussed earlier, the denition of outlier can be dierent based on the error L. First, we investigate the outlier prediction performance for dierent error 122 Table 5.12: MAE (in Years) Results of Inter Fusion (EF) on FG-NET Feature Decision Subset No. of updates k MAE f 1 SS 1 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 1 2.01 f 2 SS 2 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 10 g 2 1.99 f 3 SS 3 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 10 g 2 2.05 f 4 SS 4 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 10 g 2 2.16 f 5 SS 5 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 0 2.20 f 6 SS 6 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 1 2.19 f 7 SS 7 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 10 g 2 2.22 f 8 SS 8 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 0 2.35 f 9 SS 9 =fs 3 ;s 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 0 2.27 f 10 SS 10 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 1 2.07 f 11 SS 11 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 1 2.08 f 12 SS 12 =fs 4 ;s 5 ;s 6 ;s 7 ;s 8 ;s 9 ;s 10 g 1 2.10 Table 5.13: MAE (in Years) Results of Inter Fusion (EF) on MORPH-II Feature Decision Subset No. of updates k MAE f 1 SS 1 =fs 2 ;s 3 ;s 4 ;s 5 ;s 6 ;s 7 g 0 3.21 f 2 SS 2 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.18 f 3 SS 3 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.22 f 4 SS 4 =fs 2 ;s 3 ;s 4 ;s 5 ;s 7 g 1 3.44 f 5 SS 5 =fs 2 ;s 4 ;s 6 ;s 7 g 2 3.40 f 6 SS 6 =fs 2 ;s 3 ;s 4 ;s 7 g 2 3.36 f 7 SS 7 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.37 f 8 SS 8 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.46 f 9 SS 9 =fs 2 ;s 3 ;s 4 ;s 6 ;s 7 g 1 3.41 f 10 SS 10 =fs 2 ;s 3 ;s 4 ;s 5 ;s 6 ;s 7 g 0 3.30 f 11 SS 11 =fs 2 ;s 3 ;s 4 ;s 7 g 2 3.37 f 12 SS 12 =fs 2 ;s 3 ;s 4 ;s 5 ;s 7 g 1 3.31 Table 5.14: MAE (in Years) Results of Intra-inter Fusion (AEF) on FG-NET and MORPH-II Database DS Subset MAE FG-NET fDS 4 ;DS 5 ;DS 6 ;DS 7 ;DS 8 ;DS 10 g 1.99 MORPH-II fDS 2 ;DS 4 ;DS 6 g 2.96 L (from 5 to 10 years). The 5-fold cross validation is used to test the reliability of 123 Table 5.15: MAE (in Years) Results of Inter-intra Fusion (EAF) on FG-NET and MORPH-II Database SS Subset MAE FG-NET fSS 2 ;SS 1 ;SS 9 g 1.97 MORPH-II fSS 2 ;SS 12 ;SS 3 g 2.98 Table 5.16: MAE (in Years) Results of Composite Fusion (CF) on FG-NET and MORPH-II Database Decision Subset MAE FG-NET fs 7 d 2 ;s 6 d 2 ;s 4 d 1 ;s 5 d 1 ;s 8 d 7 ;s 10 d 4 ;s 8 d 1 ;s 6 d 9 g 1.95 MORPH-II fs 2 d 4 ;s 4 d 4 ;s 2 d 1 ;s 4 d 12 ;s 3 d 10 ;s 3 d 11 ;s 4 d 1 ;s 2 d 3 ;s 3 d 3 g 2.89 the method. The performance of the outlier prediction is evaluated based on following performance measures: P D (detection rate), P M (missing rate), P FA (false alarm rate). P D = n d n t ; (5.9) P M = n m n t = 1P D ; (5.10) P FA = n f n i ; (5.11) where n d : number of detected outliers, n t : number of total outliers, n m : number of missed outliers,n f : number of wrongly detected outliers, andn i : number of total inliers. The results of outlier prediction on FG-NET are shown in Table 5.17. We can see that the detection rates vary with the error level L. The false alarm rate is 0% or very low, which will keep the inliers unaected (i.e., no error estimation will be done on them). For the detected outliers, error estimation will applied and the estimated error will be compensated for their estimated ages. The nal MAE results for dierent error level L of outlier are shown in Table 5.18. The minimum MAE achieved is 1.78 years. 5.8.7 Performance Comparison To prove the robustness and eectiveness of our proposed methods, we compare the MAEs and cumulative scores (CS) with other state-of-the-art methods. Table 5.19 shows 124 Table 5.17: Performance of Outlier Prediction on FG-NET Error level L (years) P D (detected) P M (missed) P FA (false alarm) 10 88.6 % 11.4 % 0.0 % 9 89.4 % 10.6 % 0.0 % 8 84.9 % 15.1 % 0.1 % 7 80.4 % 19.6 % 0.0 % 6 76.1 % 23.9 % 0.0 % 5 81.0 % 19.0 % 0.2 % Table 5.18: MAE (in Years) Results after Error Compensation for Outliers Dened from Composite Fusion (CF) on FG-NET Error level L (years) MAE (years) 10 1.86 9 1.85 8 1.82 7 1.83 6 1.82 5 1.78 the MAEs of dierent age estimation methods on the FG-NET. Fig. 5.10 shows the cumulative scores (CS) of dierent methods on the FG-NET. It can be observed that our proposed method outperforms other state-of-the-art methods in terms of MAE and CS. More specically, the MAE of our method can be as low as 1.78 years on the FG-NET database. 5.9 Conclusion and Future Work In this work, we proposed a multistage learning and deep fusion framework, called GGE- FOE, for age estimation. The proposed GGEFOE method was evaluated on the FG-NET and MORPH databases with signicantly improved performance over existing solutions. We will continue to improve its performance by consider the following: 1) to increase diversity among decisions by including other features or age grouping systems; 2) to investigate other decision selection algorithms; 3) to explore other machine learning 125 Table 5.19: MAE (in Years) Results of Dierent Age Estimation Methods on the FG- NET Aging Database Method MAE WAS [26] 8.06 AGES [26] 6.77 KAGES [24] 6.18 QM [46] 6.55 MLPs [46] 6.98 RUN [88] 5.78 RankBoost [90] 5.67 GP [91] 5.39 BM [87] 5.33 RED-SVM [8] 5.24 LARR [33] 5.07 PFA [34] 4.97 RPK [89] 4.95 MHR [68] 4.87 MTWGP [91] 4.83 PLO [48] 4.82 BIF [39] 4.77 NDF [19] 4.67 FLP [61] 4.61 OHRank [9] 4.48 GEF(CF) [ours] 2.73 GGEF(CF) [ours] 1.95 GGEF(CF)-OE [ours] 1.78 methods. We will also verify the robust performance of the GGEFOE framework on other databases or across multiple aging face databases. 126 Figure 5.10: Cumulative scores (CS) of dierent age estimation methods on the FG-NET aging database. 127 Chapter 6 Conclusion and Future Work 6.1 Summary of the Research In Chapter 2, we provided a brief review of related work on soft biometrics, gender recognition, race classication, age group classication and age estimation. Then, we provided a review on one of machine learning methods, called Support Vector Machine (SVM). It shows not only the classication algorithm but also the regression algorithm. In Chapter 3, we proposed a facial age group classication system using a structured fusion of shape-feature and surface-feature based classiers. Two new shape features were developed and a new surface feature based method was designed. By setting a ROC to jointly classify the facial images with two stages, the resulting system gave a highly accurate classication result. Experimental results on FG-NET and MORPH aging databases demonstrated that the proposed method outperforms the state-of-the- art methods. In Chapter 4, we proposed a three-stage ensemble learning framework for age esti- mation. In the rst stage, we presented an age grouping method to classify faces into dierent age groups. Then, in the second stage, to lower feature dimensionality, some local features were extracted from facial components. These local features and some global features, totally 12 features, were separately used by SVR to predict facial ages in each classied group. Therefore, 12 decision results are delivered at the output of the second stage. Finally, in the third stage, based on the analysis of diversity, some fusion schemes are designed to eectively select decisions from dierent systems to boost performance for nal age estimation. We also discussed the arithmetic complexity on the fusion methods. The proposed GEF framework was evaluated on the most frequently 128 used FG-NET and MORPH aging databases and demonstated a better performance than other state-of-the-art age estimation methods. In Chapter 5, we proposed a multistage (5-stage) learning and deep fusion method for facial age estimation. In the rst stage - gender grouping, a model is trained to classify faces into male and female groups. For the second stage - age grouping, faces in male or female group are then classied into age groups, where each group has dierent age range. In the third stage - age estimation within age groups, each age group has its trained model to predict ages of faces classied in that group. For the fourth stage - fusion of decisions, based on the diversity of dierent decisions (i.e., age estimation results from the third stage), decisions are selected for fusion. To make fusion eective, the diversity is created by generating several age grouping systems and adopting dierent facial features for age estimation to obtain various decisions (i.e., estimation results). In the nal stage, through error analysis on results of the previous stage, outlier faces are predicted and their estimation errors can be compensated to further improve estimation results. Our multistage learning method is veried on the FG-NET and MORPH databases, and shows better results compared with other state-of-the-art age estimation methods. 6.2 Future Research Directions The race classication has been investigated for several years and some research work did show good results on this problem. However, race classication is a multi-class classication problem because there are a lot of races existed. Moreover, This may require us to locate another types of features to represent the racial information from faces. With using race classication to rst identify the race of a face, we could perform age grouping in each race. This may improve the classication accuracy on age grouping since each race may have dierent aging process in nature. For this direction, we could try to investigate a good race classication algorithm rst. Once the race classication 129 is taken care of, we can combine the gender classication system and race classication system with age grouping system together, and nd ways to improve the performance on age grouping and estimation. 130 Bibliography [1] The FG-NET aging database, http://www.fgnet.rsunit.com/. [2] L. Bai, L. Shen, and Y. Wang. A novel eye location algorithm based on radial symmetry transform. In 18th IEEE International Conference on Pattern Recognition (ICPR), volume 3, pages 511{514, 2006. [3] S. Baluja and H. A. Rowley. Boosting sex identication performance. International Journal of Computer Vision, 71(1):111{119, 2007. [4] M. Barker and W. Rayens. Partial least squares for discrimination. Journal of chemometrics, 17(3):166{173, 2003. [5] D. Basak, S. Pal, and D. C. Patranabis. Support vector regression. Neural Infor- mation Processing-Letters and Reviews, 11(10):203{224, 2007. [6] B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classiers. In Proceedings of the fth annual workshop on Computational learning theory, pages 144{152. ACM, 1992. [7] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011. [8] K.-Y. Chang, C.-S. Chen, and Y.-P. Hung. A ranking approach for human ages estimation based on face images. In Pattern Recognition (ICPR), 2010 20th Inter- national Conference on, pages 3396{3399. IEEE, 2010. [9] K.-Y. Chang, C.-S. Chen, and Y.-P. Hung. Ordinal hyperplanes ranker with cost sen- sitivities for age estimation. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 585{592. IEEE, 2011. [10] H. F. Chen, P. N. Belhumeur, and D. W. Jacobs. In search of illumination invari- ants. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 254{261, 2000. [11] Y. Chen and C. Hsu. Subspace learning for facial age estimation via pair- wise age ranking. Information Forensics and Security, IEEE Transactions on, x(x):(Accepted, to be published), 2014. 131 [12] V. Coetzee, J. M. Gree, L. Barrett, and S. P. Henzi. Facial-based ethnic recogni- tion: insights from two closely related but ethnically distinct groups. South African Journal of Science, 105(11-12):464{466, 2009. [13] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(6):681{685, 2001. [14] C. Cortes and V. Vapnik. Support-vector networks. Machine learning, 20(3):273{ 297, 1995. [15] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 1, pages 886{893, 2005. [16] M. M. Dehshibi and A. Bastanfard. A new algorithm for age recognition from facial images. Signal Processing, 90(8):2431{2444, 2010. [17] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classication and scene analysis 2nd ed. 1995. [18] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classication. John Wiley & Sons, 2012. [19] N. Fan. Learning nonlinear distance functions using neural network for regression with application to robust human age estimation. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 249{254. IEEE, 2011. [20] Y. Fu, G. Guo, and T. S. Huang. Age synthesis and estimation via faces: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(11):1955{ 1976, 2010. [21] Y. Fu and T. S. Huang. Human age estimation with regression on discriminative aging manifold. Multimedia, IEEE Transactions on, 10(4):578{584, 2008. [22] Y. Fu, Y. Xu, and T. S. Huang. Estimating human age by manifold analysis of face pictures and regression on aging features. In Multimedia and Expo, 2007 IEEE International Conference on, pages 1383{1386. IEEE, 2007. [23] F. Gao and H. Ai. Face age classication on consumer images with gabor feature and fuzzy lda method. Advances in biometrics, pages 132{141, 2009. [24] X. Geng, K. Smith-Miles, and Z.-H. Zhou. Facial age estimation by nonlinear aging pattern subspace. In Proceedings of the 16th ACM international conference on Multimedia, pages 721{724. ACM, 2008. [25] X. Geng, Z.-H. Zhou, and S.-F. Chen. Eye location based on hybrid projection function. Journal of Software, 8:1394{1400, 2003. 132 [26] X. Geng, Z.-H. Zhou, and K. Smith-Miles. Automatic age estimation based on facial aging patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12):2234{2240, 2007. [27] X. Geng, Z.-H. Zhou, Y. Zhang, G. Li, and H. Dai. Learning from facial aging patterns for automatic age estimation. In Proceedings of the 14th annual ACM international conference on Multimedia, pages 307{316, 2006. [28] S. A. Glantz. Primer of biostatistics. McGraw Hill Medical Publishing, 6th edition, 2005. [29] A. J. Golby, J. D. Gabrieli, J. Y. Chiao, and J. L. Eberhardt. Dierential responses in the fusiform region to same-race and other-race faces. Nature neuroscience, 4(8):845{850, 2001. [30] B. A. Golomb, D. T. Lawrence, and T. J. Sejnowski. Sexnet: A neural network identies sex from human faces. In NIPS, pages 572{579, 1990. [31] A. Gunay and V. V. Nabiyev. Automatic age classication with lbp. In 23rd IEEE International Symposium onComputer and Information Sciences (ISCIS'08), 2008, pages 1{4, 2008. [32] G. Guo, C. R. Dyer, Y. Fu, and T. S. Huang. Is gender recognition aected by age? In Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pages 2032{2039. IEEE, 2009. [33] G. Guo, Y. Fu, C. R. Dyer, and T. S. Huang. Image-based human age estimation by manifold learning and locally adjusted robust regression. Image Processing, IEEE Transactions on, 17(7):1178{1188, 2008. [34] G. Guo, Y. Fu, C. R. Dyer, and T. S. Huang. A probabilistic fusion approach to human age prediction. In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. IEEE Computer Society Conference on, pages 1{6. IEEE, 2008. [35] G. Guo, Y. Fu, T. S. Huang, and C. R. Dyer. Locally adjusted robust regression for human age estimation. In Applications of Computer Vision, 2008. WACV 2008. IEEE Workshop on, pages 1{6. IEEE, 2008. [36] G. Guo and G. Mu. Simultaneous dimensionality reduction and human age esti- mation via kernel partial least squares regression. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 657{664. IEEE, 2011. [37] G. Guo and G. Mu. Joint estimation of age, gender and ethnicity: Cca vs. pls. In Automatic Face and Gesture Recognition, 2013. FGR 2013. 10th International Conference and Workshops on, pages 1{6. IEEE, 2013. [38] G. Guo, G. Mu, Y. Fu, C. Dyer, and T. Huang. A study on automatic age estimation using a large database. In 12th IEEE International Conference on Computer Vision (ICCV), pages 1986{1991, 2009. 133 [39] G. Guo, G. Mu, Y. Fu, and T. S. Huang. Human age estimation using bio- inspired features. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 112{119, 2009. [40] S. Gutta, H. Wechsler, and P. J. Phillips. Gender and ethnic classication of face images. In Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, pages 194{199. IEEE, 1998. [41] M. A. Hajizadeh and H. Ebrahimnezhad. Classication of age groups from facial image using histograms of oriented gradients. In 2011 7th IEEE Iranian Machine Vision and Image Processing (MVIP), pages 1{5, 2011. [42] D. Hammond and E. Simoncelli. Nonlinear image representation via local multiscale orientation. Rapport technique, Courant Institute of Mathematical Sciences, New York University, 2005. [43] W.-B. Horng, C.-P. Lee, and C.-W. Chen. Classication of age groups based on facial features. Tamkang Journal of Science and Engineering, 4(3):183{192, 2001. [44] C.-W. Hsu, C.-C. Chang, and C. Lin. A practical guide to support vector classica- tion. department of computer science and information engineering, national taiwan university, taiwan. Taipei, Taiwan, 2003. [45] Y. Kwon and N. Lobo. Age classication from facial images. Computer vision and image understanding, 74(1):1{21, 1999. [46] A. Lanitis, C. Draganova, and C. Christodoulou. Comparing dierent classiers for automatic age estimation. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 34(1):621{628, 2004. [47] P. L. Leong. Aging changes in the male face. Facial plastic surgery clinics of North America, 16(3):277{279, 2008. [48] C. Li, Q. Liu, J. Liu, and H. Lu. Learning ordinal discriminative features for age estimation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2570{2577. IEEE, 2012. [49] H. Ling, S. Soatto, N. Ramanathan, and D. W. Jacobs. Face verication across age progression using discriminative methods. IEEE Transactions on Information Forensics and Security, 5(1):82{91, 2010. [50] K.-H. Liu, P.-Y. Chiang, and C.-C. J. Kuo. A machine learning approach to 3d model retrieval. In Proceedings of Asia-Pacic Signal and Information Processing Association Annual Submit and Conference, pages 13{17, 2011. [51] K.-H. Liu, T.-J. Liu, and H.-H. Liu. A sift descriptor based method for global disparity vector estimation in multiview video coding. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 1214{1218. IEEE, 2010. 134 [52] K.-H. Liu, S. Yan, and C.-C. J. Kuo. Age group classication via structured fusion of uncertainty-driven shape features and selected surface features. In Winter Appli- cations of Computer Vision (WACV), 2014 IEEE Conference on. IEEE, 2014. [53] T.-J. Liu, W. Lin, and C.-C. J. Kuo. A multi-metric fusion approach to visual quality assessment. In Quality of Multimedia Experience (QoMEX), 2011 Third International Workshop on, pages 72{77. IEEE, 2011. [54] T.-J. Liu, W. Lin, and C.-C. J. Kuo. Image quality assessment using multi-method fusion. Image Processing, IEEE Transactions on, 22(5):1793{1807, 2013. [55] T.-J. Liu, K.-H. Liu, and H.-H. Liu. Temporal information assisted video quality metric for multimedia. In Multimedia and Expo (ICME), 2010 IEEE International Conference on, pages 697{701. IEEE, 2010. [56] Y. Ma, T. Xiong, Y. Zou, and K. Wang. Person-specic age estimation under ranking framework. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval, page 38. ACM, 2011. [57] E. Makinen and R. Raisamo. Evaluation of gender classication methods with auto- matically detected and aligned faces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(3):541{547, 2008. [58] B. Moghaddam and M.-H. Yang. Gender classication with support vector machines. In Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, pages 306{311. IEEE, 2000. [59] N. Nakao, W. Ohyama, T. Wakabayashi, and F. Kimura. Automatic detection of facial midline and its contributions to facial feature extraction. Electron. Lett. Comput. Vis. Image Anal, 6(3):55{65, 2008. [60] M. Nazir, M. Ishtiaq, A. Batool, M. A. Jaar, and A. M. Mirza. Feature selec- tion for ecient gender classication. In Proceedings of the WSEAS international conference, Wisconsin, pages 70{75, 2010. [61] B. Ni, S. Yan, and A. Kassim. Learning a propagable graph for semisupervised learning: classication and regression. Knowledge and Data Engineering, IEEE Transactions on, 24(1):114{126, 2012. [62] NIST. Sematech e-handbook of statistical methods. NIST. NIST/SEMATECH e-Handbook of Statistical Methods, 2006. [63] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classication with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971{987, 2002. [64] E. C. Paes, H. J. Teepen, W. A. Koop, and M. Kon. Perioral wrinkles: Histologic dierences between men and women. Aesthetic Surgery Journal, 29(6):467{472, 2009. 135 [65] E. Patterson, A. Sethuram, M. Albert, K. Ricanek, and M. King. Aspects of age variation in facial morphology aecting biometrics. In Biometrics: Theory, Appli- cations, and Systems, 2007. BTAS 2007. First IEEE International Conference on, pages 1{6. IEEE, 2007. [66] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss. The feret evaluation method- ology for face-recognition algorithms. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(10):1090{1104, 2000. [67] B. Poggio, R. Brunelli, and T. Poggio. Hyberbf networks for gender classication. 1992. [68] T. Qin, X.-D. Zhang, D.-S. Wang, T.-Y. Liu, W. Lai, and H. Li. Ranking with multiple hyperplanes. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 279{286. ACM, 2007. [69] N. Ramanathan and R. Chellappa. Face verication across age progression. IEEE Transactions on Image Processing,, 15(11):3349{3361, 2006. [70] K. Ricanek and T. Tesafaye. Morph: A longitudinal image database of normal adult age-progression. In 7th IEEE International Conference on Automatic Face and Gesture Recognition (FGR'06), pages 341{345, 2006. [71] K. Ricanek Jr and E. Boone. The eect of normal adult aging on standard pca face recognition accuracy rates. In Neural Networks, 2005. IJCNN'05. Proceedings. 2005 IEEE International Joint Conference on, volume 4, pages 2018{2023. IEEE, 2005. [72] B. Sch olkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural computation, 12(5):1207{1245, 2000. [73] H. Shehadeh, A. Al-khalaf, and M. Al-khassaweneh. Human face detection using skin color information. In 2010 IEEE International Conference on Elec- tro/Information Technology (EIT), pages 1{5, 2010. [74] L.-L. Shen and Z. Ji. Modelling geiometric features for face based age classica- tion. In 2008 IEEE International Conference on Machine Learning and Cybernetics, volume 5, pages 2927{2931, 2008. [75] A. J. Smola and B. Sch olkopf. A tutorial on support vector regression. Statistics and computing, 14(3):199{222, 2004. [76] H. Smulyan, R. G. Asmar, A. Rudnicki, G. M. London, and M. E. Safar. Com- parative eects of aging in men and women on the properties of the arterial tree. Journal of the American College of Cardiology, 37(5):1374{1380, 2001. [77] Z. Song, B. Ni, D. Guo, T. Sim, and S. Yan. Learning universal multi-view age esti- mator using video context. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 241{248. IEEE, 2011. 136 [78] K. Sveikata, I. Balciuniene, J. Tutkuviene, et al. Factors in uencing face aging. literature review. Stomatologija, 13(4):113{115, 2011. [79] J. W. Tanaka, M. Kiefer, and C. M. Bukach. A holistic account of the own-race eect in face recognition: Evidence from a cross-cultural study. Cognition, 93(1):B1{B9, 2004. [80] X. Tang, Z. Ou, T. Su, H. Sun, and P. Zhao. Robust precise eye location by adaboost and svm techniques. Advances in Neural Networks{ISNN 2005, pages 93{98, 2005. [81] P. Thukral, K. Mitra, and R. Chellappa. A hierarchical approach for human age estimation. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1529{1532, 2012. [82] K. Tonchev, I. Paliy, O. Boumbarov, and S. Sokolov. Human age-group classication of facial images with subspace projection and support vector machines. In 2011 IEEE 6th International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS), volume 1, pages 439{443, 2011. [83] K. Ueki, T. Hayashida, and T. Kobayashi. Subspace-based age-group classication using facial images under various lighting conditions. In Automatic Face and Gesture Recognition, 2006. FGR 2006. 7th International Conference on, pages 6{pp. IEEE, 2006. [84] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Pro- ceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I{511. IEEE, 2001. [85] A. R. Webb. Statistical pattern recognition. Wiley. com, 2003. [86] S. Yan, M. Liu, and T. S. Huang. Extracting age information from local spatially exible patches. In Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, pages 737{740. IEEE, 2008. [87] S. Yan, H. Wang, T. S. Huang, Q. Yang, and X. Tang. Ranking with uncertain labels. In Multimedia and Expo, 2007 IEEE International Conference on, pages 96{99. IEEE, 2007. [88] S. Yan, H. Wang, X. Tang, and T. S. Huang. Learning auto-structured regressor from uncertain nonnegative labels. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1{8. IEEE, 2007. [89] S. Yan, X. Zhou, M. Liu, M. Hasegawa-Johnson, and T. S. Huang. Regression from patch-kernel. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1{8, 2008. [90] P. Yang, L. Zhong, and D. Metaxas. Ranking model for facial age estimation. In Pattern Recognition (ICPR), 2010 20th International Conference on, pages 3404{ 3407. IEEE, 2010. 137 [91] Y. Zhang and D.-Y. Yeung. Multi-task warped gaussian process for personalized age estimation. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2622{2629. IEEE, 2010. [92] S. K. Zhou, B. Georgescu, X. S. Zhou, and D. Comaniciu. Image based regres- sion using boosting method. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 1, pages 541{548. IEEE, 2005. 138
Abstract (if available)
Abstract
Age estimation has been attracted lots of attention last decade. This dissertation includes six chapters. In Chapter 1, we give an introduction on this dissertation, including significance of the research, contributions of the research, and organization of the dissertation. The previous work in this area is also thoroughly reviewed. ❧ In Chapter 2, we provide the research background, which includes a brief review of related work on soft biometrics, gender recognition, race classification, age group classification and age estimation. It addresses several feature extraction methods which could be useful in representing facial aging features. Also, some classification and regression algorithms used for age grouping and estimation are discussed. ❧ In Chapter 3, we present a structured fusion method for facial age group classification. To utilize the structured fusion of shape features and surface features, we introduced the region of certainty (ROC) to not only control the classification accuracy for shape feature based system but also reduce the classification needs on surface feature based system. In the first stage, we design two shape features, which can be used to classify frontal faces with high accuracies. In the second stage, a surface feature is adopted and then selected by a statistical method. The statistical selected surface features combined with a SVM classifier can offer high classification rates. With properly adjusting the ROC by a single non‐sensitive parameter, the structured fusion of two stages can provide a performance improvement. In the experiments, we use face images in the public available FG‐NET and MORPH databases and partition them into three pre‐defined age groups. It is observed that the proposed method offers a correct classification rate of 95.1% in FG‐NET and 93.7% in MORPH, which outperforms state‐of‐the‐art methods by a significant margin. ❧ In Chapter 4, we present a novel multistage learning system, called Grouping‐Estimation‐Fusion (GEF), for human age estimation via facial images. The GEF consists of three stages: 1) age grouping
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A learning‐based approach to image quality assessment
PDF
Advanced techniques for object classification: methodologies and performance evaluation
PDF
Landmark-free 3D face modeling for facial analysis and synthesis
PDF
Machine learning techniques for outdoor and indoor layout estimation
PDF
Advanced techniques for human action classification and text localization
PDF
Multimodal image retrieval and object classification using deep learning features
PDF
Machine learning methods for 2D/3D shape retrieval and classification
PDF
Green learning for 3D point cloud data processing
PDF
Efficient machine learning techniques for low- and high-dimensional data sources
PDF
Object detection and recognition from 3D point clouds
PDF
Labeling cost reduction techniques for deep learning: methodologies and applications
PDF
Advanced features and feature selection methods for vibration and audio signal classification
PDF
Data-driven image analysis, modeling, synthesis and anomaly localization techniques
PDF
Efficient template representation for face recognition: image sampling from face collections
PDF
Data-efficient image and vision-and-language synthesis and classification
PDF
Learning to optimize the geometry and appearance from images
PDF
Efficient graph learning: theory and performance evaluation
PDF
Behavior understanding from speech under constrained conditions: exploring sparse networks, transfer and unsupervised learning
PDF
Noise aware methods for robust speech processing applications
PDF
Mutual information estimation and its applications to machine learning
Asset Metadata
Creator
Liu, Kuan-Hsien
(author)
Core Title
Facial age grouping and estimation via ensemble learning
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
07/16/2016
Defense Date
06/11/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
age estimation,age grouping,classification,feature selection,fusion,machine learning,OAI-PMH Harvest,regression
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Georgiou, Panayiotis G. (
committee member
), Nakano, Aiichiro (
committee member
), You, Suya (
committee member
)
Creator Email
khliu1212@gmail.com,liuk@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-444922
Unique identifier
UC11286964
Identifier
etd-LiuKuanHsi-2710.pdf (filename),usctheses-c3-444922 (legacy record id)
Legacy Identifier
etd-LiuKuanHsi-2710.pdf
Dmrecord
444922
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Liu, Kuan-Hsien
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
age estimation
age grouping
feature selection
fusion
machine learning
regression