Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Compositing real and virtual objects with realistic, color-accurate illumination
(USC Thesis Other)
Compositing real and virtual objects with realistic, color-accurate illumination
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Compositing Real and Virtual Objects
with Realistic, Color-Accurate Illumination
by
Chloe LeGendre
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements of the Degree
DOCTOR OF PHILOSOPHY
(Computer Science)
August 2019
Copyright 2019 Chloe LeGendre
Acknowledgments
Many individuals have mentored, supported, taught, and challenged me over the initial years of my research
career, culminating in this dissertation. I want to take this opportunity to sincerely thank them all and express
my gratitude.
First, I want to thank my advisor Prof. Paul Debevec for the ongoing mentorship, and for continually
challenging me to be more rigorous, creative, and visually and narratively compelling in my scientific endeavors.
One of the most difficult parts of research is handling uncertainty, especially not knowing how, when, or if a
difficult problem will be solved. I want to thank Paul for showing me how to plot the course forward despite an
often uncertain destination and how to find joy along the way. I also thank him for giving me the opportunity to
work with and learn from an incredibly talented interdisciplinary team of engineers, artists, and scientists.
I want to thank my dissertation and qualification committee members Prof. Michael Fink, Prof. Jernej
Barbiˇ c, Prof. Randall Hill, and Prof. Hao Li for providing valuable questions, perspectives, and insights that
helped shape this final work. I want to thank Prof. Hao Li in particular for welcoming me to study computer
graphics at the University of Southern California and nominating me for USC’s Annenberg PhD Fellowship. I
want to thank Lizsl DeLeon for her organization and patience and Prof. Parag Havaldar for teaching me how to
teach during my time at USC. I also thank Prof. Philippos Mordohai, my M.S. advisor from Stevens Institute
of Technology, for taking a chance on me as a research assistant when I was relatively new to both research and
programming. I want to thank him and Dr. Gang Hua for first introducing me to the scientific challenges of
computer vision.
This thesis stems from extensive collaborations with colleagues from the Vision and Graphics Lab at the
USC Institute for Creative Technologies. I thank Xueming Yu, Jay Busch, Shanhe Wang, Xinglei Ren, and Bipin
ii
Kishore for all their incredible hard work maintaining and building state-of-the-art systems for computational
photography and for teaching me about electronics. I thank Jay Busch in particular, who patiently taught me
about rendering, video production, and many more essential topics for a computer graphics researcher. I also
want to thank current and past VGL members Dr. Andrew Jones, Kalle Bladin, Adair Liu, Pratusha Prasad,
Marcel Ramos, Prof. Sumanta Pattanaik and Loc Huynh for all their hard work, patience, and dedication,
and Kathleen Haase, Christina Trejo, and Michael Trejo for continually supporting VGL’s research goals with
their important work. I thank our lab’s graduates Dr. Koki Nagano and Dr. Wan-Chun Ma for their senior
advice, along with my PhD comrades Eli Pincus, Su Lei, Zeng Huang, and Tianye Li. I also thank our paper
models Greg Downing, Jonetta Thomas, Jessica Robertson, Donna Ruffin, Xinglei Ren, and Shanhe Wang for
generously volunteering their time.
During the course of my USC PhD program, I was also fortunate enough to have two separate internships at
Google, in both engineering and research roles. I want to thank my advisor Prof. Paul Debevec and Dr. Shahram
Izadi for these opportunities and thank Ivan Neulander and Dr. Wan-Chun Ma for their guidance and mentorship
during this time. I thank my Google colleagues Graham Fyffe, Konstantine Tsotsos, Laurent Charbonnel, and
Christina Tong for their expertise and hard work, and, in particular, I thank John Flynn for patiently reviewing
my code and giving me significant technical guidance regarding machine learning.
I was also lucky to be included in The Academy of Motion Picture Arts and Science’s sci-tech subcommittee
on Solid State Lighting. I thank Paul for inviting me to join this project, and I thank Dan Sherlock, Joe di
Gennaro, Jack Holm, Jonathan Erland, and Joshua Pines for welcoming my contributions.
Prior to USC, I worked in two corporate research organizations (Johnson & Johnson, L’Oréal USA Research
& Innovation), where my peers and mentors first encouraged me to pursue a career in scientific research. I thank
my L’Oréal mentor Dr. Guive Balooch for resourcing me to work on creative projects and for financing my very
first inspirational SIGGRAPH conference trip. I also thank my colleagues Nghi Nguyen, Dr. Anne Young, and
Helga Malaprade for their dedication and vision, and I thank Dr. Paulo Bargo, Dr. Gregory Payonk, and Dr.
iii
Kevin Walker for supporting my transition to academia. I thank my J&J mentor Dr. Nikoleta Batchvarova for
taking a chance on me right out of college. The late research leader Dr. Nikiforos Kollias of J&J first showed
me how a research group could resemble a family, and I thank him for including me as an honorary member and
introducing me to Dr. Paulo Bargo, Dr. Gregory Payonk, and Dr. Gabriela Oana Cula, who were the very first
people to ignite my passion for computer vision and optical measurement.
I want to thank my parents Stephen and Glenda LeGendre for an actual lifetime of support, love, and encour-
agement along with their patience and understanding as I built a career so far away from home. They encourage
me equally in both my many artistic and scientific pursuits, and this dissertation is a reflection of that support. I
also want to thank my grandmother Bessie Goldman, my aunt Sheila Goldman, and Louisa Goldstein for their
continued love and encouragement. I want to thank my friends: Harry Anastopulos, Wyn Furman, Dr. Martin
Naradikian, Dr. Barbara Coons, Claudia Montoya, Christian Montoya, Gerard Leone, Carin Bortz, and lastly
Prof. Roland Betancourt, the sibling I never had. Finally, I want to thank my partner Devin O’Neill, whose fas-
cination with the world and intellectual curiosity have inspired me from the beginning of my time in California,
and whose bottomless well of patience and humor have kept me afloat during more challenging times. I thank
him for his enduring love and friendship.
This dissertation work was sponsored by a USC Annenberg PhD Fellowship, the U.S. Army Research,
Development, and Engineering Command (RDECOM), and partially by Google Inc. The content of this thesis
does not necessarily reflect the position or policy of the U.S. Government, and no official endorsement should
be inferred.
iv
Table of Contents
Acknowledgements ii
Abstract xii
I Introduction 1
I.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
II Background and Related Work 7
II.1 Digital Image Compositing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
II.2 Image-Based Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
II.3 Spectral Measurement, Color Rendition, and Multispectral Illumination . . . . . . . . . . . . 15
II.4 Facial Scanning for Photo-realistic Virtual Humans . . . . . . . . . . . . . . . . . . . . . . . 22
II.5 Lighting Estimation from an Unconstrained Image . . . . . . . . . . . . . . . . . . . . . . . . 25
III Practical Multispectral Lighting Reproduction 29
III.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
III.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
III.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
III.3.1 Driving Multispectral Lights with a Color Chart . . . . . . . . . . . . . . . . . . . . . 32
III.3.2 Recording a Color Chart Panorama . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
III.3.3 Lighting a Subject with a Color Chart Panorama . . . . . . . . . . . . . . . . . . . . 39
III.3.4 Augmenting RGB Panoramas with Color Charts . . . . . . . . . . . . . . . . . . . . 41
III.3.5 Fast Multispectral Lighting Environment Capture . . . . . . . . . . . . . . . . . . . . 42
III.3.6 Reproducing Dynamic Multispectral Lighting . . . . . . . . . . . . . . . . . . . . . . 46
III.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
III.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
III.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
III.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
IV Optimal LED Selection for Multispectral Lighting Reproduction 57
IV .1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
IV .2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
IV .3 Methods and Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
IV .3.1 Spectral Illuminant Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
v
IV .3.2 Metameric Illuminant Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
IV .3.3 Metameric Reflectance Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
IV .3.4 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
IV .4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
IV .4.1 Spectral Illuminant Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
IV .4.2 Metameric Reflectance Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
IV .5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
IV .6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
V Efficient Multispectral Image-Based Relighting 73
V .1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
V .2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
V .3 Method and Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
V .3.1 Multispectral Reflectance Field Capture . . . . . . . . . . . . . . . . . . . . . . . . . 79
V .3.2 Spectral Promotion of Reflectance Functions . . . . . . . . . . . . . . . . . . . . . . 79
V .3.3 Spectral Promotion with Diffuse and Specular Separation . . . . . . . . . . . . . . . 81
V .3.4 Multispectral Image-Based Relighting . . . . . . . . . . . . . . . . . . . . . . . . . . 82
V .3.5 Measurement Setup and Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
V .4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
V .4.1 Color Rendition of Multispectral IBRL . . . . . . . . . . . . . . . . . . . . . . . . . 88
V .4.2 Spectral Promotion of Reflectance Fields . . . . . . . . . . . . . . . . . . . . . . . . 89
V .4.3 Diffuse/Specular Separation Spectral Promotion . . . . . . . . . . . . . . . . . . . . 92
V .5 Application: Camouflage Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
V .6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
VI Efficient Multispectral Facial Capture with Monochrome Cameras 110
VI.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
VI.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
VI.3 Analysis and Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
VI.3.1 Monochrome Multiview Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
VI.3.2 Monochrome Specular Normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
VI.3.3 Diffuse Reflectance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
VI.3.4 Multispectral Polarization Promotion . . . . . . . . . . . . . . . . . . . . . . . . . . 119
VI.3.5 Monochrome Diffuse Normals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
VI.3.6 Color Channel Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
VI.3.7 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
VI.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
VI.4.1 Monochrome Camera Facial Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
VI.4.2 Comparison with Color Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
VI.4.3 Multispectral Polarization Promotion . . . . . . . . . . . . . . . . . . . . . . . . . . 130
VI.4.4 Multispectral Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
VI.4.5 Color Mixing for Diffuse Albedo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
VI.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
vi
VI.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
VII DeepLight: Lighting Estimation for Mobile Mixed Reality in Unconstrained Environments 142
VII.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
VII.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
VII.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
VII.3.1 Training Data Acquisition and Processing . . . . . . . . . . . . . . . . . . . . . . . . 146
VII.3.2 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
VII.3.3 Reflectance Field Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
VII.3.4 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
VII.3.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
VII.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
VII.4.1 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
VII.4.2 Qualitative Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
VII.4.3 Comparisons with Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
VII.4.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
VII.5 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
VII.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
VIII Conclusion and Future Work 173
VIII.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
VIII.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
IX Basis Publications 178
BIBLIOGRAPHY 179
vii
List of Figures
II.1 Example digital image with alpha channel, showing the typical compositing workflow. . . . 8
II.2 Demonstrating the rear projection compositing technique. . . . . . . . . . . . . . . . . . . 9
II.3 Example green screen chroma-based keying. . . . . . . . . . . . . . . . . . . . . . . . . . 10
II.4 Demonstrating environment matting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
II.5 Demonstrating rendering synthetic objects into a real-world scene using image-based lighting. 13
II.6 Demonstrating RGB LED lighting reproduction. . . . . . . . . . . . . . . . . . . . . . . . 14
II.7 Demonstrating image-based relighting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
II.8 The X-Rite Color Checker Chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
II.9 The emission spectra of typical red, green, blue, and white LEDs. . . . . . . . . . . . . . . 21
II.10 Comparing a photograph with a rendered virtual human. . . . . . . . . . . . . . . . . . . . 22
II.11 A subject lit by the polarized spherical gradient illumination conditions. . . . . . . . . . . . 24
III.1 Spectra of the six multispectral LEDs of the light stage used for lighting reproduction. . . . 34
III.2 A color chart illuminated by each spectral channel of the light stage. . . . . . . . . . . . . . 35
III.3 Comparing color charts for different illuminants under RGB, RGBW, and RGBCAW mul-
tispectral lighting reproduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
III.4 Quantitative error for different illuminants under RGB, RGBW, and RGBCAW multispec-
tral lighting reproduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
III.5 Using a different camera for lighting capture and lighting reproduction. . . . . . . . . . . . 37
III.6 Capturing a color chart panorama. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
III.7 Two visualizations of a color chart panorama. . . . . . . . . . . . . . . . . . . . . . . . . . 39
III.8 Lighting a subject inside the light stage with a color chart panorama. . . . . . . . . . . . . . 40
III.9 Fast multispectral lighting environment capture. . . . . . . . . . . . . . . . . . . . . . . . . 43
III.10 Processing a multispectral lighting environment from photographs captured using the fast
technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
III.11 A subject in a variety of lighting environments with three different lighting reproduction
techniques: RGB, RGBW, and RGBCAW. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
III.12 A subject in a variety of lighting environments with three different lighting reproduction
techniques: RGB, RGBW, and RGBCAW. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
III.13 Comparing multispectral illumination capture techniques of chapter III. . . . . . . . . . . . 49
III.14 Comparing lighting reproduction using linear least squares solves and non-negative least
squares solves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
III.15 Comparing color rendition for lighting reproduction using linear least squares solves and
non-negative least squares solves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
viii
III.16 Quantitative errors for comparing color rendition for lighting reproduction using linear least
squares solves and non-negative least squares solves . . . . . . . . . . . . . . . . . . . . . 52
III.17 Comparison of single-chart reconstruction versus five-chart reconstruction. . . . . . . . . . 52
III.18 Comparing subjects lit by real-world illumination and its reproduction in the light stage. . . 53
III.19 Example frames for dynamic lighting reproduction. . . . . . . . . . . . . . . . . . . . . . . 54
IV .1 Spectra of commercially available LEDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
IV .2 A circuit board with 11 LEDs of distinct spectra. . . . . . . . . . . . . . . . . . . . . . . . 62
IV .3 Sum of Squared Errors for Spectral Illumination Matching for different LED combinations. 64
IV .4 Average color error for different illuminants for different LED combinations, using metameric
reflectance matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
IV .5 Average color error for different illuminants for different LED combinations, using metameric
reflectance matching, when considering skin tones specifically. . . . . . . . . . . . . . . . . 66
IV .6 Average color error when considering different lighting reproduction optimization techniques. 67
IV .7 Color charts generated for the human observer, comparing real-world illumination with
LED-reproduced lighting, for different combinations of LEDs. . . . . . . . . . . . . . . . . 68
IV .8 Comparing the spectra of LED-reproduced illumination to real-world illumination. . . . . . 69
IV .9 Skin tone colors generated for the human observer, comparing real-world illumination with
LED-reproduced lighting, for different combinations of LEDs. . . . . . . . . . . . . . . . . 70
V .1 A still life scene setup inside the light stage. . . . . . . . . . . . . . . . . . . . . . . . . . . 76
V .2 A complex, real-world multispectral lighting environment captured for the results of chapter V. 87
V .3 Quantitatively comparing three methods of IBRL: multispectral LEDs with an RGB camera,
white LEDs only (previous work), and multispectral LEDs using a monochrome camera. . . 88
V .4 Qualitative Assessment of Spectral Promotion techniques presented in chapter V. . . . . . . 91
V .5 Quantitative Assessment of Spectral Promotion techniques presented in chapter V. . . . . . 92
V .6 Comparing white LED IBRL and efficient multispectral IBRL with spectral promotion. . . . 93
V .7 Comparing three methods of IBRL: multispectral LEDs with an RGB camera, white LEDs
only (previous work), and multispectral LEDs using a monochrome camera. . . . . . . . . . 95
V .8 Qualitative evaluation of different techniques of spectral promotion of chapter V. . . . . . . 96
V .9 Illustrating spectral promotion with diffuse/specular separation. . . . . . . . . . . . . . . . 97
V .10 An inset illustrating spectral promotion with diffuse/specular separation. . . . . . . . . . . . 98
V .11 Comparing measured and spectrally-promoted multispectral basis images. . . . . . . . . . . 99
V .12 A light stage designed for full body capture, equipped only with white LEDs. . . . . . . . . 102
V .13 Example reflectance basis images for military camouflage evaluation with multispectral
image-based relighting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
V .14 Multispectral mage-based relighting result including compositing visualization with alpha
channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
V .15 Quantitative color error to compare RGB and multispectral image-based relighting. . . . . . 105
V .16 Comparing multispectral and RGB-scaling image-based relighting for a military camouflage
evaluation application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
V .17 Comparing multispectral and RGB-scaling image-based relighting for a military camouflage
evaluation application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
ix
V .18 Comparing multispectral and RGB-scaling image-based relighting for a military camouflage
evaluation application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
VI.1 Monochrome photographs of a subject illuminated by multispectral LEDs and the resulting
colorized images and renderings of the high-resolution facial scan. . . . . . . . . . . . . . . 112
VI.2 A cross-polarized photograph of a subject and a white-balanced polarization difference im-
age to show colorless specular reflections from the skin. . . . . . . . . . . . . . . . . . . . 118
VI.3 Detail view of skin illuminated by different sources of narrow-band multispectral LED illu-
mination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
VI.4 Adding images of skin illuminated by different spectral channels. . . . . . . . . . . . . . . 126
VI.5 Photograph of a subject and renderings produced using the monochrome facial scanning
pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
VI.6 Diffuse normals, specular normals, and diffuse albedo generated from a monochrome cam-
era facial scan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
VI.7 Comparing specular normals computed for a color camera and a monochrome camera,
showing improved image resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
VI.8 Experimentally evaluating multispectral polarization promotion by comparing with ground
truth multispectral polarization difference images. . . . . . . . . . . . . . . . . . . . . . . . 133
VI.9 A painted mannequin photographed for multispectral optical flow experiments. . . . . . . . 135
VI.10 Qualitative results demonstrating our multispectral optical flow technique. . . . . . . . . . . 137
VI.11 Evaluating color rendition for a monochrome imaging facial scanning pipeline. . . . . . . . 139
VI.12 Images of a subject captured with a monochrome camera used for generating a high-resolution
scan, including hallucinated colorized polarization difference images. . . . . . . . . . . . . 140
VII.1 Visual synopsis of DeepLight. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
VII.2 Comparing the FOV of a mobile phone video relative to the 360
environment. . . . . . . . 144
VII.3 Capture apparatus developed for DeepLight data collection. . . . . . . . . . . . . . . . . . 147
VII.4 The network architecture and training schema for DeepLight. . . . . . . . . . . . . . . . . 148
VII.5 Visual results for an ablation study, showing the utility of the different loss terms used for
Deep Light. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
VII.6 Qualitative comparisons between ground truth spheres and renderings using DeepLight in-
ference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
VII.7 Comparing spheres rendered with DeepLight inference and those rendered using lighting
inference from previous state-of-the-art techniques. . . . . . . . . . . . . . . . . . . . . . . 158
VII.8 Comparing real objects with rendered objects using ground truth illumination, DeepLight
inference, and inference of the previous state-of-the-art techniques. . . . . . . . . . . . . . 159
VII.9 Additional qualitative comparisons between DeepLight and the state-of-the-art lighting in-
ference, for indoor scenes (part a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
VII.10 Additional qualitative comparisons between DeepLight and the state-of-the-art lighting in-
ference, for indoor scenes (part b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
VII.11 Additional qualitative comparisons between DeepLight and the state-of-the-art lighting in-
ference, for outdoor scenes (part a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
x
VII.12 Additional qualitative comparisons between DeepLight and the state-of-the-art lighting in-
ference, for outdoor scenes (part b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
VII.13 Comparing real objects with rendered objects using ground truth illumination, DeepLight
inference, and inference of the previous state-of-the-art techniques. . . . . . . . . . . . . . 167
VII.14 Comparing real objects with rendered objects using ground truth illumination, DeepLight
inference, and inference of the previous state-of-the-art techniques. . . . . . . . . . . . . . 168
VII.15 Comparing real objects with rendered objects using ground truth illumination, DeepLight
inference, and inference of the previous state-of-the-art techniques. . . . . . . . . . . . . . 169
VII.16 Comparing real objects with rendered objects using ground truth illumination, DeepLight
inference, and inference of the previous state-of-the-art techniques. . . . . . . . . . . . . . 170
VII.17 Comparing the relative radiance accuracy for DeepLight and other state-of-the-art techniques.171
VII.18 Comparing qualitative temporal consistency for DeepLight and the state-of-the-art tech-
nique for lighting estimation from indoor scenes. . . . . . . . . . . . . . . . . . . . . . . . 171
VII.19 Visualizing using a different camera for lighting inference than was used for training data
capture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
VII.20 Visualizing challenging scenes for lighting inference. . . . . . . . . . . . . . . . . . . . . . 172
xi
Abstract
In digital compositing, images from a variety of sources are combined to form a seamless, realistic result
that appears as though it could have been photographed in the real world. This technique is extensively used in
visual effects and film production, for example when an actor filmed in a studio is composited into a different
real or virtual environment. It is also widely used in augmented or mixed reality, where rendered virtual imagery
is combined with a live video feed. For a realistic and convincing composite, the content from the various source
images must appear to have been captured at the same time under the same lighting conditions. Thus, whether
such content is photographed or synthetically rendered, digital compositing benefits from accurate measure-
ment of scene illumination, and, for rendered objects, also from accurate material reflectance measurement. In
this dissertation, we therefore present new techniques for the measurement and estimation of illumination and
reflectance, geared towards the goal of producing realistic and color-accurate digital composites.
First, we present a multispectral illumination measurement and playback technique to extend studio lighting
reproduction for live-action compositing to the multispectral domain, improving color rendition as compared
with previous work. Next, a theoretical analysis affords the selection of an optimal, minimal set of light emitting
diodes (LEDs) of distinct spectra for this technique. For post-production methods, we extend image-based
relighting to the multispectral domain for the same goal of improved color rendition, forming the appearance
of an object in a novel lighting environment by combining images of the object as lit by an omnidirectional,
multispectral lighting basis. We show one real-world application of multispectral image-based relighting where
color accuracy is critical: military camouflage material evaluation. For another specific application of digitizing
humans, which enables rendering a person under a novel viewpoint or lighting environment, we also present
an efficient approach to multispectral high-resolution facial scanning using monochrome cameras, where both
geometric resolution and color rendition are improved compared to previous work.
xii
The final technique we present is a machine learning based method to estimate high dynamic range (HDR),
omnidirectional illumination given only a single, unconstrained mobile phone image with a limited field-of-
view. As image formation using the aforementioned post-production techniques requires accurate lighting mea-
surement, typically realized using panoramic HDR photography and a special color calibration target, real-time
mobile mixed reality compositing with convincing illumination remains elusive because these off-line light mea-
surement techniques are both impractical and not accessible for the average mobile phone user. Our learning
based approach for omnidirectional HDR lighting estimation from a single image is the first to generalize to both
indoor and outdoor scenes, while comparing favorably to previous methods developed to handle only a single
class of lighting.
xiii
Chapter I
Introduction
Compositing, or realistically combining imagery from a variety of sources to form one final image or video,
has long been used by visual effects practitioners, filmmakers, and cinematographers as a creative, narrative-
enhancing tool. The technique enables actors filmed in a studio to appear as though they were recorded on
location outside the studio, or even to appear as though they were part of a computer-generated, virtual scene.
More broadly, compositing need not involve an actor’s performance; rather, many photographed or rendered
components can be combined together to form an image that is impossible or too difficult, dangerous, or expen-
sive to be natively filmed in-camera, opening new realms of creative possibility where content is limited only by
the creator’s imagination rather than by photographic feasibility.
Compositing is differentiated from "bricolage" or "collage," the creation of a work through the assemblage
of materials from a diverse range of sources, in that its final goal is realism, even if some of the source material is
fantastical or computer-generated imagery. The professional compositor and educator Steve Wright succinctly
stated "The Compositor’s Creed," [151] writing: "The ultimate objective of compositing is to take images from
a variety of sources and combine them in such a way that they appear to have been shot at the same time under
the same lighting conditions by the same camera." The techniques described in this dissertation serve this goal,
with a particular focus on matching the illumination across the source elements to be combined.
1
Compositors typically match component imagery from different sources by first manually adjusting color
balance and overall image brightness, although such efforts are not always successful. For improved results,
skilled cinematographers and lighting artists often expend considerable effort to match the illumination condi-
tions across component imagery in advance, particularly focusing on the colors and directions of the various light
sources in the scene. For instance, when filming an actor in a studio to composite onto an outdoor background,
this could include simulating the sun with a bright studio light source placed at just the right relative location
on set. For computer-generated imagery, this often means lighting the virtual content using Global Illumination
(GI) rendering techniques with photographed high dynamic range (HDR) panoramas captured in the real world
[27]. In both examples, while the directions, relative intensities, and colors of light sources in the scene are
considered, their spectra are overlooked. Unfortunately, this lack of spectral consideration is problematic for
color rendition in compositing, as the spectral composition of a light source is responsible for color appearance.
Color is the result of integrating the spectral modulation of a scene illuminant by the reflectance spectrum of
the material being illuminated by the spectral sensitivity of the observer (camera or human eye). Both the human
visual system and most digital cameras are "tristimulus observers," measuring color by integrating across three
different portions of the visible electromagnetic spectrum, turning a continuous signal across wavelengths into
three values for a set of color primaries. For digital cameras, these primaries are usually red, green, and blue,
realized by filtering broad-spectrum incident light with three differently colored translucent materials. For the
human visual system, these primaries correspond to the three different photoreceptors of the human eye (S, M, L
cone cells), which have different sensitivities to light of various wavelengths due to their specialized photopsins,
or photoreceptive proteins.
This physics of color appearance yields two significant consequences for compositing. First, cameras with
different spectral sensitivities will usually produce different observed colors for the same scene and illumination,
making the seamless integration of imagery captured by multiple devices difficult. Next, even for a single
camera, matching only the color of two light sources is not enough to guarantee matching color rendition for all
2
materials in the scene, assuming the spectral compositions of the light sources are not the same. It is therefore
challenging to produce studio lighting with color rendition that perfectly matches that of light sources from the
real world, e.g. the sun. A film’s colorist may ultimately want to spend their time expressing a certain mood
or feeling through color, but unfortunately must first expend considerable effort to correct color mismatches
arising from using multiple cameras and light sources with variable spectra. The techniques described in the
first sections of this dissertation (chapter III - chapter VI) thus propose solutions for improved color rendition
in compositing, based on the insights that image formation involves spectral modulation and that the world is a
spectral renderer.
The first section (chapter III) focuses on lighting reproduction for live-action compositing, where the goal
is to computationally light an actor in a studio with the light of the scene into which they will ultimately be
composited. In this technique, first described by Debevec et al. [32], an actor is surrounded by a sphere of
computer-controllable red, green, and blue light emitting diodes (RGB LEDs), which display photographed or
synthetically rendered HDR panoramic lighting environments. We extend both the illumination measurement
and playback techniques to the multispectral domain, adding a variety of LEDs of distinct spectra and thereby
improving color rendition over the RGB LED-only state-of-the-art. The following section (chapter IV) provides
a theoretical analysis of the optimal set of LEDs of distinct spectra to use for this technique, with broader appli-
cability to luminaire design for any color sensitive workflows such as film and television production, archiving,
and cultural heritage preservation.
Subsequent portions of this dissertation consider the spectral aspects of various post-production techniques,
where computer graphics methods enable realistically relighting or rendering a subject into a novel scene, in
contrast to the earlier chapters concerned with producing color-accurate studio lighting. In chapter V, we extend
image-based relighting (IBRL) [30, 104] to the multispectral domain, forming the appearance of an object in a
novel lighting environment by combining images of it as lit by an omnidirectional, multispectral lighting basis.
Although LED studio lighting presents the aforementioned challenges for color rendition in film-making, the
3
availability of relatively bright but narrow-band LED sources affords a whole new way to form color images
without using a tristimulus camera. In chapter VI, we present a new way to scan and digitize a human face
measuring both geometry and reflectance using multispectral solid state lighting with monochrome cameras.
We improve both color rendition and geometric resolution compared with the state-of-the-art technique. Just
as chapter III presents a multispectral extension of earlier work [32] for improved color rendition, chapter V
similarly extends IBRL as first described by Debevec et al. [30] and Nimeroff et al. [104], while chapter VI
extends the polarized gradient illumination high-resolution facial scanning techniques of Ma et al. [98] and
Ghosh et al. [49]. We improve each technique by using multispectral LEDs and knowledge of the spectral
nature of image formation.
Beyond visual effects and film production, compositing also forms the basis for a newer form of media:
augmented or mixed reality (AR or MxR). In AR, the goal is to realistically render virtual content into a live
camera feed, often that of a mobile phone. Unlike in off-line rendering scenarios for visual effects, real-world
environmental illumination measurement via panoramic HDR photography would be impractical for casual AR
applications, as it would require specialized capture equipment. While the live camera feed is available during
an AR session, estimating scene illumination from a single image with low dynamic range (LDR) and a limited
field-of-view (FOV) is a challenging, under-constrained problem. One reason is that an object’s appearance in
an image is the result of light arriving from the full sphere of directions around the object, including from those
outside the camera’s FOV . Furthermore, even light sources within the FOV will likely be too bright to be mea-
sured properly in a single exposure if the rest of the scene is well-exposed, saturating the image sensor due to
limited dynamic range and thus yielding an incomplete record of relative scene radiance. Thus, in chapter VII,
the dissertation concludes with a machine learning based technique for estimating a 360
HDR lighting environ-
ment given only a limited field-of-view, single exposure image, as would be available from the live camera feed
4
during a typical AR session. Key to our technique is that spheres with diverse bidirectional reflectance distri-
bution functions (BRDFs) reveal different lighting cues, enabling us to record training data for a deep learning
model using a standard LDR video stream.
I.1 Summary of Contributions
In this dissertation, we present new techniques for the measurement and estimation of illumination and re-
flectance, geared towards the goal of producing realistic and color-accurate digital composites for visual effects,
film production, and augmented reality applications.
The main contributions are:
(i). A practical framework for reproducing omnidirectional incident illumination conditions with complex
spectra using a light stage with multispectral LED lights.
A set of capture techniques to measure the omnidirectional color rendition properties of a scene’s
illumination. One technique is suitable to capture such data at video-rate.
An optimization-based technique to reproduce such illumination inside a multispectral light stage,
improving color rendition for live-action compositing compared with the state-of-the-art, while re-
quiring no spectral measurements.
A theoretical analysis that affords the selection of an optimal, minimal set of LEDs to use for this
technique.
(ii). A framework to efficiently extend image-based relighting to the multispectral domain.
A technique to form the appearance of an object in a new lighting environment by combining images
of it as lit by an omnidirectional, multispectral lighting basis, improving color rendition as compared
with the state-of-the-art.
A technique for the spectral promotion of white light basis images to multispectral, enabling efficient
capture of a fully-multispectral basis.
(iii). A framework to efficiently extend polarized gradient illumination high-resolution facial scanning to the
multispectral domain.
A technique for high-resolution facial scanning using monochrome cameras and multispectral LEDs
that trades the spatial multiplexing of a typical camera’s Bayer pattern for temporal multiplexing
with narrow-band illumination for improved geometric resolution and color rendition.
5
A step-by-step analysis of the utility of color data in the face scanning pipeline that affords a more
efficient multispectral capture compared with the naïve approach.
A technique for multispectral polarization promotion, such that only one spectral channel of the
multispectral scanning system requires polarization.
A technique for multispectral optical flow, used for aligning images across spectral channels.
(iv). A deep learning based method to infer plausible high dynamic range, omnidirectional illumination given
an unconstrained, low dynamic range image with a limited field-of-view.
An end-to-end method to infer 360
HDR illumination from mobile phone imagery at interactive
frame rates, for augmented reality applications.
A data collection technique using spheres of varying reflective properties for training the lighting
estimation algorithm.
A novel image-based relighting rendering loss function, used for training the HDR lighting inference
network using only LDR data.
6
Chapter II
Background and Related Work
Great care and consideration are required to generate realistic digital image composites from multiple sources,
whether they are photographed or rendered. Significant research efforts in computer graphics, computer vision,
and computational photography have endeavored to match relative camera poses, lighting conditions, and film
grain or camera noise characteristics, as well as to generate reasonable "edges" of foreground elements to use in
seamless blending. In this section, we present a brief history of compositing, starting in the era of film cameras,
and then we describe the related works from both film production and research for each of the core contributions
of this dissertation. Thus our main focus areas are lighting and reflectance measurement for various forms of
compositing: live-action, post-production or rendering-based including of human faces, and in an unconstrained
setting for augmented reality.
II.1 Digital Image Compositing
The Alpha Channel and Matting. To quote computer graphics researcher Alvy Ray Smith, "The history of
digital image compositing ... is essentially the history of the alpha channel" [126], the record of where the
various foreground and background elements should show in an image, and in what ratio or with what opacity,
as first described for digital images by Porter and Duff [110]. An example of the digital compositing process for
7
combining foreground and background elements with an alpha channel is shown in Fig. II.1. As with many other
modern visual effects methods, digital compositing descends from techniques originally developed for motion
pictures recorded using film rather than digital cameras. The goal of these methods at first was to realistically
combine foreground and background elements filmed separately, so that an actor filmed a studio could be made
to appear elsewhere in the final film. Early compositing methods usually required the analog equivalent of the
alpha channel: the matte, a secondary, companion film strip with selective transparency, indicating where the
final composited image should resemble the foreground (and whose complement indicated where the final image
should resemble the background). Such film where the transparency was spatio-temporally aligned to a moving
actor’s performance was called a traveling matte. The process of extracting this matte was termed matting.
a. original image b. alpha channel (a) c. pre-multiplied(a b) d. example composite
e. background f. complement (1a) g. pre-multiplied(e f) h. composite(c+ g)
Figure II.1: a. An original digital input image. b. The alpha channel (a). c. The result of multiplying a and b,
also referred to as the pre-multiplied foreground element. d. The foreground element (brightened by one stop)
composited into a natural scene. e. A checkerboard background image. f. The complement of the alpha channel
(1a). g. The result of multiplying e and f, also referred to as the pre-multiplied background element. h.
The foreground element composited onto the checkboard, computed by summing pre-multiplied images c and
g. Image a and ground truth alpha channel b courtesy of Rhemann et al. [115].
8
Film-Based Compositing Methods. Some of the earliest techniques for motion picture compositing included
the "Williams Process" (1916) and the "Dunning and Pomeroy Processes" (1924) for black-and-white film, along
with rear screen projection (1915) or Lawrence Butler’s blue screen technique (1940) for color film [120, 40,
144, 20]. In the Williams Process [146], an actor was filmed in front of a black background, and the final
foreground shot was processed a second time to create an over-exposed image such that the actor could serve as
their own traveling matte. In the Dunning and Pomeroy Processes [34, 108], an actor illuminated by orange light
performed in front of a blue screen, and "an unexposed negative was bipacked with a developed and orange-
toned positive" that depicted the desired background. "Since the orange light passed the positive of the same
color without any interference, the foreground was depicted unaltered," while "the blue background light ...
printed the toned background positive anywhere where it was not blocked out by the studio set and actors"
[144]. In rear screen projection, or simply "rear projection," as shown in Fig II.2, an actor performed in front
of a translucent screen back-lit by a synchronized projector playing the mirror image of a previously-filmed
background sequence [144], so no traveling matte was required. All three techniques were used in filming the
black-and-white King Kong (1933), among numerous other films [120].
Figure II.2: Demonstrating the rear projection compositing technique. Image courtesy Wikipedia (author: Wiki-
wikiyarou, reproduced under Creative Commons Attribution-Share Alike 3.0 license).
9
Chroma-Based Keying Methods. Upon the introduction of color film, Lawrence Butler developed the first
predecessor to modern green or blue screen chroma-keying [120], where color differences are used to separate
the foreground and background from a single image. In Butler’s technique, an actor performed in front of a
blue screen, and "black-and-white separations" were used to separate a color film negative into individual color
channels, where the resulting blue channel produced the required traveling matte [120]. The method worked
well as long as the reflected colors from the actor (e.g. their skin, hair, and clothing) were not the same hue as
the blue studio backdrop. It helped Butler win the Best Special Effects Academy Award in 1941 for The Thief
of Bagdad (1940) [120, 6] and was later improved by Petro Vlahos (1959) [140] for better color rendition of
the foreground elements, as used in Ben Hur (1959), securing Vlahos a 1964 Academy Award [51]. Vlahos’s
technique was also employed in the first Star Wars trilogy films (1977-1983) [60].
a. original image b. alpha channel (a) c. pre-multiplied a b d. composite
Figure II.3: Demonstrating modern green screen chroma-based keying. a. An original input image of a subject
filmed in front of a green screen. b. The alpha channel (a) obtained using chroma-based keying techniques. c.
The result of multiplying a and b, also referred to as the pre-multiplied foreground element. d. The foreground
element composited onto a checkerboard background. Images reproduced from LeGendre et al. [85].
While chroma-keying was subsequently used for decades in the motion picture industry with great practical
success, in 1996 Smith and Blinn analyzed the mathematics of accurately "separating a desired foreground image
from a background of a constant color" and showed it to be provably unsolvable [127], introducing techniques
to prune the search space of possible solutions. Nonetheless, digitally chroma-keying an actor’s performance
10
in front of a green or blue screen, as shown in Fig. II.3, is still frequently performed for contemporary film
and television production, with many modern visual effects software packages offering a panoply of proprietary
algorithmic options to extract the coveted alpha channel. Production-ready, automated techniques still lever-
age chroma or luma-based color differences for foreground / background separation, however the single image
natural matting problem, where the background is unconstrained, has also been explored in a research context
[90, 91, 115].
Environment Matting. Each of the aforementioned techniques for digital and analog film compositing tack-
led the first order challenge of neatly separating an image into foreground and background components using
opacity, but did not consider how well the extracted foreground elements would otherwise visually blend into
the desired background. For the composite to look convincing, however, the foreground subjects must reflect
light in a manner consistent with the scene into which they will be composited.
Figure II.4: Demonstrating the environment matting technique of Zongker et al. [161]. From left to right: a
standard alpha-matte composite, an environment matte composite, and a photograph of the vase in front of a
background image. Light refraction is more faithfully rendered using this technique. Figure reproduced from
Zongker et al. [161].
For real-world objects, Zongker et al. [161] therefore introduced Environment Matting, describing and
introducing techniques to capture not only a foreground subject’s opacity, but also how it reflected and refracted
11
illumination, as shown in Fig. II.4. The technique produced realistic composites for static scenes, but, as it
approximated light transport using an array of changing backdrops realized with computer monitors, it was not
applicable for moving subjects. Follow-up work [23] extended the technique for moving colorless objects (such
as water pouring into a glass).
II.2 Image-Based Lighting
High Dynamic Range Image-Based Lighting. To realistically composite an image of a virtual or synthetic
object into a natural scene, Debevec [27] introduced a method where the object was lit with omnidirectional illu-
mination photographed in the real-world, and rendered using Global Illumination (GI) techniques, as shown in
Fig. II.5. Blinn and Newell [17] had previously introduced Environment Mapping, where the color of a rendered
object at a particular point was represented by its surface normal, which indexed into the corresponding direc-
tion of a 360
, low dynamic range (LDR) environment map. This technique generated realistic sharp specular
reflections, but only for mirror-like, high-shine objects. GI rendering, in contrast, enabled the realistic simula-
tion of light transport for both direct and indirect illumination, for materials reflecting light both specularly and
diffusely [54, 72]. Since an object’s appearance in an image is the result of light arriving from a full sphere of
incident lighting directions around the object, Debevec [27] employed omnidirectional or panoramic photogra-
phy techniques to capture a complete record of scene illumination, such as by photographing a mirror sphere or
stitching across several wide-angle photographs captured with a fisheye lens. To recover the full dynamic range
of the lighting, multiple exposures were captured of a given panoramic scene and radiometrically aligned using
the technique of Debevec and Malik [31]. Since the resulting high dynamic range (HDR), panoramic image
served as the record of scene illumination used for rendering, the technique came to be termed Image-Based
Lighting (IBL) [28]. GI rendering using HDR IBL is used extensively in modern visual effects and is now
available in most sophisticated 3D rendering software programs (e.g. Maya, Cinema 4D, and 3ds Max).
12
Figure II.5: Top: demonstrating rendering synthetic objects into a real-world scene using Image-Based Lighting,
as in the work of Debevec [27]. Image reproduced from the companion SIGGRAPH Electronic Theater work,
"Rendering with Natural Light". Bottom: the HDR IBL used for this work, represented as an image of a mirror
sphere and displayed at multiple exposures to show the dynamic range.
Image-Based Lighting Reproduction for Live-Action Compositing. Just as photographed HDR panoramas
have been used as a record of environmental lighting for rendering virtual objects, they have since also been used
to drive omnidirectional studio illumination, lighting real-world subjects for live-action compositing. Debevec
et al. [32] introduced Image-Based Lighting Reproduction: the technique of computationally reproducing the
lighting of a real-world scene inside a studio, surrounding an actor by a whole sphere of computer-controllable
RGB LED lights driven to replicate the color and intensity of a scene’s measured incident illumination from
each direction, as shown in Fig. II.6. This method has since been applied in commercial films such as Gravity
13
(2013, Academy Award Winner for Best Visual Effects in 2014) [58, 8], Rogue One: A Star Wars Story (2016,
Nominated for Academy Award for Best Visual Effects in 2017) [125, 1], and First Man (2018, Academy Award
Winner for Best Visual Effects in 2019) [93, 9].
Figure II.6: Left: Demonstrating image-based lighting reproduction for live-action compositing, where a subject
is lit in a studio with RBG LEDs driven to replicate the color and intensity of a scene’s measured incident
illumination. Right: Three examples of live-action composites generated using this technique, with lighting
environments in the bottom row. Images reproduced from Debevec et al. [32].
Image-Based Relighting. Similarly, just as HDR panoramas have been used to realistically render a virtual
object with GI or to light an actor in a studio with real-world illumination, they have also been used for Image-
Based Relighting (IBRL). In this technique, introduced by Nimeroff et al. [104] and Debevec et al. [30], the
appearance of a subject under novel lighting conditions is formed as a linear combination of its appearance in
a set of basis reflectance images. These are captured by first lighting the subject with a series of illumination
conditions forming an orthonormal basis on a sphere (see Fig. II.7, left). This process produces realistic imagery
for compositing into real-world scenes because light is additive. An example rendering produced is shown in Fig.
II.7, right. The technique bears some resemblance to lighting reproduction, except that rather than filming the
subject live in the studio as lit by a reproduction of the environmental illumination, the photographed reflectance
basis affords post-production relighting instead. Since its development, the method has since been used in
14
commercial films including Spider-Man 2 (2004, Academy Award Winner for Best Visual Effects in 2005)
[117, 7].
Figure II.7: Demonstrating Image-Based Relighting for a human face. Left: visualizing the reflectance basis
images of the subject as lit by the spherical lighting basis. Right: the subject relit using the inset HDR image-
based lighting environment. Images reproduced from Debevec et al. [30].
II.3 Spectral Measurement, Color Rendition, and Multispectral Illumination
In the previous section, we described the core IBL-based techniques for compositing that have been developed
to ensure consistent, accurate illumination across source elements to be combined, whether they be real-world
subjects or synthetic renderings. While each of these techniques (GI rendering, lighting reproduction, and IBRL)
have yielded remarkable realism in compositing, they operate only in the tristimulus color or RGB domain,
without considering the spectral nature of real-world image formation. As this dissertation extends lighting
measurement (IBL), playback (lighting reproduction), and reflectance measurement (IBRL) to the multispectral
domain for improved color rendition in compositing, in this section, we therefore present the related work
covering spectral lighting measurement and multispectral illumination sources.
15
Spectral Lighting Measurement. Traditional omnidirectional illumination measurement techniques such as
that of Debevec [27] capture only tristimulus RGB panoramic imagery rather than fully spectral information,
which can lead to color rendition errors as noted in chapter I. In chapter III, we thus first introduce a method to
measure the omnidirectional color rendition properties of a scene using panoramic HDR photography augmented
by images of one or more color charts. However, other significant work has endeavored to estimate or capture
fully spectral incident illumination conditions. With measured camera spectral sensitivity functions, Tominaga
and Tanaka [135] promoted RGB omnidirectional imagery to spectral estimates by projecting onto the first
three principal components of a set of illuminant basis spectra, showing successful modeling of daylight and
incandescent spectra within the same scene. However, the technique required measuring the camera spectral
sensitivities and was not demonstrated to solve for more than two illuminant spectra. Tominaga and Fukuda
[134] acquired mirror sphere photographs with a monochrome camera and multiple bandpass filters to achieve
six spectral bands and also swapped the bandpass filters for a Liquid Crystal Tunable Filter to achieve 31 or 61
bands. Since the filters’ bands overlapped, they solved a linear system to best estimate the illuminant spectra.
However, this system required specialized equipment to achieve these results, unlike the capture approach we
present in chapter III that requires only standard photographs. Kider et al. [76] augmented HDR panoramic
photography with a mechanically instrumented spectroradiometer, capturing 81 sample spectra over the upper
hemisphere. They used bi-cubic interpolation over the hemisphere to validate a variety of spectral sky models
but did not explicitly fuse the spectral data with the RGB photography of the sky. Kawakami et al. [74] promoted
RGB sky imagery to spectral estimates by inferring sky turbidity and fitting to a spectral skylight model.
Multispectral Image-Based Lighting. Darling et al. [26] explored multispectral IBL for rendering synthetic
subjects, describing a six-channel multispectral environmental illumination capture method and real-time ren-
dering workflow with superior color matching results as compared with RGB-based rendering, where the target
16
colors were defined based on fully-spectral rendering at narrow-bandwidth resolution. Their method of multi-
spectral environmental illumination capture extended the technique of Debevec [27] to include capturing HDR
photographs of a mirrored sphere as seen by a camera through cyan and yellow filters, but required multiple
photographs of the same scene and a modified camera with a removed infrared filter, also differing from our
approach in chapter III.
Lighting Measurement using Color Charts. Rather than directly measuring emission or reflectance spectra,
throughout this dissertation, we often describe techniques that leverage color chart photographs to reproduce
the color rendition properties of complex multispectral illumination environments. This approach was inspired
by the success of estimating both illuminant spectra and camera spectral sensitivities using color charts with
samples of known or measured reflectance. The Macbeth ColorChecker
T M
Chart [101] (now sold by X-Rite
T M
,
see Fig. II.8) simulates a variety of natural reflectance spectra, including light and dark human skin, foliage,
blue sky, certain types of flowers, and a ramp of spectrally flat neutral tones. Its 24 patches have 19 independent
reflectance spectra, which, when photographed by an RGB camera, yield 57 linearly independent integrals of
the spectrum of light falling on the chart. Rump et al. [116] showed that camera spectral sensitivities could be
estimated from an image of such a chart (actually, a more advanced version with added spectra) under a known
illuminant, and Shi et al. [123] showed that relatively smooth illuminant spectra could be estimated from a
photograph of such a chart. Our goal in chapter III is not to explicitly estimate scene illuminant spectra as in
Shi et al., but rather to drive a sphere of multispectral LEDs to produce the same appearance of a subject to a
camera under spectrally complex lighting environments. While neither of these works addressed omnidirectional
lighting capture or lighting reproduction, both showed that surprisingly rich spectral information is available
from color chart observations, which we leverage throughout this dissertation.
17
Figure II.8: The X-Rite (Macbeth) Color Checker chart commonly photographed and used for ensuring consis-
tent image color balance. This chart or a variant with additional neutral, gray squares are used throughout this
work.
Multispectral Lighting Reproduction for a Single Light Source. Our idea to extend image-based light-
ing reproduction to the multispectral domain was largely inspired by a detailed study by Wenger et al. [145]
demonstrating that the spectral mismatches between common real-world illumination spectra and the LED-based
sources used for lighting reproduction severely impact color rendition. To address this, they showed that by mea-
suring the spectra of the LEDs, the spectral reflectance of a set of material samples, and the camera’s spectral
sensitivity functions, the color rendition properties of particular illuminants such as incandescent and fluores-
cent lights could be simulated reasonably well with a nine-channel multispectral LED light source. However,
unlike our technique described in chapter III, this work required specialized spectral measurement equipment
and limited discussion to individual light sources, rather than a fully spherical lighting environment.
Multispectral LED-Based Illumination Sources. Throughout this dissertation and for lighting reproduction
specifically, we use a six-channel multispectral light stage with sources arranged around a sphere. Spherical
or dome-shaped multispectral LED lighting rigs have been previously designed and constructed to illuminate
a subject from all directions for the purpose of multispectral material reflectance measurement, rather than
studio lighting reproduction. The system of Ajdin et al. [11] used sixteen spectral channels, while that of
18
Gu and Liu [57] used six and Kitahara et al. [79] used nine. Besides that of Wenger et al. [145], individual
multispectral light sources have also been built [52, 77, 105, 106, 124], also predominantly for image-based
spectral reflectance measurements of scenes and materials rather than lighting reproduction. LED Light sources
with individually-controllable spectral channels have also been used in professional studio lighting or consumer
electronics contexts. In particular we note the availability of multispectral professional studio luminaires, such
as the RGBW (red, green, blue, white) LED ARRI Skypanel and various RGBW and RGBA (red, green, blue,
amber) ColorKinetics LED light sources, as well as the consumer-focused Philips Hue LED light bulbs, which
are three-channel Red, Lime, Royal Blue or RGB luminaires.
Optimal Multispectral LED Combinations. After introducing techniques for multispectral lighting repro-
duction in chapter III, in chapter IV we present a theoretical analysis of the best selection LEDs of distinct
spectra to use for this technique when optimizing for color rendition. To the best of our knowledge, Ajdin et al.
[11], Gu and Liu [57], and Kitahara et al. [79] selected LEDs for their spherical lighting systems based on maxi-
mizing the coverage of the visible light spectrum according to LED availability, without specifically considering
color rendition. For single light sources rather than sphere or dome-shaped systems, Wenger et al. [145], sought
to cover the gaps between the emission spectra of RGB LEDs with their nine-channel multispectral light, but
did not discuss a minimal, sufficient LED set. Park et al. [105] selected optimal combinations from a set of
five multispectral LEDs for a sequential illumination imaging system, but the initial selection of LEDs was not
discussed. Shrestha et al. [124] described a multispectral LED-based imaging system, starting with a total of
19 LEDs of distinct spectra and generating optimal sets of three LEDs each, based on the accuracy of per-pixel
estimated reflectance spectra or colors from either two or three photographs under sequential multispectral il-
lumination conditions. While this approach is the most similar to our optimal LED selection contribution, our
objective differs in that we optimize LED selections for color rendition rather than spectral estimation, and we
19
seek only one multispectral illumination condition for a single photograph while using any number of LEDs of
distinct spectra.
Color Rendition in Image-Based Relighting. In chapter V, we efficiently extend the IBRL framework to
the multispectral domain, forming an image of a subject in a new lighting environment by summing a set of
multispectral reflectance basis images. In the original work, Debevec et al. [30] acquired basis photographs with
an RGB camera by spiraling a broad-spectrum light source around the subject on a customized gantry. They then
showed that taking the dot product of the reflectance field with a high dynamic range (HDR), omnidirectional
image representing scene illumination effectively re-lights the subject using this captured lighting. However,
this work restricted the discussion to RGB pixel values only, and thus is unlikely to be able to record or simulate
the effect of spectrally complex illumination sources on spectrally complex subjects, e.g. fluorescent light on
human skin. Thus, in chapter V we extend the number of channels of the reflectance field such that we can
relight scenes with multispectral illumination.
In chapter III, we demonstrate superior studio lighting reproduction using multiple spectral channels as
compared with using only RGB LEDs. However, this result is not directly applicable to IBRL, because the
problems are not exactly analogous. RGB LED lighting reproduction yields poor color rendition for various
materials due to the lack of energy produced across the visible spectrum when using just RGB LEDs – which
is a problem mostly avoided in IBRL when using broad-spectrum white light for reflectance field acquisition
(see the LED spectra in Fig. II.9). However, most broad-spectrum white LEDs, including those we use in our
LED sphere, still have a spectral gap between the shorter wavelength blue emitter part of the spectrum and
longer wavelength phosphor-converted part (again, see Fig. II.9). Thus, the spectral mismatch problem is still a
concern for IBRL when using LEDs for the reflectance basis imagery.
20
Figure II.9: The emission spectra of typical red, green, blue, and white LEDs. The white LED (black line) has
a noticeable gap in its emission spectrum at around 480nm, potentially causing color rendition errors for classic
IBRL using broad-spectrum LED-based illumination. For the RGB LEDs, there is a noticeable gap at around
590nm where the emission spectra of typical red and green LEDs do not overlap. For lighting reproduction,
Wenger et al. [145] demonstrated that RGB LEDs alone could not sufficiently reproduce many real-world
illuminants.
Spectral Promotion. One technique we introduce in chapter V is a way to avoid capturing the full multispec-
tral reflectance basis for IBRL, while still gaining the improved color rendition of our method. We term this
spectral promotion - where we illuminate our subject with each of the spectrally distinct sources, but only for
a full-on spherical lighting condition. We then propagate this spectral reflectance information to each of the
individual lighting directions of the basis set. Previous work by Ma et al. [97] showed that a high-resolution
RGB image could be spectrally promoted to a multispectral image, using a multispectral image acquired with a
lower-resolution camera. This bears some similarity to our technique, as we operate on the angular illumination
domain, and propagate spectral information from a very low resolution image of the sphere (where all lights
of each LED spectra are driven to the same brightness) and propagate these spectral relationships to all of the
individual lighting directions in the spherical basis.
21
II.4 Facial Scanning for Photo-realistic Virtual Humans
Beyond lighting reproduction and image-based relighting, in chapter VI, we further leverage multispectral LED-
based illumination to improve the geometric resolution and color rendition of a facial scanning technique used
to generate the assets required for rendering and compositing an actor’s digital double into a real-world or
computer-generated scene, as shown in Fig. II.10, right. The ability to render photo-realistic digital humans
is one major goal of computer graphics, with broad applications in film production, virtual reality, training
simulations, and healthcare.
Figure II.10: Left: Photograph of a subject. Right: A rendering of the subject, using geometry and reflectance
captured from a high-resolution facial scanning system. Images reproduced from Alexander et al. [13]
22
High-resolution Facial Scanning. To render a photo-realistic virtual human as in Fig. II.10, right, high-
resolution skin geometry and reflectance information are required, and so static high-resolution scans of the
subject in a variety of extreme facial expressions are typically captured. After a topological registration step,
animators can mix or cross-dissolve between these facial expressions to create a new animation [92]. These
static facial scans are often generated using computational illumination techniques based on polarized gradient
illumination conditions [98, 49], which we extend to the multispectral domain in chapter VI.
When light reflects from a dielectric surface such as human skin, the specular component can be polarized
while the diffuse reflection is largely unpolarized. This difference has been leveraged in the classic computer
vision literature [149, 148, 103] to aid in the separation of diffuse and specular reflections. Most high-resolution
facial scanning techniques [98, 49] use polarized spherical lighting conditions to estimate per-pixel surface
normals from each reflectance component separately. These polarized lighting conditions are realized using
linear polarizers positioned both in front of each light source in the lighting rig, as well as in front of each
camera. Fig. II.11 shows a subject illuminated by a set of these lighting conditions. In chapter VI, we similarly
use polarization difference imaging with monochrome cameras for diffuse-specular separation, but other filter-
free approaches ([136, 42, 103, 99]) would be of interest for future work.
Dynamic Facial Performance Capture. Since the high-resolution facial capture methods [98, 49] are limited
to static poses, significant research has endeavored to capture dynamic human facial performance. Such captures
allow for an actor’s performance to be transferred to a new digital character or allow the actor’s performance
to be re-rendered in some other novel way. Multi-camera video recording setups enable dynamic multiview
stereo based 3D reconstructions [41, 16, 18, 44]. Since these techniques only require an image of the subject
in a single, even lighting condition per viewpoint, they can be applied to video sequences of an actor’s perfor-
mance. However, such methods do not capture skin reflectance information, and they estimate high resolution
geometric details using the technique referred to as "dark is deep," where dark pixels are assumed to represent
23
Figure II.11: A subject lit by the polarized spherical gradient illumination conditions of Ma et al. [98]. The
top row represents the sub-surface scattered reflections, or the cross-polarized imaging state. The bottom
row represents the parallel-polarized imaging state, including some sub-surface scattered reflections and some
polarization-preserving specular reflections. Images reproduced from Alexander et al. [13].
surface concavities [142]. This assumption breaks down for the commonly-occurring cases of skin pigmenta-
tion or blemishes, where dark pixels are the result of light absorption rather than shadowing. As such, multiview
computational illumination approaches [49, 98], though requiring multiple input photographs of the subject
under different time-multiplexed lighting conditions, still provide superior geometric accuracy. Dynamic scan-
ning methods can be augmented or initialized by one or more high resolution static scans [13, 12, 43, 67, 47],
providing an additional application for our monochrome multispectral facial scanning approach.
24
Complementary Optical Flow. To capture the details of a high-resolution scan for dynamic facial perfor-
mances, Wilson et al. [147] extended the work of Ma et al. [98] by introducing a novel optical flow based
approach. They added gradient illumination conditions that were complementary to those of Ma et al. [98], such
that pairs of images of a subject under different lighting conditions could be added together to produce an image
of the subject as lit by a full sphere of even illumination, enabling temporal image alignment by satisfying the
brightness constancy constraint of optical flow. This flow-based image alignment also improved the quality of
static scans, offering superior image alignment and better signal-to-noise ratios as an over-complete image ba-
sis was acquired. The monochrome facial scanning approach that we describe is applicable to this technique as
well, specifically as we define a custom optical flow step that aligns images captured under different illumination
spectra.
Multispectral Facial Capture. In chapter VI, we seek to generate a high-resolution 3D facial model with
multispectral textures. However, our method leverages techniques presented in chapter III and chapter V to
produce color images from a set of multispectral basis images. Multispectral light sources have been previously
used directly for 3D reconstruction as well, extending photometric stereo [150] to a single-shot and therefore
video-rate approach by trading temporal-multiplexing for spectral multiplexing [59]. Fyffe et al. [45] extended
this approach to the dynamic 3D reconstruction of faces, although they used cross-polarized images and assumed
Lambertian skin reflectance, limiting the resolution of the recovered 3D geometry.
II.5 Lighting Estimation from an Unconstrained Image
As described above, panoramic HDR photography has been required for rendering and compositing realistic
virtual objects or humans into real-world scenes using IBL[27]. In chapter VII, our goal is to estimate omnidi-
rectional HDR lighting for mixed or augmented reality (AR) compositing applications, where multi-exposure,
panoramic photography is impractical. In a typical AR session, the goal is to render virtual objects into a live
25
camera feed. Thus we assume that a single exposure, low-dynamic range image is available. To rephrase the
problem: we would like to have realistic illumination estimated only from the background image into which the
virtual object will be composited.
Recovering HDR Illumination from LDR Imagery. Our method is based on deep learning, where typically
large volumes of training data are required. Rather than collect many real-world HDR panoramas, which would
be challenging and time consuming, our HDR lighting inference method is trained using only low dynamic
range (LDR) input data. Key to our technique is that spheres with diverse bidirectional reflectance distribution
functions (BRDFs) reveal different lighting cues, enabling us to record training data using a standard LDR video
stream. This insight has been previously leveraged for sun intensity recovery from clipped panoramas using a
diffuse, gray sphere [113, 29]. Debevec et al. [29] recorded the full dynamic range of natural illumination in
a single photograph based on observed irradiance on diffuse gray strips embedded within a cut-apart mirrored
sphere, allowing reconstruction of saturated light source intensities.
Unconstrained Illumination Recovery With Joint Reflectance or Geometry Inference. Illumination es-
timation has long been considered an open computer vision research problem, because the appearance of a
scene depends simultaneously on its geometry, reflectance properties, and lighting, as well as the camera’s
exposure, color balance, and depth-of-field. Within this research community, the joint recovery of geometry,
reflectance, and lighting, has been termed the inverse rendering problem [157, 112]. A similar problem, in-
trinsic image decomposition [15], aims to separate an image into shading and reflection. However, shading is
an effect of lighting, rather than its direct observation. Several recent approaches jointly inferred material re-
flectance and illumination from an object comprised of an unknown material [102, 94], one or more images of a
segmented object [141, 96], specular objects of a known class [114, 48], or with measured or known geometry
26
[95, 143, 100, 56, 14]. However, in chapter VII we estimate lighting from unconstrained images with unknown
geometry and arbitrarily complex scenes.
Unconstrained Illumination Recovery for Complex Scenes. Several recent works have investigated this
precise task. Khan et al. [75] projected a limited-FOV HDR image onto a hemisphere and flipped it to infer
360
lighting. For LDR input images, Karsch et al. [73] estimated a scene’s geometry and diffuse albedo,
detected in-view light sources, and, for unseen lights, found a matching LDR panorama from a large database
[152]. They then promoted the result to HDR, minimizing a diffuse scene rendering loss. For indoor scenes,
Gardner et al. [46] learned a mapping from a limited FOV LDR image to HDR lighting using a convolution
neural network (CNN). Noting the lack of HDR panoramas, they leveraged the same LDR panorama dataset
[152] to regress first from the input image to a LDR panorama and light source locations and then refined the
model for light source intensities with 2,100 new, captured HDR panoramas. Though demonstrating state-of-
the-art results, they noted two key limitations. First, the predicted LDR panorama and HDR light sources were
white-balanced to match the input image using the Gray World assumption [21]. Second, renderings improved
when an artist manually tuned the predicted lighting intensity. In chapter VII, we propose a novel rendering-
based loss function that allows our network to learn both the colors and intensities of the incident illumination
relative to the input image, without HDR imagery. Furthermore, we propose a lighting model that generalizes
to both indoor and outdoor scenes, though outdoor HDR lighting estimation from a single image or from a LDR
panorama has also received attention ([80, 81, 62, 159]).
Illumination Recovery using Rendering-Based Loss Terms. In chapter VII, we train an HDR lighting pre-
diction model by rendering three spheres with different BRDFs during training using the estimated illumination.
Our model thus learns how to produce more accurate lighting predictions by reasoning on rendered object ap-
pearances, rather than on the lighting directly. Our rendering process uses IBRL [30, 104] with photographed
27
reflectance bases for each BRDF. The IBRL process is trivially differentiable as required for all gradient-based
learning techniques since it consists of only image multiply and add operations. Our technique follows sev-
eral recent works that estimate lighting from images of faces, similarly reasoning on renderings and modeling
image formation via differentiable rendering during CNN training [22, 133, 132, 160, 121]. However, each
of these prior works has relied on simple, low frequency shading models, thereby preventing the recovery of
high-frequency illumination. Xu et al. [153] recently trained a deep neural network to perform IBRL, jointly
learning a low-dimensional reflectance basis and renderer, rather than applying IBRL as a fixed function as we
do. Hold-Geoffroy et al. [62] used a synthetic Lambertian reflectance basis in a rendering loss term for outdoor
lighting estimation, but they did not use a photographed basis or consider multiple BRDFs.
28
Chapter III
Practical Multispectral Lighting Reproduction
III.1 Summary
In this chapter, we present a practical framework for reproducing omnidirectional incident illumination con-
ditions with complex spectra using a light stage with multispectral LED lights. For lighting acquisition, we
augment standard RGB panoramic photography with one or more observations of a color chart with numerous
reflectance spectra. We then solve for how to drive the multispectral light sources so that they best reproduce the
appearance of the color charts in the original lighting. Even when solving for non-negative intensities, we show
that accurate lighting reproduction is achievable using just four or six distinct LED spectra for a wide range of
incident illumination spectra. A significant benefit of our approach is that it does not require the use of special-
ized equipment (other than the light stage) such as monochromators, spectroradiometers, or explicit knowledge
of the LED power spectra, camera spectral response functions, or color chart reflectance spectra. We describe
two simple devices for multispectral lighting capture, one for slow measurements of detailed angular spectral
detail, and one for fast measurements with coarse angular detail. We validate the approach by realistically com-
positing real subjects into acquired lighting environments, showing accurate matches to how the subject would
actually look within the environments, even for those including complex multispectral illumination. We also
demonstrate dynamic lighting capture and playback using the technique.
29
III.2 Introduction
Lighting greatly influences how a subject appears in both a photometric and aesthetic sense. When a subject
recorded in a studio is composited into a real or virtual environment, their lighting will either complement
or detract from the illusion that they are actually present within the scene. Thus, being able to match studio
lighting to real-world illumination environments is a useful capability for visual effects, studio photography, and
in designing consumer products, garments, and cosmetics.
Lighting reproduction systems as in Debevec et al. [32] and Hamon et al. [58] surround a subject with red,
green, and blue light-emitting diodes (RGB LEDs) and drive them to match the colors and intensities of the light-
ing of the scene into which the subject will be composited. The lighting environments are typically recorded as
omnidirectional, high dynamic range (HDR) RGB images photographed using panoramic photography or com-
puted in a virtual scene using a global illumination algorithm. While the results can be believable – especially
under the stewardship of color correction artists – the accuracy of the color rendition is suspect since only RGB
colors are used for recording and reproducing the illumination, and there is significantly more detail across the
visible spectrum than what is being recorded and simulated. Wenger et al. [145] noted that lighting reproduced
with RGB LEDs can produce unexpected color casts even when each light source strives to mimic the directly
observable color of the original illumination. Ideally, a lighting reproduction system could faithfully match the
appearance of the subject under any combination of illuminants including incandescent, daylight, fluorescent,
solid-state, and any filtered or reflected version of such illumination.
Recently, several efforts have produced multispectral light stages with more than just red, green, and blue
LEDs in each light source for purposes such as multispectral material measurement [57, 11, 79]. These systems
add additional LED colors such as amber and cyan, as well as white LEDs that use phosphors to broaden their
emission across the visible spectrum. In this work, we present a practical technique for driving the intensities of
such arrangements of LEDs to accurately reproduce the effects of real-world illumination environments with any
30
number of spectrally distinct illuminants in the scene. The practical nature of our approach rests in that we do not
require explicit spectroradiometer measurements of the illumination; we use only traditional HDR panoramic
photography and one or more observations of a color chart reflecting different directions of the illumination in
the environment. Furthermore, we drive the LED intensities directly from the color chart observations and HDR
panoramas, without explicitly estimating illuminant spectra, and without knowledge of the reflectance spectra
of the color chart samples, the emission spectra of LEDs, or the spectral sensitivity functions of the cameras
involved. Our straightforward process is:
(i). Photograph the color chart under each of the different LEDs.
(ii). Record scene illumination using panoramic photography plus observations of a color chart facing one or
more directions.
(iii). For each LED light source, estimate the appearance of a virtual color chart reflecting its direction of light
from the environment.
(iv). Drive the light source LEDs so that they best illuminate the virtual color chart with the estimated appear-
ance.
Step one is simple, and step four simply uses a non-negative least squares solver. For step two, we present
two assemblies for capturing multispectral lighting environments that trade spectral angular resolution for speed
of capture: one assembly acquires unique spectral signatures for each lighting direction; the other permits video
rate acquisition. For step three, we present a straightforward approach to fusing RGB panoramic imagery and
directional color chart observations. The result is a relatively simple and visually accurate process for driving
a multispectral light stage to reproduce the spectrally complex illumination effects of real-world lighting envi-
ronments. We demonstrate our approach by recording several lighting environments with natural and synthetic
illumination and reproduce this illumination within an LED sphere with six distinct LED spectra. We show this
31
enhanced lighting reproduction process produces accurate appearance matches for color charts, a still life scene,
and human subjects and can be extended to real-time multispectral lighting capture and playback.
III.3 Method
In this section we describe our techniques for capturing incident illumination and driving multispectral light
sources in order to reproduce the effect of the illumination on a subject as seen by a camera.
III.3.1 Driving Multispectral Lights with a Color Chart
We first consider the sub-problem of reproducing the appearance of a color chart to a given camera in a particular
lighting environment using a multispectral light source. Photographing the chart with an RGB camera produces
pixel values P
i j
where i is the index of the given color chart patch and j is the camera’s j’th color channel.
Because light is linear, the superposition principle states that the chart’s appearance to the camera under the
multispectral light source will be a linear combination of its appearance under each of the spectrally distinct
LEDs. Thus, we record the basis for every way the chart can appear by photographing the chart lit by each
of the LED spectra at unit intensity to produce images as seen in Fig. III.2. We then construct a matrix L
where L
i jk
is the averaged pixel value of color chart square i for camera color channel j under LED spectrum
k. To achieve even lighting, we do this using the full sphere of LED sphere light sources of a given spectrum
simultaneously, though a single multispectral light could be used. We consider L to be the i j k matrix whose
columns correspond to the LED spectrum k and whose rows unroll the indices i and j to place the RGB pixel
values for all chart squares in the same column. To optimally reproduce the chart appearance, we simply need to
solve for the LED intensity coefficientsa
k
that minimize Eq. III.1, where m is the number of color chart patches
and n is the number of different LED spectra.
32
m
å
i=1
3
å
j=1
(P
i j
n
å
k=1
a
k
L
i jk
)
2
=jjP Lajj
2
(III.1)
Conveniently, this process does not require measuring the illuminant spectra, the camera spectral sensitivity
functions, the color chart’s reflectance spectra, or even the spectra of the LEDs.
We can also state the problem in terms of the unknown scene illuminant spectrum I(l), the color chart
spectral reflectance functions R
i
(l), and the camera spectral sensitivity functions C
j
(l), assuming a spectral
camera model:
P
i j
=
Z
l
I(l)R
i
(l)C
j
(l) (III.2)
If we define the emission spectrum of LED k as I
k
(l) and drive each LED to a basis weight intensity ofa
k
,
the reproduced appearance of the chart patches under the multispectral light is:
P
i j
Z
l
"
n
å
k=1
a
k
I
k
(l)
#
R
i
(l)C
j
(l) (III.3)
Since a
k
does not depend on l, we can pull the summation out of the integral, which then simplifies to the
appearance of the color chart patches under each of the basis LED spectra L
i jk
:
P
i j
n
å
k=1
a
k
Z
l
I
k
(l)R
i
(l)C
j
(l)=
n
å
k=1
a
k
L
i jk
(III.4)
Thus, everything we need to know about the spectra of the illuminant, color chart patches, LEDs, and camera
sensitivities is measured directly by photographs of the color chart and without explicit spectral measurements.
Of course, this technique does not specifically endeavor to reproduce the precise spectrum of the original illu-
minant, and with only six LED spectra this would not generally be possible. However, as more LED sources
33
of distinct spectra are added, the reproduced illumination å
n
k=1
a
k
I
k
(l) will better approximate the original
illuminant I(l).
We can easily solve for the LED intensities a
k
that minimize Eq. III.1 using linear least squares (LLS), but
this may lead to negative weights for some of the LED colors, which is not physically realizable. We could,
however, simulate such illumination by taking two photographs, one where the positively-weighted LEDs are
turned on, and a second where the absolute values of the negatively-weighted LEDs are turned on, and subtract
the pixel values of the second image from the first. If our camera can take these two images quickly, this might be
an acceptable approach. But to avoid this complication (and facilitate motion picture recording) we can instead
solve for the LED intensity weights using non-negative least squares (NNLS), yielding the optimal solution
where the lighting can be reproduced all at once and captured in a single photograph.
To test this technique, we used a Canon 1DX DLSR camera to photograph a Matte ColorChecker Nano
from Edmund Optics, which includes 12 neutral squares and 18 color squares of distinct spectra [101] under
incandescent, fluorescent, and daylight illuminants. We then recorded the color chart’s L matrix under our six
LED spectra (Fig. III.1) producing the images in Fig. III.2.
Figure III.1: The illumination spectra of the six LEDs in each light source of the light stage (LED sphere).
34
Figure III.2: A color chart illuminated by the red, green, blue, cyan, amber, and white LEDs of the light stage,
allowing measurement of the L matrix.
The background squares of the first row of Fig. III.3 show the appearance
of the color chart under the three
illuminants, while the circles inside the squares show the appearance under the non-negative least squares solve
for six LEDs (red, green, blue, cyan, amber, and white, or RGBCAW) for direct comparison. This yields charts
that are very similar in appearance, to the point that many circles are difficult to see at all. The second two
rows show the results of reproducing the illumination with just four (RGBW) or three (RGB) LED spectra. The
RGBW matches are also close, but the RGB matches are generally poor, producing oversaturated colors that are
easily distinguishable from the original appearance. These differences are reported quantitatively in Fig. III.4.
Using Different Cameras Although not preferred, cameras with differing spectral sensitivity functions may
be used to record the environment lighting and the reproduced illumination, as the solving process will still
endeavor to find LED intensities that reproduce the chart’s appearance. Fig. III.5 shows a color chart pho-
tographed under several illuminants with a Canon 1DX DSLR, and then as it appears photographed with a
Nikon D90 DSLR under reproduced illumination calculated from an L matrix from the Nikon D90. Despite
different sensitivities, the matches are reasonably close, with the most notable discrepancies in the blue and
For display, RAW pixel values are converted to sRGB color space.
35
RGBCAW RGBW RGB
Incandescent Fluorescent Daylight
Figure III.3: Comparison color charts for three illuminants where the background of each square is the original
chart appearance, and the circles (sometimes invisible) show the color chart under the reproduced illumination
in the LED sphere. Six (RGBCAW), four (RGBW), and three (RGB) spectral channels are used.
purple squares under fluorescent illumination. If both cameras are available to photograph a color chart in the
target illumination, applying a lighting-specific 3 3 color transfer matrix first can further improve the match.
III.3.2 Recording a Color Chart Panorama
Real lighting environments often feature a variety of illuminant spectra, with indirect light from numerous mate-
rials modulating these spectra. A color chart, lit by an entire hemisphere of light, lacks the angular discrimination
36
0%
1%
2%
3%
4%
5%
RGBCAW RGBW RGB
Average error relative to white chart
patch intensity
Incandescent
Red-Theory
Red-Exp.
Green-Theory
Green-Exp
Blue-Theory
Blue-Exp.
0%
1%
2%
3%
4%
5%
RGBCAW RGBW RGB
Average error relative to white chart
patch intensity
Fluorescent
Red-Theory
Red-Exp.
Green-Theory
Green-Exp
Blue-Theory
Blue-Exp.
0%
1%
2%
3%
4%
5%
RGBCAW RGBW RGB
Average error relative to white chart
patch intensity
Daylight
Red-Theory
Red-Exp.
Green-Theory
Green-Exp
Blue-Theory
Blue-Exp.
Incandescent Fluorescent Daylight
Figure III.4: Quantitative error plots for Fig. III.3 for different illuminants and number of LED spectra used for
lighting reproduction, based on the average squared errors of raw pixel values from all 30 ColorChecker Nano
patches. Both the theoretical error from the minimization and actual error from reproducing the illumination are
reported.
RGBCAW
Incandescent Fluorescent Daylight
Figure III.5: Color charts photographed with a Canon 1DX camera (background squares) with lighting repro-
duced in the LED sphere photographed by a Nikon D90 camera (foreground circles).
needed to record detailed directional effects of such illumination. To address this, we built a black paper enclo-
sure (Fig. III.6) to allow only a narrow cone of light to fall on the color chart and secured it in front of a camera.
We chose the cone width to roughly match the angular resolution of our light stage, which has light sources
spaced 12
apart around its equator. We cover the aperture with three sheets of 5-degree light-shaping diffuser
from Luminit, Inc. to antialias incident light.
To capture a spherical lighting environment, we place a Canon Rebel 3Ti DSLR with a Sigma 50mm macro
lens with the color chart box on a GigaPan EPIC 100 pan/tilt rig, and program the rig to record the 360
37
environment as a set of 30 horizontal by 24 vertical directions, which roughly matches the angular resolution
of the LED sphere. As the chart becomes brightly illuminated when pointed toward a light source, we set the
camera to aperture-priority (Av) mode so that the shutter speed will be automatically chosen to expose the color
chart properly at each position. The shutter speeds are recorded within each image’s EXIF metadata. We graphed
pixel values of the grayscale chart patches under fixed lighting for a range of shutter speeds, finding the "shutter
speed" EXIF tag to be accurate. We cover the viewfinder to prevent stray light from reaching the light meter.
Capturing the 720 images takes about one hour.
camera
lens
black paper
pan/tilt unit
Figure III.6: We use a black box to photograph a small color chart illuminated by only a small section of
the environment. The box is placed over the lens of a camera on a pan/tilt rig to capture an omnidirectional
multispectral lighting environment.
We stabilize the color chart image sequence using fiducial markers, so we may reliably sample a 20 20
pixel area in the center of each square. We divide the mean pixel values by the EXIF exposure time for each
photograph, producing values proportional to the actual HDR radiance from the chart squares. By choosing
an appropriate f/stop and ISO, the exposure times vary from 1/2 to 1/1000 second in a typical environment. A
full lat-long panorama of charts can be seen in Fig. III.7 (left). Transposing the data yields a chart of lat-long
panoramas, where each section is a low-resolution image of the environmental illumination modulated by the
spectral reflectance of each chart square as in Fig. III.7 (right).
38
a. Panorama of charts b. Chart of panoramas
Figure III.7: Two visualizations of an outdoor multispectral lighting environment recorded as a color chart
panorama as in Sec. III.3.2.
III.3.3 Lighting a Subject with a Color Chart Panorama
We know from Sec. III.3.1 how to drive a multispectral LED source to match the appearance of a color chart,
and we have now measured how each section of our lighting environment, at about the resolution of our LED
sphere, illuminates a color chart. Thus, for each light source, we bilinearly interpolate the four nearest charts
to the precise direction of the light to create an interpolated chart P for that light. We then drive the LEDs in
the light so that it illuminates the chart as it appeared lit by the environment. The LED sphere creates varying
LED intensities through pulse-width modulation (PWM), using 12 bits of lighting input to achieve 4K possible
intensity values. We automatically scale the overall brightness of the lighting environment so that all of the LED
intensity values fit within range, and the lighting environment is reproduced up to this scaling factor.
Fig. III.8 shows a subject illuminated by two lighting environments that were spectrally recorded using
the panoramic color chart technique of Sec. III.3.2, next to results of driving the LED sphere with 6-channel
RGBCAW LED intensities solved for from the chart data. It achieves good matches for the skin tones, clothing,
and specular highlights.
39
a. Subject in the multispectral light stage, seen through a fisheye lens.
b. Real light c. LED sphere d. Real light e. LED sphere
Figure III.8: a. A fisheye view of a subject in the multispectral light stage. b. The subject in a daylight
environment captured as the panorama of color charts in Fig. III.7. c. The subject in the LED sphere (a) with
the lighting reproduced as in Sec. III.3.3. d. A subject in an indoor lighting environment with LED, fluorescent,
and incandescent light sources also captured as a panorama of color charts. e. The subject lit in the LED sphere
with the indoor lighting.
40
III.3.4 Augmenting RGB Panoramas with Color Charts
Recording a lighting environment with the panoramic color chart method takes significantly longer than typical
HDRI lighting capture techniques such as photographing a mirrored sphere or acquiring HDR fisheye pho-
tographs. To address this, we describe a simple process to promote a high-resolution RGB HDRI map to multi-
spectral estimates using a sparse set of color chart observations.
Suppose that one of our LED sphere light sources is responsible for reproducing the illumination from a
particular set of pixels T in the RGB HDR panorama, and the average pixel value of this area is Q
j
(where
j indicates the color channel). Q’s three color channels are not enough information to drive the multispectral
light source, and we wish we had seen an entire color chart lit by this part of the environment T . Suppose that
we can estimate, from the sparse sampling of color charts available, the appearance of a color chart P
i j
lit by
the same general area of the environment as T , and we consider it to be reflecting an overall illuminant I(l)
corresponding to our environment area T . Again, we presume the spectral camera model of Eq. III.2, where
R
i
(l) is the spectral reflectance function of patch i of the color chart and C
j
(l) is the spectral sensitivity function
of the j’th color channel of the camera:
P
i j
=
Z
l
I(l)R
i
(l)C
j
(l)
Our color chart features a white, spectrally flat neutral square which we presume to be the zeroth index patch
R
0
(l)= 1. (In practice, we scale up P
0
to account for the fact that white patch is only 90% reflective.) This patch
reflects the illumination falling on the chart as seen by the camera, which yields our RGB pixel observation Q.
In general, since the illuminant estimate I(l) corresponds to a larger area of the environment than T , Q will not
be equal to P
0
. For example, if T covers an area of foliage (for example) modulating I(l) by spectral reflectance
S(l), and the illuminant I broadly accounts for the incident daylight, we would have:
Q
j
=
Z
l
I(l)S(l)C
j
(l)
41
Since we want to estimate the appearance of a color chart P
0
i j
illuminated by the environment area T , we are
interested in knowing how the modulated illuminant I(l)S(l) would illuminate the color chart squares R
i
(l)
as:
P
0
i j
=
Z
l
I(l)S(l)R
i
(l)C
j
(l)
We do not know the spectral reflectance S(l), but we know that environmental reflectance functions are
generally smooth, whereas illuminants can be spiky. If we assume that S(l) ¯ s
j
over each camera sensitivity
function C
j
(l), we have:
P
0
i j
= ¯ s
j
Z
l
I(l)R
i
(l)C
j
(l)
We can now write P
0
i j
= ¯ s
j
P
i j
and since R
0
(l)= 1, we can write:
P
0
0 j
= ¯ s
j
Z
l
I(l)C
j
(l)= Q
j
so ¯ s
j
= Q
j
=P
0 j
and we compute P
0
i j
= Q
j
P
i j
=P
0 j
. In effect, we divide the estimated illuminant color chart P
by its white square and recolor the whole chart lit by the observed RGB pixel average Q to arrive at the estimate
P
0
for a color chart illuminated by T . This chart is consistent with Q and retains the same relative intensities
within each color channel of the estimated illuminant falling on the chart patches.
If our camera spectral response functions were known, then it might be possible to estimate S(l) as more
than a constant per color channel to yield a more plausible P
0
. This is of interest to investigate in future work.
III.3.5 Fast Multispectral Lighting Environment Capture
Recording even a sparse set of color chart observations using the pan/tilt technique is still relatively slow com-
pared to shooting an RGB HDR panorama. If the scene is expected to comprise principally one illuminant, such
as daylight, one could promote the entire HDRI map to multispectral color information using a photograph of
42
a single color chart to comprise every color chart estimate P
0
. However, for scenes with mixed illumination
sources, it would be desirable to record at least some of the angular variation of the illumination spectra.
To this end, we constructed the fast multispectral lighting capture system of Fig. III.9(a), which points a
DSLR camera at chrome and black 8cm spheres from Dube Juggling Equipment and five Matte Nano Col-
orChecker charts from Edmund Optics aimed in different directions. The chrome sphere was measured to be
57:5% reflective using the reflectance measurement technique of Reinhard et al. Ch. 9 [113]. The black acrylic
sphere, included to increase the observable dynamic range of illumination, reflects 4% of the light at normal
incidence, increasing significantly toward grazing angles in accordance with Fresnel’s equations for a dielectric
material. The five color charts face forward and point45
vertically and horizontally. A rigid aluminum beam
secures the DSLR camera and 100mm macro lens 135cm away from the sphere and chart arrangement. Two
checker fiducial markers can be used to stabilize dynamic footage if needed.
Figure III.9: a. Photographer using the fast multispectral lighting capture system. b. Image from the lighting
capture camera with spheres and color charts. c. Color-coded visualization of interpolating the five charts over
the front hemisphere of surface normals.
If the lighting is static, we can record an HDR exposure series and reconstruct the RGB lighting directly
from the chrome sphere as in the work of Debevec [27]. If the lighting is dynamic and must be recorded in a
single shot, we can set the exposure so that the indirect light from the walls, ceiling, or sky exposes acceptably
well in the chrome ball, and that the light sources can be seen distinctly in the black ball, and combine these two
43
reflections into an HDR map. We begin by converting each sphere reflection to a latitude-longitude (lat-long)
mapping as in Reinhard et al. Ch. 9 [113]. Since the spheres are 3
apart from each other with respect to the
camera, we shift the black sphere lat-long image to rotate it into alignment with the mirrored sphere image. The
spheres occlude each other from a modest set of side directions, so we take care to orient the device so that no
major sources of illumination fall within these areas.
We correct both the chrome and black sphere maps to 100% reflectivity before combining their images in
HDR. For the black ball, this involves dividing by the angularly-varying reflectivity resulting from Fresnel gain.
We measured the angularly-varying reflectivity of the two spheres in a dark room, moving a diffuse light box to
a range of angles incident upon the spheres, allowing us to fit a Fresnel curve with index of refraction 1.51 to
produce the corrective map. Since the dielectric reflection of the black ball depends significantly on the light’s
polarization, we avoid using the black sphere image when reconstructing skylight.
In either the static or dynamic case, if there are still bright light sources (such as the sun) which saturate in
all of the available exposures, we reconstruct their RGB intensities indirectly from the neutral grey squares of
the five color charts using the single-shot light probe technique of Debevec et al. [29]. Other dynamic RGB
HDR lighting capture techniques could be employed, such as that of Unger and Gustavson [138].
Once the RGB HDR lighting environment is assembled, we need to promote it to a multispectral record of
the environment using the five color charts P
1
:::P
5
. For each LED sphere light, we estimate how a virtual color
chart P
0
would appear lit by the light it is responsible for reproducing. Since we do not have a chart which points
backwards, we postulate a backwards-facing color chart P
6
as the average of the five observed color charts. The
six color charts now point toward the vertices of an irregular octahedron. To estimate P
0
, we determine which
octahedral face the LED sphere light aims toward, and compute P
0
as the barycentric interpolation of the charts at
the face’s vertices. Fig. III.9(c) visualizes this interpolation map over the forward-facing directions of a diffuse
sphere.
44
We are nearly ready to drive the LED sphere with the captured multispectral illumination. For each light,
we determine the average RGB pixel color Q of the HDR lighting environment area corresponding to the light
source. We then scale the color channels of our color chart estimate P to form P
0
consistent with Q as in Sec.
III.3.4. We then solve for the LED intensities which best reproduce the appearance of color chart P
0
using the
technique of Sec. III.3.1.
Figure III.10: Upper left. Unwrapped chrome and black acrylic sphere images from Fig. III.9 in latitude-
longitude mapping. Upper right. Sampled colors from the five color charts. Bottom. The six derived maps for
driving the red, green, blue, cyan, amber, and white LEDs to reproduce the multispectral illumination.
45
III.3.6 Reproducing Dynamic Multispectral Lighting
The fast multispectral capture system allows us to record dynamic lighting environments. Most digital cin-
ema cameras, such as the RED Epic and ARRI Alexa, record a compressed version of RAW sensor data,
which can be mapped back into radiometrically linear measurements according to the original camera response
curves. While this feature is not usually present in consumer cameras, the Magic Lantern software add-on
(http://www.magiclantern.fm/) allows many Canon DSLR cameras to record RAW video at high definition reso-
lution. We installed the add-on onto a Canon 5D Mark III video camera to record 24fps video of the multispectral
lighting capture apparatus, which we could play back at 24fps on the LED sphere as a spherical frame sequence.
Fig. III.19 shows some results made using this process.
III.4 Results
In this section, we compare various lighting reproduction results to photographs of subjects in the original
lighting environments. We tried to match poses and expressions from real to reproduced illumination, although
sometimes the shoots were completed hours or a day apart. We placed a black foamcore board 1m behind the
actor in the light stage (Fig. III.8(a)) for a neutral background. We photographed the actors in RAW mode on a
Canon 1DX DSLR camera, and no image-specific color correction was performed. To create printable images,
all raw RGB pixel values were transformed to sRGB color space using the 3 3 color matrix [1.879, -0.879,
0.006; -0.228, 1.578, -0.331; 0.039, -0.696, 1.632], which boosts color saturation to account for the spectral
overlap of the sensor filters. Finally, a single brightness scaling factor was applied equally to all three color
channels of each image to account for the brightness variation between the real and reproduced illumination and
the variations in shutter speed, aperture, and ISO settings.
46
Recorded Original RGBCAW RGBW RGB
Illumination Lighting Reproduction Reproduction Reproduction
Figure III.11: A subject in a variety of lighting environments (left columns) with three different lighting reproduction
techniques. Generally, RGBCAW and RGBW reproduction produces accurate results, whereas using just RGB LEDs
tends to oversaturate colors.
47
Recorded Original RGBCAW RGBW RGB
Illumination Lighting Reproduction Reproduction Reproduction
Figure III.12: A subject in a variety of lighting environments (left columns) with three different lighting reproduction
techniques. Generally, RGBCAW and RGBW reproduction produces accurate results, whereas using just RGB LEDs
tends to oversaturate colors. The final two images of the bottom row show the subject and camera rotated 90 degrees to the
left in the same sunset lighting of the fifth row, with RGBCAW lighting reproduction.
Figs. III.11 and III.12 show two subjects in indoor, sunlit, and cloudy lighting environments. The indoor
environment featured an incandescent soft box light and spectrally distinct blue-gelled white LED light panels,
with fluorescent office lighting in the ceiling. The lighting environment was recorded using the fast capture
technique of Sec. III.3.5, and the subject was photographed in the same environment. Later, in the LED sphere,
the lighting was reproduced using non-negative 6-channel RGBCAW lighting, 4-channel RGBW lighting, and
3-channel RGB lighting solves as described in Sec. III.3.4. Generally, the matches are visually very close for
RGBCAW and RGBW lighting reproduction, whereas colors appear too saturated using RGB lighting. The fact
48
that the non-negative RGBW lighting reproduction performs nearly as well as RGBCAW suggests that these
four spectra may be sufficient for many lighting reproduction applications (see chapter IV for a more detailed
discussion). The bottom rows include a sunset lighting condition reproduced with RGBCAW lighting where the
light of the setting sun was changing rapidly. We recorded the illumination both before and after photographing
the subject, averaging the two sets of color charts and RGB panoramas to solve for the lighting to reproduce. In
the bottom row, the subject appears somewhat shinier in the reproduced illumination, possibly because the real
lighting environment photo was taken closer to the time of her makeup application.
Recorded RGBCAW Original RGBCAW Recorded
illumination reproduction lighting reproduction illumination
(Fast capture, (Fast capture, (Chart panorama, (Chart panorama,
Sec. III.3.5) Sec. III.3.5) Sec. III.3.2) Sec. III.3.2)
Figure III.13: Comparing the illumination capture techniques of Sections III.3.5 and III.3.2, using 6-channel
RGBCAW illumination for a still life scene.
Fig. III.13 compares the results using the fast capture technique of Sec. III.3.5 with the color chart panorama
method of Sec. III.3.2, with RGBCAW lighting reproduction for a still life scene. The recorded lighting envi-
ronment included incandescent and fluorescent area light sources placed directly against the left and right sides
of the light stage, allowing the lighting to be captured and reproduced by the LEDs without moving the still life
or the camera. The still life contained materials with diverse reflectance spectra, including a variety of prism-
shaped crayons and twelve fabric swatches arranged as a chart. We used a Canon Rebel 3Ti DSLR to capture
49
both the lighting and the subject. Qualitatively, both illumination capture techniques produced appearances for
the still life materials that closely matched those in the real illumination (center of Fig. III.13).
LLS positive lights LLS negative lights Difference Original lighting NNLS lighting
Figure III.14: Comparing linear least squares (LLS) solves with positive and negative LED intensities (pho-
tographed in separate images) to non-negative least squares (NNLS) solves, which can be photographed in a
single image, using 6-channel RGBCAW illumination.
Fig. III.14 compares the results of a linear least squares (LLS) RGBCAW solve with positive and negative
LED intensities to a non-negative least squares (NNLS) solve for the lighting environment of Fig. III.13, using
the fast illumination capture method of Sec. III.3.5. The LLS solve reproduced the incandescent source with
mostly positive LED intensities with a small amount of negative blue and green, and the fluorescent source as
mostly positive with a small amount of negative cyan. Both solutions closely approximate the appearance of the
still life in the original environment.
Fig. III.15 shows the color chart pixel values for each of the lighting reproduction methods of Figs. III.13
and III.14 (foreground), compared with the values under the real illumination (background), with quantitative
results in Fig. III.16. We evaluated error for the ColorChecker Nano and, separately, our homemade chart of
fabric swatches. The LLS and NNLS solves yielded similar color accuracy for both material sets, although
unexpectedly NNLS slightly outperformed LLS. We suspect the image subtraction required for LLS contributed
to increased error by compounding sensor noise. Thus, the slight theoretical improvement of LLS did not
outweigh the simplicity of single-shot acquisition with NNLS. The chart panorama technique of Sec. III.3.2
yielded slightly superior color accuracy compared with the faster capture method of Sec. III.3.5. However, the
50
RGBCAW
Fast capture Fast capture Chart panorama
LLS (Sec. III.3.5) NNLS (Sec. III.3.5) NNLS (Sec. III.3.2)
Figure III.15: Charts photographed in a real lighting environment (background squares) with reproduction in the
light stage (foreground circles) using LLS and NNLS 6-channel solves for the techniques of Secs. III.3.5 and
III.3.2. Left: 3
rd
image of Fig. III.14. Center: Last image Fig. III.14. Right: 4
th
image of Fig. III.13.
error is still low for the fast method since its limited spectral angular resolution is mitigated by the fact that
diffuse surfaces integrate illumination from a wide area of the environment.
For each image of Figs. III.13 and III.14, the fabric colors were reproduced nearly as accurately as the
ColorChecker squares, showing that the chart-based solve generalized well to materials with other reflectance
spectra. The yellow bowl in the still life even contained a fluorescent pigment (appearing yellow under our blue
LED), and its appearance under the reproduced illumination also matched that of the real environment.
Fig. III.17 compares the results of using interpolated observations from all five color charts of the fast
lighting capture device to using the colors from only the front-facing chart, for the indoor environment of Fig.
III.12. The 5-chart example produces slightly richer colors and can be seen to match the original effect of
the lighting in the first row of Fig. III.12, but the effect is subtle. The color charts in Fig. III.17 visualize
the error between the actual appearance of the left-facing and right-facing color charts in the lighting capture
device compared to their appearance under the reproduced illumination, showing slightly better matches for the
5-chart result. In this case, observing only a front-facing color chart achieves acceptable multispectral lighting
reproduction.
51
0%
1%
2%
3%
4%
5%
LLS Fast -
ColorChecker
NNLS Fast -
ColorChecker
NNLS -
ColorChecker
LLS Fast -
Fabric
Swatches
NNLS Fast -
Fabric
Swatches
NNLS -
Fabric
Swatches
Average error relative to white chart
patch intensity
Still Life Scene - Multispectral Fluorescent and Incandescent
Illumination Environment
Red-Exp.
Green-Exp.
Blue-Exp.
Figure III.16: Quantitative experimental error plot for Fig. III.15, for the LLS and NNLS solves (RGBCAW),
based on the average squared errors of raw pixel values from all 30 ColorChecker Nano patches and, separately
reported, our 12 fabric swatches.“Fast” refers to the spheres-and-charts lighting capture method of Sec. III.3.5,
while no designation refers to the chart panorama technique of Sec. III.3.2.
1 chart 5 charts Left error, 1 chart Left error, 5 charts Right error, 1 chart Right error, 5 charts
Figure III.17: Comparison of single-chart reconstruction versus five-chart reconstruction.
Fig. III.18 shows each subject in real and reproduced outdoor illumination using RGBCAW lighting, where
compositing is used to place the subject in the light stage on top of the original background. The alpha matte
was obtained by taking a second silhouetted photograph a fraction of a second after the first, with the LED lights
off, and the foamcore background lit brightly with a pair of flashes. The result are images where it is easy to
believe that the subject is really in the environment instead of in the light stage, and the lighting reproduction is
close to the original.
52
Real photo Light stage composite Lighting Real photo Light stage composite
Figure III.18: The left images show subjects photographed in real outdoor lighting environments. The light-
ing was captured with panoramic HDR images and color charts, allowing six-channel multispectral lighting
reproduction. The right images show the subjects photographed inside the multispectral light stage under repro-
ductions of the captured lighting environments, composited into background photos, producing close matches
to the originals.
Fig. III.19 shows our technique applied to an image sequence with dynamic lighting, where an actor is rolled
through a set with fluorescent, incandescent, and LED lights with various colored gels applied. We reconstructed
the full dynamic range of the illumination using the chrome and black sphere reflections and generated a dynamic
RGBCAW lighting environment using the fast capture technique of Sec. III.3.5. The lighting was played in the
light stage as the actor repeated the performance in front of a green screen. Finally, the actor was composited
into a clean plate of the set. The real and reproduced lighting scenarios are similar, although differences arise
from discrepancies in the actor’s pose and some spill light from the green screen, especially under the hat, and
some missing rim light blocked by the green screen. We ensured that the automated matting technique did not
alter the colors of the foreground element, and better matte lines could be achieved by a professional compositor.
The limited lighting resolution of the LED sphere can be seen in the shadows of the hat on the face, though not
much in the eye reflections. Overall, the lighting is effectively reproduced from the original set to the LED
sphere.
53
Real environment Lighting capture Lighting reproduction Final composite
Figure III.19: Frames from an image sequence where a subject moves through an environment, causing dy-
namic lighting from fluorescent, incandescent, and LED lights with various gels and diffusers. The last column
composites the subject onto a clean plate of the set.
III.5 Discussion
A strength of our technique is that it does not require measurement, knowledge, or estimation of any spectral
illumination or reflectance functions; we just need to know how our camera sees a color chart lit by the desired
section of the environment and lit by each of the differently colored LEDs. We could alternately have measured
the spectra of the LED sources as well as the spectrum of every lighting direction in the environment, and then
approximated the incident illumination spectra using linear combinations of the LEDs. Such a technique would
require spectral measurement equipment and would likely be cumbersome for lighting capture.
54
Alternatively, to avoid omnidirectional spectral illumination capture, we could use the approach of Shi et
al. [123] to estimate illuminant spectra directly from color chart appearances once the HDRI maps were fused
with color chart observations as in Sec. III.3.4, and then approximate these spectra with linear combinations
of LEDs. This approach still requires color chart spectral reflectance measurements and, also, has only been
shown to accurately reconstruct smooth illuminant spectra. In contrast, we showed good color matching even
for a spiky fluorescent source (Figs. III.4, III.3).
In either alternative approach with explicit spectral modeling, errors made in approximating the illuminant
spectra with the LEDs will not necessarily be minimal with respect to how the camera will see materials lit by
these spectra. Since we optimize directly so that the LEDs illuminate the color charts the way they appeared in
the original lighting, there is theoretically no better lighting reproduction solution to be had from the observations
available.
III.6 Future Work
We currently consider only the visible spectrum of light. Nonetheless, the ability to record and reproduce
hyperspectral illumination with IR and UV components could have applications in medicine, garment design,
and night vision research. Adding hyperspectral sensors and chart patches (and LEDs as in Ajdin et al. [11])
could extend the technique into this domain.
Since the ColorChecker has only nineteen distinct spectra, it may fail to record information in certain areas
of the spectrum where an item, such as a particular garment, has interesting spectral detail. In this case, a
sample of the item could be added to the color chart, and the L matrices and lighting capture could proceed with
this additional spectral detail taken into consideration. Furthermore, if a particular item’s colors are not being
reproduced faithfully, its weight in the non-negative least squares solve could be increased, improving its color
rendition at the expense of the other patches.
55
While our technique tailors the lighting reproduction to look correct to the camera, it does not specifically
attempt to replicate the effects of the illumination as seen by the human eye. Empirically, for RGBCAW and
RGBW lighting, the subject does appear similarly lit to their original lighting (though it is often dimmer than
the original environment), and RGB-only solves look even worse to the eye than to the camera for which the
solution has been optimized. We could tailor the lighting reproduction to the human eye by measuring the L
matrices for the color chart and LEDs in the photometric XYZ color space, by measuring each color chart square
lit by each LED color with a spectroradiometer, building the L matrix from reported XYZ values. We could also
optimize for multiple simultaneous observers, such as for both photographer and their camera.
III.7 Conclusion
In this chapter, we have presented a practical way to reproduce complex, multispectral lighting environments
inside a light stage with multispectral light sources. The process is easy to practice, since it simply adds a small
number of color chart observations to traditional HDR lighting capture techniques, and the only calibration
required is to observe a color chart under each of the available LED colors in the sphere. The technique produces
visually close matches to how subjects would actually look in real lighting environments, even with as few
as four available LED spectra (RGB and white), and can be applied to dynamic scenes. The technique may
have useful applications in visual effects production, virtual reality, studio photography, cosmetics testing, and
clothing design.
56
Chapter IV
Optimal LED Selection for Multispectral Lighting Reproduction
IV .1 Summary
In chapter III, we used six spectral channels for image-based lighting reproduction, although even using four-
channel lighting with RGBW LEDs yielded improved results compared with RGB-only illumination. In this
chapter, we provide a theoretical analysis to show the sufficiency of using as few as five LEDs of distinct spectra
for color-accurate multispectral lighting reproduction and solve for the optimal set of five from 11 such commer-
cially available LEDs. We leverage published spectral reflectance, illuminant, and camera spectral sensitivity
datasets to show that two approaches of lighting reproduction, matching illuminant spectra directly and matching
material color appearance observed by one or more cameras or a human observer, yield the same LED selec-
tions. Our proposed optimal set of five LEDs includes red, green, and blue with narrow emission spectra, along
with white and amber with broader spectra.
IV .2 Introduction
Both Wenger et al. [145] and chapter III extended lighting reproduction to the multispectral domain, considering
light sources comprised of more than just RGB LEDs, for the single light source and omnidirectional lighting
scenarios respectively. Intuitively, adding more individually-controllable spectral channels to a light source
57
should improve its color rendition capability, but each additional spectral channel also adds complexity. As such,
a minimal, sufficient set of LEDs may be of interest to visual effects practitioners and luminaire manufacturers.
As previously described, Debevec et al. [32] and Hamon et al. [58] reproduced omnidirectional lighting
environments using only RGB LEDs, with computer-controllable light sources and LED flat panels respectively,
but neither evaluated color rendition or optimized LED selection based on emission spectra. Wenger et al. [145],
seeking to cover the gaps between the emission spectra of RGB LEDs for a single light source, demonstrated
improved color rendition for skin and the squares of a color chart when using a nine-channel multispectral light
instead of an RGB source but did not discuss a minimal, sufficient LED set. As bright yellow LEDs were not
then available, they filtered two white LEDs of their nine-channel source to produce dim yellow light. Although
multispectral illumination can be achieved by selectively filtering broad spectrum white light, we consider this
approach out of scope for our analysis since the overall quantity of light after filtering will be far lower for the
same input power, although the same optimization approaches can be applied. In chapter III, we reproduced
omnidirectional lighting in a six-channel, multispectral light stage with RGB, cyan (C), amber (A), and white
(W) LEDs. When using only RGBW LEDs, color rendition was improved for skin, the squares of a color chart,
and various fabric samples, as compared with RGB-only lighting. Even closer color matches were achieved
when using all six multispectral (RGBCAW) LEDs. However, the LED selection process was not justified,
beyond the implicit goal of again covering the spectral gaps between the RGB LEDs. Our results of chapter III
and those of Wenger et al. [145] motivate our current goal of selecting a minimal, sufficient set of LEDs for
multispectral lighting reproduction.
In this chapter, we approach the selection of such a set of LEDs by evaluating the color rendition capabilities
of different LED combinations. We evaluate color matches for various materials of diverse reflectance spectra
as viewed under different real-world illuminants by multiple observers. We find the optimal set of LEDs from
an initial set of 11 commercially-available LEDs with distinct, visible light emission spectra. Along with white
(W), we consider the 10 Lumileds Luxeon Rebel ES colors: Royal Blue (Y), Blue (B), Cyan (C), Green (G),
58
Figure IV .1: Spectra of the 11 Lumileds Luxeon LEDs evaluated: Royal Blue (Y), Blue (B), Cyan (C), Green
(G), Lime (L), PC Amber (P), Amber (A), Red-Orange (O), Red (R), Deep Red (D), and White (W).
Lime (L), PC Amber (P), Amber (A), Red-Orange (O), Red (R), and Deep Red (D) with spectra in Fig. IV .1.
We first determine the number k of LEDs of distinct spectra such that adding an additional LED no longer
meaningfully improves lighting reproduction, and we next determine the optimal subset of k such LEDs to use
for reproducing diverse and complex illuminants.
We solve for a minimal, sufficient set of LEDs using two different approaches from Wenger et al. [145]:
Spectral Illuminant Matching (SIM) and Metameric Reflectance Matching (MRM). With SIM, we compute a
linear combination of individual LED emission spectra that best matches a target illuminant spectrum in a least-
squares sense. With MRM, we compute a linear combination of individual LED emission spectra that best
matches the appearance of a color chart as observed by one or more cameras, or the standard human observer.
From our analysis, our contributions are the following:
(i). We show that using more than five LEDs of distinct spectra yields diminishing returns for color rendition
in multispectral lighting reproduction, for the diverse illuminants, materials, and observers we consider.
59
(ii). We propose an optimal set of five LEDs which includes: red, green, and blue LEDs with narrow emission
spectra, plus white and amber LEDs with broader emission spectra.
IV .3 Methods and Equations
Wenger et al. [145] described three optimization approaches for evaluating the performance of lighting repro-
duction systems. In this section, we summarize them and describe their applicability to computing an optimal
combination of LEDs for multispectral lighting reproduction. For each possible LED combination, we solve for
relative spectral channel intensities a
k
using these approaches, and we then evaluate the color rendition perfor-
mance using the accumulated error from each optimization approach. We seek to reproduce lighting with only
positive spectra, so we obtaina
k
0 for all k by using non-negative least squares solves.
Throughout our analysis, we assume a spectral camera model, with pixel values p
j
generated by integrating
over all wavelengths the modulation of an illuminant spectrum I(l) by the reflectance spectrum R(l) of a
material, by the camera spectral sensitivity C
j
for the observer’s jth color channel:
p
j
=
Z
l
I(l)R(l)C
j
(l) (IV .1)
We compute pixel values using a discrete approximation of Eq. IV .1, where i is the index over the wavelength
samples, dictated by measurement spectral resolution:
p
j
=
å
i
I
i
R
i
C
j;i
(IV .2)
IV .3.1 Spectral Illuminant Matching
Using Spectral Illuminant Matching (SIM), the goal is to directly match a target illumination spectrum. We
compute a linear combination of individual LED emission spectra that best matches a target illuminant spectrum
60
in a least-squares sense. If two illuminants have the same spectra, then they will produce identical color appear-
ances for all materials as seen by all observers. While this method maintains the observer-independence of LED
selection, Wenger et al. [145] demonstrated for a nine-channel multispectral light source that SIM yielded the
poorest color rendition of the various approaches. Finding the relative spectral channel intensities a
k
given a
target illuminant spectrum I is formulated as a least squares problem, considering the kth LED with emission
spectrum I
k
:
argmin(
å
i
(
å
k
a
k
I
k;i
I
i
)
2
)ja
k
08 k (IV .3)
IV .3.2 Metameric Illuminant Matching
Using Metameric Illuminant Matching (MIM), the goal is to match the color rather than the spectrum of a target
illuminant, which requires knowledge or assumptions about the spectral sensitivity functions of the observer.
Generating light that is metameric to a target illuminant only ensures that materials with spectrally-flat, neutral
reflectance spectra will appear the same color under both illuminants to the target observer. Producing metameric
light is also formulated as a least-squares problem, where C
j
represents the spectral sensitivity of the observer’s
jth color channel:
argmin(
å
j
(
å
i
C
j;iå
k
a
k
I
k;i
å
i
I
i
C
j;i
)
2
)ja
k
08 k (IV .4)
IV .3.3 Metameric Reflectance Matching
As the goal of a lighting reproduction system is color rendition, Metameric Reflectance Matching (MRM) seeks
to directly optimize the relative LED weightsa
k
for a multispectral light source to best match color appearances,
for particular materials of interest, with known or measured reflectance spectra, and observer(s) with spectral
61
sensitivities C
j
. For n materials with known reflectance spectra R
n
, producing light that affords material color
matches is again formulated as a least-squares problem:
argmin(
å
n
å
j
(
å
i
R
n;i
C
j;iå
k
a
k
I
k;i
å
i
R
n;i
I
i
C
j;i
)
2
)ja
k
08 k (IV .5)
For MIM, the modulation of a target illuminant by the spectral sensitivities of one tristimulus observer yields
only three equations, so we can theoretically only solve for optimal weights a
k
for three-channel light sources.
However, Wenger et al. [145] found a minimum norm, positive solution for a nine-channel source. Rather than
solve this under-constrained MIM problem, we instead observe that the MIM constraints may be included in the
MRM solve if a material with a flat reflectance spectrum, such as any neutral grayscale square of a color chart,
is included. Such a spectrum modulates the incident illumination spectrum only by an overall, wavelength-
independent scale factor. Similarly, if we sought to include SIM constraints in the MRM solve, we could design
a set of theoretical reflectance spectra as pulse functions at increments equivalent to our measurement resolution
(10 nm). Minimizing MRM error using a set of such functions is equivalent to minimizing SIM error.
Figure IV .2: Circuit board with 11 LEDs of distinct spectra (some duplicates).
62
IV .3.4 Datasets
Towards an observer-agnostic optimal LED selection, we evaluate MRM color error for various LED combi-
nations by using a database of spectral sensitivity functions for 28 cameras [71] and include the CIE 1931 2
standard observer. For test illuminants, we consider CIE illuminant A (tungsten lighting), D65 (average day-
light), and F4 (fluorescent, used for calibrating the CIE Color Rendering Index, with CRI = 51). Since light
sources of a light stage frequently reproduce indirect illumination, we also consider D65 modulated by mea-
sured reflectance spectra of grass and sand from the USGS Digital Spectral Library [24]. We use these materials
because of their frequent occurrence in natural scenes, but other materials may be of interest, depending on
the lighting environment. We also consider diffuse skylight, which we measured at midday using a Photo Re-
search PR-650 spectroradiometer. We use the 24 reflectance spectra of the X-Rite ColorChecker
T M
chart for
MRM [101]. We also evaluate MRM color error for different skin tones using a database of 120 skin reflectance
spectra, with measurements from the forehead and cheeks of 40 subjects [139]. As the spectral resolution and
sampling extent vary across datasets, we resample each spectrum with trapezoidal binning between 400 and 700
nm to the coarsest resolution, 10 nm. As camera spectral sensitivity functions are typically broad, re-sampling
error should be small. We affix each LED onto a custom circuit board (Fig. IV .2) and measure the emission
spectra (Fig. IV .1) using the PR-650 spectroradiometer, though manufacturer-supplied spectral data could also
be used.
IV .4 Results
IV .4.1 Spectral Illuminant Matching
Fig. IV .3 shows the SIM sum of squared errors (SSE) (400-700 nm) for the normalized test illuminants, for
several LED combinations. Optimal weightsa
k
are determined for each LED combination for each illuminant.
63
We report the theoretical error using all 11 LEDs, the minimal error combinations of six and five LEDs, RGBW,
and RGB only. As there are 462 combinations of six LEDs to construct from 11 LEDs, we always include
RGBW in our sets of five or six LEDs, as these are useful for other light stage applications such as high resolution
facial scanning. We therefore evaluate 21 LED combinations, varying two LEDs of a six-channel source.
0.00
0.01
0.02
0.03
0.04
0.05
A D65 F4 Skylight D65 x Grass D65 x Sand
SSE Visible Spectrum
Spectral Illuminant Matching Error
11 LED
6 LED RGBW + CP
6 LED RGBW + LP
6 LED RGBW + LA
6 LED RGBW + PA
5 LED RGBW + A
5 LED RGBW + P
5 LED RGBW + L
4 LED RGBW
3 LED RGB
Figure IV .3: Spectral Sum of Squared Errors, 400-700 nm, using SIM for six different illuminants, for various
combinations of LEDs.
The black bars of Fig. IV .3 represent the theoretically minimal SSE when using all 11 LEDs of distinct
spectra. D65, Skylight, and the D65 modulated spectra are reproduced comparably with 4 (RGBW) and 11
LEDs. The five-channel RGBWP solve reproduces illuminants A and F4 comparably to the 11-channel solve.
IV .4.2 Metameric Reflectance Matching
Fig. IV .4 shows the theoretical MRM average error for computed tristimulus values of the color chart, relative
to the white square values. For each LED combination, we solve for optimal weights a
k
for each observer
separately and report the average color error across the 24 chart squares and the 29 observers (28 cameras and
human observer). Therefore, we constrain that the LEDs used must be consistent across observers, but the LED
intensitiesa
k
may vary from observer to observer.
64
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
3.50%
4.00%
4.50%
A D65 F4 Skylight D65 x Grass D65 x Sand
Average error relative to white chart patch
intensity
Metameric Reflectance Matching Error
11 LED
6 LED RGBW + CP
6 LED RGBW + LP
6 LED RGBW + LA
6 LED RGBW + PA
5 LED RGBW + A
5 LED RGBW + P
5 LED RGBW + L
4 LED RGBW
3 LED RGB
Figure IV .4: Average squared error of tristimulus values from all 24 color chart squares for six different illumi-
nants, using Metameric Reflectance Matching. Each bar represents the average error across 29 observers.
Five-channel RGBWA and RGBWP solves again have similar error to the full 11-channel solve for MRM,
with average error under 0.5% for illuminants A, D65, Skylight, and the D65 modulated spectra, and under 1%
for illuminant F4. For a given LED arrangement and illuminant, average color errors for various observers are
similar.
Although the reflectance spectra of the color chart are designed to replicate various real-world material
spectra [101], skin tones are of particular interest for lighting reproduction. Fig. IV .5 shows the theoretical
average MRM color error for 120 skin tones, corresponding to the forehead and cheek reflectance spectra from
40 subjects [139], relative to color chart white square values. For Fig. IV .5, we solve for optimal LED intensities
a
k
using only the 24 color checker reflectance spectra for MRM but independently report error for skin spectra
(not included in the minimization).
We also separately solve for optimal LED intensitiesa
k
when including these 120 spectra in the optimization,
which reduces average skin tone color error to under 1% even for RGB-only lighting for each illuminant at the
expense of increased color error for the color chart squares. The color rendition for the color chart squares is
0.1% worse on average and at most 0.6% worst, adding to the errors presented in Fig. IV .4.
65
Figure IV .5: Average squared error of tristimulus values computed from 120 skin reflectance spectra for six
different illuminants, using Metameric Reflectance Matching solves from Fig. IV .4. Each bar represents the
average error across 29 observers.
Notably, SIM and MRM each produce an identical optimal set of five LEDs (RGBWP) for the illuminants
and observers considered, and both error metrics show diminishing returns for spectral matching or color rendi-
tion when including more than five spectral channels in a light source.
While for the experiments presented in Fig. IV .4 we tailor the light intensities a
k
to each observer, we also
solve for one optimal set of intensitiesa
k
for each illuminant for all observers and again report the average error
across all chart squares and observers. Fig. IV .6 shows the theoretical difference in the MRM color error as a
function of the solving method (SIM, all-observers MRM, single-observer MRM). The single-observer MRM
solves yield lower color error than the all-observer MRM solves, but the all-observer MRM error is still low,
owing to the general similarity of camera spectral sensitivity functions.
We also present qualitative results for color rendition using the color chart MRM for the six illuminants
for the CIE 2
standard observer in Fig. IV .7. The background squares represent the ground truth color chart
appearances under the six illuminants, and the foreground circles represent the color chart appearances as illu-
minated by the MRM optimized lighting reproduction using 11 LEDs, RGBWP, RGBW, and RGB-only. All
66
0.00%
2.00%
4.00%
6.00%
8.00%
10.00%
12.00%
14.00%
A D65 F4 Skylight D65 x Grass D65 x Sand
Average error relative to white chart patch
intensity
Metameric Reflectance Matching Error
11 LED (One Observer Solve)
11 LED (All Observers Solve)
11 LED (SIM Solve)
5 LED RGBW + P (One Observer)
5 LED RGBW + P (All Observers)
5 LED RGBW + P (SIM Solve)
4 LED RGBW (One Observer Solve)
4 LED RGBW (All Observers Solve)
4 LED RGBW (SIM Solve)
Figure IV .6: Average squared error of tristimulus values from all 24 color chart squares for six different illu-
minants, using SIM, MRM (single-observer), and MRM (all-observers). Each bar represents the average error
across 29 observers.
pixel values are computed using Eq. IV .2, and the XYZ tristimulus values are converted to sRGB for display.
RGBW lighting produces visually accurate color rendition for illuminants D65, Skylight, and D65 modulated
by grass and sand reflectance spectra. Adding the PC Amber LED yields visually accurate color rendition for
the remaining illuminants, A and F4.
In Fig. IV .8, we compare the spectra from the human observer color chart MRM lighting reproduction with
11 LEDs, RGBWP, RGBW, and RGB-only with the six ground truth illuminant spectra re-sampled at 10 nm
resolution (solid black lines). The spectra of Fig. IV .8 therefore correspond to those used to generate the color
charts of Fig. IV .7. While the reproduced lighting spectra do not exactly match the ground truth illuminant
spectra, even with 11 spectral channels, the LED reproduced lighting still yields close visual matches (Fig.
IV .7).
Using the optimal LED intensities a
k
computed for the CIE 2
standard observer color charts of Fig. IV .7,
we also present qualitative results for color rendition for a selection of 24 skin reflectance spectra in Fig. IV .9.
These 24 spectra were selected from the original set of 120 so as to evenly cover the possible values for total light
reflected across the visible spectrum. Again, background squares represent the ground truth skin appearances
67
Illuminant A Illuminant D65 Illuminant F4 Skylight D65Grass D65Sand
All 11 LEDs RGBWP RGBW RGB
Figure IV .7: Color matching results using Metameric Reflectance Matching for different direct and indirect illu-
minants for the CIE 2
standard observer, with illumination reproduced using various LED combinations. The
background squares represent the ground truth computed color chart appearances under the target illuminants,
while the foreground circles represent the computed color chart appearances under LED-reproduced illumina-
tion. Row 1 shows the lighting reproduction results using all 11 LEDs of distinct spectra. Row 2 shows results
using 5 LEDs only (RGB, White and PC Amber). Row 3 shows results using RGBW only. Row 4 shows the
result using RGB only. XYZ tristimulus values are converted to the sRGB color space for display. RGBW
lighting produces accurate color rendition for D65, Skylight, and D65 modulated by grass and sand reflectance
spectra. Adding PC Amber yields accurate color rendition for illuminants A and F4.
68
Illuminant A Illuminant D65 Illuminant F4
0
0.05
0.1
0.15
0.2
0.25
0.3
400 450 500 550 600 650 700
Relative Power
Wavelength (nm)
Illuminant A (MRM Human Observer)
11 LED
5 LED (RGBWP)
4 LED (RGBW)
3 LED (RGB)
ACTUAL
0
0.05
0.1
0.15
0.2
0.25
0.3
400 450 500 550 600 650 700
Relative Power
Wavelength (nm)
Illuminant D65 (MRM Human Observer)
11 LED
5 LED (RGBWP)
4 LED (RGBW)
3 LED (RGB)
ACTUAL
0
0.05
0.1
0.15
0.2
0.25
0.3
400 450 500 550 600 650 700
Relative Power
Wavelength (nm)
Illuminant F4 (MRM Human Observer)
11 LED
5 LED (RGBWP)
4 LED (RGBW)
3 LED (RGB)
ACTUAL
Skylight D65Grass D65Sand
0
0.05
0.1
0.15
0.2
0.25
0.3
400 450 500 550 600 650 700
Relative Power
Wavelength (nm)
Skylight (MRM Human Observer)
11 LED
5 LED (RGBWP)
4 LED (RGBW)
3 LED (RGB)
ACTUAL
0
0.05
0.1
0.15
0.2
0.25
0.3
400 450 500 550 600 650 700
Relative Power
Wavelength (nm)
Illuminant D65 x Grass (MRM Human Observer)
11 LED
5 LED (RGBWP)
4 LED (RGBW)
3 LED (RGB)
ACTUAL
0
0.05
0.1
0.15
0.2
0.25
0.3
400 450 500 550 600 650 700
Relative Power
Wavelength (nm)
Illuminant D65 x Sand (MRM Human Observer)
11 LED
5 LED (RGBWP)
4 LED (RGBW)
3 LED (RGB)
ACTUAL
Figure IV .8: Spectra produced using Metameric Reflectance Matching for different direct and indirect illumi-
nants for the CIE 2
standard observer, using various LED combinations. Solid black lines represent the ground
truth illuminant spectra re-sampled at 10 nm resolution.
69
under the six illuminants, and foreground circles represent appearances as illuminated by the MRM optimized
lighting reproduction using 11 LEDs, RGBWP, RGBW, and RGB-only. RGBW lighting produces visually
accurate color rendition for illuminants A, D65, Skylight, and D65 modulated by the sand reflectance spectrum.
Adding the PC Amber LED yields visually accurate color rendition for the remaining illuminants, F4 and D65
modulated by grass reflectance spectrum.
Illuminant A Illuminant D65 Illuminant F4 Skylight D65Grass D65Sand
All 11 LEDs RGBWP RGBW RGB
Figure IV .9: Color matching results using Metameric Reflectance Matching for different direct and indirect
illuminants for the CIE 2
standard observer, with illumination reproduced using various LED combinations.
The background squares represent the ground truth computed skin tone appearances under the target illuminants,
while the foreground circles represent the computed skin tone appearances under LED-reproduced illumination.
Row 1 shows the lighting reproduction results using all 11 LEDs of distinct spectra. Row 2 shows results using
5 LEDs only (RGB, White and PC Amber). Row 3 shows results using RGBW only. Row 4 shows the result
using RGB only. XYZ tristimulus values are converted to the sRGB color space for display. RGBW lighting
produces accurate color rendition for A, D65, Skylight, and D65 modulated by the sand reflectance spectrum.
Adding PC Amber yields accurate color rendition for illuminants F4 and D65 modulated by the grass reflectance
spectrum.
70
IV .5 Future Work
The experiments of this chapter describe the theoretically achievable color rendition capabilities of various com-
binations of LEDs, considering specific illuminants, materials, and observers. While we have shown that a
five-channel RGBWP light source should be theoretically sufficient for multispectral lighting reproduction, it
would be interesting to validate these results with physical measurements, to compare theoretical and experi-
mental error as in prior work [145] and chapter III. Such a validation would require the physical existence of
particular illuminants, and so the direct image-based MRM optimization approach of chapter III (Eqn. III.1)
could be employed, which does not rely on spectral measurements.
Additionally, visual effects practitioners may have particular interest in digital cinema cameras such as the
RED Epic and the ARRI Alexa, which were not included in the camera database [71]. The same optimization
approaches could be applied to ensure the optimal LED selection for these cameras if their spectral sensitiv-
ities were known or measured. Without knowledge of the camera spectral response, the image-based MRM
minimization approach of chapter III could again be employed.
While our optimization approach did not evaluate the overall brightness of the light produced by each com-
bination of LEDs for a given power input, it is well known that LEDs of different spectra have varied luminous
efficiency. Future work could optimize for luminous efficiency in tandem with color rendition performance and
identify potential trade-offs between these goals.
Although multispectral lighting reproduction produces visually accurate color rendition without the need for
color correction in post-production, camera raw pixel values produced by the sensor will inevitably have a color
matrix applied to them for display as part of a typical color workflow. A 33 color matrix allows for linear color
channel mixing, in effect producing a different theoretical camera sensor. Another direction for future work
could include optimizing LED selection based on both luminous efficiency and color rendition performance,
while allowing for a post-processing color matrix step.
71
IV .6 Conclusion
We have demonstrated that material color appearance under various direct and indirect illuminants may be
accurately matched using as few as five LEDs of distinct spectra for multispectral lighting reproduction: red,
green, blue, white, and broad-spectrum PC Amber. Using more than these five LEDs of distinct spectra yields
diminishing returns. Spectral illuminant matching and color appearance matching via metameric reflectance
yield the same optimal set of five LEDs for the illuminants and observers considered.
72
Chapter V
Efficient Multispectral Image-Based Relighting
V .1 Summary
In chapter III, we described how to computationally light an actor in a studio with illumination photographed
in the real world, extending image-based lighting reproduction to the multispectral domain. In this chapter,
we similarly introduce a multispectral extension to image-based relighting (IBRL). This technique renders the
appearance of a subject in a novel lighting environment as a linear combination of the images of its reflectance
field, the appearance of the subject lit by each incident lighting direction. Traditionally, a tristimulus color cam-
era records the reflectance field as the subject is sequentially illuminated by broad-spectrum white light sources
from each direction. Using a multispectral LED sphere and either a tristimulus (RGB) or monochrome camera,
we photograph a still life scene to acquire its multispectral reflectance field – its appearance for every light-
ing direction for multiple incident illumination spectra. For the tristimulus camera, we demonstrate improved
color rendition for IBRL when using the multispectral reflectance field, producing a closer match to the scene’s
actual appearance in a real-world illumination environment. For the monochrome camera, we also show close
visual matches. We additionally propose an efficient method for acquiring such multispectral reflectance fields,
augmenting the traditional broad-spectrum lighting basis capture with only a few additional images equal to
the desired number of spectral channels. In these additional images, we illuminate the subject by a complete
73
sphere of each available narrow-band LED light source, in our case: red, amber, green, cyan, and blue. From
the full-sphere illumination images, we promote the white-light reflectance functions for every direction to mul-
tispectral, effectively hallucinating the appearance of the subject under each LED spectrum for each lighting
direction. We also use polarization imaging to separate the diffuse and specular components of the reflectance
functions, spectrally promoting these components according to different models. We validate that the approxi-
mated multispectral reflectance functions closely match those generated by a fully multispectral omnidirectional
lighting basis, suggesting a rapid multispectral reflectance field capture method which could be applied for live
subjects.
V .2 Introduction
Image-based relighting (IBRL) techniques [104, 30] allow computer graphics practitioners to compute the ap-
pearance of a subject under novel lighting conditions from how it appears in a set of basis illumination images.
With image-based lighting environments [27], the scene can then be rendered to appear as it would in a real-
world environment for realistic compositing into a background photograph. Typically, the appearance of the
subject lit from each incident lighting direction, called a 4D reflectance field [30], is recorded with an omnidi-
rectional lighting sphere comprised of individual white LED light sources. However, white LEDs have emission
spectra different from many target illumination spectra commonly found in the real world, such as incandescent
or fluorescent illumination, or indirect illumination such as bounce light from vegetation or a brick wall.
The spectral mismatch between the light sources used to generate lighting basis images and the various
direct and indirect light sources captured in image-based lighting (IBL) environments has largely been ignored,
since the final values output by rendering systems are tristimulus RGB pixel values for displays. Typically, the
lighting basis images are first white-balanced with a diagonal color matrix, and then the RGB pixel values of
the environment lighting are used to weight the individual color channels of each basis image. This tristimulus
74
scaling approach is prone to producing color errors in the final relit result, because it does not consider spectral
rendition differences. In tristimulus imaging systems, RGB pixel values are produced by integrating over all
wavelengths the fully spectral modulation of the reflectance spectrum of a material by the incident illumination
spectrum and the camera spectral sensitivities. Accordingly, a simple white balance operation, or any linear color
channel mixing, usually cannot completely correct for material color appearance mismatches across spectrally-
different illumination environments.
As an extreme example, suppose that we wish to simulate the appearance of a blue-green material under
monochromatic sodium illumination. During lighting capture, the 589nm sodium emission line will likely fall
within the spectral sensitivity of both the cameras’s red and green pixels and produce a positive response for
those pixels, but not the blue, producing a yellow color as expected. During reflectance capture, the blue-green
material, lit by white LED light, would produce positive response for the green and blue pixels, but not the red.
In the IBRL process, the RGB color of the light is multiplied by the RGB color of the material seen under white
LED light, yielding zero for red and blue and a positive value for green, indicating green reflectance. However,
this represents a significant color mismatch, since the color of the monochromatic sodium light reflecting from
the blue-green material should be, if anything, the yellow sodium color. Thus, performing IBRL in the RGB
domain can in theory lead to significant errors in color rendition.
For more accurate color rendition with IBRL, intuition suggests merging the practiced techniques of re-
flectance field capture with multispectral imaging methods. If both the image basis set and the environment
illumination map provided information beyond RGB pixel values, color rendition could theoretically be im-
proved. Various multispectral imaging techniques could augment the RGB data – for instance, we could use an
array of imaging filters placed in front of the camera to gain additional color channels for either case, or, for the
image basis set, use a set of differently colored LEDs with multiple narrow-band spectral power distributions to
illuminate the subject for each lighting basis or direction.
75
In this work, we record multispectral reflectance fields using an LED sphere (Fig. V .1) comprised of six
different LEDs of distinct spectra: red, green, blue, cyan, amber, and white (RGBCAW), spectra in Fig. III.1.
Once we have acquired a multispectral reflectance field, we can relight the subject according to the multispectral
lighting conditions of any environment, presuming that the lighting environment has been rendered or captured
in a multispectral manner. In practice, we reproduce a lighting environment using each of the multispectral light
sources of the sphere, producing relative intensities of each spectral channel for each lighting direction using the
methods of chapter III.
Figure V .1: A wide-angle image peering into the light stage or LED sphere, with the still life we use for the
results of chapter V placed at the center.
76
A main advantage of the typical white light spherical lighting basis image acquisition process is the rapid
capture time, which enables visual effects practitioners to acquire such image basis sets for live actors, so that
digital doubles can be relit and realistically composited into any scene. Multiplying the number of lighting basis
functions or lighting directions by the number of desired spectral channels, while theoretically desirable, would
be too slow to practice for live subjects. To avoid lengthy acquisition times, we promote white-LED reflectance
functions to multispectral reflectance functions, requiring only an additional n photographs more than for white
light reflectance field capture, where n is the number of multispectral channels.
Our process is the following:
(i). Acquire white LED reflectance field basis images R(q;f;u;v;k= 0; j) where(q;f) is the incident lighting
direction, (u;v) is the pixel coordinate in the image, k is the index of the LED color (where 0 indicates
white LED light is used), and j is the color channel of the camera.
(ii). Acquire images of the subject lit by a full sphere of illumination F(u;v;k; j) for each available LED color
where(u;v) is the pixel coordinate, k is the index of the LED color where 0 is the white LED and 1::n are
the color LEDs, and j is the index of the color channel of the camera.
(iii). Promote each pixel(u;v) of each lighting direction basis image(q;f) to one which includes the response
to all LED spectra as R(q;f;u;v;k; j) = R(q;f;u;v;0; j) F(u;v;k; j)=F(u;v;0; j).
If multispectral light sources are not available for each direction of the lighting rig, as would be the case
with those designed to capture only white light reflectance fields, we also show a method of spectral promotion
of reflectance fields basis images that uses only a head light positioned to illuminate the scene from the front.
We also demonstrate spectrally promoting the diffuse and specular components of R independently and sum-
ming their promoted results. This requires photographing the diffuse and specular components separately, or
estimating them from the reflectance functions.
77
To evaluate multispectral IBRL color rendition, we construct a still life scene with diverse materials of vari-
ous reflectance spectra, and photograph it illuminated by a mixed illumination real-world environment consisting
of diffused incandescent and fluorescent illumination. Using a fully multispectral reflectance field we demon-
strate superior color rendition as compared to using the tristimulus-scaling approach. We also demonstrate that
multispectral reflectance fields enable the use of monochrome cameras for IBRL. Even with our spectrally-
promoted multispectral reflectance field, we show improved color rendition for the still life scene as compared
with tristimulus-scaling IBRL, with very few additional images required at capture time.
In summary, the contributions of this chapter are the following:
(i). To the best of our knowledge, we show the first example of multispectral IBRL.
(ii). We demonstrate experimentally that using a multispectral reflectance field acquired with a tristimulus
(RGB) camera improves IBRL color rendition, as compared with the tristimulus scaling of white light
reflectance field basis images.
(iii). We show that multispectral IBRL enables the uses of monochrome cameras for reflectance field acquisi-
tion.
(iv). We propose a fast method of approximate multispectral reflectance field capture, by "colorizing" the white
light reflectance field, and we experimentally assess the accuracy of the approximation.
(v). We propose an alternative method of approximate multispectral reflectance field capture, where diffuse
and specular components are spectrally-promoted with different models.
V .3 Method and Equations
In this section we describe our techniques for acquiring a fully multispectral reflectance field of a subject and
producing an IBRL rendered result using such a dataset. We also describe methods for promoting white light
78
reflectance functions to multispectral and for radiometrically and colorimetrically calibrating the various parts
of our acquisition system.
V .3.1 Multispectral Reflectance Field Capture
In our multispectral LED sphere, we acquire the appearance of a subject lit by every lighting direction for every
available LED of distinct spectra (RGBCAW). Our LED sphere includes 346 light sources surrounding a subject,
with 30 light sources spaced around the equator with a separation of 12
, with all lighting directions accounted
for excluding a small portion near the bottom of the sphere. Each light source includes one LED of each spectral
channel, with the LEDs spaced around a 3 cm diameter circle. The diameter of the LED sphere is 2.7 m, so
the angular difference between the LEDs of each spectral channel per light source is small. While we use point
lights coming from a dense set of lighting directions on a sphere as our lighting basis, we could also employ any
of the basis conditions typically used in IBRL, such as Hadamard patterns or the spherical harmonic basis.
V .3.2 Spectral Promotion of Reflectance Functions
Although our LED sphere allows for the capture of a fully multispectral reflectance field (each LED color for
each lighting direction), we would ideally like to minimize the number of images required without sacrificing
color rendition or angular resolution. Towards this goal, we evaluate several methods for promoting a white light
reflectance field (just the white LEDs for each lighting direction) to a multispectral reflectance field, for both a
tristimulus (RGB) and monochrome camera. This is especially notable for the monochrome camera case, since
effectively we need to "colorize" the monochrome images lit from each direction in order to generate a color
IBRL image.
Our general technique for spectral promotion is motivated by the fact that a diffuse, Lambertian surface
will reflect the same spectrum of light toward the camera – up to a scale factor – no matter which direction
it is lit from as long as the illuminant remains the same. Thus, we can measure the spectral reflectance of a
79
point (u;v) on the subject from a reference lighting condition A, and measure only the brightness of the point
as lit by a second condition B, and surmise that the spectral reflectance of the subject lit by condition B is the
spectral reflectance of the point lit by condition A scaled to have the brightness as when lit by B. We could
sensibly choose the reference lighting condition to be a full sphere of even-intensity illumination, since this will
minimize (though not eliminate) the appearance of specular reflectance. Thus, if F(u;v;k; j) is the appearance
of the subject at each pixel(u;v) under a full sphere of illumination for each LED spectrum k for each camera
sensitivity function j, we can write:
R(q;f;u;v;k; j)= R(q;f;u;v;0; j)
F(u;v;k; j)
F(u;v;0; j)
(V .1)
However, since many LED sphere systems have only white LEDs, it might not be possible to produce full
spheres of differently colored illumination F(u;v;k; j). In this case, we can use a variant of this method of
spectral promotion by adding a single multispectral "head light" to illuminate the scene from the front with each
of the desired spectral channels, ideally as a ring light around the camera to minimize shadowing. Thus, if
H(u;v;k; j) is the appearance of the subject at each pixel(u;v) under a head light of each LED spectrum k, we
substitute H(u;v;k; j) for F(u;v;k; j) and H(u;v;0; j) for F(u;v;0; j), writing:
R(q;f;u;v;k; j)= R(q;f;u;v;0; j)
H(u;v;k; j)
H(u;v;0; j)
(V .2)
Specular reflections from the head light are more likely to affect the estimation of the subject’s diffuse
spectral reflectance, although cross-polarizing the head light could conceivably alleviate the issue. Additionally,
by lighting the scene from only from the front direction, the interreflections observed in the scene will be specific
to the frontal lighting direction, rather than to the more general condition of omnidirectional environmental
80
illumination. Our results shown later indicate that both methods are generally successful, but produce different
errors in interreflection regions.
V .3.3 Spectral Promotion with Diffuse and Specular Separation
Spectral promotion with Eq. V .1 assumes that the materials in the scene are Lambertian and ignores the effects of
specular reflections. For a particular lighting direction, the spectrum of the light reflected off a non-Lambertian
object towards the camera could be the color of the light source for dielectrics or a different color for conductors.
For each lighting direction, if we can separate the diffuse and specular components of the reflectance, then we
could apply a different spectral promotion model to each component and sum the results, potentially producing
a closer match to the ground truth multispectral reflectance field.
Separating diffuse and specular components could in theory be done in several ways, including polarization
difference imaging as in Ma et al. [98] or Ghosh et al. [49], spherical harmonic frequency separation [136],
chromaticity analysis [119, 30], or a combination of these approaches e.g. Nayer et al. [103]. In this chapter,
we use polarization difference imaging using vertically and horizontally polarized LEDs as in Ghosh et al. [49].
In addition to the unpolarized multispectral LEDs, each light source in our sphere includes a ring of twelve
white LEDs, with six polarized vertically and six polarized horizontally, allowing us to compute the specu-
lar and diffuse components of each white light reflectance function as in Ghosh et al. [49]. We can pro-
mote the diffuse component of the white light reflectance functions with Eq. V .1, producing R
D
(q;f;u;v;k; j),
where the subscript D denotes the diffuse component. To compute the spectrally-promoted specular compo-
nent R
S
(q;f;u;v;k; j), where the subscript S denotes the specular component, if p(k; j) is the pixel color of a
spectrally-flat object illuminated by each LED spectrum k for the camera sensitivity function j we can write:
R
S
(q;f;u;v;k; j)= R
S
(q;f;u;v;0; j)
p(k; j)
p(0; j)
(V .3)
81
Effectively, with Eq. V .3, we tint the white light reflectance function specular component R
S
(q;f;u;v;0; j)
by the color of light source of each LED spectrum k, assuming that the scene materials are dielectrics. This
prevents the specular reflections of dielectric materials from becoming tinted by the underlying diffuse color of
the material.
We can further improve the spectral promotion of the specular component for the tristimulus (RGB) camera
by using chromaticity analysis to separate materials reflecting light of the same color as the incident illumination
from those reflecting light of a different color than the incident illumination. As the color of light reflected
specularly from conductors closely matches that of the diffuse component of the material, we use Eq. V .1 to
spectrally promote the parts of white light specular reflectance functions that do not exhibit behavior typical
of dielectrics, generating R
SC
(q;f;u;v;k; j), with the subscript SC denoting a specular conductor. In practice,
we compute both R
SC
(q;f;u;v;k; j) and R
S
(q;f;u;v;k; j), and produce a total spectrally-promoted specular
reflectance function by linearly interpolating between the images produced by the two models. To control the
interpolation, for the white incident light we compute the pixel color ratios
R
w
G
w
and
R
w
B
w
from p(0; j= 1;2;3).
For each pixel of a specular white light reflectance function, we compute the pixel color ratios
R
G
and
R
B
. If
j
R
w
G
w
R
G
j= 0 andj
R
w
B
w
R
B
j= 0, then the color of the specular reflectance function for that pixel is the same
as the light source color, and so we spectrally promote this pixel using R
S
(q;f;u;v;k; j). Ifj
R
w
G
w
R
G
j> t and
j
R
w
B
w
R
B
j> t, where t is a threshold that we empirically set to 0.3, then we use R
SC
(q;f;u;v;k; j), interpolating
in all other cases. Exploring other methods of separating conductors from dielectrics in specular reflectance
functions is of interest for future work.
V .3.4 Multispectral Image-Based Relighting
Once we have a multispectral reflectance field, through exhaustive imaging with the omnidirectional multi-
spectral lighting basis or through one of the spectral promotion techniques, we can relight the object with a
82
multispectral lighting environment. We capture real-world multispectral lighting environments by photograph-
ing a mirrored sphere and an arrangement of five "Nano" ColorChecker
TM
charts from Edmund Optics using
traditional HDR photography with an RGB camera. Using an image-based metameric reflectance matching ap-
proach as in chapter III, we independently solve for the optimal relative intensity of each spectral channel for
each lighting direction, matching color chart appearances for each narrow cone of the lighting environment.
To optimally reproduce the target chart appearances, we solve for the LED intensity coefficients a
k
that
minimize Eq. III.1, where m is the number of color chart patches and n is the number of different LED spectra.
However, rather than driving the LEDs to directly illuminate a subject in a studio environment as in chapter III,
we use these intensities as scaling factors for the measured (or spectrally-promoted) multispectral reflectance
field basis images.
If a 3 3 color matrix M is to be applied to the IBRL result for display, and M unequally weights the color
channels, then the quantity to minimize of should be modified, as both the target values P and the reproduced
values La will change, and errors for all color channels should be weighted equally in the minimization. Before
constructing P and L, we apply the color matrix M to the RGB triples, producing P
M
and L
M
.
argmin(jjP
M
L
M
ajj
2
) (V .4)
When using a monochrome camera, LED intensity coefficients a
k
must be computed separately for each
color channel, yielding 3n degrees of freedom for each lighting direction. Eq. III.1 is modified for the monochrome
camera case in Eq. V .5. N
ik
is the average pixel value of color chart square i under LED spectrum k, and N is the
m n matrix whose columns correspond to the LED spectrum k and whose rows correspond to the color chart
83
square i. Eq. V .5 should be minimized for each color channel of the target chart, producing in our case three a
vectors, which are the lighting primaries for that direction of the lighting environment.
argmin
m
å
i=1
(P
i
n
å
k=1
a
k
N
ik
)
2
= argmin(jjP Najj
2
) (V .5)
In the monochrome camera case, as the relative intensities of the spectral channels are optimized indepen-
dently for each color channel of the target chart, applying a 3 3 color channel mixing matrix does not impact
the minimization.
V .3.5 Measurement Setup and Calibration
We use a tristimulus (RGB) Ximea MQ042CG-CM machine vision camera with 50mm Fujinon lens and a ver-
tical linear polarizer filter for all measurements, including: the capture of a real-world lighting environment
mirrored sphere and color chart photographs, a reference photograph of a subject in this real lighting environ-
ment, the subject’s multispectral reflectance field and multispectral full sphere illumination conditions, and its
horizontally and vertically polarized white light reflectance fields. This Bayer pattern camera produces linearly
encoded 8-bit images of resolution 2048 2048 at frame rates up to 72 frames per second, and its image ac-
quisition can be triggered by an external signal such that the sensor is precisely exposed to each basis lighting
condition individually. Since our experimental still life scene contains highly reflective conductors as well as
dark regions in shadows, we photographed the lighting environment, the appearance of the still life in the real
lighting environment, and all reflectance fields at multiple exposure times, assembling them into HDR images
[27], and scaling each to the correct relative radiance. We measured the radiometrically linear range of pixel
values for this camera to be from 0.1 to 0.85 on the scale of 0.0 to 1.0. The camera has a non-zero black level,
so we subtract an averaged dark current image from each frame for each exposure time prior to demosaicing.
84
Although the Lumileds Luxeon Rebel ES multispectral LEDs of our LED sphere are binned in a quality
control process to ensure that all of the LEDs of the same spectral channel have a similar emission spectrum
and overall radiance, we observed LED-to-LED variation in radiance that was greater than measurement-to-
measurement or day-to-day variation. Accordingly, we multiply each color channel of the reflectance field basis
images by a unique brightness scaling factor per LED, such that each image represents the appearance of the
scene as lit by a uniform amount of metameric light from each direction. In practice, calibrating the relative radi-
ance of 3468 lighting conditions (346 light sources, 6 spectral channels, plus the horizontally and vertically
polarized LED groups) in a spherical arrangement is a non-trivial task. We compute these scaling factors by
placing a diffuse, spectrally-flat 33% reflective gray sphere at the center of the LED sphere and photographing
it with the Ximea camera as illuminated by each multispectral and polarized basis lighting condition. For each
lighting basis, we then compute the relative light source intensity for each image color channel using photomet-
ric stereo [150], assuming the surface normals follow the shape of a sphere. Since the light sources aim in from
all directions, we used four cameras placed 90
apart around the equator to photograph the sphere under each
basis lighting condition, interpolating between the photometric stereo’s light intensity values from each camera
according to each light’s distance from the camera.
Given our LED arrangement, our full-sphere illumination conditions should produce about 90 more light
than a single point light source illuminating the scene from the front. In order to properly expose these conditions
with the same exposure bracketing used for the lighting basis images, we dim the LEDs for the full sphere
condition using pulse-width modulation (PWM). To achieve the correct relative radiance for the full sphere
spectral promotion of reflectance fields, we need to ensure that
F(u;v;k; j)
F(u;v;0; j)
=
R(u;v;k; j)
R(u;v;0; j)
. For each spectral channel,
we sum the scaled lighting basis images of the diffuse gray sphere, comparing the value with PWM dimmed
full sphere condition. We observed that summing across the basis images for each lighting direction produced
a diffuse sphere image that was on average 1.2 brighter than the photographed full sphere condition for the
85
RGBCW and polarized LEDs, and 1.14 brighter for the amber LED, after adjusting for the PWM, due to the
increased current draw required to power all the light sources simultaneously.
We measured the reflectivity of our mirrored sphere using the technique in Reinhard et al. [113], ch. 9, and
found it to differ slightly for each color channel as (R = 60.7%, G = 61.6%, B = 63.4%), so we scale the image of
the captured mirrored sphere image of the target lighting environment accordingly to the same relative radiance
per color channel.
As our experiments require that we not move the camera, we could not test multispectral IBRL for both a
tristimulus and true monochrome camera simultaneously as desired. Instead, after acquiring the reflectance fields
and full sphere illumination conditions with our tristimulus camera, we compute artificial monochrome images
as M= 0:986R+0:368G+0:578B, using the pixel values after dark current subtraction and demosaicing, which
we measured as the best color channel mix to match the appearances of a color chart illuminated by daylight for
the tristimulus camera and the comparable monochrome Ximea camera.
V .4 Results
We quantitatively and qualitatively compare the multispectral and tristimulus-scaling IBRL results for pho-
tographs of a still life scene in a real lighting environment, using both an RGB and (simulated) monochrome
camera. We also compare the IBRL results using several methods of multispectral promotion to results achieved
using the full multispectral omnidirectional lighting basis.
For these comparisons, we constructed a still life scene containing various materials with diverse reflectance
spectra, including colorful paper straws, a red metallic sphere, prism-shaped wax crayons, and twelve fabric
swatches arranged as a chart (see Fig. V .1). We also included a "Nano" ColorChecker
TM
chart from Edmund
Optics in the scene, and a paper cup flanked by bright red and green squares of felt to elucidate the rendering of
interreflections by each method for reflectance field spectral promotion.
86
The recorded real-world lighting environment was constructed to include incandescent and fluorescent area
lights placed up against the left and right sides of the LED sphere, allowing both the appearance of the still
life in the real lighting and the multispectral reflectance field to be captured without moving the still life or the
camera. For visualization, we show in Fig. V .2 the appearance of the lighting environment reflected off two
spheres of different reflectivities (chrome and black acrylic), and the appearance of color charts facing in five
different directions.
Figure V .2: Left: The lighting environment observed as reflected in chrome and black acrylic spheres. Right:
Sampled colors from the five color charts placed in the lighting environment.
For all comparison images we apply one consistent color matrix M= [1.768, -0.187, -0.329; -0.461, 1.780, -
0.525; -0.139, -0.845, 2.882] to convert camera RAW values to sRGB for display, but we apply no image-specific
color correction. We used M in the lighting environment optimization as described in Eq. V .5. We applied no
brightness scaling to any images of the still life in this section, although we multiplied the pixel values of the
color charts with one overall brightness scaling factor for the qualitative and quantitative color chart matching
assessments.
87
V .4.1 Color Rendition of Multispectral IBRL
Figure V .3: Quantitative experimental error plot for Fig. V .7, computing the color error for the three IBRL
methods, based on the average squared errors of linear RGB pixel values, with the color matrix M first applied
to both the target chart in the real lighting environment and the IBRL results. We report average error from all 30
ColorChecker nano patches, and, then, separately, the twelve fabric swatches from the 3 4 chart in the scene,
relative to the white square pixel value of the color chart in the original lighting environment.
We recorded the lighting environment using the multispectral lighting capture technique of chapter III and
photographed the still life from the front using a Ximea MQ042CG-CM RGB machine vision camera as lit
by this environment. We then photographed the still life lit by each each of the six spectral channels of each
of the 346 point light sources in the LED sphere to acquire the still life’s multispectral reflectance field. One
multispectral basis lighting condition example is shown in row C of Fig. V .11, as seen by the RGB tristimulus
camera. We also simulated how a monochrome camera might have observed the multispectral reflectance field
by converting the RGB images to grayscale as previously described.
To synthetically illuminate the scene with the recorded lighting environment based on its recorded multispec-
tral reflectance field, we solved for the spectral channel intensities for each direction of the lighting environment
with Eq. V .4 for the tristimulus camera and the lighting primaries for each direction of the environment for the
monochrome camera with Eq. V .5. The lighting basis elements were then weighted by the computed channel
88
intensities and summed. For comparison, we also performed traditional RGB image-based relighting [30] by
scaling the RGB channels of the white-balanced white-LED reflectance field basis images according to an RGB
light probe image.
We compare the color rendition of each approach in Fig. V .7. At first glance, all three approaches (tristimulus-
scaling IBRL and multispectral IBRL with both color and monochrome cameras) produce results which are sim-
ilar to the ground truth lighting. But, on closer inspection, both the (simulated) monochrome camera IBRL and
the white light only tristimulus-scaling approach produce color mismatches, as observed in the close-up inset of
the colorful straws, and in the color charts below them, both shown in Fig. V .7. In these charts, the the fore-
ground dots represent the measured values in the IBRL results, and the background squares represent the chart
colors measured in the real lighting environment. When the foreground dots visually fade into the background
squares, IBRL produced good color rendition. Multispectral IBRL with the color camera visually produces the
best color rendition and quantitatively reduces color error for both the fabric and color chart squares for the red
and green channels (Fig. V .3). We observed with all three IBRL approaches some amount of haze in the dark
regions of the images, which we believe to be the result of dark current which was not entirely compensated
for. We suspect that this haze is responsible for the lack of improved color rendition for the blue channel for
multispectral IBRL, as the haze in the multispectral image produced with the color camera appears blue.
V .4.2 Spectral Promotion of Reflectance Fields
In addition to acquiring the multispectral reflectance fields, we also photographed the scene lit by a full even
sphere of illumination for each LED spectrum (Row A of Fig. V .11). We generated a similar set of images for
our simulated monochrome camera. For both the color and monochrome cameras, we promoted the white LED
reflectance field to multispectral using Eq. V .1, hallucinating the appearance of the still life scene as illuminated
by every spectrum of light from every direction. We show an example spectrally-promoted basis image for
each spectral channel in Row D of Fig. V .11, and, in row E, the absolute value of the difference between the
89
full sphere spectrally-promoted basis image and the ground truth image for each spectral channel for the same
lighting direction of the sphere. Using the spectrally-promoted reflectance fields, we then compute the IBRL
results using the multispectral relighting equations for color and monochrome cameras, respectively Eq. V .4 and
Eq. V .5. We show the full sphere reflectance field spectral promotion IBRL renderings in the second row of Fig.
V .8, for comparison with the results computed using the measured multispectral reflectance field (first row of
Fig V .8). We also show the color differences resulting from the full sphere spectral promotion in the first row
of color charts of Fig. V .4. The background squares show the sampled colored charts from the multispectral
reflectance field IBRL result, and the foreground dots represent the sampled charts from the spectral promotion.
To evaluate the head light spectral promotion, we computed a virtual head light image as the sum of three
multispectral reflectance field basis images for the lighting directions closest to our camera. For both the color
and simulated monochrome camera, we promoted the white LED reflectance field to multispectral using Eq.
V .3, again hallucinating our scene’s appearance as illuminated by each spectrum of light from each direction.
We show an example of such a spectrally-promoted basis image of the reflectance field for each spectrum in
Row F of Fig. V .11, and the absolute value of the difference between the spectrally-promoted basis image and
the actual image for that lighting condition in Row G of Fig. V .11. We produce the color and monochrome
IBRL results with the head light promoted multispectral reflectance field using Eq. V .4 and Eq. V .5. We show
the head light promoted IBRL renderings in the third row of Fig. V .8 and the color differences resulting from
this method of spectral promotion in the second row of color charts of Fig. V .4, where the background squares
again show the sampled colored charts from the multispectral reflectance field IBRL result and the foreground
dots represent the sampled color charts from this method of spectral promotion.
In Fig. V .5, we report the quantitative error for spectral promotion for the fabric swatches and color chart
squares, by comparing the sampled color values from the IBRL result using the different techniques of spectral
promotion with the results that could be achieved using the full, measured multispectral reflectance field. The
head light approach does introduce color rendition error, and we observed an overall brightness increase for the
90
Color Camera Monochrome Camera
Full Sphere Pro. Head Light Pro.
Figure V .4: Background squares represent the sampled values from the IBRL results using the full multispectral
reflectance field. Row 1. Foreground dots represent sampled color chart values from the full sphere spectral
promotion IBRL results for each camera. Row 2. Foreground dots represent sampled color chart values from
head light promotion IBRL results. Best viewed on a monitor.
color squares relative to the white square intensity (Fig. V .4), likely owing to the increased amount of light
reflected specularly from the not-entirely-diffuse color chart in this lighting configuration.
The average color error for the fabric and color charts rendered using the full sphere spectrally-promoted
reflectance field for the color camera is 0.7% (Fig. V .5), indicating that full sphere promotion is a viable method
that can yield the color rendition improvement of multispectral IBRL, while requiring only a few additional
photographs for the lighting basis. As the first row of color chart results of Fig. V .4 only compared the full
sphere promotion with the color charts rendered by the fully-multispectral IBRL, in Fig. V .6 we also compare
the full sphere promoted multispectral IBRL result with the sampled color chart pixels from the real environment.
We show that even with this approximation step, we achieve superior color rendition as compared to the existing
method for this spectrally-complex real-world environment.
91
Figure V .5: Color error for sampled pixel values from the fabric and color charts, comparing the multispectral
IBRL results computed using the spectrally-promoted reflectance fields for the full sphere and head light meth-
ods. We evaluate both techniques for color rendition by comparing with the IBRL result using the measured
multispectral reflectance fields. We report average error from all 30 ColorChecker nano patches, and, then, sep-
arately, the twelve fabric swatches from the 3 4 chart in the scene, relative to the white square pixel value of
the color chart rendered using the multispectral IBRL. Left. Color Camera. Right. Monochrome camera.
In the case of the monochrome camera imagery on the right of Fig. V .8, it is apparent that spectral promotion
can introduce color errors in the presence of interreflections. In the real image at the top, there are strongly hued
shadows on the sides of the central cup from the bounced light from the red and green felt cards to its sides.
In the full sphere promotion image in the middle row, these interreflections are somewhat muted, since in the
full sphere image the shadows are filled in with more direct white LED light, and thus are less saturated. The
situation is even worse for the headlight condition, where the shadows are very desaturated. This results from
the headlight illuminating the cup well, but not the red and green felt squares, which are nearly perpendicular to
the lighting direction, so that little bounced light is received by the cup. As a result, the shadows are promoted
using relatively desaturated lighting estimates and do not match the coloration of the correct shadows at all well.
V .4.3 Diffuse/Specular Separation Spectral Promotion
We also acquired the reflectance field of the still life scene as it was illuminated by each of the polarized light
sources, such that we could spectrally promote the diffuse and specular components of the reflectance field
according to the different proposed models. While the IBRL result (Fig. V .9i) did produce color rendition
92
Full Sphere Promoted
White LEDs IBRL Multispectral IBRL
Figure V .6: Foreground dots represent sampled color chart values from the IBRL results, while background
squares represent the sampled values from the chart in the real lighting environment. Best viewed on a monitor.
results visually comparable to those of the other methods of spectral promotion, we did not observe an overall
benefit for the still life scene in this particular lighting environment, likely because our scene did not contain
many dielectric materials, and the light sources of the real lighting environment were relatively diffused and
oriented in a way that minimized the appearance of specular reflections relative to the camera position.
Nonetheless, we show in Fig. V .9 that promoting the diffuse and specular components of the reflectance
fields with different models can reduce color error for the individual basis lighting conditions, where we do
observe specular reflections for our scene. Fig. V .9g shows the result of combining the spectrally-promoted
diffuse, specular conductor, and specular dielectric classified pixels of this lighting basis image separately, where
Fig. V .9h is the absolute value of the difference between Fig. V .9g and the still life appearance when really lit
by the amber LED for that lighting direction (Fig. V .9j). Fig. V .9k shows the result from the spectral promotion
using only the full sphere of illumination, with the corresponding error map as compared with Fig. V .9j in Fig.
V .9l. Spectrally promoting the diffuse and specular components differently minimizes the color error particularly
for the dielectric, green wax crayon prism to the right in the scene.
Fig. V .9b shows the result of the polarization difference imaging for this lighting direction, where we observe
only specular reflections. Fig. V .9c shows the result of spectrally promoting the specular lighting basis image
with Eq. V .1, and V .9d shows the result of promoting the specular lighting basis image with Eq. V .3. Promoting
93
the specular reflection with the diffuse model selects the wrong color for the dielectric specular highlight in Fig.
V .9c, as it appears red. When using the specular model, the highlight color appears to match that of the real
amber LED lighting basis image. Fig. V .10 enlarges a region of interest of Fig. V .9c and d.
94
White LEDs IBRL Real Lighting Multispectral IBRL Multispectral IBRL
(RGB Cam.) (RGB Cam.) (Mono. Cam.)
Figure V .7: Row 1. Comparison among three different methods of image-based relighting (IBRL) for a still life
scene illuminated by a complex multispectral lighting environment and the real photographed appearance of the
scene under the actual illumination. Left: IBRL result using white LEDs only for capturing the reflectance field
(similar to technique of previous work [30]). Second from the left: photograph of the scene lit by the real-world
illumination. Second from the right: Multispectral IBRL using six spectral LED channels to capture the multi-
spectral reflectance field, as recorded by a color camera. Right: Multispectral IBRL again using multispectral
LEDs, as observed by a monochrome camera. Row 2. Close-up insets of the images of Row 1, demonstrating
superior color rendition with Multispectral IBRL with a color camera. Row 3. After sampling the pixel values
from the color charts centered in the scene, we scale the charts to the same overall brightness with a single fac-
tor. The foreground dots represent sampled values from the images of the same column, while the background
squares represent the target colors from the color chart in the real mixed-illumination lighting environment.
Good color rendition is indicated when the foreground dots visually fade into the background squares. (For the
real lighting environment color chart, the foreground and background colors would be the same.) Best viewed
on a monitor.
95
Color Camera Monochrome Camera
Multispectral Basis Full Sphere Promoted Head Light Promoted
Figure V .8: Row 1. IBRL result using the measured multispectral reflectance field, for color and monochrome
cameras. Row 2. IBRL result using the full-sphere spectrally-promoted reflectance field, using Eq. V .1. Row 3.
IBRL result using the head light spectrally-promoted reflectance field, using Eq. V .3. The shadows at the sides
of the cup are relatively desaturated in the spectrally-promoted images for the monochrome camera.
96
a b c d
e f g h
i j k l
Figure V .9: a. The diffuse component of the reflection of the scene. b. The specular reflection of the scene
as computed from polarization difference imaging using white LEDs. c. The result of spectrally promoting the
specular reflection component using Eq. V .1, hallucinating the appearance under the amber LED. d. The result
of spectrally promoting the specular reflection component using Eq. V .3. e. The result of spectrally promoting
the diffuse component using Eq. V .1. f. The blended result, interpolating between the images of c and d
according to our approach. g. The sum of the spectrally-promoted diffuse reflection component and the blended
result spectrally-promoted specular component. h. The absolute difference between g and the real lighting basis
image j. i. The multispectral IBRL result using polarizing difference imaging, with the multispectral reflectance
field computed in the same manner as g. j. The actual appearance of the scene when lit by the amber LED for
this direction of the LED dome. k. The result of promoting both the diffuse and specular components of the
reflection of the scene using V .1, equivalent to the sum of e and c. l. The absolute difference between k and the
real lighting basis image j.
97
a b
Figure V .10: a. An inset of Fig. V .9c, showing that promoting specular reflections produces color mismatches
for dielectric materials such as the wax crayon (right in each figure). b. An inset of Fig. V .9d, showing that
promoting the specular reflections with Eq. V .3 generates the expected appearance, although we did not measure
the real specular-only reflectance components when the scene was lit by the amber LED.
98
White LED Red LED Amber LED Green LED Cyan LED Blue LED
A. Full Sphere B. Head Light
C. Real Basis D. Promoted
w/ A
E.j(C - D)j F. Promoted
w/ B
G.j(C - F)j
Figure V .11: Row A. Full sphere of illumination for each spectral channel. Row B. Head light illumination.
Row C. Example measured multispectral basis images. Row D. Example spectrally-promoted basis images,
computed using Eq. V .1 with full sphere of illumination (White - actual image; Red, Amber, Green, Cyan, and
Blue - Virtual images). Row E. Absolute difference of the measured and full sphere promoted basis images.
Row F. Example spectrally-promoted basis images, computed using Eq. V .2 with head light illumination. Row
G. Absolute difference of the measured and head light promoted basis images.
99
V .5 Application: Camouflage Evaluation
In this section, we describe one application of multispectral IBRL and thus show that computer graphics tech-
niques can be used to improve the assessment of novel military camouflage materials, where color rendition is
particularly critical. Ultimately, the goal of such materials are to allow a soldier to visually blend into their envi-
ronment when observed by the human eye or a variety of surveillance cameras with different spectral response
functions. While care and consideration are taken such that the reflectance spectra of the pigmented fabrics used
for military uniforms best match the real-world reflectance spectra of various natural materials such as grass,
foliage, or sand, human perception studies are still used to assess the camouflaging ability of any new materials.
A typical study will include having a soldier wear a novel uniform and stand in the desired natural environment,
while a photographer captures their appearance at a variety of distances. Then, the images are displayed on a
computer monitor while users are asked to evaluate whether or not they are able to see the solider in the resulting
photographs. There are several problems with this material evaluation approach:
1) New materials cannot be exactly benchmarked against existing materials, since it is not possible to recreate
the exact lighting conditions present on the day of the previous materials’ real-world imaging sessions.
2) Outdoor lighting conditions change rapidly with time due to the movement of the sun and clouds, so even
photographs captured on a given day may be challenging to compare with one another.
3) For each material or uniform, photographs must be taken at a variety of distances, which is time consuming
and theoretically unnecessary, since the material or uniform’s appearance does not change, but only the
percentage of the field-of-view it occupies for a given imaging distance, camera, and lens.
4) The camera spectral response and the incident illumination spectra are effectively embedded in each cap-
tured dataset, since only RGB data are recorded. Another light source could potentially change the effec-
tiveness of the camouflage, due to the effect of "illuminant metamerism," or another camera could change
100
the effectiveness due to "observer metamerism." In both cases, a spectral mismatch between the materials
of the natural world and the camouflage materials may cause colors to match for some illuminants or some
cameras, but not others.
Problem 1, of saving exact lighting conditions in order to later recreate their effect when illuminating multi-
ple uniforms, is easily solved used the capture techniques of chapter III. Problem 2 is also eliminated, as our fast
lighting capture method requires only about 1.5s for HDR acquisition. Problem 3 can be addressed using image
processing operations during compositing by downsizing the foreground element before combining it with the
desired background, although background plates at a variety of distances would still be required for the final
composites.
Problem 4 can be addressed for the case of multiple cameras by storing a record of each lighting environment
in the form of an HDR mirrored sphere photograph and color chart observation as recorded by multiple imaging
devices, although the human observer case is more challenging. Illuminant metamerism could be evaluated by
recording a wide variety of lighting environments using our capture technique, which would be less challenging
since only a record of the lighting and not each uniform’s appearance would need to be captured for each
environment. Formally, if I is the number of test lighting environments, n is the number of uniforms, and v is
the number of views, the existing outdoor photography method required I n v photographs for the material
evaluation. Our multispectral image-based relighting approach requires I HDR photographs for the lighting
environments plus I v photographs for the background plates used for compositing. Of course n(b+ k)
photographs would be required for the spectrally-promoted reflectance field basis images, where b is the number
of lighting directions and k is the number of spectral channels in the LED sphere, but these photographs are not
captured on location, are straightforward to acquire, and may be reused for as many lighting environments as
desired.
101
To relight a military camouflage uniform in a novel lighting condition with good color rendition, we use
the techniques of Sec. V .3. The LED sphere used for all previously shown results was designed to uniformly
illuminate a volume about the size of a human head placed at the center of the sphere, but in order to capture
a set of reflectance field basis images for a full human body, we used a larger LED sphere designed to evenly
illuminate a human-sized figure (Fig. V .12), as used in Einarsson et al. [35]. Importantly, this LED sphere
consists only of white LEDs, and so we designed a six-channel multispectral "head light" as described in Sec.
V .3.2. Using Eq. V .2, we could therefore record a white light reflectance field basis image set and promote it to
multispectral.
Figure V .12: A light stage designed for full-body reflectance field capture, but equipped only with white LEDs.
Examples of the white light reflectance field basis images captured with a monochrome Ximea machine
vision camera and 25mm Fujinon lens are shown in Fig. V .13 for three different camouflage uniforms. (We
note that a color camera could also be used). We captured 225 basis lighting conditions in total, plus the six
multispectral head light conditions. Images were captured of a mannequin wearing each of the uniforms, and
the camera was oriented at a 45
angle to fit the entire mannequin into the field of view. Immediately after the
102
reflectance basis was captured, a reflective gray board was placed behind the mannequin and illuminated from a
grazing angle, such that an alpha channel could be obtained. An example of a recorded alpha channel used for
compositing is shown in Fig. V .14.
Figure V .13: Example images from the white light reflectance field basis image set for the camouflage uniform
evaluation. Each row represents a different uniform made of a different patterned camouflage material, while
each column represents a different basis lighting condition.
Two outdoor multispectral lighting environments were captured using the technique of chapter III, and si-
multaneously background plates were acquired for later use in compositing. The lighting environments included
direct sunlight and so required missing light reconstruction [113]. Additionally, three different subjects were as-
signed to wear one of the uniforms, and they were photographed quickly after the lighting capture, as illuminated
by the real-world lighting conditions. These real images are used as "ground truth" for visually comparing with
103
Figure V .14: Upper left: multispectral image-based relighting result using the spectrally-promoted lighting
basis images for one of the camouflage uniforms. Upper right: alpha channel obtained by illuminating a board
placed behind the subject. Lower left: clean plate of a background captured at the same time as the lighting
environment, with the same camera. Lower right: composite of relit foreground and the background clean plate.
The lighting environment is pictured in Fig. V .17.
our camouflage uniform relighting results. We also photographed a large color chart in each lighting condition
for quantitative analysis.
In Figures V .16 and V .17, we show the composited images of the relit uniforms using the multispectral
approach as compared with the previous technique’s RGB-scaling approach and the ground truth. Visually, the
colors in the multispectral IBRL result more closely match the ground truth than the RGB-scaling result for each
of the uniforms. One obvious difference between the relit results of Figures V .16 and V .17 is the lack of sharp
shadowing on the sand or grass, which is caused by each scene’s direct sunlight. Future work should endeavor to
capture shadow information for compositing during the reflectance field capture. While theoretically possible,
104
we did not include this in our approach. Other than the lacking ground shadows, the multispectral results yield
convincing composites with good color rendition. Camouflage material is intentionally designed to be relatively
diffuse, as specular reflections would cause a uniform’s wearer to be more easily observed. Accordingly, our
multispectral relit composites were convincing even without using the separate diffuse and specular spectral
promotion models previously described.
In Fig. V .18, we show the composited color charts in each lighting environment. Again, the multispectral
IBRL chart colors more closely match those of the ground truth image. From the color charts we computed
quantitative error results for both lighting environments and both the multispectral and RGB-only IBRL tech-
niques, which are shown in Fig. V .15. Quantitative results show improved color rendition for the multispectral
IBRL for the color charts, in alignment with visual perception of the results in Figures V .16, V .17, and V .18, as
well as the previously reported still-life scene measurements.
Sunlight Environment (Grassy ground plane) Sunlight Environment (Sandy ground plane)
Figure V .15: Average percent RGB color error (camera raw values) across all 24 color chart squares, comparing
both multispectral and RGB-only relit results with the ground truth colors after scaling for overall brightness,
reported for two lighting environments. RGB-only relighting shows a higher average color error compared with
the multispectral result. These results correspond with the images in Fig. V .18.
105
Multispectral IBRL Ground Truth Photo RGB IBRL
Figure V .16: The lighting environment is shown on the far left, with bright direct sunlight and bright light from
below bounced off the sand. For three different uniforms, we show the composited multispectral IBRL result
using our approach (left), the ground truth photograph of a uniform captured in the real lighting environment
(middle), and the RGB-scaling IBRL result. The uniform colors in the multispectral IBRL result more closely
visually match the ground truth photograph colors. Best viewed on a monitor.
106
Multispectral IBRL Ground Truth Photo RGB IBRL
Figure V .17: The lighting environment is shown on the far left, with bright direct sunlight and green light from
below bounced off the grass. For three different uniforms, we show the composited multispectral IBRL result
using our approach (left), the ground truth photograph of a uniform captured in the real lighting environment
(middle), and the RGB-scaling IBRL result. The uniform colors in the multispectral IBRL result more closely
visually match the ground truth photograph colors. Best viewed on a monitor.
107
Multispectral IBRL Ground Truth Photo RGB IBRL
Figure V .18: The lighting environments are shown on the far left, with bright direct sunlight for both. For the
color checker chart, we show the composited multispectral IBRL result using our approach (left), the ground
truth photograph of a chart captured in the real lighting environment (middle), and the RGB-scaling IBRL result.
The second row contains an inset of the first row. The fourth row contains an inset of the third row. The chart
square colors in the multispectral IBRL result more closely visually match the ground truth photograph colors.
The quantitative results for these images are shown in Fig. V .15. Best viewed on a monitor.
108
V .6 Conclusion
In this chapter, we have shown that performing image-based relighting in the multispectral domain produces
more accurate relighting results than the traditional approach of multiplying lighting and reflectance in the tris-
timulus domain, at least for a still life scene with a variety of colorful objects and certain camouflage materials.
Recording a multispectral reflectance field requires more data, however, so we introduced the process of spec-
tral promotion where monochrome or RGB images of an object under a particular direction of white light are
promoted into their plausible appearance under a variety of narrow-band spectra of illumination based on the re-
flectance of the objects as observed by an alternate white-LED lighting condition. We showed that this produces
good image-based relighting results in the multispectral domain with only a few more images than traditional
reflectance field capture, although errors can arise where there are specular reflections or interreflected light. To
address the problem of specular reflections, we showed that if the reflectance field is recorded under two polar-
ization conditions that allow the separation of diffuse and specular reflections, then the reflectance components
can be spectrally promoted according to independent color models and more accurate results can be obtained. As
a result, we have shown that multispectral image-based relighting can be performed accurately and efficiently,
enabling new applications for the simulation of realistic illumination on real-world subjects.
109
Chapter VI
Efficient Multispectral Facial Capture with Monochrome
Cameras
VI.1 Summary
In previous chapters, we showed how lighting reproduction and image-based relighting could be improved us-
ing a light stage with multispectral LEDs. Another use of a light stage is for high-resolution facial scanning,
which we similarly extend to the multispectral domain in this chapter. Specifically, we propose a multispectral
variant to polarized gradient illumination facial scanning which uses monochrome instead of color cameras to
achieve more efficient and higher-resolution results. In typical polarized gradient facial scanning, sub-millimeter
geometric detail is acquired by photographing a subject in eight or more polarized spherical gradient lighting
conditions made with white LEDs, and RGB cameras are used to acquire color texture maps of the subject’s
appearance. In our approach, we replace the color cameras and white LEDs with monochrome cameras and mul-
tispectral, colored LEDs, noting that color images can be formed from successive monochrome images recorded
under different illumination spectra. While a naïve extension of the scanning process to this setup would require
multiplying the number of polarized gradient illumination images by the number of spectral channels, we show
that the surface detail maps can be estimated directly from monochrome imagery, so that only an additional n
110
photographs are required, where n is the number of added spectral channels. We also introduce a new multi-
spectral optical flow approach to align images across spectral channels in the presence of slight subject motion.
Lastly, for the case where a capture system’s white light sources are polarized and its multispectral colored LEDs
are not, we introduce the technique of multispectral polarization promotion, where we estimate the cross- and
parallel-polarized monochrome images for each spectral channel from their corresponding images under a full
sphere of even, unpolarized illumination. We demonstrate that this technique allows us to efficiently acquire
a full color (or even multispectral) facial scan using monochrome cameras, unpolarized multispectral colored
LEDs, and polarized white LEDs.
VI.2 Introduction
Creating high-quality digital human characters, particularly those based on the likeness of real people, is a long-
standing goal in computer graphics, with applications in films, video games, simulations, and virtual reality.
Computational imaging and illumination systems have been developed to faithfully capture a subject’s facial
shape and appearance to produce highly photo-realistic renderings of the subject’s digital double for compositing
into a novel scene. Such systems can be broadly grouped into two categories: (1) those using even, diffuse
illumination and multiple camera viewpoints to estimate facial geometry using multiview stereo approaches,
and (2) those using multiple camera viewpoints in combination with specialized illumination patterns, generating
facial geometry of a comparatively higher resolution using variants of shape from shading or photometric stereo
for surface normal estimation [150].
For the methods of the second category, such as that of Ma et al. [98] and subsequent approaches [49, 147],
polarization filters are placed in front of the lights and the cameras, and polarization difference imaging sepa-
rates the sub-surface and specular reflections from images for each gradient illumination condition, exploiting
111
a. red LED b. green LED c. diffuse reflections d. 3D rendering
e. blue LED f. white LED g. specular reflections h. 3D geometry
Figure VI.1: a,b,e,f. Monochrome photographs of a subject lit by red, green, blue, and white LEDs; c. Colorized
diffuse reflection image produced by mixing a,b,e, and f; g. Monochrome polarization difference image showing specular
reflections; d. Full-color rendering of the subject; h. Geometry rendering of the subject with no diffuse albedo. In this
chapter, all renderings are produced using only monochrome images.
the polarization-preserving aspect of specular reflections from the skin. Ma et al. [98] demonstrated that sep-
arating sub-surface and specular reflections improved 3D reconstruction, since surface normals estimated from
the sharp, unblurred specular reflections more accurately captured the pore-level details of facial geometry. Ad-
ditionally, the cross-polarized images provided a better representation of the subject’s colored diffuse albedo,
recording light that has traveled through the skin, having undergone multiple sub-surface scattering events and
therefore depolarization before reflecting back towards the camera.
Since a color diffuse albedo texture map is required for rendering a photo-realistic digital double, most facial
scanning systems of both categories [16, 18, 98, 49, 147] use tristimulus RGB cameras and broad-spectrum white
112
light sources for acquisition. One recent exception is the work of Fyffe et al. [44], where monochrome machine
vision cameras were used in combination with unpolarized LED-generated blue light. This shorter wavelength
light revealed more texture cues in the skin for multiview stereo and optical flow computations. However, this
single-shot approach only generated facial geometry and did not capture reflectance information or produce the
colored texture maps needed to produce photo-realistic renderings.
Photography with monochrome cameras offers a few obvious theoretical advantages. First, for a color cam-
era, the light-absorbing color filters placed in front of the imaging sensor may absorb more than two thirds of
the total incident light per pixel, requiring a light-boosting compromise somewhere else in the system across the
exposure triad (increasing sensor gain, exposure time, or aperture size, none of which are desirable for high-
resolution facial scanning), or the use of even brighter lights, which could be uncomfortable for live subjects.
Additionally, most modern color cameras include RGB filters in the color filter array (CFA) arrangement of a
typical Bayer pattern, reducing the effective image resolution for each color channel. Full-color images are pro-
duced using various up-sampling methods (referred to as debayering or demosaicing algorithms). In contrast,
monochrome cameras allow for imaging with comparatively less incident light at a higher true image resolution,
while only generating a single-channel intensity image. In this chapter, our goal is to gain the aforementioned
benefits of monochrome camera image acquisition in a high-resolution facial scanning system, without sacrific-
ing the color information that is required for rendering.
While color images are most often produced using tristimulus RGB cameras with spectral sensitivity func-
tions similar to the human visual system, they can also be produced using active illumination – by sequentially
illuminating a subject with at least three differently colored light sources, capturing images using a monochrome
sensor with a broad spectral response. This technique was exploited in chapter V in the service of full-color
image-based relighting using a reflectance field captured with a monochrome camera. Using colored illumi-
nation trades the spatial multiplexing of typical RGB cameras for temporal multiplexing, increasing the image
resolution for each color channel at the expense of requiring more photographs. The resulting color images will
113
be similar if the emission spectra of the narrow band light sources (modulated by the monochrome camera’s
spectral response curve) are similar to those of the broad-spectrum white light source modulated by a color
camera’s spectral response curves. LED light sources have been used for this technique, though their emission
spectra are often narrower than typical camera spectral sensitivity functions.
Although the lighting systems used for high-resolution facial scanning [98, 49, 147] have been comprised of
broad-spectrum white LEDs, other omnidirectional lighting rigs have been built using red, green, and blue LEDs
[32, 58]. Furthermore, several omnidirectional multispectral light stages have been built [11, 57, 79, 89, 39] as
described in chapter II and chapter III, incorporating RGB LEDs along with other LEDs of distinct spectra,
often with six or more spectral channels per light source. Such systems would enable multispectral image
acquisition, rather than just RGB. While these systems were not developed for facial scanning, they could
enable monochrome camera color imaging via time-multiplexed multispectral illumination.
The computational lighting-based facial scanning techniques already employ time-multiplexed illumination,
requiring at least eight images [98] of a subject under different lighting conditions for a complete scan. At first
glance, one might assume that using a monochrome camera and colored LEDs instead of a color camera and
white LEDs would require at least 3 times the originally required number of images, assuming three-channel
RGB output. Or, in the multispectral case, one might expect to need at least n times as many images, where n
is the number of desired spectral channels of the output. Such a large number of images would be impractical
to acquire for live subjects, where scan time should be minimized to reduce the negative effects of slight subject
motion across frames. However, in this chapter, we evaluate for each stage of the high-resolution facial scanning
process, whether or not full color imaging is necessary. From this analysis, we show how to efficiently capture
a full color (or multispectral) facial scan using monochrome cameras and colored LEDs, requiring only n addi-
tional images, where n is the desired number of added spectral channels used for producing the diffuse albedo
texture map. We also introduce a new multispectral optical flow approach that corrects for subject motion and
chromatic aberrations, based on the method of complementary flow [147].
114
Additionally, for the case where a system’s broad-spectrum white LEDs are already polarized for facial
scanning while its multispectral colored LEDs are not, we introduce an approach that we call multispectral po-
larization promotion, which allows us to hallucinate cross- and parallel-polarized images of a subject for each
spectral channel even when all but one are unpolarized. We compute the per-pixel amount of light reflected
specularly relative to the quantity of incident light for a given lighting condition using the polarized white LEDs
and polarization difference imaging. Since skin is a dielectric, the proportion of incident light reflected specu-
larly should mostly not depend on the spectrum of the incident illumination. Therefore, the specular reflection
image can be approximated as consistent across the other spectral channels, up to a scale factor accounting for
the monochrome camera’s potentially different sensitivities to each of the spectral channels or differing LED
intensities. With this information, for each spectral channel we can hallucinate its cross-polarized image of
sub-surface scattered light by subtracting the hallucinated per-pixel specular reflection image from a captured
photograph of the subject under an unpolarized lighting condition for that incident spectrum.
In summary, our contributions are:
(i). We demonstrate that full-color or multispectral high-resolution facial scans using state-of-the-art tech-
niques [98, 49, 147] can be efficiently acquired using monochrome cameras and colored LEDs instead of
color cameras and broad-spectrum white LEDs, thereby increasing image resolution and reducing light
requirements.
(ii). We show that such a scan can be acquired by adding only n photographs to the normal scan sequence,
where n is the number of added spectral channels.
(iii). We introduce a new multispectral complementary flow approach to align images captured under illumi-
nants of different spectra.
115
(iv). We show that polarized light sources are only required for one spectral channel of the illumination sys-
tem, and for the remaining spectral channels we can promote unpolarized images to cross- and parallel-
polarized as required.
VI.3 Analysis and Equations
In this section, we analyze which parts of a high-resolution facial scanning pipeline can be achieved with
monochrome cameras alone, and which parts require the addition of multispectral LEDs to obtain color in-
formation. In our analysis, we consider a monochrome camera extension to the scanning approach of Ghosh et
al. [49], which incorporates the gradient illumination patterns of Ma et al. [98] and the optical flow techniques of
Wilson et al. [147], but with a spherical polarization scheme that enables diffuse-specular reflection separation
across multiple camera viewpoints. We also justify our technique of multispectral polarization promotion and
introduce a method for multispectral optical flow.
VI.3.1 Monochrome Multiview Stereo
Initially, a low-resolution 3D reconstruction of the face is generated using passive multiview stereo. Images of
the subject are captured under a single even, diffuse lighting condition from a variety of viewpoints. Multiview
stereo approaches do not require RGB images; they even operate more efficiently when using only intensity
information (RGB data converted to gray-scale), extracting cross-view correspondences on one third of the
input data. As noted by Fyffe et al. [44], skin texture cues are highly visible under shorter wavelength blue light,
which is absorbed by spatially-varying skin pigmentation (see Fig. VI.3). Thus, color images are trivially not
required for multiview stereo, and blue illumination with monochrome cameras will improve stereo matching.
116
VI.3.2 Monochrome Specular Normals
Ma et al. [98] described eight polarized, spherical gradient illumination conditions used to infer surface nor-
mals, for surfaces that primarily reflect light diffusely (Lambertian) and specularly. They introduced "specular
normal maps," showing that polarization difference imaging combined with gradient illumination conditions
could yield geometry with resolution comparable to that achieved with laser scanning. Cross-polarized images
were subtracted from parallel-polarized images, producing a specular reflection image for each gradient lighting
condition, from which the specular normals were derived.
Since skin is a dielectric, specularly-reflected light is mostly of the same spectrum as the incident illumina-
tion. Fresnel’s equations describe the amount of light incident to an interface that is reflected versus refracted,
depending on the indices of refraction of the interface materials. For typical dielectric materials, the index of
refraction has some dependency on wavelength, but this dependency is very slight across the visible spectrum,
and computer graphics practitioners commonly assume that reflectivity is not a function of wavelength. This is a
reasonable assumption for dielectric materials. Accordingly, the specular reflection image of a face produced via
polarization difference imaging is largely "colorless" (see Fig. VI.2). Therefore, we do not need color cameras
to compute "specular normals." This makes sense intuitively, since specular normals encode geometry rather
than color. Therefore, to compute specular normals, we could theoretically generate the eight polarized gradient
illumination conditions using any illuminant to which a monochrome camera has some sensitivity.
So far, both 3D reconstruction steps of a high-resolution facial scan, the multiview stereo coarse recon-
struction followed by the specular reflection based surface normal computation, do not require color images.
If we only wanted to measure high-resolution facial geometry without computing the subject’s diffuse albedo
texture map, then a monochrome camera scan would require exactly the same number time-multiplexed lighting
conditions as a color camera scan.
117
a. cross-polarized b. (parallel - cross)
Figure VI.2: a. Cross-polarized color photograph of a subject under an even sphere of white light (sRGB). b. Specular
reflection image computed via full color polarization difference imaging. The image has been white-balanced to the color
of the white light source. Specular reflections are largely colorless.
VI.3.3 Diffuse Reflectance
However, to render a subject’s digital double in color, artists require a colored texture map of the subject’s diffuse
reflectance, which is approximated using a view-dependent synthesis of color images of the subject illuminated
by a full sphere of cross-polarized white light. If our light stage includes polarized colored LEDs in the same
polarization arrangement as the white LEDs of Ghosh et al. [49], then we could capture a cross-polarized image
of the subject under a full even sphere of illumination for each of the available spectral channels, producing
the images required to generate a multispectral diffuse texture map of the subject. This multispectral diffuse
texture map could then be used to generate an RGB texture map of the subject’s appearance under a particular
illuminant (see subsection: Color Channel Mixing, VI.3.6).
Note that up to this point, despite capturing the at least eight spherical gradient illumination images with
a monochrome camera, the only additional images that must be captured for the diffuse reflectance are those
of a subject under a full sphere of cross-polarized illumination for each added spectral channel. For n added
118
spectra (excluding the white LED, for which we already have the full sphere cross-polarized condition), we have
only added n images to the scan process. Importantly, we do not need to capture the full polarized gradient
illumination sequence for each spectral channel (or even with just RGB LEDs) to obtain a high-quality scan
complete with the subject’s color or multispectral diffuse albedo.
VI.3.4 Multispectral Polarization Promotion
However, polarizing all the colored LEDs of a lighting rig not only adds complexity, but it also absorbs over half
of the light emitted by the LEDs. Since colored LEDs are often used for applications like live-action compositing
with lighting reproduction [32, 58, 89] (as in chapter III), where video-rate recording demands short exposure
times, halving the light output of the colored LEDs is undesirable. As an alternative, we develop a technique that
we call multispectral polarization promotion in which we hallucinate cross-polarized images for each spectral
channel from unpolarized lighting images, so that we can generate a multispectral diffuse albedo texture map
of the subject. Our process requires that only one of the spectral channels in the lighting rig is polarized in the
pattern of Ghosh et al. [49]. We note that this process is different from the spectral promotion of chapter V,
where our goal was to propagate spectral information across lighting directions rather than polarization states.
For clarity, we extend the variable naming conventions of Ma et al. [98]. We define a spherical gradient
illumination image of the subject L
l;i;s
, where l describes the gradient condition, i describes the polarization
state (one of cross or parallel), and s defines the index of the spectrum of illumination, ranging from 0 to n 1
where n is the number of spectral channels in the lighting rig, and 0 represents the white LED. The gradient
illumination images required [98] are therefore:
L
x;c;0
, cross-polarized, x gradient
L
y;c;0
, cross-polarized, y gradient
L
z;c;0
, cross-polarized, z gradient
L
f;c;0
, cross-polarized, full sphere
L
x;p;0
, parallel-polarized, x gradient
L
y;p;0
, parallel-polarized, y gradient
119
L
z;p;0
, parallel-polarized, z gradient
L
f;p;0
, parallel-polarized, full sphere
When linear polarizers over the light sources are oriented perpendicularly to the those in front of the camera,
the polarizer will block all of the specularly-reflected light and about half of the diffusely reflected light, such
that L
l;c;s
=
1
2
D
l;s
, representing an image of the diffuse or sub-surface scattered reflections. When the polarizer
in front of the camera is parallel, the polarizer will block about half of the diffusely reflected light, and none
of the specularly-reflected, such that L
l;p;s
=
1
2
D
l;s
+ S
l;s
. Therefore, for each gradient lighting condition l and
spectrum s, the specular reflection image S
l;s
is produced via polarization differencing:
S
l;s
= L
l;p;s
L
l;c;s
(VI.1)
Using a monochrome spectral camera model, a pixel value p
s; j
of a material j lit by spectrum s is produced by
integrating a fully-spectral modulation of the scene illuminant I
s
(l) by the reflectance spectrum of the material
R
j
(l) and the monochrome camera’s spectral sensitivity function C(l):
p
s; j
=
Z
700
400
I
s
(l)R
j
(l)C(l) (VI.2)
We again assume that light reflected specularly from the skin preserves both the polarization and spectrum
of the incident source. This assumption implies for an image pixel representing specular reflection, that the
reflectance spectrum R
j
(l) of Eq. VI.2 is a constant value over the visible wavelength range. This value
represents the per-pixel reflectivity or specular albedo (r
spec
) of the surface, modulated by a per-pixel constant
scale factor F
l
that only depends on the geometry of the illumination relative to the geometry of the surface. The
intuition behind the constant F
l
is that a different amount of light will be reflected specularly towards the camera
120
for a pixel depending on the incident illumination condition l and the pixel’s surface normal. Both constants can
be pulled out from the integral, and the pixel values of the specular reflection image S
l;s
are computed as:
S
l;s
=(r
spec
F
l
)
Z
700
400
I
s
(l)C(l) (VI.3)
In Eq. VI.3, the integral represents the intensity of I
s
(l) as observed by the monochrome camera with
spectral response C(l). We call this quantity W
s
:
W
s
=
Z
700
400
I
s
(l)C(l) (VI.4)
W
s
can be directly measured as a calibration step by photographing a reflective white spectralon disk or the
white square of a color chart as lit by each spectrum of illumination s (scaled up to represent the true reflectance
of these calibration targets). No spectral measurements are required. By substitution, we can write that the
specular reflection image S
l;s
is a scaled multiple of the incident light intensity, depending on the per-pixel
specular albedo and per-pixel geometric factor: S
l;s
=(r
spec
F
l
)W
s
. Or, by rearranging:
(r
spec
F
l
)=
S
l;s
W
s
(VI.5)
We can equate these ratios across spectral channels for a given gradient illumination condition l. Without
loss of generality, we can compare the white LED with another spectrum s:
S
l;0
W
0
=
S
l;s
W
s
(VI.6)
We assume that with our lighting rig we are able to capture cross- and parallel-polarized images L
f;c;0
and
L
f;p;0
for the white LED for the full sphere lighting condition f , producing S
f;0
using Eq. VI.1. After measuring
121
W
s
for each spectral channel, we therefore solve for S
f;s
for each spectral channel, by substitution into Eq.
VI.6. The intuition behind this step is again that the amount of light reflected specularly does not depend on the
incident spectrum, but rather depends only on the relative intensity of the different spectral channels as observed
by the camera.
However, the specular reflection images S
f;s
are not sufficient. For the texture maps cross-polarized images
L
f;c;s
are required for each spectral channel (or, equivalently, D
f;s
). So, using the unpolarized multispectral
LEDs of the lighting rig, we capture the unpolarized ("mixed polarization") image M
f;s
for each spectrum s. An
unpolarized lighting image M
l;s
for lighting condition l can be approximated as the sum of cross- and parallel-
polarized images:
M
l;s
= L
l;p;s
+ L
l;c;s
(VI.7)
Or equivalently, by substitution:
M
l;s
= D
l;s
+ S
l;s
(VI.8)
Since we capture images M
f;s
and estimate S
f;s
for each spectral channel, we can compute D
f;s
or equiv-
alently L
f;c;s
. The multispectral set of hallucinated images D
f;s
provide the diffuse albedo maps required for
rendering, after RGB images are formed via color channel mixing. Again, for n added spectra, we have only
added n unpolarized multispectral images to the scan process.
With polarization promotion, we have effectively hallucinated cross- and parallel-polarized images for all
spectral channels using only the polarized lighting conditions of one spectral channel and the corresponding un-
polarized lighting conditions of the others. Theoretically the polarized spectral channel could be any – polarizing
the white LED channel is not a requirement of our approach. However, since the index of refraction has some
slight wavelength dependence, comparing specular images under the broad-spectrum white LED with those of
the other spectra is advisable to minimize errors caused by the assumption of spectrum-preserving reflections.
122
VI.3.5 Monochrome Diffuse Normals
Ma et al. [98] introduced "diffuse normals," which could be used in a "hybrid normal" shader to simulate the
effects of sub-surface scattering in a real-time rendering application. In our multispectral polarization promotion
approach, we only photograph diffuse gradient lighting conditions for a single monochrome image channel for
the white LED, so we generate single-channel diffuse normals rather than typical RGB diffuse normals. In our
results section, we visualize the colorized spherical gradient illumination images by considering that the ratio of
a cross-polarized gradient condition to a full sphere cross-polarized condition should be approximately the same
across spectral channels. After photographing L
l;c;0
, L
f;c;0
, and hallucinating L
f;c;s
for each spectral channel, we
can approximately compute L
l;c;s
for lighting conditions l of gradients x;y;z:
L
l;c;s
L
f;c;s
L
l;c;0
L
f;c;0
(VI.9)
This is an approximation that does not consider the wavelength-dependent optical properties of the skin
demonstrated through diffuse normals computation.
VI.3.6 Color Channel Mixing
Once the diffuse or sub-surface scattered reflection images for each spectral channel have been photographed or
computed via polarization promotion, we generate an RGB image of the diffuse reflections using image-based
multispectral metameric reflectance matching as chapter III, extended to the monochrome imaging case as in
chapter V, Eqn. V .5. This process requires a color chart photograph captured in the desired environment; thus
the computed RGB diffuse albedo best represents the skin’s color as it would appear in that scene.
123
VI.3.7 Optical Flow
As in other high-resolution facial scanning approaches [147, 49, 98], temporal alignment between photographs
is a prerequisite for all computations requiring more than one image, i.e. estimation of diffuse and specular
normals, polarization promotion, and color channel mixing. For the monochrome imaging approach, we need to
flow across spectral channels, not only to account for potential movement between frames but also to correct for
chromatic aberrations. Formally, our optical flow approach must additionally align L
f;m;s=1::n
to L
f;m;0
, where
m indicates the mixed polarization condition. Below, we discuss the special case of adding spectral channels
comprised of the red, green, and blue LEDs (spectra in Fig. III.1).
The appearance of skin illuminated by red, green, and blue light is different owing to the wavelength-
dependent effects of sub-surface scattering [50, 70]. When skin is illuminated by a broad-spectrum light source,
shallow sub-surface scattered light appears blueish in color, while deeper scattered light appears reddish in color
from this wavelength-dependent scattering and light absorption by the skin’s chromophores. For the narrow-
band LED illumination, the image under the red LED exhibits less distinct skin texture and a more diffused,
soft appearance, in contrast with the image under the blue LED with a great deal of high frequency detail,
predominantly from short-wavelength light absorption by epidermal melanin. The image under the green LED
is similar to blue, but slightly "softer" (see Fig. VI.3).
a. white b. red c. green d. blue
Figure VI.3: Inset of facial detail photographed by monochrome camera under different incident illumination spectra
(RBGW), with spectra in Fig. III.1. Images have been scaled to the same relative brightness for display.
124
To flow from an image of a subject illuminated by one spectrum to that of a different spectrum, we can
naïvely assume that these images are the same, modulated only by an overall average scale factor x
s
that accounts
for the differing fully-spectral modulation of the subject’s average spectral reflectance by the differing incident
LED spectra s and the monochrome camera’s spectral sensitivity. Formally, the assumption is that we can
compute x
s
such that L
f;m;0
x
s
L
f;m;s
for s= 1::n, approximately satisfying the brightness constancy constraint,
so that x
s
L
f;m;s
may be flowed to L
f;m;0
. This naive assumption ignores spatially-varying skin spectral reflectance
and the effects of sub-surface scattering.
Wilson et al. [147] defined an iterative optical flow solution to align a pair of complementary images that
when added together produced a third target image. The method flowed cross- and parallel-polarized images
to mixed polarization images, and flowed spherical gradient illumination images and their inverse counterparts
towards images of a subject lit by a full-on, even sphere of illumination. We extend the complementary flow
of Wilson et al. to the multispectral domain, increasing the accuracy of the brightness constancy assumption
by combining images across spectral channels. Our key observation is that some linear combination of aligned
multispectral images will more closely match the target image, L
f;m;0
, as compared with a scaled version of each
aligned image alone. Fig. VI.4 demonstrates this effect, where the absolute value of the pixel error is lowest
for the linear combination of red, green, and blue images when trying to match the per-pixel values of the white
image.
Inspired by the metameric reflectance matching expression (Eq. V .5 of chapter V) and complementary
optical flow [147], we define a least squares procedure to incrementally align images of the same subject captured
as illuminated by different spectral channels. For a set of two or more unaligned images, we compute the
amounts x
s
of each image L
f;m;s
for s= 1::n, that, when all added together, best produce the target image L
f;m;0
,
as in Eqn. VI.10.
argmin
x0
(jjL
f;m;0
n
å
s=1
x
s
L
f;m;s
jj
2
) (VI.10)
125
a.jx
1
RWj b.jx
2
GWj c.jx
3
BWj
d.jx
4
R+ x
5
BWj e.jx
6
R+ x
7
G+ x
8
BWj
Figure VI.4: Absolute difference e in pixel values when approximating the image for white LED, using a. the red LED
e
r
= 0:00971, b. the green LED e
g
= 0:00816, c. the blue LED e
b
= 0:01136, d. the red and blue LEDs e
rb
= 0:00684,
and e. the red, green, and blue LEDs e
rgb
= 0:00561 respectively.
126
We scale the unaligned images by these amounts x
s
. Then, as in Wilson et al. [147], we initialize the flow
fields for each scaled image to 0, indicating no motion, and iteratively update flow estimates for each spectral
channel. During the first step of one iteration, the flow field for an unaligned image is estimated by assuming
that the other unaligned images’ flow fields are constant. In the the next step of the iteration, we estimate the
flow field for a different unaligned image, while the flow fields for other images including the scaled aligned
image of the previous step are assumed constant.
To find x
s
values, we could use linearly independent samples from a color chart lit by each incident spectrum,
or pixel values sampled from the actual images after low pass filtering to account for motion, as they are initially
unaligned. In practice, we sample pixel values from filtered images and use a non-negative least squares solver.
However, the solver may suggest x
s
= 0 for some spectrum s, which means that the corresponding image will
never be aligned during the complementary flow. A weight of x
s
= 0 in the solve means that adding this spectrum
does not further help to minimize color error, which implies that its spectral contribution is either redundant
or not useful due to a lack of spectral overlap with the target illuminant. To handle redundancy, for each
source image with x
s
= 0 in the initial solve, we can subsequently find the set of already aligned images that,
when combined, best match it in a least squares sense. Then, we can compute simple optical flow to align the
source image to the already aligned linear combination target. We note that this is different from the iterative
complementary step, as the color mixing solve endeavors to best match an unaligned source image L
f;m;s
rather
than the target image L
f;m;0
, and furthermore this step does not use the complementation constraint. This final
step ensures that every image is ultimately aligned to the target.
To further improve the robustness of our multispectral optical flow, we note that negative pixels may be pro-
duced when linearly combining images across spectral channels, even when using a non-negative least squares
solve. As an example, during one iteration, say we solve for the amount of a red image R and a green image
G that best produce the target white image W. Then, the flow step would alternate between aligning the source
image R to the target image W G and the source image G to the target image W R, possibly producing some
127
negative pixel values. Rather than clamping the images to zero, we find the minimum and maximum pixel values
of both the source and target images, and remap both to the same pixel value range. This changes the relative
intensity of both images, so we further normalize each remapped image by dividing each by a highly smoothed,
Gaussian-blurred version of itself.
VI.4 Results and Discussion
VI.4.1 Monochrome Camera Facial Scan
First, we show sample images from a monochrome camera facial scan captured using four spectral channels:
red, green, blue and broad-spectrum white (RGBW, spectra in Fig. III.1). The lighting rig used for this facial
scan only has polarizing filters for the white LEDs, so we employ our polarization promotion technique and
multispectral optical flow. In Fig. VI.12, we show input monochrome images and the full-color cross- and
parallel-polarized images that can be produced in our pipeline. In Fig. VI.5 we show a side-by-side comparison
of a flash-lit photograph of the subject acquired with a Canon 1DX DSLR camera with a rendering of the subject
produced using our monochrome imaging pipeline. For the facial scan, we used 14 monochrome Ximea xiQ
MQ042MG-CM machine vision cameras, each fitted with a 50mm Fujinon lens and linear polarizer. For the
rendering, we used a custom alSurface skin shader and the Arnold global illumination ray-tracer. We tried
to match the camera and lighting positions, although in this case the flash-lit photograph of the subject was
acquired many days apart from her facial scan. Nonetheless, the subject’s likeness has clearly been captured,
and high resolution facial details are produced along with the color texture map required for rendering. (Note,
for the renderings in Fig. VI.5, in keeping with the state-of-the-art, image-based skin microgeometry could be
added as in Graham et al. [55] to improve the appearance of specular reflections, and vellus facial hair or "peach
fuzz" could be added to further improve realism as in LeGendre et al. [84].) In Fig. VI.6, we show a region
of the subject’s cheek for the single-channel diffuse normal, specular normal, and diffuse albedo texture maps
128
generated from the scan images of Fig. VI.12. Although our technique produces only a single-channel diffuse
normal, the smoother appearance of the diffuse normal map is observed compared with the specular normal map
as expected.
a. photograph of flash-lit subject b. rendering with diffuse texture c. rendering without diffuse texture
Figure VI.5: a. Color photograph of a female subject under a flash-lit condition. b. Rendering of the same female
subject from the monochrome scan, photographs in Fig. VI.12. Note that the scan of the subject and her photograph were
completed several days apart, with different cameras. c. Rendering to show captured geometry without color detail.
VI.4.2 Comparison with Color Imaging
To demonstrate the improvement in the resolution of geometric "specular normals" when using a monochrome
camera as compared with a color camera, we photographed a subject using two cameras with the same sensor,
but one color and one monochrome. The two machine vision cameras, a Ximea xiC MC124CG-SY (color) and
a Ximea xiC MC124MG-SY (monochrome) both use the Sony IMX253 sensor. They were placed immediately
adjacent to one another, though not on the exact same optical axis. With both cameras, we photographed a subject
129
a. diffuse normals b. specular normals c. diffuse albedo
Figure VI.6: a. Diffuse normal map for a crop of the cheek region. b. Specular normal map for the same region. c.
Corresponding diffuse albedo texture map. Each were generated using our monochrome facial scanning pipeline with
multispectral polarization promotion.
illuminated by polarized spherical gradient illumination conditions in the lighting rig, aligned the images with
optical flow, and then computed the "specular normals" of Ma et al. [98]. We show the results in Fig. VI.7.
In the top row, we compare the specular normals obtained using a monochrome camera with those obtained
using a color camera, using the adaptive homogeneity-directed demosaicing algorithm [61] for obtaining color
images. In the bottom row, we compare the specular normals obtained using a monochrome camera with those
obtained using a color camera and a simple linear interpolation demosaicing algorithm. In both cases, the
specular normals computed with the monochrome camera images are sharper and show a greater level of detail.
VI.4.3 Multispectral Polarization Promotion
Next, to validate our new technique for hallucinating cross- and parallel-polarized multispectral images when
only one spectral channel is polarized, we photographed a subject in a different lighting rig where all four
spectral channels (red, green, blue, and broad-spectrum white) are polarized. We were therefore able to generate
"ground truth" polarization difference images for each spectral channel. For this experiment, we used a color
Ximea xiC MC124CG-SY camera, fitted with a linear polarizer and 50 mm Schneider lens (though only a
130
a. monochrome camera b. color camera (AHDD)
c. monochrome camera d. color camera (linear demosaicing)
Figure VI.7: a. Specular normals of a subject’s cheek, computed using polarized gradient illumination with a monochrome
camera, and b. with a color camera, using the adaptive homogeneity-directed demosaicing algorithm (AHDD) [61] to
obtain color images. c. Specular normals of a subject’s lip region, again computed using polarized gradient illumination
with a monochrome camera, and d. with a color camera, using simple linear Bayer pattern interpolation to obtain color
images. Even when using a color camera in combination with a sophisticated demosaicing algorithm, photography with
monochrome cameras yields higher resolution geometry, as observed via the sharpness of skin details in a and c. Best
viewed on a monitor.
131
monochrome camera is required). We photographed the subject under each spectral channel for the cross and
parallel polarization states, producing the color polarization difference images in the top row of Fig. VI.8. Next,
we simulated a monochrome camera response for the polarization difference images, converting the color images
to grayscale using Y = 0:2126R+0:7152G+0:0722B. We calibrated the relative LED intensities across spectral
channels as observed by the camera, photographing a color chart and measuring the pixel values of the white
square, and we then scaled the monochrome polarization differences for each spectral channel according to this
calibration. These scaled images are shown in the middle row of Fig. VI.8. They all appear to be visually the
same, validating the assumption that the amount of light reflected specularly from the skin does not depend on
the incident illumination spectrum. Indeed, for incident illumination of pixel intensity 1.0, the average absolute
difference between the polarization difference images for the white LED as compared with the red, green, and
blue LEDs are 0:0160;0:0016, and 0:0012 respectively, sampled along a large section of the subject’s cheek
region. These absolute difference images are shown in the bottom row of Fig. VI.8.
VI.4.4 Multispectral Optical Flow
Next, to evaluate our optical flow technique, we aligned the red, green, and blue LED images of our scan subject,
in Fig. VI.1a, b, and e, to the white LED, Fig. VI.1f. For these images, we found that even naïvely flowing
a weighted version of each spectral channel directly to the white image worked well in practice and visually
aligned the images. We expect that naïve optical flow was successful for a number of reasons. First, the re-
flectance spectra of skin are relatively smooth, and the spectrum of our target white LED covered a large portion
of the visible wavelength range and overlapped with each of the other LED spectra (see Fig. III.1). Additionally,
the blue and green images revealed high frequency details required for optical flow approaches, allowing them
to be easily aligned to white after global intensity matching followed by local intensity normalization. The red
image lacked high frequency detail, showing a large texture-less region for the skin, a challenging input for most
optical flow techniques. However, if a pixel is misaligned in a texture-less region, the result can still be visually
132
red LED green LED blue LED white LED
Figure VI.8: Top row: Ground truth polarization difference images, computed by photographing a subject under cross-
and parallel-polarized lighting conditions for each spectrum. Middle row: Monochrome ground truth polarization differ-
ence images for each spectral channel, computed via RGB to grayscale conversion of the images of the top row. These
are then scaled based on the calibrated LED intensities as observed by the camera. These images are qualitatively similar
across spectral channels, indicating that the amount of light reflected specularly from the skin does not depend on the inci-
dent spectrum, justifying our technique of multispectral polarization promotion. Bottom row: Absolute difference across
the images of the middle row. From left to right:jWhite - Redj,jWhite - Greenj,jWhite - Bluej, and the trivially 0-valued
jWhite-Whitej. Quantitative average absolute errors are 0:016;0:0016;0:0012 for a large region of the subject’s cheek, for
red, green, and blue respectively as compared with the white LED. Errors are relative to incident light of intensity 1.0.
133
acceptable, since it will take on a nearby pixel’s nearly identical intensity. Finally, each image was captured
with a 12 millisecond exposure time, and therefore a complete facial scan took less than one quarter of a second.
With a short capture time and the subject trying not to move, motion across frames was very slight, on the order
of less than 3 pixels.
Though we found our complementary multispectral optical flow technique was not required for the input
images in Fig. VI.1, we captured an additional scan subject to demonstrate the utility of our approach for the
challenging case of photographing objects with diverse reflectance spectra. We painted a Styrofoam mannequin
head with different brightly colored paints (see Fig. VI.9a) and photographed it under slight rigid motion inside
the lighting rig with a Ximea monochrome xiQ MQ042MG-CM camera, using the same four spectral illumi-
nation conditions (red, green, blue and white LEDs), while adding amber and cyan LEDs (spectra also in Fig.
III.1). In Fig. VI.10, we compare results for four different multispectral optical flow approaches, where all four
use the same core optical flow algorithm, parameters, intensity normalization techniques, and non-negative least
squares solver. Fig. VI.10 visualizes the magnitude and direction of the computed flow fields as saturation and
hue respectively. As the mannequin only moves rigidly, the pixel values of these flow visualizations should be
smoothly varying within a spectral channel, indicating a similar direction of motion for all image pixels. To
demonstrate the appearance of a smoothly varying flow field, we trivially aligned a white LED image to a dif-
ferent white LED image (Fig. VI.9c), and visualized the flow result in Fig. VI.9b. The four multispectral optical
flow approaches with results in Fig. VI.10 are the following:
Naïve Flow: Separately flow a scaled version of each spectral channel to white.
Single-Image Incremental Flow: First, find the image that, when scaled, best matches white in a least
squares sense, and flow this scaled image to white. Then flow each subsequent image individually to a
linear combination of all previously aligned images, including white. (This uses metameric reflectance
matching but not complementary flow.)
134
Two-Image Complementary Flow: First, find the pair of images that, when combined, best match white
in a least squares sense, and apply two-image complementary flow to iteratively align both scaled images
to white. Then flow each subsequent image individually to a linear combination of all previously aligned
images, including white. (This uses metameric reflectance matching and two-image complementary flow.)
Multi-Image Complementary Flow: Find the linear combination of all of images that, when combined,
best match white in a least squares sense, and apply multi-image complementary flow to iteratively align
all images to white. If the initial solve does not use all spectral channels, flow any unused images indi-
vidually to a linear combination of all previously aligned images, including white. (This uses metameric
reflectance matching and multi-image complementary flow.)
a. mannequin b. visualized flow c. white LED
Figure VI.9: a. A color image of the painted mannequin head used for the optical flow experiments. b. The flow field
visualized by naïvely flowing one white LED image to another. Smoothly varying pixel colors indicate rigid motion.
Hue represents the direction of the per-pixel motion vector, and saturation represents its magnitude. c. The image of the
mannequin lit by the white LED, the target image for the flow calculations of Fig. VI.10.
Visually, our results demonstrate that the two-image complementary approach yields more smoothly-varying
flow fields for the rigidly moving mannequin than either the naïve or single-image incremental approach, indicat-
ing that our complementary multispectral optical flow approach can improve alignment for challenging subjects
135
with diverse reflectance spectra and considerable movement. Though our results are not perfect, as some non-
smoothly varying color changes are observable, there is clear improvement over the naïve approach. While the
residual color error when approximating white is smaller for the multi-image complementary approach, which
might help to better meet the brightness constancy constraint of optical flow, this comes at the cost of introducing
a larger number of "guessed" initial flow fields in the complementary flow framework. As a result, qualitatively,
we observe the best results for the two-image complementary approach, where fewer initial flow field guesses
are required.
For all experiments, we used the OpenCV 3.3 [19] implementation of TV-L1 optical flow [158], with
the following parameters: t = 0:25;l = 0:15;q = 0:3, nscales=5, warps=5, e = 0:01, innerIterations=30,
outerIterations=10, scaleStep=0:5, g = 0, and no median filtering. For images of size 2048 x 2048, we used
a Gaussian blurring kernel of width= 21 for both the metameric reflectance solve and the intensity normaliza-
tion.
VI.4.5 Color Mixing for Diffuse Albedo
Another benefit of our multispectral scanning approach is that we can approximate the color rendition properties
of different illuminants as observed by a particular camera when generating cross-polarized images and diffuse
albedo texture maps (see bottom row of Fig. VI.11), using Eqn. V .5. In our technique, the color rendition of
the scanning illumination condition (typically that of broad-spectrum white LEDs) is not baked into the diffuse
albedo. It is well known that simply applying a 3 3 color channel mixing matrix to an RGB image cannot
correct for spectral differences between illuminants. We therefore evaluate our technique’s ability to match the
color rendition properties of three different illuminants (daylight, fluorescent, and tungsten) as observed by a
Canon 1DX camera, compared with applying a color matrix to an image illuminated by white LEDs.
136
red LED amber LED green LED cyan LED blue LED
Input Images
Naïve Flow
Single-Image
Incremental Flow
Two-Image
Complementary Flow
Multi-Image
Complementary Flow
Figure VI.10: Top row: Input monochrome images for each spectral channel. Rows 2-5: Visualized flow fields for each
spectral channel, when aligning to the white LED image of Fig. VI.9c, for four different outlined flow techniques. Hue
represents direction of the per-pixel motion vector, while saturation represents magnitude. Images have been scaled 2.0x
for display. The Two-Image Complementary Flow technique yields the smoothest flow fields representing rigid motion,
best aligning the multispectral images for this challenging scan subject with diverse reflectance spectra.
137
With the same Canon 1DX camera, we photographed a color chart lit by white LEDs. We then computed the
best 3 3 color matrices to try to "correct" this image to match the color rendition properties of different real-
world illuminant images. We repeated this test for a different Ximea color camera as well. With a monochrome
Ximea camera, we then captured the appearance of the color chart lit by six LEDs of distinct spectra, WRGB plus
cyan (C) and amber (A), generating the N matrix of Eq. V .5. We solved for the multispectral image primaries for
each illuminant when using WRGBCA, WRGB, and RGB LEDs only. We compare the color rendition of each
approach in Fig. VI.11. The background squares represent the target colors under the real-world illuminants,
while the foreground dots represent the colors achieved by applying the color matrices (top two rows) or by
mixing spectral channels (bottom chart row). When the dots "disappear," it means we have achieved good color
rendition. The multispectral approach with six spectral channels visually and quantitatively out-performs the
color-matrix approach for daylight and incandescent light, especially for the important skin-colored square of
the color chart. Color rendition can be improved with multispectral monochrome imaging with six spectral
channels compared with simply applying a color matrix, which suggests that the skin’s wavelength-dependent
sub-surface scattering effects maybe better estimated using our approach, if an artist knows ahead of time the
color rendition properties of a virtual scene’s dominant illuminant. We show that generating texture maps with
WRGB alone with monochrome cameras does not out-perform the color matrix approach, however it performs
significantly better than RGB only.
VI.5 Future Work
In this chapter, we have used polarization difference imaging for diffuse-specular separation. It would be of in-
terest to evaluate other methods for this step as well, particularly an approach employing higher order spherical
harmonic lighting conditions [136] instead of polarizers. Ideally, polarizing filters could be avoided for all spec-
tral channels to maximize light output and further reduce exposure times. Furthermore, fully-spectral rendering
138
Daylight Tungsten Fluorescent
color matrix,
same camera
color matrix,
diff. camera WRAGCB WRGB RGB WRGB
Figure VI.11: Matching color rendition with a 3 3 color matrix applied to color images captured with white LED
illumination, as compared to color channel mixing (Eq. V .5) with multispectral monochrome imaging. Background squares
are pixel values sampled from a color chart illuminated by three real world illuminants, photographed by a Canon 1DX
camera. Foreground dots represent the closest achievable color rendition for each method. Row 1: 3 3 color matrix
from an color image under white LED lighting, photographed by the same camera as the target, and Row 2: by a different
camera from the target. Rows 3, 4, and 5: Monochrome imaging with six (WRAGCB), four (WRGB), and three (RGB)
spectral channels. Last row: Scan subject of Fig. VI.12, cross-polarized images produced from a multispectral basis with
Eq. V .5. Color matrices alone cannot perfectly color-correct, especially when the input and target images are captured with
different cameras. Quantitatively, multispectral imaging with monochrome cameras and six spectral channels improves
color rendition for tungsten and daylight as compared with applying a color matrix alone. All charts have been converted
to sRGB for display and are best viewed on a monitor.
139
l monochrome monochrome monochrome color color
L
l;p;s=0
L
l;c;s=0
S
l;s=0
L
l;p
L
l;c
Figure VI.12: From left to right: lighting conditions l, monochrome parallel-polarized images, monochrome cross-
polarized images, monochrome polarization difference images, colorized hallucinated parallel-polarized images, and col-
orized hallucinated cross-polarized images, for a female subject.
140
has received considerable attention in the computer graphics field in recent years, and several commercial global
illumination rendering software programs now include spectral information. It would be of theoretical interest to
evaluate how our multispectral texture maps could be used in spectral global illumination rendering, particularly
as the sub-surface scattering appearance for a given illuminant can be "baked" into our diffuse albedo, albeit for
the full sphere of incident illumination.
VI.6 Conclusion
In this chapter, we have demonstrated that high-resolution facial scanning with a light stage system can be
efficiently achieved using monochrome cameras and multispectral LEDs, with advantages over the usual setup
with color cameras and white LEDs. Only a few more images are required, equivalent to the number of added
spectral channels, and the increased camera sensitivity can be used to shorten the exposure times to speed
up the scan and/or reduce the light on the subject. In the case where only one spectral channel (e.g. white)
is polarized, we introduced an approach to hallucinate cross- and parallel-polarized images for the remaining
channels, as required for diffuse albedo texture map generation. We also introduced a novel multispectral optical
flow technique based on complementary flow, enabling our multispectral 3D scanning technique to be applied
to live subjects with diverse reflectance spectra.
141
Chapter VII
DeepLight: Lighting Estimation for Mobile Mixed Reality in
Unconstrained Environments
VII.1 Summary
In previous chapters, HDR image-based lighting environments were required for forming realistic composites,
whether for lighting reproduction (chapter III), image-based relighting (chapter V), or global illumination based
rendering (chapter VI). In this chapter, we introduce a technique to estimate such a lighting environment when
radiometric measurements are not available, as would typically be the case for unconstrained mobile mixed
reality. Specifically, we present a learning-based method to infer plausible HDR, omnidirectional illumination
given an unconstrained, low dynamic range (LDR) image from a mobile phone camera with a limited field-
of-view (FOV). For training data, we collect videos of various reflective spheres placed within the camera’s
FOV , leaving most of the background unoccluded, leveraging that materials with diverse reflectance functions
reveal different lighting cues in a single exposure. We train a deep neural network to regress from the LDR
background image to HDR lighting by matching the LDR ground truth sphere images to those rendered with the
predicted illumination using image-based relighting, which is differentiable. Our inference runs at interactive
frame rates on a mobile device, enabling realistic rendering of virtual objects into real scenes for mobile mixed
142
reality. Training on automatically exposed and white-balanced videos, we improve the realism of rendered
objects compared to the state-of-the art methods for both indoor and outdoor scenes.
VII.2 Introduction
a. training data b. input image c. output lighting d. rendered object e. real object f. rendered object g. real object
Figure VII.1: Given an arbitrary low dynamic range (LDR) input image captured with a mobile device b., our method
produces omnidirectional high dynamic range lighting (c. lower) useful for rendering and compositing virtual objects into
the scene. We train a CNN with LDR images a. containing three reflective spheres, each revealing different lighting cues
in a single exposure. d. and f. show renderings produced using our lighting, closely matching photographs of real 3D
printed and painted objects in the same scene (e., g.).
Compositing rendered virtual objects into photographs or videos is a fundamental technique for mixed or
augmented reality, borrowed from the world of visual effects and film production, as previously described in
chapter I. Ultimately, the realism of a composite depends on both geometric and lighting related factors. An
object "floating in space" rather than placed on a surface will immediately appear fake; similarly, a rendered
object that is too bright, too dark, or lit from a direction inconsistent with other objects in the scene can be just
as unconvincing. In this chapter, we propose a method to estimate plausible illumination from mobile phone
images or video to convincingly light synthetic 3D objects for real-time compositing.
Estimating scene illumination from a single photograph with low dynamic range (LDR) and a limited field of
view (FOV) is a challenging, under-constrained problem. One reason is that an object’s appearance in an image
is the result of the light arriving from the full sphere of directions around the object, including from directions
143
outside the camera’s FOV . However, in a typical mobile phone video, only 6% of the panoramic scene is observed
by the camera (see Fig. VII.2). Furthermore, even light sources within the FOV will likely be too bright to be
measured properly in a single exposure if the rest of the scene is well-exposed, saturating the image sensor due
to limited dynamic range and thus yielding an incomplete record of relative scene radiance. To measure this
missing information, Debevec [27] merged omnidirectional photographs captured with different exposure times
and lit synthetic objects with these high dynamic range (HDR) panoramas using global illumination rendering.
But in the absence of such measurements, professional lighting artists often create convincing illumination by
reasoning on cues like shading, geometry, and context, suggesting that a background image alone may provide
sufficient information for plausible lighting estimation.
Figure VII.2: The field of view (FOV) of mobile phone video (inset shown in full color), relative to the 360
environment.
As with other challenging visual reasoning tasks, convolutional neural networks (CNNs) comprise the state-
of-the-art techniques for lighting estimation from a limited-FOV , LDR image, for both indoor [46] and outdoor
[62] scenes. Naïvely, many pairs of background images and lighting (HDR panoramas) would be required for
training; however, capturing HDR panoramas is complex and time-consuming, so no such dataset exists for both
scene types. For indoor scenes, Gardner et al. [46] first trained a network with many LDR panoramas [152], and
then fine-tuned it with 2100 captured HDR panoramas. For outdoor scenes, Hold-Geoffroy et al. [62] fit a sky
144
model to LDR panoramas for training data. We also use a CNN, but our model generalizes to both indoor and
outdoor scenes and requires no HDR imagery.
In this work, our training data is captured as LDR images with three spheres held within the bottom portion
of the camera’s FOV (Fig. VII.3), each with a different material that reveals different cues about the scene’s
ground truth illumination. For instance, a mirrored sphere reflects omnidirectional, high-frequency lighting, but,
in a single exposure, bright light source reflections usually saturate the sensor so their intensity and color are
misrepresented. A diffuse gray sphere, in contrast, reflects blurred, low-frequency lighting, but captures a rela-
tively complete record of the total light in the scene and its general directionality. We regress from the portion
of the image unoccluded by the spheres to the HDR lighting, training the network by minimizing the difference
between the LDR ground truth sphere images and their appearances rendered with the estimated lighting. We
first measure each sphere’s reflectance field as in Debevec et al. [30]. Then, during training, we render the
spheres with the estimated HDR lighting using image-based relighting [30, 104], which is differentiable. Fur-
thermore, we add an adversarial loss term to improve recovery of plausible high-frequency illumination. As only
one exposure comprises each training example, we can capture videos of real-world scenes, which increases the
volume of training data and gives a prior on the automatic exposure and white-balance of the camera.
For a public benchmark, we collect 200k new images in indoor and outdoor scenes, each containing the
three different reflective spheres. We show on a random subset that our method out-performs the state-of-the-art
lighting estimation techniques for both indoor and outdoor scenes for mobile phone imagery, as our inferred
lighting more accurately renders synthetic objects. Furthermore, our network runs at interactive frames rates
on a mobile device, and, when used in combination with real-time rendering techniques, enables more realistic
mobile mixed reality composites.
In summary, our key contributions are:
(i). A data collection technique for training a lighting estimation algorithm.
145
(ii). A CNN-based method to predict plausible omnidirectional HDR illumination from a single unconstrained
image. To the best of our knowledge, ours is the first to generalize to both indoor and outdoor scenes.
(iii). A novel image-based relighting rendering loss function, used for training the HDR lighting inference
network using only LDR data.
VII.3 Method
Here we describe how we acquire our training data, our network architecture, and the loss functions of our
end-to-end lighting estimation method.
VII.3.1 Training Data Acquisition and Processing
Gardner et al. [46] fine-tuned a pre-trained network using 2100 HDR panoramas, fewer examples than would
be typically required for deep learning without pre-training. However, our key insight is that we can infer HDR
lighting from only LDR images with reference objects in the scene, provided they span a range of BRDFs
that reveal different lighting cues. Thus, we collect LDR images of indoor and outdoor scenes, where each
contains three spheres located in the bottom portion of the camera’s FOV , occluding as little of the background
as possible (Fig. VII.3, center). The three spheres are plastic holiday ornaments with diverse finishes that
differently modulate the incident illumination: mirrored silver, matte silver (rough specular), and diffuse gray
(spray-painted), measured as 82.7%, 64.4%, and 34.5% reflective respectively. We built a capture rig to fix the
sphere-to-phone distance, stabilizing the sphere positions in each image (see Fig. VII.3, left).
As we require only LDR input imagery, we collect portrait HD (1080 1920) video at 30 fps, rather than
static photographs. This increases the speed of training data acquisition compared with HDR panoramic photog-
raphy, enabling the capture of millions of images, albeit with significant redundancy for adjacent frames. The
146
Figure VII.3: Left. Capture apparatus. Center. Example frame. Right. Processed data (top: input; bottom:
ground truth).
videos feature automatic exposure and white balance, providing a prior to help disambiguate color, reflectance,
and illumination.
We locate the three spheres in each video frame by detecting circular boundaries in the optical flow field
between neighboring frames, though marker-based tracking could also be used. Our reference spheres seen in
the training data are designed to be held in a fixed position in the camera’s FOV , but in practice their position
wanders and jitters somewhat during data collection. To address this, we extract images of the three spheres for
each video frame as follows. We compute optical flow for each frame against the following frame, and compute
the gradient of the flow field. We modulate the gradient magnitude by 0:1
q
2
where q is the angle in radians
between the gradient vector and the direction to a manually marked sphere center (within 5 pixels of the actual
center). This produces a noisy image with bright rings around the spheres due to the difference in flow between
the spheres and the background. We find the highest confidence bright circles in this image using template
matching within a tolerance of the marked spheres and over a range of radii, and average the detected positions
and radii across all frames of each video to eliminate jitter. We re-sample cropped images of the spheres using
an idealized camera model oriented towards the sphere center with a view frustum tangent to the sphere on all
four sides to eliminate perspective distortion. For the background images, we remove the lower 20% of each
147
frame during both training and inference. The final training data consists of cropped background images, each
paired with a set of three cropped spheres, one per BRDF (Fig. VII.3, right).
ENCODER
MOBILE NET V2
ADVERSARIAL
DISCRIMINATOR
DECODER
REAL
OR
FAKE
BRDF BASIS RECONSTRUCTION LOSS (L 1 )
RESIZE
4X4
CONV
4X4
CONV
3X3 CONV
RESHAPE
HDR LIGHTING
BG
FC
RESIZE
3X3 CONV
RESIZE
3X3 CONV
32
32
32
16
128
256
135
192
64
16
16
64
16
16
8
8
256
4
4
128
8
8
4
4
160
5
6
32
32
... MB
... MS
... D
{ }
MB
MS
D
GT
GT
GT
=
4X4
CONV
FC
1024
Figure VII.4: Overview of our network. We regress to HDR lighting from an LDR, limited-FOV input image
captured with a mobile device. We include a multi-BRDF image-based relighting reconstruction loss for a
diffuse(D), matte silver(MS), and mirror ball(MB) and an adversarial loss for the mirror ball only. Only the part
outlined in red occurs at inference time.
VII.3.2 Network Architecture
The input to the model is an unconstrained LDR, gamma-encoded image captured with a mobile phone, resized
from the native cropped resolution of 1080 1536 to 135 192 and normalized to the range of[0:5;0:5]. Our
architecture is an encoder-decoder type, where the encoder includes fast depthwise-separable convolutions [66].
We use the first 17 MobileNetV2 [118] layers, processing the output feature maps with a fully-connected (FC)
layer to generate a latent vector of size 256. For the decoder, we reshape this vector and upsample thrice by a
factor of two to generate a 32 32 color image of HDR lighting. We regress to natural log space illumination
as the sun can be more than five orders of magnitude brighter than the sky [129]. Although we experimented
with fractionally-strided convolutions, bilinear upsampling with convolutions empirically improved our results.
We train the network to produce omnidirectional lighting in the mirror ball mapping [113], where each pixel in
image space represents an equal solid angular portion of a sphere for direction(q;f). Thus, the corners of the
148
output image are unused, but this mapping allows for equal consideration of all lighting directions in the loss
function, if desired. For network details, see Fig. VII.4.
VII.3.3 Reflectance Field Acquisition
As described in chapter V, Debevec et al. [30] introduced the 4D reflectance field R(q;f;u;v) to denote the
image of a subject with pixels (u;v) as lit from any lighting direction (q;f) and showed that taking the dot
product of the reflectance field with an HDR illumination map relights the subject to appear as they would
in that lighting. During training, we use this method to render spheres with the predicted HDR lighting. We
photograph reflectance fields for the matte silver and diffuse gray spheres using a computer-controllable sphere
of white LEDs [98], spaced 12
apart at the equator. This produces an anti-aliased reflectance field for the
diffuse and matte silver sphere, which has some specular roughness. However, this LED spacing aliases the
mirror BRDF. As we infer lighting in a mirror ball mapping, we instead construct the mirror ball basis as a set of
32 32 one-hot matrices of size 32 32, scaled by its measured reflectivity. We convert the lighting bases for
the other BRDFs to the same geometric and relative radiometric space. The photographed bases are normalized
based on the incident light source color and converted to the mirror ball mapping, accumulating energy from the
photographs for each new lighting direction i for the set of directions on the 32 32 mirror ball using a Phong
lobe (n= 64) and super-sampling with a 4 4 grid of directions on a sphere.
VII.3.4 Loss Function
To train the lighting prediction network, we minimize an image-based relighting loss and add an adversarial loss
to ensure inference of plausible high-frequency illumination.
Image-based relighting rendering loss: We train the network by minimizing the reconstruction loss between
the ground truth sphere images I and rendered spheres lit with the predicted HDR lighting. With the reflectance
149
fields R(q;f;u;v), pixel values for each sphere lit by each lighting direction(q;f) of the 3232 mirror ball, we
can compute a linear image
ˆ
I of each sphere under a novel lighting environment
ˆ
L as a linear combination of its
basis images. Slicing the reflectance field into individual pixels R
u;v
(q;f), we generate
ˆ
I
u;v
with (VII.1), where
L
i
(q;f) represents the color and intensity of light in the novel lighting environment for the direction(q;f):
ˆ
I
u;v
=
å
q;f
R
u;v
(q;f)L
i
(q;f): (VII.1)
The network outputs Q, a log space image of omnidirectional HDR lighting in the mirror ball mapping, with
pixel values Q
i
(q;f). Thus we render each sphere with (VII.2):
ˆ
I
u;v
=
å
q;f
R
u;v
(q;f)e
Q
i
(q;f)
: (VII.2)
The ground truth sphere images I are LDR, 8-bit, gamma-encoded images, possibly with clipped pixels.
Accordingly, we clip the rendered sphere images with a differentiable soft-clipping functionL, n= 40:
L(p)= 1
1
n
log
1+ e
n(p1)
: (VII.3)
We then gamma-encode the clipped linear renderings with g, to match I. We mask out the pixels in the
corners of each ball image with a binary mask
ˆ
M, producing the masked L
1
reconstruction loss L
rec
for BRDFs
b=[0;1;2], wherel
b
represents an optional weight for each BRDF:
L
rec
=
2
å
b=0
l
b
ˆ
M(L(
ˆ
I
b
)
1
g
L(I
b
))
1
: (VII.4)
Adversarial loss: Minimizing only E[L
rec
] produces blurred, low-frequency illumination. While this might
be acceptable for lighting diffuse objects, rendering shiny objects with realistic specular reflections requires
150
higher frequency lighting. Recent works in image inpainting and synthesis [107, 82, 156, 68, 155, 128] leverage
Generative Adversarial Networks [53] for increased image detail, adding an adversarial loss to promote multi-
modal outputs rather than a blurred mean of the distribution. We train our network in a similar framework
to render plausible clipped mirror ball images, of which we have many real examples. This is perceptually
motivated, as humans have difficulty reasoning about reflected light directions [109, 131], which digital artists
leverage when environment mapping [17] reflective objects with arbitrary images. Furthermore, real-world
lighting has been shown to possess a high degree of statistical regularity, despite its complexity [33].
Similar to Pathak et al. [107], we use an auxiliary discriminator network D with our base CNN as the
generator G. During training, G tries to trick D, producing clipped mirror ball images appearing as "real" as
possible. D tries to discriminate between real and generated images. We condition D on a few pixels from
the original image surrounding the ball: we sample the four corners of the cropped ground truth mirror ball
image, and bilinearly interpolate a 32 32 hallucinated background, as if the mirror ball were removed. We
then softclip and composite both the ground truth and predicted mirror ball onto this "clean plate" with alpha
blending (yielding I
c
,
ˆ
I
c
) providing D with local color cues and ensuring that samples from both sets have the
same perceptual discontinuity at the sphere boundary. Given input image x, G learns a mapping to Q, G : x! Q,
used to render a mirror ball with (VII.2). The adversarial loss term, then, is:
L
adv
= logD(L(I
c
))+ log(1 D(L(å
q;f
R(q;f)e
G(x;q;f)
)
1
g
)): (VII.5)
Joint objective: The full objective is therefore:
G
= argmin
G
max
D
(1l
rec
)E[L
adv
]+l
rec
E[L
rec
]: (VII.6)
151
VII.3.5 Implementation Details
We use TensorFlow [10] and train for 16 epochs using the ADAM [78] optimizer with b
1
= 0:9, b
2
= 0:999, a
learning rate of 0.00015 for G, and, as is common, one 100 lower for D, alternating between training D and
G. We setl
rec
= 0:999, withl
b
= 0:2;0:6;0:2 for the mirror, diffuse, and matte silver BRDFs respectively, and
use g = 2:2, as the camera’s video mode employs image-dependent tone-mapping. We use a batch size of 32
and batch normalization [69] for all layers but the last of G and D. We use ReLU6 activations for G and ELU
[25] for D. For data augmentation, we horizontally flip the input and ground truth images. We found that data
augmentation by modifying white balance and exposure did not improve results, perhaps since they simulated
unlikely camera responses.
Datasets: We collected 37.6 hours of training video using a Google Pixel XL mobile phone, in a variety of
indoor and outdoor locations, times of day, and weather conditions, generating 4.06 million training examples.
We bias the data towards imagery of surfaces or ground planes where one might want to place a virtual AR
object. For test data, we collected 116 new one-minute videos (211.7k frames) with the same camera and
separated them into four sets: unseen indoor and outdoor (UI, UO) and seen indoor and outdoor (SI, SO).
"Unseen" test videos were recorded in new locations, while the "seen" were new videos recorded in previously-
observed environments. We evaluate our method on the following videos: 28 UI (49.3k frames), 27 UO (49.7k
frames), 27 SI (49.9k frames), and 34 SO (62.7k frames).
VII.4 Evaluation
In this section, we evaluate the lighting inference quantitatively and qualitatively, assess the different loss terms,
and compare with the state-of-the-art for both indoor and outdoor scenes.
152
VII.4.1 Quantitative Results
Accurate lighting estimates should correctly render objects with arbitrary materials, so we measure lighting
accuracy first using L
rec
, comparing with ground truth LDR spheres. We show the average per-pixel L
1
loss for
each unseen test dataset for each material and the per-pixel linear RGB angular error q
rgb
for the diffuse ball, a
distance metric commonly used to evaluate white-balance algorithms (see Hordley and Finlayson [64]), in Table
VII.1 (top). (Minimizingq
rgb
during training did not improve results.) We show results for seen test sets in VII.2
(top).
Table VII.1: Average L
1
loss by BRDF: diffuse (d), mirror (m), and matte silver (s), and RGB angular error q
rgb
for
diffuse (columns), for our network trained with different loss terms (rows). We compare ground truth images with those
rendered using our HDR lighting inference, for unseen indoor (UI) and outdoor (UO) locations. Our full method is in the
top row.
L
1(d)
L
1(s)
L
1(m)
q
rgb(d)
Loss terms UI UO UI UO UI UO UI UO
L
1(m,d,s)
+ L
adv
0.12 0.13 0.13 0.13 0.17 0.16 9.8 10.8
L
1(m,d,s)
0.12 0.13 0.12 0.13 0.15 0.14 9.9 11.0
L
1(m)
0.20 0.18 0.16 0.15 0.14 0.13 11.0 13.5
L
1(s)
0.12 0.13 0.13 0.13 0.21 0.20 10.0 11.4
L
1(d)
0.12 0.13 0.15 0.15 0.28 0.27 10.0 11.2
Ablation studies: We assess the importance of the different loss terms, L
rec
for each BRDF and L
adv
, and
report L
rec
and q
rgb
for networks supervised using subsets of the loss terms in Table VII.1. Training with only
the mirror BRDF or only the diffuse BRDF leads to higher L
rec
for the others. However, training with only the
matte silver BRDF still yields low L
rec
for the diffuse sphere, suggesting they reveal similar lighting cues. In
Fig. VII.5, we show the ground truth images and renderings produced for each loss variant. Visually, training
with only the mirror ball L
1(m)
fails to recover the full dynamic range of lighting, as expected. Training with
153
Table VII.2: Average L
1
loss by BRDF: diffuse (d), mirror (m), and matte silver (s), and RGB angular error q
rgb
for
diffuse (columns), for our network trained with different loss terms (rows). We compare ground truth images with those
rendered using our HDR lighting inference, for seen indoor (SI) and outdoor (SO) locations. Our full method is in the top
row.
L
1(d)
L
1(s)
L
1(m)
q
rgb(d)
Loss terms SI SO SI SO SI SO SI SO
L
1(m,d,s)
+ L
adv
0.11 0.09 0.12 0.10 0.18 0.14 8.0 10.4
L
1(m,d,s)
0.11 0.08 0.12 0.10 0.15 0.12 8.0 10.6
L
1(m)
0.25 0.15 0.18 0.13 0.14 0.11 8.6 13.2
L
1(s)
0.14 0.10 0.12 0.10 0.23 0.18 7.9 11.3
L
1(d)
0.11 0.08 0.14 0.12 0.30 0.25 7.9 10.4
only the matte silver L
1(s)
or diffuse L
1(d)
fails to produce a realistic mirror ball; thus objects with sharp specular
reflections could not be plausibly rendered. Training with L
adv
yields higher frequency illumination as expected.
VII.4.2 Qualitative Results
Ground truth comparisons: In Fig. VII.6, we show examples of ground truth spheres compared with those
rendered using image-based relighting and our HDR lighting inference, for each BRDF. These examples corre-
spond to the 25
th
, 50
th
, and 75
th
percentiles for the L
rec
loss.
Virtual object relighting: We 3D-print two identical bunnies using the model from The Stanford 3D Scanning
Repository [5]. The two are coated with paints of measured reflectance: diffuse gray (34.5% reflective) and matte
silver (49.9% reflective), respectively. We photograph these "real" bunnies in different scenes using the Google
Pixel XL, also capturing a clean plate for lighting inference and virtual object compositing. In Fig. VII.8 we
compare the real bunny images (b, f) to off-line rendered composites using our lighting estimates (d, h). We
also record ground truth HDR lighting as in [27] using a Canon 5D Mark III, color correcting the raw linear
154
L
1(all)
input gt + L
adv
L
1(all)
L
1(m)
L
1(s)
L
1(d)
(d)
(s)
(m)
(d)
(s)
(m)
Figure VII.5: Ablation study: Unseen image inputs, ground truth, and rendered images of diffuse(d), matte silver(s) and
mirror(m) spheres, lit with HDR lighting inference from networks trained using different loss terms(top). Our full method
is labeled in bold.
HDR panorama so it matches the LDR phone image. We fit a linearization curve for each LDR input using a
color chart, however the phone’s image-dependent tone-mapping makes radiometric alignment challenging. We
compare renderings using the ground truth and predicted lighting in Fig. VII.8 (c, g).
The bunny renderings are generated using global illumination (GI) techniques and image-based lighting
(IBL) [27], using the Arnold renderer. For all images, we use Arnold’s aiStandardSurface shader for both
the diffuse and matte silver virtual bunnies, shader parameters in Tables VII.6 and VII.7 respectively. These
parameters were selected to best visually match the reflectance field basis images for spheres coated with the
same paint as the real, 3D printed bunnies. For these off-line renders, virtual objects were set on a virtual planar
surface manually placed by an artist to best match the background plate, after first setting the virtual camera’s
approximate FOV . We use the Arnold aiShadowMatte shader for the virtual planes, with parameters in Table
VII.8, which allows virtual objects to cast shadows onto them. Furthermore, this shader enables indirect diffuse
and specular shading of virtual objects using colors sampled from the background image at the location where
the placed virtual plane projects into the background image. Thus, the light bouncing off the virtual plane is
155
UI input (d) (s) (m) UO input (d) (s) (m)
25
th
%
gt
pred
50
th
%
gt
pred
75
th
%
gt
pred
Figure VII.6: Qualitative comparisons between ground truth spheres and renderings using our HDR lighting inference
and IBRL. Examples shown for 25
th
, 50
th
, and 75
th
percentiles for L
rec
.
tinted the color of the background image, which is then used to light the underside of any virtual object placed
on the surface.
VII.4.3 Comparisons with Previous Work
We retrain our network for the 3:4 aspect ratio input of the state-of-the-art methods for indoor [46] and outdoor
[62] scenes, cropping a 1080 810 landscape image from the center of each portrait input and resizing to
192144 to maintain our FC layer size. (Our comparison network thus observes half of the FOV of our standard
network.) Gardner et al. [46] host a server to predict HDR lighting given an input image; Hold-Geoffroy et al.
[62] also predict camera elevation. We randomly select 450 images from test sets UI and UO and retrieve their
lighting estimates as HDR panoramas, converting them to the 32 32 mirror ball mapping and rotating them
to camera space using the predicted camera elevation if given. We render spheres of each BRDF with IBRL
and compare with ground truth, showing the average L
1
loss for each BRDF and q
rgb
for the diffuse ball in
Table VII.3. We also show the relative error in total scene radiance measured by summing all diffuse sphere
156
linear pixel values
in Fig. VII.17. We show comparison sphere renderings in Fig. VII.7 and bunny renderings
in Fig. VII.8 (e, i), with more spheres in Figs. VII.9, VII.10, VII.11, and VII.12, and more bunny renderings
in VII.13, VII.14, VII.15, and VII.16. We show significant improvements compared to both approaches, while
requiring only one model that generalizes to both indoor and outdoor scenes. Without a specific sun and sky
model, our network also infers diverse light sources for outdoor scenes. However, we present these results with
two caveats: first, our training data are generated with a fixed FOV camera, which was varied and unknown for
previous approaches, and second, our training and test data are generated with the same camera. Nonetheless,
for mobile mixed-reality with a fixed-FOV , we show that optimizing for accurately rendered objects for multiple
BRDFs improves lighting estimation.
Table VII.3: Quantitative comparisons with the previous state-of-the-art in indoor[46] and outdoor[62] lighting estimation.
Average L
1
loss by BRDF: diffuse(d), mirror(m), and matte silver(s), and RGB angular error q
rgb
for the diffuse sphere.
n= 450 for each.
unseen indoor (UI) unseen outdoor (UO)
ours [46] ours [62]
L
1(d)
0:130:07 0:210:11 0:130:08 0:250:12
L
1(s)
0:140:05 0:220:06 0:140:06 0:250:07
L
1(m)
0:180:03 0:230:06 0:170:04 0:340:06
q
rgb(d)
10:38:8
11:97:2
11:210:9
14:36:6
Temporal consistency: We do not explicitly optimize for temporal consistency, but the adjacent video frames
in our training data provide an indirect form of temporal regularization. In Fig. VII.18 we compare rendered
results from four sequential frames for our approach and for that of Gardner et al. [46]. While we show
qualitative improvement, adding a temporal loss term is of interest for future work.
Scene radiance is modulated by the albedo and foreshortening factor of the diffuse sphere, with greater frontal support, and we use
g = 2:2.
157
UI input ours gt [46] UO input ours gt [62]
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
Figure VII.7: Ground truth and rendered spheres produced via IBRL using our predicted HDR lighting and that of the
previous state-of-the-art for indoor [46] and outdoor [62] scenes.
VII.4.4 Performance
Our inference runs at 12-20 fps on various mobile phone CPUs. We report speed and accuracy for networks
trained to predict lower resolution lighting (8 8 or 16 16) in Table VII.4. We report the lighting inference
time for our standard network on a variety of mobile devices in Table VII.5.
VII.5 Limitations and Future Work
Spatially-varying illumination: The reference spheres of the training data reflect the illumination from a
point 60 cm in front of the camera and do not reveal spatially-varying lighting cues. Virtual AR objects are often
placed on surfaces visible in the scene, and the light bouncing up from the surface should be the illumination
on the object coming from below. A potential improvement to our technique would be to replace the bottom
158
a. unseen input b. real object c. GT HDR IBL d. ours e. [46]/[62] f. real object g. GT HDR IBL h. ours i. [46]/[62]
indoor outdoor shade outdoor sun
Figure VII.8: For each input image a., we photograph a real 3D-printed bunny placed in the scene for two different BRDFs
(b., f.) and capture ground truth HDR panoramas at the bunny’s location. Using GI rendering with IBL, we render a virtual
bunny into the scene using ground truth lighting (c., g.), our lighting inference (d., h.), and that of the state-of-the-art
methods for indoor [46] or outdoor [62] scenes (e., i.).
159
Table VII.4: Average inference time on a NVIDIA Quadro K1200 GPU (N) and Google Pixel 3 XL mobile CPU (P)
and L
1
loss by BRDF: diffuse (d), mirror (m), and matte silver (s), and RGB angular error q
rgb
for diffuse (columns),
for variants of our network trained for different output lighting resolutions[n n] and size of latent representations. Our
baseline network is[32 32]-256.
L
1(d)
L
1(s)
L
1(m)
q
rgb(d)
Time (ms)
Network UI UO UI UO UI UO UI UO N P
Baseline 0.12 0.13 0.13 0.13 0.17 0.16 9.8 10.8 6.53 80.0
[16 16]-256 0.11 0.08 0.12 0.10 0.17 0.13 6.5 8.9 6.06 62.7
[16 16]-128 0.22 0.13 0.13 0.13 0.17 0.15 8.5 9.9 5.93 61.4
[8 8]-128 0.11 0.12 0.12 0.12 0.13 0.13 14.1 15.6 5.76 50.9
Table VII.5: Average inference time on various mobile devices / Qualcomm Snapdragon systems-on-a-chip (QS SoCs)
CPUs for our standard (baseline) lighting estimation network.
Mobile device QS SoC Time (ms) Rate (fps)
ASUS ROG 845 (2.96 GHz Kryo 385) 57.1 17.5
Samsung Galaxy S9+ 845 (2.8 GHz Kryo 385) 59.8 16.7
Google Pixel XL 821 (2.15 GHz Kryo)
y
57.8 17.3
Google Pixel 2 XL 835 (2.35 GHz Kryo 280) 102.1 9.8
Google Pixel 3 XL 845 (2.5 GHz Kryo 385) 80.0 12.5
directions of our lighting estimate with pixel values sampled from the scene surface below each object, allowing
objects placed in different parts of the scene to receive differently colored bounce light from their environments.
Using a different camera: Our test and training data are captured with the same camera. In Fig. VII.19 we
show results for two images captured using a different mobile phone camera (Apple iPhone 6). Qualitatively,
we observe differences in white balance, suggesting an avenue for future work. Similarly, our network is trained
for a particular camera FOV and may not generalize to others.
y
QS 821 has been shown to out-perform QS 835 for single-threaded floating point operations [4].
160
Challenging image content: Simple scenes lacking variation in surface normals and albedo (Fig. VII.20,
left) can challenge our inference approach, and scenes dominated by a strongly hued material can also pose a
challenge (Fig. VII.20, right). Adding knowledge of the camera’s exposure and white balance used for each
input image might improve the robustness of the inference.
Future work: During mobile mixed reality sessions, objects are positioned on planes detected using sensor
data fused with structure-from-motion [137]. Thus, computational resources are already devoted to geometric
reasoning, which would be of interest to leverage for improved mixed reality lighting estimation. Furthermore,
inertial measurements could be leveraged to continuously fuse and update lighting estimates as a user moves a
phone throughout an environment. Similarly, as our training data already includes temporal structure, explicitly
optimizing for temporal stability would be of interest. Lastly, one could increase generality by acquiring training
data in a raw video format and simulating different camera models during training.
VII.6 Conclusion
We have presented an HDR lighting inference method for mobile mixed reality, trained using only LDR imagery,
leveraging reference spheres with different materials to reveal different lighting cues in a single exposure. This
work is the first CNN-based approach to generalize to both indoor and outdoor scenes using a single image as
input, and for mobile mixed reality compares favorably to previous work developed to handle a single class of
lighting.
161
Table VII.6: Diffuse gray object aiStandardSurface shader parameters.
parameters values
base weight 0.700
base color RGB [0.649, 0.700, 0.700]
specular weight 0.300
specular roughness 0.600
IOR 1.7
Table VII.7: Matte silver object aiStandardSurface shader parameters.
parameters values
base weight 1.000
base color RGB [0.588, 0.600, 0.600]
metalness 1.000
specular roughness 0.470
Table VII.8: Virtual plane object aiShadowMatte shader parameters.
parameters values
use background image on
shadow color RGB [0.000, 0.000, 0.000]
shadow opacity 1.000
diffuse indirect on
diffuse use background image on
diffuse intensity 1.000
specular indirect on
specular intensity 0.200
specular roughness 0.200
IOR 1.5
162
UI input ours gt [46] UI input ours gt [46] UI input ours gt [46] UI input ours gt [46]
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
Figure VII.9: Additional qualitative comparisons between ground truth sphere images and renderings produced
with IBRL using our predicted HDR lighting and that of the previous state-of-the-art for unseen indoor [46]
scenes. Continued on next page.
163
UI input ours gt [46] UI input ours gt [46] UI input ours gt [46] UI input ours gt [46]
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
Figure VII.10: Additional qualitative comparisons between ground truth sphere images and renderings produced
with IBRL using our predicted HDR lighting and that of the previous state-of-the-art for unseen indoor [46]
scenes. Continued from previous page.
164
UO input ours gt [62] UO input ours gt [62] UO input ours gt [62] UO input ours gt [62]
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
Figure VII.11: Additional qualitative comparisons between ground truth sphere images and renderings produced
with IBRL using our predicted HDR lighting and that of the previous state-of-the-art for unseen outdoor [62]
scenes. Continued on next page.
165
UO input ours gt [62] UO input ours gt [62] UO input ours gt [62] UO input ours gt [62]
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
(d)
(s)
(m)
Figure VII.12: Additional qualitative comparisons between ground truth sphere images and renderings produced
with IBRL using our predicted HDR lighting and that of the previous state-of-the-art for unseen outdoor [62]
scenes. Continued from previous page.
166
a. unseen input b. real object c. GT HDR IBL d. ours e. [46] f. real object g. GT HDR IBL h. ours i. [46]
indoor indoor indoor indoor
Figure VII.13: For each input image (a), we photograph a real 3D-printed bunny placed in the scene for two different
BRDFs (b, f) and capture ground truth HDR panoramas at the bunny’s location. Using GI rendering with IBL, we render a
virtual bunny into the scene using ground truth lighting (c, g), our lighting inference (d, h), and that of the state-of-the-art
methods for indoor [46] scenes (e, i). Continued on next page.
167
a. unseen input b. real object c. GT HDR IBL d. ours e. [46] f. real object g. GT HDR IBL h. ours i. [46]
indoor indoor indoor
Figure VII.14: For each input image (a), we photograph a real 3D-printed bunny placed in the scene for two different
BRDFs (b, f) and capture ground truth HDR panoramas at the bunny’s location. Using GI rendering with IBL, we render a
virtual bunny into the scene using ground truth lighting (c, g), our lighting inference (d, h), and that of the state-of-the-art
methods for indoor [46] scenes (e, i). Continued from previous page.
168
a. unseen input b. real object c. GT HDR IBL d. ours e. [62] f. real object g. GT HDR IBL h. ours i. [62]
outdoor shade outdoor shade outdoor sun
Figure VII.15: For each input image (a), we photograph a real 3D-printed bunny placed in the scene for two different
BRDFs (b, f) and capture ground truth HDR panoramas at the bunny’s location. Using GI rendering with IBL, we render a
virtual bunny into the scene using ground truth lighting (c, g), our lighting inference (d, h), and that of the state-of-the-art
methods for outdoor [62] scenes (e, i). Continued on next page.
169
a. unseen input b. real object c. GT HDR IBL d. ours e. [62] f. real object g. GT HDR IBL h. ours i. [62]
outdoor shade outdoor shade outdoor shade
Figure VII.16: For each input image (a), we photograph a real 3D-printed bunny placed in the scene for two different
BRDFs (b, f) and capture ground truth HDR panoramas at the bunny’s location. Using GI rendering with IBL, we render a
virtual bunny into the scene using ground truth lighting (c, g), our lighting inference (d, h), and that of the state-of-the-art
methods for outdoor [62] scenes (e, i). Continued from previous page.
170
Figure VII.17: Boxplot of RGB relative radiance accuracy, measured by summing linear pixel values of the diffuse ball
rendered with the HDR lighting estimates, and comparing with ground truth: (pred-gt)/gt, n= 450, for our approach and
the previous state-of-the-art methods for indoor[46] and outdoor[62] scenes.
frame 0 frame 1 frame 2 frame 3
0 1 2 3 0 1 2 3 0 1 2 3
(d)
(s)
(m)
a. ground truth b. ours c. [46]
Figure VII.18: Example ground truth spheres (a) and renderings produced with IBRL using our predicted illumination
(b) and that of [46] (c), for four sequential unseen indoor video frames (top), demonstrating qualitative improvement in
temporal consistency.
171
SI input (d) (s) (m) SO input (d) (s) (m)
gt
pred
Figure VII.19: Example ground truth spheres and renderings produced with IBRL using our predicted HDR lighting,
with input images from a different camera.
UI input (d) (s) (m) UI input (d) (s) (m)
gt
pred
Figure VII.20: Example challenging scenes: ground truth spheres and renderings produced with IBRL using our predicted
HDR lighting.
172
Chapter VIII
Conclusion and Future Work
VIII.1 Conclusion
In this dissertation, we have presented several techniques to improve the measurement and estimation of illu-
mination and reflectance, towards the high-level goal of improving the realism and color accuracy of digital
image composites, from visual effects or film production to real-time augmented reality. We have shown that
adding multispectral LED-based illumination to a light stage system improves a variety of techniques previously
developed using either only RGB LEDs or only broad-spectrum white LEDs, including: lighting reproduction,
image-based relighting, and high-resolution facial geometry and reflectance scanning. Throughout the first sec-
tions of this work, our ambition was to highlight that the world is a spectral renderer and leverage the spectral
nature of image formation to improve the color rendition accuracy of these various compositing-related tech-
niques. Finally, as compositing forms the basis for augmented and mixed reality applications, we concluded
with a technique to estimate a high dynamic range light probe image from a limited field-of-view mobile phone
image, towards accurate illumination for realistic rendering in real-time compositing when prior radiometric
lighting measurement is impractical.
173
VIII.2 Future Work
This dissertation has suggested several opportunities for related research directions on the topics of lighting
measurement, playback, and estimation, along with a variety of ways to potentially improve or expand upon the
individual techniques presented in each chapter. These specific suggestions can be found at each chapter’s end,
while in this section we identify more general or thematic future research directions.
Image-based Relighting in Convolutional Neural Networks. In chapter VII, we introduced the technique
of using image-based relighting as a differentiable renderer to use in training a deep neural network for lighting
estimation from an unconstrained mobile phone image. Compared with other differentiable shading models
and corresponding lighting representations from prior works, ours is the first to handle high frequency illumi-
nation effects. Furthermore, IBRL enables the simulation of realistic, complex shading effects that are thus far
impossible to achieve using other differentiable rendering techniques, e.g. realistic interreflections, subsurface
scattering, etc. We therefore envision that IBRL could be used in a deep learning context to solve other chal-
lenging, under-constrained problems such as lighting estimation from known objects or objects of a known class
(including from faces). Furthermore, it could be useful towards solving the much more challenging problem of
learning reflectance functions given known illumination. Taking this even one step further, solving for unknown
illumination and reflectance simultaneously could enable realistic object relighting in an unconstrained setting.
Indeed, since the publication of LeGendre et al. [86] (on which chapter VII is based), the work of Sun et al.
[130] describes training an encoder-decoder style neural network to both estimate lighting from faces and pre-
dict the appearance of a person in a novel lighting environment given only an unconstrained portrait as input.
While this work did not use IBRL during training as in our approach, it did use the technique to pre-render
training data: a large dataset of faces as they would appear in thousands of paired lighting environments. This
hybrid real/synthetic training dataset benefits from realistic light transport unlike most synthetic imagery, while
retaining the benefit of scaling to cover many lighting environments.
174
We believe our work has paved the way for other researchers to consider using data-driven rendering tech-
niques during model training, rather than relying on simplistic shading models e.g. Phong shading or spherical
harmonics with pre-computed radiance transfer [111]. While in chapter VII we avoided consideration of the
view-dependence of the reflectance field by using spheres (which are rotationally-symmetric) as our reference
objects, Debevec et al. [30] described a viewpoint interpolation technique for image-based relighting, suggest-
ing that changing the viewing angle during differentiable rendering with IBRL may still be feasible for future
work using objects that are not rotationally-symmetric.
Multispectral Lighting Estimation from an Unconstrained Image. In the earlier portions of this disserta-
tion, we extended classical image-based lighting measurement and playback to the multispectral domain, while
in chapter VII we predicted only tristimulus RGB light probe images. One obvious direction for future work
combining our contributions is therefore multispectral, HDR, omnidirectional lighting estimation from an un-
constrained LDR image. Our research in chapter III suggests that adding at least one color chart observation
to the training data could help work towards this goal, while the differentiable IBRL loss function could be ex-
tended to the multispectral domain using the techniques presented in chapter V. Real-time rendering techniques
on mobile devices remain relatively simplistic due to compute and power consumption constraints, making spec-
tral rendering on mobile platforms not particularly likely in the near future. However, visual effects and film
production could likely benefit from high-quality multispectral lighting estimation, and, additionally, the type of
data that would be likely be required for training a multispectral lighting estimation network (triplets of corre-
sponding RGB light probes, color chart observations, and un-occluded background images) could also be used
to learn how to "promote" an RGB light probe to multispectral in a data-driven way.
LEDs Beyond the Light Stage. Much of this dissertation has focused on multispectral extensions to tech-
niques typically practiced using a light stage system of omnidirectional LEDs. However, it is also our ambition
175
to expand the "spectral awareness" of cinematographers more broadly. Due to technological advancements in
solid state lighting (SSL) and a mandate by the United States Department of Energy (DOE), LED lighting has
proliferated both in film production and even more generally in residential and commercial environments [154].
LED-based light sources offer higher energy efficiency, sophisticated computer controllability, and cooler run-
ning temperatures as compared with previous types of film studio luminaires such as tungsten sources. However,
there is currently no standardized way to produce "colorless" or white illumination using LED technology, and
studio LED sources with vastly different spectra are commonly used on film sets for this goal. Understand-
ably, these spectral variations have produced the "chromatic chaos" [37] in film-making that caught the attention
of the Science and Technology Council of The Academy of Motion Picture Arts and Sciences, catalyzing the
formation of its sub-committee on SSL [38] (of which the dissertation author is a member). Towards building
broader "spectral awareness," this sub-committee developed a Cinematographic Spectral Similarity Index [63]
intended for filmmakers to compare the color rendition properties of different light sources, which we briefly
summarize here.
A Cinematographic Spectral Similarity Index. As SSL sources have replaced Tungsten sources for use
during film and media production, new color rendition challenges in post-production have emerged. The Color
Rendering Index (CRI) is often reported for various light sources as an attempt to mitigate such challenges,
but its computation technique involving the human observer (rather than a film camera observer) limits its
accuracy and utility. The Academy’s SSL committee instead developed a novel index or rating system for light
sources, termed the Spectral Similarity Index (SSI), that is based on the similarity of the source’s spectrum to
a reference spectrum, thereby eliminating the need for any assumption of a specific observer or camera. The
index produces a "confidence factor," where a high score implies predictable color rendition for a light source for
cinematography, while a moderate score suggests possible color rendition challenges. The calculations behind
the SSI score are best conceptualized as a filtered and weighted sum of least squares error between the illuminant
176
to be rated and a reference illuminant spectrum, e.g. that of studio Tungsten. The weighting and filtering
functions are designed based on the known wavelength extent and frequency space analysis of cinematographic
camera spectral responses, respectively. Thus, while a standard observer is not required, general prior knowledge
of typical camera spectral sensitivities is incorporated into the design of the index. The SSI was presented at the
Society of Motion Picture and Television Engineers (SMPTE) technical conference in 2016 [63], and since the
publication of this initial work, the commercial light meter manufacturer Sekonic has integrated the Index into
their commercial spectometer, the C-800 Spectromaster [3].
177
Chapter IX
Basis Publications
Several publications form the basis for this work.
chapter III is based on LeGendre et al. (2016) [89].
chapter IV is based on LeGendre et al. (2017) [88].
chapter V is based on LeGendre et al. (2016) [87].
chapter VI is based on LeGendre et al. (2018) [83].
chapter VII is based on LeGendre et al. (2019) [86].
178
BIBLIOGRAPHY
[1] The 89th academy awards, 2017. https://www.oscars.org/oscars/ceremonies/2017 . Accessed:
2019-03-01.
[2] EBU TECH 3353, Development of a “Standard” Television Camera Model Implemented in the TLCI-
2012. European Broadcasting Union.
[3] New sekonic c-800 spectromaster. https://c800.sekonic.com/ . Accessed: 2019-04-19.
[4] Qualcomm snapdragon 835 performance preview. https://www.anandtech.com/show/11201/
qualcomm- snapdragon- 835- performance- preview/2 . Accessed: 2018-11-18.
[5] The Stanford 3D Scanning Repository. http://graphics.stanford.edu/data/3Dscanrep/ .
[6] The 13th academy awards, 1941. https://www.oscars.org/oscars/ceremonies/1941 . Accessed:
2019-02-21.
[7] The 77th academy awards, 2005. https://www.oscars.org/oscars/ceremonies/2005 . Accessed:
2019-05-23.
[8] The 86th academy awards, 2014. https://www.oscars.org/oscars/ceremonies/2014 . Accessed:
2019-03-01.
[9] The 91st academy awards, 2019. https://www.oscars.org/oscars/ceremonies/2019 . Accessed:
2019-03-01.
[10] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Cor-
rado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp,
Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh
Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster,
Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay
Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu,
and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Soft-
ware available from tensorflow.org.
179
[11] Boris Ajdin, Manuel Finckh, Christian Fuchs, Johannes Hanika, and Hendrik Lensch. Compressive
higher-order sparse and low-rank acquisition with a hyperspectral light stage. Technical report, Univ. of
Tuebingen, 2012.
[12] Oleg Alexander, Graham Fyffe, Jay Busch, Xueming Yu, Ryosuke Ichikari, Andrew Jones, Paul Debevec,
Jorge Jimenez, Etienne Danvoye, Bernardo Antionazzi, et al. Digital Ira: Creating a real-time photoreal
digital actor. In ACM SIGGRAPH 2013 Posters, page 1. ACM, 2013.
[13] Oleg Alexander, Mike Rogers, William Lambeth, Jen-Yuan Chiang, Wan-Chun Ma, Chuan-Chang Wang,
and Paul Debevec. The digital Emily project: Achieving a photorealistic digital actor. IEEE Computer
Graphics and Applications, 30(4):20–31, 2010.
[14] Jonathan T Barron and Jitendra Malik. Intrinsic scene properties from a single RGB-D image. In Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 17–24, 2013.
[15] Harry Barrow, J Tenenbaum, A Hanson, and E Riseman. Recovering intrinsic scene characteristics.
Comput. Vis. Syst, 2:3–26, 1978.
[16] Thabo Beeler, Bernd Bickel, Paul Beardsley, Bob Sumner, and Markus Gross. High-quality single-shot
capture of facial geometry. In ACM Transactions on Graphics (ToG), volume 29, page 40. ACM, 2010.
[17] James F Blinn and Martin E Newell. Texture and reflection in computer generated images. Communica-
tions of the ACM, 19(10):542–547, 1976.
[18] Derek Bradley, Wolfgang Heidrich, Tiberiu Popa, and Alla Sheffer. High resolution passive facial perfor-
mance capture. In ACM transactions on graphics (TOG), volume 29, page 41. ACM, 2010.
[19] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
[20] Lura S. Brainerd. Method of producing moving pictures, June 1919. US Patent 1,307,846.
[21] G. Buchsbaum. A spatial processor model for object colour perception. Journal of the Franklin Institute,
310:1–26, July 1980.
[22] Dan A Calian, Jean-François Lalonde, Paulo Gotardo, Tomas Simon, Iain Matthews, and Kenny Mitchell.
From faces to outdoor light probes. In Computer Graphics Forum, volume 37, pages 51–61. Wiley Online
Library, 2018.
[23] Yung-Yu Chuang, Douglas E Zongker, Joel Hindorff, Brian Curless, David H Salesin, and Richard
Szeliski. Environment matting extensions: Towards higher accuracy and real-time capture. In Proceed-
ings of the 27th annual conference on Computer graphics and interactive techniques, pages 121–130.
ACM Press/Addison-Wesley Publishing Co., 2000.
[24] Roger Nelson Clark, Gregg A Swayze, Richard Wise, K Eric Livo, Todd M Hoefen, Raymond F Kokaly,
and Stephen J Sutley. USGS digital spectral library., 2007.
180
[25] Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning
by exponential linear units (elus). In International Conference on Learning Representations (ICLR),
2016.
[26] Benjamin A Darling, James A Ferwerda, Roy S Berns, and Tongbo Chen. Real-time multispectral render-
ing with complex illumination. In Color and Imaging Conference, volume 2011, pages 345–351. Society
for Imaging Science and Technology, 2011.
[27] Paul Debevec. Rendering synthetic objects into real scenes: Bridging traditional and image-based graph-
ics with global illumination and high dynamic range photography. In Proc. of the 25th Annual Conf. on
Computer Graphics and Interactive Techniques, SIGGRAPH ’98, pages 189–198, New York, NY , USA,
1998. ACM.
[28] Paul Debevec. Image-based lighting. IEEE Computer Graphics and Applications, 22(2):26–34, 2002.
[29] Paul Debevec, Paul Graham, Jay Busch, and Mark Bolas. A single-shot light probe. In ACM SIGGRAPH
2012 Talks, SIGGRAPH ’12, pages 10:1–10:1, New York, NY , USA, 2012. ACM.
[30] Paul Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker, Westley Sarokin, and Mark Sagar. Ac-
quiring the reflectance field of a human face. In Proceedings of the 27th annual conference on Computer
graphics and interactive techniques, pages 145–156. ACM Press/Addison-Wesley Publishing Co., 2000.
[31] Paul Debevec and Jitendra Malik. Recovering high dynamic range radiance maps from photographs.
In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages
369–378. ACM Press/Addison-Wesley Publishing Co., 1997.
[32] Paul Debevec, Andreas Wenger, Chris Tchou, Andrew Gardner, Jamie Waese, and Tim Hawkins. A
lighting reproduction approach to live-action compositing. In Proc. 29th Annual Conference on Computer
Graphics and Interactive Techniques, SIGGRAPH ’02, pages 547–556, New York, NY , USA, 2002.
ACM.
[33] Ron O Dror, Alan S Willsky, and Edward H Adelson. Statistical characterization of real-world illumina-
tion. Journal of Vision, 4(9):11–11, 2004.
[34] Carroll D. Dunning. Method of producing composite photographs, January 1927. US Patent 1,613,163.
[35] Per Einarsson, Charles-Felix Chabert, Andrew Jones, Wan-Chun Ma, Bruce Lamond, Tim Hawkins,
Mark T Bolas, Sebastian Sylwan, and Paul E Debevec. Relighting human locomotion with flowed re-
flectance fields. Rendering techniques, 2006:17th, 2006.
[36] Jonathan Erland. The state of solid state lighting: Color rendering index and metamerism considerations.
In SMPTE 2016 Annual Technical Conference and Exhibition. SMPTE, 2009.
[37] Jonathan Erland. Chromatic chaos: Implications of newly introduced forms of stagelight. National
Association of Broadcasters (NAB) Conference, 2011.
[38] Jonathan Erland. The Academy and cinema lighting: The pre-eminence of excellence over expedience.
Journal of Film Preservation, (98):53–60, 2018.
181
[39] Alix Fenies, Christos Kampouris, Husheng Deng, and Abhijeet Ghosh. Improving image-based lighting
reproduction using a multispectral light stage. In CVMP. CVMP Short Papers, 2017.
[40] Raymond Fielding. Techniques of Special Effects of Cinematography. Focal Press, 2013.
[41] Yasutaka Furukawa and Jean Ponce. Dense 3D motion capture for human faces. In Computer Vision and
Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1674–1681. IEEE, 2009.
[42] Graham Fyffe, Paul Graham, Borom Tunwattanapong, Abhijeet Ghosh, and P Debevec. Near-instant
capture of high-resolution facial geometry and reflectance. In Computer Graphics Forum, volume 35,
pages 353–363. Wiley Online Library, 2016.
[43] Graham Fyffe, Andrew Jones, Oleg Alexander, Ryosuke Ichikari, Paul Graham, Koki Nagano, Jay Busch,
and Paul Debevec. Driving high-resolution facial blendshapes with video performance capture. In ACM
SIGGRAPH 2013 Talks, SIGGRAPH ’13, pages 33:1–33:1, New York, NY , USA, 2013. ACM.
[44] Graham Fyffe, Koki Nagano, Loc Huynh, Shunsuke Saito, Jay Busch, Andrew Jones, Hao Li, and Paul
Debevec. Multi-view stereo on consistent face topology. In Computer Graphics Forum, volume 36, pages
295–309. Wiley Online Library, 2017.
[45] Graham Fyffe, Xueming Yu, and Paul Debevec. Single-shot photometric stereo by spectral multiplexing.
In Computational Photography (ICCP), 2011 IEEE International Conference on, pages 1–6. IEEE, 2011.
[46] Marc-André Gardner, Kalyan Sunkavalli, Ersin Yumer, Xiaohui Shen, Emiliano Gambaretto, Christian
Gagné, and Jean-François Lalonde. Learning to predict indoor illumination from a single image. ACM
Trans. Graph., 36(6):176:1–176:14, November 2017.
[47] Pablo Garrido, Levi Valgaerts, Chenglei Wu, and Christian Theobalt. Reconstructing detailed dynamic
face geometry from monocular video. ACM Trans. Graph., 32(6):158–1, 2013.
[48] Stamatios Georgoulis, Konstantinos Rematas, Tobias Ritschel, Mario Fritz, Luc Van Gool, and Tinne
Tuytelaars. Delight-net: Decomposing reflectance maps into specular materials and natural illumination.
arXiv preprint arXiv:1603.08240, 2016.
[49] Abhijeet Ghosh, Graham Fyffe, Borom Tunwattanapong, Jay Busch, Xueming Yu, and Paul Debevec.
Multiview face capture using polarized spherical gradient illumination. ACM Transactions on Graphics
(TOG), 30(6):129, 2011.
[50] Abhijeet Ghosh, Tim Hawkins, Pieter Peers, Sune Frederiksen, and Paul Debevec. Practical modeling
and acquisition of layered facial reflectance. In ACM Transactions on Graphics (TOG), volume 27, page
139. ACM, 2008.
[51] Carolyn Giardina. Visual Effects Innovator Petro Vlahos Dies at 96. https://www.
hollywoodreporter.com/news/visual- effects- innovator- petro- vlahos- 421401/ . Ac-
cessed: 2019-02-21.
182
[52] Mayank Goel, Eric Whitmire, Alex Mariakakis, T Scott Saponas, Neel Joshi, Dan Morris, Brian Guenter,
Marcel Gavriliu, Gaetano Borriello, and Shwetak N Patel. Hypercam: hyperspectral imaging for ubiqui-
tous computing applications. In Proceedings of the 2015 ACM International Joint Conference on Perva-
sive and Ubiquitous Computing, pages 145–156. ACM, 2015.
[53] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron
Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing
systems, pages 2672–2680, 2014.
[54] Cindy M Goral, Kenneth E Torrance, Donald P Greenberg, and Bennett Battaile. Modeling the interaction
of light between diffuse surfaces. In ACM SIGGRAPH computer graphics, volume 18, pages 213–222.
ACM, 1984.
[55] Paul Graham, Borom Tunwattanapong, Jay Busch, Xueming Yu, Andrew Jones, Paul Debevec, and Ab-
hijeet Ghosh. Measurement-based synthesis of facial microgeometry. In Computer Graphics Forum,
volume 32, pages 335–344. Wiley Online Library, 2013.
[56] Lukas Gruber, Thomas Richter-Trummer, and Dieter Schmalstieg. Real-time photometric registration
from arbitrary geometry. In Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium
on, pages 119–128. IEEE, 2012.
[57] Jinwei Gu and Chao Liu. Discriminative illumination: Per-pixel classification of raw materials based on
optimal projections of spectral brdf. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on, pages 797–804, June 2012.
[58] Pierre-Loïc Hamon, James Harmer, Stuart Penn, and Nicolas Scapel. Gravity: Motion control and face
integration. In ACM SIGGRAPH 2014 Talks, SIGGRAPH ’14, pages 35:1–35:1, New York, NY , USA,
2014. ACM.
[59] Carlos Hernández, George V ogiatzis, Gabriel J Brostow, Bjorn Stenger, and Roberto Cipolla. Non-rigid
photometric stereo with colored lights. In Computer Vision, 2007. ICCV 2007. IEEE 11th International
Conference on, pages 1–8. IEEE, 2007.
[60] Jeff Heusser. Academy to Celebrate Petro Vlahos. https://www.fxguide.com/quicktakes/
academy- to- celebrate- petro- vlahos/ . Accessed: 2019-02-21.
[61] Keigo Hirakawa and Thomas W Parks. Adaptive homogeneity-directed demosaicing algorithm. IEEE
Transactions on Image Processing, 14(3):360–369, 2005.
[62] Yannick Hold-Geoffroy, Kalyan Sunkavalli, Sunil Hadap, Emiliano Gambaretto, and Jean-François
Lalonde. Deep outdoor illumination estimation. In IEEE International Conference on Computer Vision
and Pattern Recognition, volume 2, 2017.
[63] Jack Holm, Tom Maier, Paul Debevec, Chloe LeGendre, Joshua Pines, Jonathan Erland, George Joblove,
Scott Dyer, Blake Sloan, Joe di Gennaro, et al. A cinematographic spectral similarity index. In SMPTE
2016 Annual Technical Conference and Exhibition, pages 1–36. SMPTE, 2016.
183
[64] Steven D Hordley and Graham D Finlayson. Re-evaluating colour constancy algorithms. In Pattern
Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 1, pages
76–79. IEEE, 2004.
[65] Kevin W Houser, Minchen Wei, Aurélien David, Michael R Krames, and Xiangyou Sharon Shen. Review
of measures for light-source color rendition and considerations for a two-measure system for characteriz-
ing color rendition. Optics Express, 21(8):10393–10411, 2013.
[66] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand,
Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile
vision applications. arXiv preprint arXiv:1704.04861, 2017.
[67] Haoda Huang, Jinxiang Chai, Xin Tong, and Hsiang-Tao Wu. Leveraging motion capture and 3D scanning
for high-fidelity facial performance acquisition. In ACM Transactions on Graphics (TOG), volume 30,
page 74. ACM, 2011.
[68] Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. Globally and locally consistent image comple-
tion. ACM Transactions on Graphics (TOG), 36(4):107, 2017.
[69] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing
internal covariate shift. In Proceedings of the 32Nd International Conference on International Conference
on Machine Learning - Volume 37, ICML’15, pages 448–456. JMLR.org, 2015.
[70] Henrik Wann Jensen, Stephen R Marschner, Marc Levoy, and Pat Hanrahan. A practical model for
subsurface light transport. In Proceedings of the 28th annual conference on Computer graphics and
interactive techniques, pages 511–518. ACM, 2001.
[71] Jun Jiang, Dengyu Liu, Jinwei Gu, and Sabine Susstrunk. What is the space of spectral sensitivity func-
tions for digital color cameras. In Applications of Computer Vision WACV, 2013 IEEE Workshop on,
pages 168–179, 2013.
[72] James T Kajiya. The rendering equation. In ACM SIGGRAPH computer graphics, volume 20, pages
143–150. ACM, 1986.
[73] Kevin Karsch, Kalyan Sunkavalli, Sunil Hadap, Nathan Carr, Hailin Jin, Rafael Fonte, Michael Sittig, and
David Forsyth. Automatic scene inference for 3D object compositing. ACM Transactions on Graphics
(TOG), 33(3):32, 2014.
[74] Rei Kawakami, Hongxun Zhao, Robby T. Tan, and Katsushi Ikeuchi. Camera spectral sensitivity and
white balance estimation from sky images. Int. J. Comput. Vision, 105(3):187–204, December 2013.
[75] Erum Arif Khan, Erik Reinhard, Roland W. Fleming, and Heinrich H. Bülthoff. Image-based material
editing. In ACM SIGGRAPH 2006 Papers, SIGGRAPH ’06, pages 654–663, New York, NY , USA, 2006.
ACM.
[76] Joseph T. Kider, Jr., Daniel Knowlton, Jeremy Newlin, Yining Karl Li, and Donald P. Greenberg. A
framework for the experimental comparison of solar and skydome illumination. ACM Trans. Graph.,
33(6):180:1–180:12, November 2014.
184
[77] Akira Kimachi, Hisashi Ikuta, Yusuke Fujiwara, Mitsuharu Masumoto, and Hitoshi Matsuyama. Spec-
tral matching imager using amplitude-modulation-coded multispectral light-emitting diode illumination.
Optical engineering, 43(4):975–985, 2004.
[78] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International
Conference on Learning Representations (ICLR), volume 5, 2015.
[79] M. Kitahara, T. Okabe, C. Fuchs, and H.P.A. Lensch. Simultaneous estimation of spectral reflectance
and normal from a small number of images. In Proceedings of the 10th International Conference on
Computer Vision Theory and Applications, VISAPP 2015, pages 303–313, 2015.
[80] Jean-François Lalonde, Alexei A Efros, and Srinivasa G Narasimhan. Estimating natural illumination
from a single outdoor image. In Computer Vision, 2009 IEEE 12th International Conference on, pages
183–190. IEEE, 2009.
[81] Jean-François Lalonde and Iain Matthews. Lighting estimation in outdoor image collections. In 3D Vision
(3DV), 2014 2nd International Conference on, volume 1, pages 131–138. IEEE, 2014.
[82] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta,
Andrew P Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-realistic single image super-
resolution using a generative adversarial network. In CVPR, volume 2, page 4, 2017.
[83] Chloe LeGendre, Kalle Bladin, Bipin Kishore, Xinglei Ren, Xueming Yu, and Paul Debevec. Efficient
multispectral facial capture with monochrome cameras. In Color and Imaging Conference, volume 2018,
pages 187–202. Society for Imaging Science and Technology, 2018.
[84] Chloe LeGendre, Loc Hyunh, Shanhe Wang, and Paul Debevec. Modeling vellus facial hair from asperity
scattering silhouettes. In ACM SIGGRAPH 2017 Talks, page 27. ACM, 2017.
[85] Chloe LeGendre, David Krissman, and Paul Debevec. Improved chromakey of hair strands via orientation
filter convolution. In ACM SIGGRAPH 2017 Posters, page 62. ACM, 2017.
[86] Chloe LeGendre, Wan-Chun Ma, Graham Fyffe, John Flynn, Laurent Charbonnel, Jay Busch, and Paul
Debevec. Deeplight: Learning illumination for unconstrained mobile mixed reality. In Computer Vision
and Pattern Recognition, 2019. CVPR 2019. IEEE Computer Society Conference on. IEEE, 2019.
[87] Chloe LeGendre, Xueming Yu, and Paul Debevec. Efficient multispectral reflectance function capture
for image-based relighting. In Color and Imaging Conference, volume 2016, pages 47–58. Society for
Imaging Science and Technology, 2016.
[88] Chloe LeGendre, Xueming Yu, and Paul Debevec. Optimal LED selection for multispectral lighting
reproduction. Electronic Imaging, 2017(8):25–32, 2017.
[89] Chloe LeGendre, Xueming Yu, Dai Liu, Jay Busch, Andrew Jones, Sumanta Pattanaik, and Paul Debevec.
Practical multispectral lighting reproduction. ACM Trans. Graph., 35(4), July 2016.
[90] Anat Levin, Dani Lischinski, and Yair Weiss. A closed-form solution to natural image matting. IEEE
transactions on pattern analysis and machine intelligence, 30(2):228–242, 2008.
185
[91] Anat Levin, Alex Rav-Acha, and Dani Lischinski. Spectral matting. IEEE transactions on pattern anal-
ysis and machine intelligence, 30(10):1699–1712, 2008.
[92] John P Lewis, Ken Anjyo, Taehyun Rhee, Mengjie Zhang, Frederic H Pighin, and Zhigang Deng. Practice
and theory of blendshape facial models. Eurographics (State of the Art Reports), 1(8), 2014.
[93] Andrew Liszewski. How the Biggest TV You’ve Ever Seen Helped First Man’s
Oscar-Winning Visual Effects Look So Authentic. https://io9.gizmodo.com/
how- the- biggest- tv- youve- ever- seen- helped- first- mans- os- 1832926888 . Accessed:
2019-03-01.
[94] Guilin Liu, Duygu Ceylan, Ersin Yumer, Jimei Yang, and Jyh-Ming Lien. Material editing using a phys-
ically based rendering network. In Computer Vision (ICCV), 2017 IEEE International Conference on,
pages 2280–2288. IEEE, 2017.
[95] Stephen Lombardi and Ko Nishino. Reflectance and illumination recovery in the wild. IEEE transactions
on pattern analysis and machine intelligence, 38(1):129–141, 2016.
[96] Jorge Lopez-Moreno, Elena Garces, Sunil Hadap, Erik Reinhard, and Diego Gutierrez. Multiple light
source estimation in a single image. In Computer Graphics Forum, volume 32, pages 170–182. Wiley
Online Library, 2013.
[97] Chenguang Ma, Xun Cao, Xin Tong, Qionghai Dai, and Stephen Lin. Acquisition of high spatial and spec-
tral resolution video with a hybrid camera system. International Journal of Computer Vision, 110(2):141–
155, 2014.
[98] Wan-Chun Ma, Tim Hawkins, Pieter Peers, Charles-Felix Chabert, Malte Weiss, and Paul Debevec. Rapid
acquisition of specular and diffuse normal maps from polarized spherical gradient illumination. In Pro-
ceedings of the 18th Eurographics conference on Rendering Techniques, pages 183–194. Eurographics
Association, 2007.
[99] Satya P Mallick, Todd E Zickler, David J Kriegman, and Peter N Belhumeur. Beyond lambert: Recon-
structing specular surfaces using color. In Computer Vision and Pattern Recognition, 2005. CVPR 2005.
IEEE Computer Society Conference on, volume 2, pages 619–626. Ieee, 2005.
[100] David Mandl, Kwang Moo Yi, Peter Mohr, Peter M Roth, Pascal Fua, Vincent Lepetit, Dieter Schmal-
stieg, and Denis Kalkofen. Learning lightprobes for mixed reality illumination. In Mixed and Augmented
Reality (ISMAR), 2017 IEEE International Symposium on, pages 82–89. IEEE, 2017.
[101] C.S. McCamy, H. Marcus, and J.G. Davidson. A color-rendition chart. Journal of Applied Photographic
Engineering, 2(3):95–99, June 1976.
[102] Abhimitra Meka, Maxim Maximov, Michael Zollhoefer, Avishek Chatterjee, Hans-Peter Seidel, Christian
Richardt, and Christian Theobalt. Lime: Live intrinsic material estimation. In Proceedings of Computer
Vision and Pattern Recognition (CVPR), June 2018.
186
[103] Shree K Nayar, Xi-Sheng Fang, and Terrance Boult. Separation of reflection components using color and
polarization. International Journal of Computer Vision, 21(3):163–186, 1997.
[104] Jeffry S Nimeroff, Eero Simoncelli, and Julie Dorsey. Efficient re-rendering of naturally illuminated
environments. In Photorealistic Rendering Techniques, pages 373–388. Springer, 1995.
[105] Jong-Il Park, Moon-Hyun Lee, Michael D Grossberg, and Shree K Nayar. Multispectral imaging using
multiplexed illumination. In 2007 IEEE 11th International Conference on Computer Vision, pages 1–8.
IEEE, 2007.
[106] Manu Parmar, Steven Lansel, and Joyce Farrell. An LED-based lighting system for acquiring multispec-
tral scenes. In IS&T/SPIE Electronic Imaging, pages 82990P–82990P. International Society for Optics
and Photonics, 2012.
[107] Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. Context encoders:
Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 2536–2544, 2016.
[108] Roy J Pomeroy. Method of producing composite photographs, June 1928. US Patent 1,673,019.
[109] Sylvia C Pont and Susan F te Pas. Material illumination ambiguities and the perception of solid objects.
Perception, 35(10):1331–1350, 2006.
[110] Thomas Porter and Tom Duff. Compositing digital images. In ACM Siggraph Computer Graphics,
volume 18, pages 253–259. ACM, 1984.
[111] Ravi Ramamoorthi and Pat Hanrahan. An efficient representation for irradiance environment maps. In
Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 497–
500. ACM, 2001.
[112] Ravi Ramamoorthi and Pat Hanrahan. A signal-processing framework for inverse rendering. In Proceed-
ings of the 28th annual conference on Computer graphics and interactive techniques, pages 117–128.
ACM, 2001.
[113] Erik Reinhard, Greg Ward, Sumanta Pattanaik, and Paul Debevec. High Dynamic Range Imaging: Ac-
quisition, Display, and Image-Based Lighting. Morgan Kaufmann Publishers Inc., San Francisco, CA,
USA, 2005.
[114] Konstantinos Rematas, Tobias Ritschel, Mario Fritz, Efstratios Gavves, and Tinne Tuytelaars. Deep
reflectance maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pages 4508–4516, 2016.
[115] Christoph Rhemann, Carsten Rother, Jue Wang, Margrit Gelautz, Pushmeet Kohli, and Pamela Rott. A
perceptually motivated online benchmark for image matting. In 2009 IEEE Conference on Computer
Vision and Pattern Recognition, pages 1826–1833. IEEE, 2009.
[116] Martin Rump, Arno Zinke, and Reinhard Klein. Practical spectral characterization of trichromatic cam-
eras. ACM Trans. Graph., 30(6):170:1–170:10, December 2011.
187
[117] Mark Sagar. Reflectance field rendering of human faces for Spider-Man 2. In ACM SIGGRAPH 2004
Sketches, page 118. ACM, 2004.
[118] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Inverted
residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv
preprint arXiv:1801.04381, 2018.
[119] Yoichi Sato and Katsushi Ikeuchi. Temporal-color space analysis of reflection. JOSA A, 11(11):2990–
3002, 1994.
[120] Mark Sawicki. Filming the fantastic: a guide to visual effects cinematography. Taylor & Francis, 2007.
[121] Soumyadip Sengupta, Angjoo Kanazawa, Carlos D Castillo, and David W Jacobs. Sfsnet: learning shape,
reflectance and illuminance of faces in the wild. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 6296–6305, 2018.
[122] Gaurav Sharma, Wencheng Wu, and Edul N Dalal. The ciede2000 color-difference formula: Implementa-
tion notes, supplementary test data, and mathematical observations. Color Research & Application: En-
dorsed by Inter-Society Color Council, The Colour Group (Great Britain), Canadian Society for Color,
Color Science Association of Japan, Dutch Society for the Study of Color, The Swedish Colour Centre
Foundation, Colour Society of Australia, Centre Français de la Couleur, 30(1):21–30, 2005.
[123] Junsheng Shi, Hongfei Yu, Xiaoqiao Huang, Zaiqing Chen, and Yonghang Tai. Illuminant spectrum
estimation using a digital color camera and a color chart. In Proc. SPIE, pages 927307–927307–9, 2014.
[124] Raju Shrestha and Jon Yngve Hardeberg. Multispectral imaging using LED illumination and an RGB
camera. In Color and Imaging Conference, volume 2013, pages 8–13. Society for Imaging Science and
Technology, 2013.
[125] Lucas Siegel. Rogue One EP Explains New Lighting That Will Change
The Way You See Star Wars. https://www.syfy.com/syfywire/
rogue- one- a- star- wars- story- new- lighting- techniques . Accessed: 2019-03-01.
[126] Alvy Ray Smith. Alpha and the history of digital compositing. http://alvyray.com/Memos/CG/
Microsoft/7_alpha.pdf , 1995.
[127] Alvy Ray Smith and James F Blinn. Blue screen matting. In SIGGRAPH, volume 96, pages 259–268,
1996.
[128] Shuran Song, Andy Zeng, Angel X Chang, Manolis Savva, Silvio Savarese, and Thomas Funkhouser.
Im2Pano3D: Extrapolating 360 structure and semantics beyond the field of view. In Computer Vision and
Pattern Recognition (CVPR), 2018 IEEE Conference on, pages 3847–3856. IEEE, 2018.
[129] Jessica Stumpfel, Chris Tchou, Andrew Jones, Tim Hawkins, Andreas Wenger, and Paul Debevec. Direct
HDR capture of the sun and sky. In Proceedings of the 3rd Intl. Conf. on Computer Graphics, Virtual
Reality, Visualisation and Interaction in Africa, AFRIGRAPH ’04, pages 145–149, New York, NY , USA,
2004. ACM.
188
[130] Tiancheng Sun, Jonathan T. Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph
Rhemann, Jay Busch, Paul Debevec, and Ravi Ramamoorthi. Single image portrait relighting. arXiv
preprint arXiv:1905.00824, 2019.
[131] Susan F te Pas and Sylvia C Pont. A comparison of material and illumination discrimination performance
for real rough, real smooth and computer generated smooth spheres. In Proceedings of the 2nd Symposium
on Applied Perception in Graphics and Visualization, pages 75–81. ACM, 2005.
[132] Ayush Tewari, Michael Zollhöfer, Pablo Garrido, Florian Bernard, Hyeongwoo Kim, Patrick Pérez, and
Christian Theobalt. Self-supervised multi-level face model learning for monocular reconstruction at over
250 hz. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages
2549–2559, 2018.
[133] Ayush Tewari, Michael Zollhöfer, Hyeongwoo Kim, Pablo Garrido, Florian Bernard, Patrick Pérez, and
Christian Theobalt. Mofa: Model-based deep convolutional face autoencoder for unsupervised monocular
reconstruction. In The IEEE International Conference on Computer Vision (ICCV), volume 2, page 5,
2017.
[134] Shoji Tominaga and Tsuyoshi Fukuda. Omnidirectional scene illuminant estimation using a multispectral
imaging system. In Color Imaging XII: Processing, Hardcopy, and Applications, volume 6493, page
649313. International Society for Optics and Photonics, 2007.
[135] Shoji Tominaga and Norihiro Tanaka. Omnidirectional scene illuminant estimation using a mirrored ball.
Journal of Imaging Science and Technology, 50(3):217–227, 2006.
[136] Borom Tunwattanapong, Graham Fyffe, Paul Graham, Jay Busch, Xueming Yu, Abhijeet Ghosh, and
Paul Debevec. Acquiring reflectance and shape from continuous spherical harmonic illumination. ACM
Transactions on graphics (TOG), 32(4):109, 2013.
[137] Shimon Ullman. The interpretation of structure from motion. Proc. R. Soc. Lond. B, 203(1153):405–426,
1979.
[138] Jonas Unger and Stefan Gustavson. High-dynamic-range video for photometric measurement of illumi-
nation. In Proc. SPIE, pages 65010E–65010E–10, 2007.
[139] Muhammad Uzair, Arif Mahmood, Faisal Shafait, Christian Nansen, and Ajmal Mian. Is spectral re-
flectance of the face a reliable biometric? Optics express, 23(12):15160–15173, 2015.
[140] Petro Vlahos. Composite color photography, November 1964. US Patent 3,158,477.
[141] Tuanfeng Y . Wang, Tobias Ritschel, and Niloy J. Mitra. Joint material and illumination estimation from
photo sets in the wild. In Proceedings of International Conference on 3DVision (3DV), 2018. selected for
oral presentation.
[142] Greg Ward and Mashhuda Glencross. A case study evaluation: Perceptually accurate textured surface
models. In Proceedings of the 6th Symposium on Applied Perception in Graphics and Visualization,
pages 109–115. ACM, 2009.
189
[143] Henrique Weber, Donald Prévost, and Jean-François Lalonde. Learning to estimate indoor lighting from
3D objects. In 2018 International Conference on 3D Vision (3DV), pages 199–207. IEEE, 2018.
[144] Birk Weiberg. Roy J. Pomeroy, Dunning Process Co., Inc., and Paramount Publix Corporation vs. Warner
Bros. Pictures, Inc., Vitaphone Corporation, and Frederick Jackman: How the Movie Industry Turned to
Rear Projection. In SCMS (Society For Cinema and Media Studies) 2014 Conference. SCMS (Society
For Cinema and Media Studies), 2014.
[145] Andreas Wenger, Tim Hawkins, and Paul Debevec. Optimizing color matching in a lighting reproduction
system for complex subject and illuminant spectra. In Proceedings of the 14th Eurographics Workshop
on Rendering, EGRW ’03, pages 249–259, Aire-la-Ville, Switzerland, 2003. Eurographics Association.
[146] Frank D. Williams. Method of taking motion pictures, July 1918. US Patent 1,273,435.
[147] Cyrus A Wilson, Abhijeet Ghosh, Pieter Peers, Jen-Yuan Chiang, Jay Busch, and Paul Debevec. Temporal
upsampling of performance geometry using photometric alignment. ACM Transactions on Graphics
(TOG), 29(2):17, 2010.
[148] Lawrence B Wolff. Material classification and separation of reflection components using polariza-
tion/radiometric information. Proc. of Image Understanding Workshop, pages 232–244, 1989.
[149] Lawrence B Wolff. Using polarization to separate reflection components. In Computer Vision and Pat-
tern Recognition, 1989. Proceedings CVPR’89., IEEE Computer Society Conference on, pages 363–369.
IEEE, 1989.
[150] Robert J Woodham. Photometric method for determining surface orientation from multiple images. Op-
tical engineering, 19(1):191139–191139, 1980.
[151] Steve Wright. Digital compositing for film and video. Focal Press, 2013.
[152] Jianxiong Xiao, Krista A Ehinger, Aude Oliva, and Antonio Torralba. Recognizing scene viewpoint
using panoramic place representation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
Conference on, pages 2695–2702. IEEE, 2012.
[153] Zexiang Xu, Kalyan Sunkavalli, Sunil Hadap, and Ravi Ramamoorthi. Deep image-based relighting from
optimal sparse samples. ACM Transactions on Graphics (TOG), 37(4):126, 2018.
[154] Mary Yamada and Dan Chwastyk. Adoption of light-emitting diodes in common lighting applications.
Technical report, Navigant Consulting, Suwanee, GA (United States), 2013.
[155] Chao Yang, Xin Lu, Zhe Lin, Eli Shechtman, Oliver Wang, and Hao Li. High-resolution image inpaint-
ing using multi-scale neural patch synthesis. In The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), volume 1, page 3, 2017.
[156] Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. Generative image inpaint-
ing with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 5505–5514, 2018.
190
[157] Yizhou Yu, Paul Debevec, Jitendra Malik, and Tim Hawkins. Inverse global illumination: Recovering
reflectance models of real scenes from photographs. In Proceedings of the 26th annual conference on
Computer graphics and interactive techniques, pages 215–224. ACM Press/Addison-Wesley Publishing
Co., 1999.
[158] Christopher Zach, Thomas Pock, and Horst Bischof. A duality based approach for realtime tv-l 1 optical
flow. In Joint Pattern Recognition Symposium, pages 214–223. Springer, 2007.
[159] Jinsong Zhang and Jean-François Lalonde. Learning high dynamic range from outdoor panoramas. In
IEEE International Conference on Computer Vision, 2017.
[160] Hao Zhou, Jin Sun, Yaser Yacoob, and David W Jacobs. Label denoising adversarial network (LDAN)
for inverse lighting of faces. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 6238–6247, 2018.
[161] Douglas E Zongker, Dawn M Werner, Brian Curless, and David Salesin. Environment matting and com-
positing. In Siggraph, volume 99, pages 205–214, 1999.
191
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Multi-scale dynamic capture for high quality digital humans
PDF
Recording, reconstructing, and relighting virtual humans
PDF
Spherical harmonic and point illumination basis for reflectometry and relighting
PDF
Rapid creation of photorealistic virtual reality content with consumer depth cameras
PDF
Perception and haptic interface design for rendering hardness and stiffness
PDF
Closing the reality gap via simulation-based inference and control
PDF
A framework for high‐resolution, high‐fidelity, inexpensive facial scanning
PDF
Performance prediction, state estimation and production optimization of a landfill
PDF
Data-driven 3D hair digitization
PDF
Novel techniques for analysis and control of traffic flow in urban traffic networks
PDF
Distributed adaptive control with application to heating, ventilation and air-conditioning systems
Asset Metadata
Creator
LeGendre, Chloe Anne
(author)
Core Title
Compositing real and virtual objects with realistic, color-accurate illumination
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
08/05/2019
Defense Date
06/07/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
compositing,computational illumination,computer graphics,lighting estimation,lighting reproduction,material appearance capture,multispectral illumination,multispectral imaging,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Debevec, Paul (
committee chair
), Barbic, Jernej (
committee member
), Fink, Michael (
committee member
)
Creator Email
chloelle@gmail.com,legendre@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-205304
Unique identifier
UC11663204
Identifier
etd-LeGendreCh-7736.pdf (filename),usctheses-c89-205304 (legacy record id)
Legacy Identifier
etd-LeGendreCh-7736.pdf
Dmrecord
205304
Document Type
Dissertation
Format
application/pdf (imt)
Rights
LeGendre, Chloe Anne
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
compositing
computational illumination
computer graphics
lighting estimation
lighting reproduction
material appearance capture
multispectral illumination
multispectral imaging