Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Democratizing optical biopsy with diffuse optical imaging
(USC Thesis Other)
Democratizing optical biopsy with diffuse optical imaging
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
DEMOCRATIZING OPTICAL BIOPSY
WITH DIFFUSE OPTICAL
IMAGING
by
Shanshan Cai
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of
the Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Biomedical Engineering)
August 2024
Copyright 2024 Shanshan Cai
ii
Acknowledgements
My academic journey took an unexpected turn in the summer of 2017, thanks to a fortuitous encounter
with a job posting at the Translational Imaging Center. As a master student majored in electrical
engineering, I found my horizons broadening as I engaged with the advanced work at the Center. Here,
my foray into physics theory, hardware implementation, and digital image processing—integral to my
research on diffuse optical imaging for skin analysis—fostered a burgeoning passion and propelled me
toward a Ph.D. in biomedical engineering. For this, my profound gratitude goes to the mentors,
colleagues, and friends whose support and wisdom have been instrumental to my growth. I dedicate this
section of my thesis to them, with heartfelt thanks for a journey that has been as rewarding as it has been
fulfilling.
I extend my deepest appreciation to Dr. Scott Fraser, my Ph.D. advisor, for his warm welcome into the
Fraser Lab. His insights and feedback have been pivotal in shaping my growth as a researcher. Dr. Fraser's
willingness to share his knowledge on a breadth of topics, from research projects to paper writing and
beyond, has been integral to my development. His commitment to creating a nurturing learning
environment has enriched my Ph.D. experience, and the Translational Imaging Center has provided a rich
soil for my intellectual growth, surrounded by experts in various scientific fields who have fostered my
curiosity and depth of research. Only with the diversity, liberty and trust Dr. Fraser has placed in me, I
could delve into innovative research in diffuse optical imaging—a field not previously emphasized in the
lab. Dr. Fraser stands as a paragon in my eyes, embodying the ideals of what it means to be a distinguished
researcher.
Dr. Francesco Cutrale, my Ph.D. co-advisor, merits special acknowledgment for his invaluable mentorship.
He has guided me through a spectrum of professional aspects of our projects, from the intricacies of
technical execution to strategizing for commercialization. My gratitude extends particularly for his
patience and unwavering commitment to enhancing my work's presentation, a dimension I had not fully
appreciated before his influence. His inspiration has been pivotal, fostering my creative thinking and
optimizing my efforts. Dr. Cutrale's advice in developing a distinct skill set has been a catalyst for growth,
pushing me to explore new avenues and to confidently step beyond the realms of my comfort zone. His
mentorship has been vital in my PhD journey, both personally and professionally, and he has undoubtedly
become a lifelong role model and friend.
In the supportive atmosphere of the Fraser Lab, I have benefited from the wisdom and friendship of
esteemed colleagues. My thanks go to Dr. Le Trinh, Dr. Masahiro Kitano, and Dr. Thai Truong, whose
diverse insights have significantly enriched my research. Alongside them, my lab mates—Daniel, Pu, Peter,
Rose, Valerie, Wen, and Yangyang—have been companions in a shared journey of growth, offering
insights, engaging in meaningful discussions, and providing mutual support. The friendships formed here
are among the treasures of my Ph.D. experience. Additionally, the financial support from the USC Alfred
E. Mann Institute has been indispensable. I am especially thankful to Winn Hong and John Mai for their
substantial support and the resources that have been instrumental to my work.
In closing, I must express my boundless gratitude to my family. To my parents, who have always been my
pillars of strength, their unconditional support provides me with the strength to make every hard decision
in my life. To my husband, Jinze, for his unwavering emotional support — he is my most trustworthy
comrade who faces every challenge in the world with me, side by side. To my daughter, Celia, whose
radiant smiles, and infectious joy provide daily encouragement and remind me of the wonder and
iii
simplicity of life; and to myself, for the dedication and perseverance that have been the backbone of this
journey. This experience has reinforced my belief that with persistent resolve and the support of those
around me, I can tackle any challenges that lie ahead.
iv
Table of Contents
Acknowledgements ...............................................................................................................................ii
List of Figures....................................................................................................................................... vii
List of Tables......................................................................................................................................... ix
Abstract.................................................................................................................................................. x
Chapter 1 Introduction ...................................................................................................................1
1.1 Skin analysis (melanoma) and optical biopsy .........................................................................1
1.1.1 Skin cancer and standard diagnosis .....................................................................................1
1.1.2 Common techniques for optical biopsies.............................................................................3
1.1.3 Surface reconstruction versus volumetric reconstruction ...................................................4
1.2 Light propagation in tissue .....................................................................................................6
1.2.1 Tissue optical properties......................................................................................................8
1.2.2 Diffuse Optical Imaging ......................................................................................................10
1.2.3 Light transport models.......................................................................................................12
1.2.4 Tissue photon migration ....................................................................................................18
1.3 Machine learning methods in medical imaging....................................................................20
1.3.1 Machine learning in skin diagnosis.....................................................................................21
1.3.2 Machine learning to improve DOI reconstruction .............................................................22
1.4 Motivation: Low-cost 3D skin optical biopsy ........................................................................24
Chapter 2 Rapid diffused optical imaging for accurate 3D estimation
of subcutaneous tissue features..........................................................................................................26
2.1 Introduction ..........................................................................................................................26
2.2 Low-cost multisite acquisition platform................................................................................27
2.2.1 Structural illumination in diffuse optical imaging ..............................................................28
2.2.2 Experimental Image Acquisition.........................................................................................29
2.3 Hybrid mathematical model .................................................................................................31
2.3.1 Modified photon migration model based on low-cost multisite acquisition platform......32
2.3.2 Mathematical derivation of hybrid 3D reconstruction algorithm......................................35
2.4 3D-mDOI Reconstruction Pipeline.........................................................................................37
2.4.1 Overview of 3D-mDOI reconstruction................................................................................39
2.4.2 Oversampling 3D-photon distributions: enable better reconstruction in low-cost system
...........................................................................................................................................................41
2.4.3 System computation and calibration .................................................................................43
2.5 Performance of 3D-mDOI .....................................................................................................48
2.5.1 Generation of testing samples...........................................................................................49
2.5.2 Validation with simulated tissue phantoms.......................................................................52
2.5.3 Validation with physical phantoms....................................................................................61
2.5.4 3D-mDOI Computational Efficiency....................................................................................73
2.6 Limitation and future work for 3D-mDOI..............................................................................75
2.7 Summary...............................................................................................................................78
v
Chapter 3 Multisite diffused optical imaging network (mDOI-Net)
for Sub-surface 3D Imaging of Tissues.......................................................................82
3.1 Introduction ..........................................................................................................................82
3.2 Inspirations from Neural Volumetric Rendering ...................................................................86
3.2.1 Introduction of neural rendering techniques.....................................................................86
3.2.2 Adaptation to 3D Reconstruction of DOI ...........................................................................89
3.3 Synthetic lesion datasets ......................................................................................................91
3.3.1 From 2D to 3D: 3D synthetic lesion phantom generation..................................................93
3.3.2 From 3D to 2D: 2D reflectance images estimation ..........................................................108
3.4 Multisite diffused optical imaging network (mDOI-Net).....................................................113
3.4.1 Limitations of 3D-mDOI and motivation for a novel approach ........................................115
3.4.2 Reconstruction module: an architecture for solving inverse problems...........................116
3.4.3 Projection module: an architecture for solving the forward problem.............................119
3.4.4 The challenge of dermatology 3D clinical data ................................................................121
3.4.5 mDOI-Net: a two-pronged model for training with limited 3D ground truth ..................122
3.5 Implementation of mDOI-Net Components........................................................................126
3.5.1 2D Light Properties Extraction Net in Reconstruction module ........................................127
3.5.2 3D Correction Net in Reconstruction module ..................................................................131
3.5.3 Gated Recurrent Unit Aggregation in Reconstruction module ........................................134
3.5.4 2D Rendering Net in Projection module ..........................................................................137
3.5.5 mDOI-Net for Domain Adaptation ...................................................................................139
3.6 Performance of mDOI-Net Components.............................................................................143
3.6.1 Dataset generation...........................................................................................................144
3.6.2 Reconstruction module....................................................................................................151
3.6.3 Projection module............................................................................................................158
3.6.4 Domain adaptation ..........................................................................................................165
3.7 mDOI-Net improvements over 3D-mDOI and standard ML ................................................170
3.7.1 Evaluation for reconstruction performance.....................................................................171
3.7.2 Specialized adoption scenarios among various methods.................................................176
3.8 Limitations and Future Work for mDOI-Net........................................................................177
3.9 Summary.............................................................................................................................180
Chapter 4 Conclusions................................................................................................................182
4.1 Summary of current mDOI findings ....................................................................................182
4.2 Advancing clinical practice with mDOI ...............................................................................184
4.3 Transitioning current mDOI to clinical devices....................................................................186
4.3.1 mDOI-Net: from simulation to clinical data .....................................................................186
4.3.2 Imaging System: From Prototype to Clinical Device.........................................................192
4.4 Summary.............................................................................................................................196
References.........................................................................................................................................198
Appendices ........................................................................................................................................210
Quantitative measurement for the synthetic phantoms...............................................................210
Network architectures and parameter summary ..........................................................................216
Reconstruction module: U-Net .................................................................................................216
Reconstruction module: Encoder-decoder ...............................................................................216
Reconstruction module: mDOI-Net...........................................................................................217
vi
Projection Module: Encoder-decoder.......................................................................................219
Projection Module: mDOI-Net ..................................................................................................220
GAN Module: 2D Discriminator.................................................................................................220
GAN Module: 3D Discriminator.................................................................................................221
vii
List of Figures
Figure 1.1 Imaris rendering Drosophila Egg Chamber
showing the difference between 3D surface vs volume reconstruction. .............................6
Figure 1.2 A schematic representation of light propagation in biological tissue...................................7
Figure 1.3 Wavelength-dependent absorption coefficients of skin chromophores
and the diagnostic potential of red light in melanoma detection. .....................................10
Figure 1.4 Theoretical overview for Semi-infinite Slab RTE model......................................................15
Figure 1.5 Theoretical overview for Stochastic Monte Carlo model....................................................19
Figure 2.1 3D-mDOI Acquisition Approach. .........................................................................................31
Figure 2.2 Sub cutaneous imaging with 3D-mDOI: approach overview..............................................39
Figure 2.3 3D printed mold design and customized PDMS phantom assembly. .................................52
Figure 2.4 The effect of photon counts and projected pattern size
on the quality of the reconstructed volumes from 3D-mDOI
and FEM simulations, using a numerical phantom.............................................................54
Figure 2.5 3D-mDOI captures information at various depths in simulated phantoms........................57
Figure 2.6 3D-mDOI reconstructs distinguishable sub-surface features
in the physical phantom. ....................................................................................................64
Figure 2.7 Depth-Dependent Variance in 3D-mDOI recovered Optical Parameters............................65
Figure 2.8 Estimation of the system imaging depth using the fitting error
from the Radiation transfer equation (RTE). ......................................................................68
Figure 2.9 Estimation of the system imaging depth Reconstruction performance
with variable depth constraints..........................................................................................70
Figure 2.10 Depth estimation using 3D-mDOI
and the significance of optimal Region of Analysis Selection. ..........................................72
Figure 2.11 Comparative Computational Efficiency – memory usage
and time - of 3D-mDOI vs. FEM.........................................................................................74
Figure 3.1 Exploring 3D structures via 2D imaging with Deep Learning:
indirect reconstruction pipeline sharing in Computer Graphics
and Diffuse Optical Imaging (DOI) research........................................................................91
Figure 3.2 Overview of synthetic lesion dataset generation from 2D dermoscopic image.................93
Figure 3.3 ISIC archive samples with corresponding biopsy-verified Breslow depth ..........................97
Figure 3.4 Increasing diversity of 3D Morphological Modeling of Melanomas...................................99
Figure 3.5 Increasing diversity of 3D optical parameter assignment.................................................102
Figure 3.6 Comparison of Synthetic Derm Cone series
Standard, constrained (-), and Enhanced (+) ....................................................................106
Figure 3.7 Synthetic and physical uniform cylinder lesion phantom.................................................107
Figure 3.8 Visualization of 2D reflectance patterns
generated with Monte Carlo across different computational approaches ......................111
Figure 3.9 Schematic representation of the mDOI-Net framework,
an advanced Deep Learning model for Diffuse Optical Imaging (DOI). ............................114
Figure 3.10 Reconstruction evolution: from 3D-mDOI to Reconstruction module ...........................118
Figure 3.11 Workflow of the Projection module in mDOI-Net ..........................................................119
Figure 3.12 Unsupervised learning framework of mDOI-Net utilizing GAN architecture ..................124
Figure 3.13 Semi supervised learning framework of mDOI-Net
utilizing GAN architecture...............................................................................................125
Figure 3.14 Comparison of two optical property extraction methods:
viii
the 2D Light Property Extraction Net (2D LPE Net) and 2D Nonlinear Fitting................131
Figure 3.15 Histogram of lesion volume and thickness across synthetic datasets............................146
Figure 3.16 Statistical analysis of lesion absorption coefficients (µa)
across four synthetic datasets ........................................................................................147
Figure 3.17 Visual comparison of reflectance images
under various illumination in different synthetic lesion datasets...................................149
Figure 3.18 Comparative analysis of synthetic and physical reflectance data
in cylinder datasets.........................................................................................................151
Figure 3.19 Visual Comparison of reconstruction output for various approaches............................155
Figure 3.20 Relationship between dynamic range of absorption coefficient µa
in the synthetic dataset and Dice coefficient scores
for various reconstruction approaches...........................................................................157
Figure 3.21: Visual comparison of Projection Module
and Encoder-decoder in estimating 2D reflectance images ..........................................161
Figure 3.22 Effect of illumination location on normalized L1 Loss
for two network architectures........................................................................................164
Figure 3.23 Comparative analysis of domain adaptation
in Synthetic Derm Cone Datasets:
ground Truth versus computed reconstructions ............................................................168
Figure 3.24 Visual Comparison of Domain Adaptation Outcomes
in Reconstruction module and mDOI-Net.......................................................................169
Figure 4.1 Diverse Age-Standardized Incidence Rate of Melanoma of Skin
per 100,000 in Different Regions......................................................................................189
ix
List of Tables
Table 2.1 Estimated optical parameters for two phantoms. ...............................................................51
Table 2.2 Comprehensive quantitative analysis of synthetic phantom reconstructions.....................60
Table 3.1 Overview of 3D synthetic lesion phantom series for enhanced generalization.................105
Table 3.2 Comparison of computational times and total reflectance ratios
across different computational approaches.......................................................................111
Table 3.3 Comparison of L1 Reference loss and computational times
for different 2D light properties extraction methods.........................................................129
Table 3.4 Performance comparison of different network architectures
for the 3D Correction Net...................................................................................................133
Table 3.5 Performance and processing time comparison
between standard and GRU-enhanced Reconstruction modules ......................................136
Table 3.6 Comparison of L1 Loss before and after average pooling
across different network architectures..............................................................................138
Table 3.7 Performance Evaluation of mDOI-Net Across Different Training Paradigms
for Domain Adaptation.......................................................................................................142
Table 3.8 Performance comparison of 3D reconstruction from various approaches........................153
Table 3.11 Test performance and processing times
for Encoder-decoder and Projection module ..................................................................159
Table 3.10 Comparative L1 Loss for test dataset
between Encoder-decoder and Projection module across four test cases......................162
Table 3.12 Comparison of the Reconstruction Module
and Teacher-student mDOI-Net's performance
across various cases of domain adaptations....................................................................166
Table 3.13 Performance and processing time comparison of different approaches
for 3D reconstruction.......................................................................................................172
x
Abstract
Conventional light imaging in living tissues is limited to depths under 100um by the
significant tissue scattering. Consequently, few commercial imaging devices can image tissue
lesions beneath the surface, or measure their invasion depth, critical in dermatology. We
present 3D-Multisite Diffused Optical Imaging (3D-mDOI) a novel approach that combines
photon migration techniques from diffuse optical tomography, with automated controls and
image analysis techniques for estimating lesion's depth via its optical coefficients. 3D-mDOl is a
non-invasive, low-cost, fast and contact-free instrument capable of estimating subcutaneous
tissue structures volumes through multisite-acquisition of reemitted light diffusion on the
sample surface. It offers rapid estimation of Breslow depth, essential for staging melanoma.
Building upon 3D-mDOI, we designed an improved machine learning solution called
multiplexed Diffused Optical Imaging Network (mDOI-Net) to estimate the depth of tissue
lesions. mDOI-Net has an interpretable structure and provides depth information on tissue
lesion depth in diverse circumstances. Its training is performed with customized synthetic
dermatology datasets that we generate from publicly available datasets, ensuring data diversity
and adaptability. The network reconstructs the 3D optical properties of the tissue from 2D
diffuse images by introducing physical modeling of steady state diffuse optical imaging to the
network. Our solution is posed to be more flexible, interpretable and predictable than current
end-to-end, black-box neural network benchmarks.
1
Chapter 1
Introduction
1.1 Skin analysis (melanoma) and optical biopsy
1.1.1 Skin cancer and standard diagnosis
Melanoma is a type of skin cancer caused by pigment-producing cells. It has been a rising
concern in the United States, with more than one million Americans suffering from the disease.
American Cancer Society [1] reports that in 2022, an estimated 99,780 Americans was
diagnosed with invasive melanoma, and approximately 7,650 died from this condition. The
overall prevalence of melanoma exhibits an upward trend. The incidence of melanoma in the
United States doubled between 1988 and 2019 [2], and the number of melanoma cases globally
is projected to climb to an estimated number of 510 000 new cases (a roughly 50% increase)
and to 96 000 deaths (a 68% increase) globally [3].
Tumor depth is an essential diagnostic and staging parameter for this cancer because it
significantly influences the prognosis and treatment strategies. Melanoma is typically classified
into five stages by Clark level [4, 5] based on the extent of disease progression and the
involvement of surrounding tissues. The initial three stages of melanoma primarily involve the
superficial layers of the skin. In situ melanoma is confined to the epidermis (the outermost
layer) and has a depth of less than 1mm. Stage II melanoma extends into the dermis (the
second layer of skin), with a depth ranging from 1mm to 4mm. Early detection and appropriate
treatment of melanoma at these stages yield a high 5-year relative survival rate of more than
98.4% [6]. However, as the tumor penetrates deeper into the skin, the prognosis becomes less
favorable. The 5-year survival rate drops to 63.6% when melanoma reaches a depth of around
2
4mm and starts to involve the nearby lymph nodes. In the most advanced stage, metastatic
melanoma, where the tumor depth exceeds 4mm, and the cancer can spread to the deeper
locations in the skin and even to the internal organs, the result is a substantially reduced
survival rate of 22.5%.
In clinical settings, Breslow Depth serves as the more standardized classification, which
measures the invasive depth of melanoma within the skin. This metric, expressed in
millimeters, quantifies the vertical distance between the topmost point of the tumor (the
granular layer of the epidermis) and the deepest point of tumor invasion. A clinical
measurement of Breslow Depth typically involves imaging a biopsy sample using an optical
micrometer attached to the eyepiece of a standard microscope [7]. Complete lesion excision
through excisional biopsy is highly recommended [8], as obtaining sufficient biopsy specimens
is crucial for accurately determining the Breslow Depth. This assessment plays a vital role in
identifying estimating the patient's prognosis and the most appropriate treatment strategy.
Despite the importance of Breslow depth, the assessment of melanoma thickness is not
included in primary checking of the dermatological examinations. Current dermatology exams
often rely on the clinician’s ability to detect and recognize abnormal patterns based on white
light reflectance from the skin surface. Dermatologists continue to use the ABCDE principles
(Asymmetry, Border, Color, Diameter and Evolution) to examine worrisome moles [9].
Melanoma is characterized by being multi-colored, asymmetrical, and typically larger than 6mm
in diameter with irregular borders. It differs from stable benign moles as it tends to change in
size, shape, or color over time. Dermoscopy is a widely used screening technique that aids in
evaluating these characteristics of moles. It involves magnifying the moles and capturing
3
photographs of the surface structure, allowing for a more detailed examination of the features
associated with melanoma. Subsequently, the suspicious area is excised through a biopsy,
followed by an ex-vivo histology and a pathologist's interpretation [10]. This pipeline is
reasonable, as a comprehensive biopsy may not always be necessary for benign moles,
especially in cosmetically or functionally sensitive anatomical locations [11].
However, recent studies report that a significant number of skin biopsies, ranging from 23%
to 44.5% [12, 13], are either unnecessary or are diagnosed as benign. The effectiveness of
invasive biopsies is currently under discussion also in other medical screening practices, such as
for Barrett’s esophagus, ductal carcinoma of the breast, and for thyroid cancers [14].
Therefore, there is a need for improved depth information during the diagnostic step before
resorting to biopsy.
1.1.2 Common techniques for optical biopsies
Several clinical approaches exist to conduct optical biopsies, enabling the assessment of 3D
information within tissues in vivo. These methods can also estimate the depth of skin lesions to
improve diagnostic accuracy. For current dermatologic imaging, the state-of-the-art techniques
use reflectance-, fluorescence-confocal microscopy and optical coherence microscopy (OCT)
[15-17]. These approaches can improve diagnosis and reduce unnecessary biopsies [18-20].
However, these technologies have not been widely adopted as a standard for diagnosis owing
to limitations imposed by capital equipment cost, the need for specialized equipment and
training, and extended image acquisition times. For example, the imaging pipeline with
reflectance microscopy imaging plus dermoscopy is reported to reduce more than 70% of
benign lesions biopsied. The cost for the reflectance microscopy is about $60k-$120k, while
4
training a qualified imaging expert cost another $20k-$30k. The imaging time is about is
approximately 7 to 10 minutes, setting barrier to the wide spread of the imaging method.
Fluorescence Microscopy has been used predominantly in experimental studies with promising
results in lesioned skin, but its requirement of a fluorescent agent prevents its prevalence [21].
OCT is already routinely established for diagnosis of nonmelanoma skin cancer, whereas for
pigmented lesions, the resolution is still not high enough [22]. At the most commonly used
imaging wavelength (1,300 nm), melanin exhibits low scattering properties, posing a challenge
in the identification of melanocytes in the OCT images, and even more in the detection of
malignancies. Therefore, the adoption of these high-performance imaging systems for
everyday clinical use remains problematic [19, 23]. The accuracy of these optical biopsy
approaches could be significantly improved with the development of a method that noninvasively obtains depth information for moles and lesions as they invade tissues.
1.1.3 Surface reconstruction versus volumetric reconstruction
The inherent difficulties in procuring direct 3D imagery of skin lesions have led to innovative
attempts to ascertain relative lesion depth from 2D dermoscopic images. Before diving into 3D
reconstructions, it is important to clearly distinguish different types of 3D reconstructions,
namely 3D surface and 3D volumetric. Surface reconstruction of the skin utilizes computer
vision algorithm to estimate 3D location of skin’s outermost layer, focusing on morphology of
skin lesions. The process estimating the degree of out of defocus in 2D images to infer the
surface location of the lesion in 3D since areas that are out of focus are typically further away
from the lens that captured the image. In contrast, volume reconstruction creates a 3D
representation of the sample that includes both the surface and the layers underneath. This
5
technique is particularly useful in assessing the Breslow depth and the structural complexity of
the skin tissue. Both morphology and Breslow depth of the skin tissue provide critical
information for the diagnosis and treatment of skin conditions.
To be more specific, surface reconstructions like Satheesha et al. [9] obtain 3D lesion
surface reconstruction through estimating the defocus map of the 2D images. First, the spatial
defocus blur at edges is computed by examining gradient ratios between input and Gaussian
kernel re-blurred images. By extending this blur measure throughout the image, a
comprehensive defocus map is generated. Employing this method facilitates the determination
of relative depth for melanoma skin lesions, where an escalation in the estimated depth
correlates with an increase in Breslow depth. While the 3D features extracted from the
reconstructed lesion surfaces augment the classification accuracy of the system, there remains
a considerable discrepancy between relative estimations and the actual Breslow depth
determined through biopsies.
In this work of thesis, we differentiate from surface reconstruction approaches [9, 24, 25]
and venture deeper into estimating the 3D volumetric properties of a lesion's optical
characteristics (Figure 1.1). Our approach harnesses not just the 2D dermoscopic images but
also leverages the reemitted data of photon migration via structured illumination. As
highlighted in Chapter 2, our strategy presents an algorithmic solution for the 3D volumetric
mapping of a lesion's optical attributes. Specifically, we employ near-infrared light combined
with the photon migration model to reconstruct a 3D representation of a tissue's optical
properties, providing the insight in concentrations of various chromophores. The method's
prime benefits include swift automated data capture in a cost-effective imaging setup, and a
6
remarkable penetration depth of up to 5 mm. For enhanced adaptability and computational
efficiency, Chapter 3 introduces a Deep Learning alternative to address the same challenge.
Figure 1.1 Imaris rendering Drosophila Egg Chamber showing the difference between 3D surface vs
volume reconstruction.
We present a side-by-side visual comparison of surface and volumetric reconstructions with the example
of a Drosophila egg chamber. 'Surface Reconstruction' at the top maps the object's external contours and
volumetric occupancy in 3D, highlighting the egg chamber's morphology with a purple surface. In contrast,
'Volumetric Reconstruction' at the bottom explores internal structures, offering an inside view that
contributes to a full understanding of the object's 3D composition. This is achieved through volume
rendering with three distinct channels: Channel 1 in Alexa Fluor 568 (red), Channel 2 in DAPI (blue), and
Channel 3 in Alexa Fluor 488 (green), all visualized using Maximum Intensity Projection. Such volume
rendering elucidates the complex interior and spatial interrelations within the egg chamber. In our thesis
work, we go beyond surface reconstruction, aiming to accurately estimate the 3D volumetric attributes
and the optical properties of lesions.
1.2 Light propagation in tissue
The interaction between light and tissue is a complex probabilistic phenomenon, governed
by the tissue's optical properties, specifically the absorption and scattering coefficients. When
light is incident on the tissue sample, a fraction of the light undergoes specular reflection while
7
the remaining portion experiences refraction and penetration into the tissue (Figure 1.2).
During interactions with tissue components, incident light loses energy, or becomes absorbed.
Meanwhile, molecular collisions alter the direction of light photons, resulting in light scattering.
After multiple absorption and scattering occurrences, a portion of the light travels to the other
side of the tissue sample as transmitted light. Some light travels back to the tissue surface,
generating reemitted light. The optical properties and the tissue composition are encoded
within the transmitted and reemitted light, both of which can be measured outside of the
tissue sample by detectors.
Figure 1.2 A schematic representation of light propagation in biological tissue.
Incident light undergoes specular reflection, and the rest refracts and travels into the tissue. As photons
interact with the tissue, they experience various absorption and scattering events. While some photons
are fully absorbed (blue arrow), others reach the boundary between the tissue and air, creating remitted
(green arrow), scattered (yellow arrow), and transmitted light (brown arrow). Different tissues have
unique compositions and structures, which include varying amounts of proteins, melanin, and water. This
leads to distinctive optical properties, absorption, and scattering coefficients for each tissue type. The
path of each photon through the tissue is characterized by a complex random walk dictated by the tissue's
optical properties. The remitted light that emerges from the tissue carries crucial information about these
interactions, effectively encoding the tissue's optical properties. By analyzing the remitted light, it is
possible to gain valuable insights into the tissue composition and the underlying photon-tissue
interactions.
8
In-vivo measurement of reemitted light has been employed for reconstructing optical
properties in optical tomography. Diffuse optical imaging (DOI) [26-29] is a non-invasive
imaging technique that utilizes the diffusion of near infrared light and photon migration to
reconstruct the optical properties of a tissue. DOI enables tomographic (3D), noninvasive
reconstructions of optical tissue properties, even the concentrations of different chromosomes.
In DOI, the diffuse light reemitted from the illuminated tissue is normally captured by imaging
system. Diagnostic applications rely on mathematical models to further interpret the detected
reemitted light. Some promising applications have explored the use of DOI for imaging and
diagnosis of pathologies in the breasts [30-32] and the brain [33-35], hence leading to the
identification of specific optical parameters, or optical signatures, for cancerous tissues [31, 32,
36].
1.2.1 Tissue optical properties
Tissue optical properties refer to the physical and physiological characteristics of biological
tissues that influence the behavior of light as it penetrates through the tissue. These properties
determine the amount of light absorbed, scattered, and transmitted through the tissue. There
are three properties [37, 38] of the tissue which determine the diffusion rate of photons within
it, namely the absorption coefficient �!, the scattering coefficient �" and the anisotropy
parameter �. Absorption coefficient measures the number of absorption events per unit
length, while scattering coefficient describes the probability of photon to scatter after
proceeding a certain distance. For the scattering event, both the phase and the travelling
distance matter. Anisotropy parameter quantifies the directional preference of light scattering
in biological tissues. Refractive index(n) is another term that guides the reflection and
9
refraction effects at the boundary of air-tissue surface. It’s the ratio of the light speed in a
vacuum to that in a medium and is assumed to be constant within the medium in most of
applications. These tissue optical properties play a critical role in various optical imaging
techniques, including diffuse optical imaging, near-infrared spectroscopy, and optical coherence
tomography.
Due to the inhomogeneity and microstructure of the tissue, the majority of light that
penetrates into the tissue is scattered, rather than absorbed or transmitted. In other words,
scattering coefficient of tissue is generally larger than absorption coefficient. However, the
absorption coefficient is the one containing more interesting information. Different tissue
components like hemoglobin, water, and lipids have their unique absorption coefficients
spectra. Starting from the measured absorption and scattering coefficients, it is possible to
derive the concentrations of tissues' main chromophores.
Research has shown that melanoma cells can produce large amounts of melanin, leading to
heavily pigmented tissue [39, 40]. When compared to other chromophores in the tissue,
melanin has been discovered to have a comparatively high absorption coefficient under red
light (Figure 1.3). As a result, melanin-rich tissue absorbs red light more easily, making it a
potentially helpful tool for recognizing and distinguishing melanoma from healthy tissue.
10
Figure 1.3 Wavelength-dependent absorption coefficients of skin chromophores and the diagnostic
potential of red light in melanoma detection.
The figure illustrates the variation in the absorption coefficients of different skin chromophores as the
wavelength of incident light changes. Hemoglobin (Hb) and its oxygenated form (HbO2), alongside water,
are key chromophores within tissue. Compared to these, melanin exhibits a notably high absorption
coefficient at the red-light wavelength (650nm). Collagen and proteins present low absorption under red
light, ensuring that such wavelengths do not adversely affect these skin components, thereby maintaining
the safety of light-based measurements. Given that melanoma cells produce significant amounts of
melanin, resulting in highly pigmented tissue, melanin's absorption properties under red light can be
leveraged to differentiate melanoma from healthy tissue, serving as a useful diagnostic marker.
1.2.2 Diffuse Optical Imaging
Since the early 1990s, Diffuse Optical Imaging (DOI) [26-29] has been investigated as a way
to identify and interpret functional changes in tissue. The method uses near-infrared (NIR) light,
whose comparatively low absorption by hemoglobin, water, and fat at wavelengths between
650 and 1000 nm allows it to penetrate tissue several centimeters deep. The lateral resolution
for general DOI is ranging from 0.5 to 10 �m. One significant advantage of optical imaging is its
penetration depth, which can reach up to 10cm. This allows for the detection of light
11
transmission in deeper tissue with a favorable signal-to-noise ratio, providing valuable
information about the functional status of the tissue being imaged.
The imaging acquisition setup contains two major parts [30-32]. The optical fibers which
inject NIR light are placed on the surface of the imaging medium, while the detecting fibers or
array are positioned slightly away on the same surface. In general, Optical Imaging methods is
non-invasive functional imaging tool with compactness setup.
A working pipeline for applying DOI toward diagnostics follows these steps. The process
initiates with data acquisition and processing using a DOI imaging acquisition setup equipped
with near-infrared light sources and detectors. This equipment emits light into the tissue and
records the diffused light that emerges back to the surface. In the conventional inverse
reconstruction phase of the DOI pipeline, a light-tissue interaction model—designed to
replicate the physical data acquisition—first simulates the diffusive light re-emitted from the
tissue's surface based on predefined optical parameters. This step, known as forward modeling
of reemitted light, is methodically approached using methods based on light transport models
or photon migration. The theoretical results from forward modeling are then compared with
the actual DOI measurements. This comparison sets the stage for updating the values of the
predefined tissue's optical parameters during the inverse reconstruction by employing
optimization methods such as least-squares fitting, regularization and Newton's method. By
repeating the following computations, optical parameters of tissue would converge, precisely
estimating of the tissue composition.
DOI can be categorized into various forms, each measuring different aspects of the diffuse
pattern. Steady-state DOI measures the transmitted and backscattered light resulting from a
12
static light source pointing at a surface [41-43]. However, conventional steady-state diffusion
methods are less favored due to the instrument’s limits-of-detection, longer acquisition times,
and a highly ill-posed inverse model [44, 45]. Frequency-resolved or time-resolved [46-49]
measurements can overcome some of these challenges, providing information-rich models, at
the expense of increased system and computational costs. Spatial imaging in the frequency
domain [50, 51] utilizes a projector to create periodic spatial illumination and then solving the
inverse model to provide 2D reconstructions and depth-sensitive maps of the surface optical
properties of the samples [52, 53].
DOI is limited to 2D acquisition and planar reconstruction as it requires the use of light
transmittance data [26, 54, 55]. 3D-DOI has recently been proposed [56, 57], however, it
requires a priori knowledge of the target objects [58, 59], and the reconstructed volume is
limited in size due to computational inefficiencies [60, 61]. These applications of DOI suggest a
path toward contact-free optical biopsy techniques by leveraging affordable commercial
components. Currently, few DOI-based approaches have been broadly adopted in clinical
practice due to these technical challenges.
1.2.3 Light transport models
In DOI settings, how light propagates through tissue can be modeled by either the light
transport model or tissue photon migration [38, 62, 63]. If light is considered as an
electromagnetic wave, its propagation can be described by the radiative transfer equation,
which accounts for absorption, emission, and scattering of optical energy. This equation can be
solved either analytically, using the Green's function for simple, homogeneous tissue
13
geometries, or numerically for complex geometries, although the latter requires more
computational time.
When light is viewed as a collection of a large number of ballistic photons, its behavior can
be represented as the sum of individual photon movements. Monte Carlo simulations model
the trajectories of photons, providing detailed insight into photon-tissue interactions. Solving
the inverse problems of these interactions through optimization frameworks allows for the
reconstruction of tissue characteristics. Photon migration modeling considers the path of each
photon separately, accommodating the highest level of tissue complexity but at the cost of the
most computational time. The details of photon migration will be discussed in the next section.
1.2.3.1 The analytical model
Radiative Transfer Equation (RTE) stands as one of the central principles in the domain of
Diffuse Optical Imaging, facilitating the depiction of radiation or light's pathway through a
medium, such as biological tissue. This critical equation encompasses the phenomena of
scattering and absorption that light encounters, as well as the exchange of radiative energy that
occurs during these events. These interactions are characterized by three tissue properties that
dictate the photon diffusion rate: the absorption coefficient �!, the scattering coefficient �"
and the anisotropy parameter �. The reflectance ratio � is another important metric,
demarcating the fraction of light reflected against that absorbed or penetrated the medium as
a function of �!, �", and �. Real-world measurement of this ratio entails the illumination of a
test sample — such as a biological tissue — with a light source �", while a detector �#
quantifies the reflected light. By marrying the reflectance data with the RTE or its derived
approximations, we can extract vital parameters like the absorption and scattering coefficients
14
through optimization techniques. Effectively applying RTE pave the way for deeper
understanding of sample’s optical properties, enhancing our ability to study and analyze
biological tissues.
By applying specific geometric and boundary conditions, analytical solutions to the RTE
facilitate more rapid identification of the optical properties of test samples. To further alleviate
the complex nature of the original RTE, a diffuse approximation method [64] is employed, an
approach that assumes isotropic directionality of scattering events. A notable example is
Farrell’s model [41], which illustrates light interaction within a semi-infinite slab medium upon
pencil beam light injection at �". The surface detector labeled �# undertakes the role of
measuring the reflectance, symbolized as �(�), wherein � stands for the distance between
the source of the light and the detector. The complexity of the incident light beam is reduced
through the strategic use of two idealized light sources (Figure 1.4): the photon source and the
image source. The positions of these virtual sources are grounded in the medium's intrinsic
characteristics, specifically its total interaction coefficient, denoted as �$
%
. The location of the
photon source is defined at a depth �& with �& = '
(!
" , while the image source is situated at a
depth −(2�) + �&), where �) represents the extrapolated boundary with zero photon
diffusion. The distance between detector to two virtual sources are computed as �' =
[(�# − �&)* + �*]
#
$) and �* = [(�# + �& + 2�+)* + �*]
#
$.
15
Figure 1.4 Theoretical overview for Semi-infinite Slab RTE model.
Beam of incident light (magenta beam) shines normally to the tissue surface at the location ��. The
diffused reflectance is detected at �� which is distant ρ away from the source position ��. In the radiation
transfer equation (RTE) model, the tissue is considered a semi-infinite slab. There exist two virtual
isotropic light sources (e.g. the Image source, Photon source) that generate an equal photon diffusion
with the tissue, as does the real incident light. The extrapolated boundary is where the light flux is equal
to zero, providing the boundary conditions to the RTE model. The distances between the detector �� and
two virtual light sources (��, ��) are used for fitting the optical properties of tissue.
With the previously outlined assumptions, it becomes possible to model the reflectance
�(�), using a form b exponential decay (Eq. ( 1.1 )). The effective attenuation coefficient
�,-- = [3�!(�! + �"(1 − �))]
'/*, the transport albedo �′ = �"(1 − �)/(�! + �"(1 − �)) ,
and total interaction coefficient �$
′ = �! + �"(1 − �) are expressed in terms of �!, �" and �.
This equation provides an approximate solution for reflectance as a function of distance from
the source, based on the optical properties of the medium and the boundary conditions.
Utilizing a least square fitting approach to align the experimental and theoretical reflectance
16
data compute by Eq. ( 1.1 ) further empowers a robust estimation of medium’s optical
properties.
�(�) = �%
4� 8
1
�$
% 9�,-- +
1
�'
:
�/(%&&0#
�'
* + 9 1
�$
% + 2�+: 9�,-- +
1
�*
:
�/(%&&0$
�*
* < Eq. ( 1.1 )
1.2.3.2 The numerical model
The numerical models [65] also start with radiative transport equation with diffuse equation
approximation in the frequency domain. In the air-tissue boundary, the photon fluence
Φ(�, �) at position � with modulation frequency � can only exit from the tissue, agreeing with
an index-mismatched type III condition. The equations Eq. ( 1.2 ) below describe the light-tissue
interaction.
−∇ ∙ �(�)∇Φ(�, �) + 9�!(�) +
��
�1(�)
: Φ(�, �) = �&(�, �)
���ℎ �������� ��������� Φ(�, �) + 2��Q ∙ �(�)Φ(�, �) = 0
Eq. ( 1.2 )
where the � = 1/3(�! + �"(1 − �)) is diffusion coefficient of the medium, �&(�, �) is an
isotropic source, and �1(�) is the speed of light in the medium. The boundary condition
represents the escaped flux from the medium Φ(�, �) and point � using the fluence rate
weighted by internal reflection. Here, � corresponds to relative refractive index derived from
Fresnel’s law.
Finite element method (FEM) is a numerical technique for solving partial differential
equations. It divides the problem domain into manageable elements and solves the discretized
17
version of the problem by optimization. In DOI, the fluence Φ(�) is then approximated by
piecewise continuous polynomial function and can be expressed as the sparce matrix in finite
element formalism. The tissue sample is also meshed into voxels, which contain absorption and
scattering coefficients for a small section. Then, the whole diffusion equation can be rewritten
into a linear algebraic equation.
The least square based approach is applied to reconstruct the optical properties at each
FEM node shown in Eq. ( 1.3 ). The inversion is achieved by minimizing the values of measured
fluence at the tissue surface Φ2 and the calculated fluence Φ3 based on the numerical model.
The Tikhonov minimization function is given by
�* = min
( WXΦ4
2 − Φ4
3Y
*
52
46'
+ �WX�7 − �&Y
*
8
76'
Eq. ( 1.3 )
where N, M and K are the number light source, detectors, and FEM tissue voxel
respectively. � is the regularization term and �& is the priori term of optical properties. The
inverse problem is then computed by forming the Jacobian matrix between fluence and optical
coefficients.
The inverse problem is highly ill-conditioned and computationally demanding. The total
matrix size is relevant to the number measurements obtained from the imaging device and the
FEM nodes. This approach works perfectly in the model with limited scale, whereas it might
cause memory explosion during computation under a large scale of data acquisition.
18
1.2.4 Tissue photon migration
Tissue Photon Migration (TPM) has also been studied for the past few decades [62, 66-68]
to understand the stochastic interactions that govern diffusion of photons. Based on this
research, multiple groups have expanded Monte Carlo modeling of light transport in tissues
[69-71] for reconstructing the absorption and scattering properties of tissues below the surface
and correlating to tissue health.
Monte Carlo simulations serve as another pivotal tool in demonstrating tissue photon
migration, creating a discretized volume of the tissue, and then solving for photon-tissue
interactions stochastically in each voxel. The model shares the same imaging structure from
RTE but implements a stochastic diffusion of photons in the tissue. This technique calculates
the diffuse reflectance in a complex 3D tissue phantom. In addition, it is more precise than the
diffusion theory approach in situations when close to the source or boundary and when
absorption dominates.
A typical Monte Carlo simulation starts with the step of photon launching into the tissue
[70, 72, 73]. After traveling through a distance computed by random number and local optical
properties, the photon loses part of its energy due to absorption. The remaining part of photon
then changes its travel direction depending on the scattering coefficient. Each launching photon
undergoes absorption, scattering, and directional changes until their energy is depleted (Figure
1.2). The interactions are computed repeatedly to obtain statistically significant 2D reflectance
at the tissue-air boundary. The details like refraction, heat effect and even changes in local
optical properties can be added in the simulation. This type of simulation accounts for the
diversity and structural complexity of depth-related optical diffusion in tissues.
19
Figure 1.5 Theoretical overview for Stochastic Monte Carlo model.
A Stochastic Monte Carlo simulation of the photons interacting with tissue at a distance �� and (B) a
distance �� accounting for locally different optical properties. The colors in the tissue phantom (A and B)
denote the probability of the presence for a photon in each phantom voxel. The 3D tissue sample contains
a subcutaneous blood vessel placed along the y axis (shown in the yellow circle), causing distortions in the
shapes of the 3D photon diffusion maps. The maps for different distances ρ show the typical “banana”
distribution, whose central depth is approximately half of the distance between the light source and the
detector. A longer distance between the source and the detector leads to a deeper sampling depth. The
distribution of photons with the smaller ρ (A) is more concentrated in a shallow region with more signals,
while that with the larger ρ (B) spreads deeper into the tissue model and presents increasingly reduced
quantities of photons.
Although generally more computationally demanding compared to the RTE model, it can
provide a statistical representation of diffuse reflectance for a heterogeneous 3D medium
(Figure 1.5). A noteworthy phenomenon here is the formation of a 3D spatial probability
density function characterized by a 2D cross-section shaped like a half-moon, and in 3D like a
banana as photons travel between a light source and a detector separated by a distance ρ [74].
20
The central depth of this distribution correlates to half the distance ρ (Figure 1.5 B). While
expanding the source-detector interval facilitates deeper and voluminous tissue samplings, it
inversely affects the photon detection rate, diminishing exponentially with growing ρ (Figure
1.5 C).
1.3 Machine learning methods in medical imaging
In the age of Artificial Intelligence revolution, machine learning is a game-changer in clinical
practice. It has incredible potential to analyze medical images and transform the way we
approach healthcare. Machine learning's prowess has been demonstrated in diagnosing diverse
conditions from medical images, with performance comparable to that of medical experts [75,
76]. Among the variety of techniques, Deep Learning has emerged as particularly influential
following its triumph in the field of computer vision.
Deep Learning, which enables the automatically learning of the hidden features from raw
input data during training, vastly outperforms traditional machine learning methods. Recent
advancements in efficient computational infrastructures, such as graphical processing units
(GPUs) and cloud computing systems, have further propelled the applications of Deep Learning.
This includes, but is not limited to, fields like medical image analysis and computer-aided
detection and diagnosis [75]. Deep Learning's successful application spans across various image
processing tasks like classification, segmentation, registration, and image reconstruction. MRI,
CT, PET, and ultrasound imaging systems have improved showcases at image quality and
efficient noise removal when compared to analytical, iterative, and compressed sensing-based
methods [76].
21
Despite the intense research momentum and promising outcomes, the proliferation of
Deep Learning research in medical imaging does not automatically translate to clinical
advancements. Firstly, Deep Learning’s the dependence on vast training datasets, and its blackbox nature question the reliability of deep-learning systems, especially if these systems are
deployed without radiologist oversight [76]. Furthermore, the issues of generalizability and
robustness of Deep Learning techniques have yet to be fully addressed [76].
1.3.1 Machine learning in skin diagnosis
Machine learning techniques have been applied to the field of skin diagnosis to create
computer-aided diagnostic tools that can help with the identification and categorization of skin
diseases [77-79]. These tools can be trained on the datasets of skin images and use algorithms
such as decision trees, random forests, support vector machines, and convolutional neural
networks to classify skin lesions based on features such as color, texture, and shape[78, 80].
The goal of these tools is to improve the accuracy and speed of skin diagnosis and reduce the
burden on healthcare providers.
Machine learning (ML) in medical diagnosis can be divided into shallow ML and Deep
Learning (DL) methods. In recent years, there has been a surge in the use of DL for the
differentiation of melanoma from benign nevus [81]. The diagnostic accuracy of DL methods for
melanoma has been found to be promising, with a mean accuracy of 89.5% (ranging from
59.7% to 100%) [77], a pooled sensitivity of 85%, and a specificity of 86% [82]. The accuracy of
DL methods is comparable to that of dermatologists, which highlights its potential for clinical
use.
22
However, the accuracy and safety of DL technologies in skin cancer detection require
further evaluation. Only a small number (2%) of computer-aided diagnosis studies aimed to
classify lesions into suspicious or non-suspicious categories, which is more reflective of the
diagnostic process performed in primary care [77]. Additionally, the images used for training
the algorithms were primarily from dermatology clinics, which may not be representative of the
general population. Furthermore, few studies considered the interpretability of their results, as
the neural network is often viewed as a black box. Although the networks have high scores in
terms of accuracy, sensitivity, and specificity, few studies reported measures of negative
predictive value. There is also a lack of consideration for the integration of these algorithms
into clinical practice and existing diagnostic pathways such as studying 3D reconstruction of
melanoma [9].
1.3.2 Machine learning to improve DOI reconstruction
The application of neural networks in diffuse optical imaging was first demonstrated by
Farrell et al. [83] in 1992, where an Artificial Neural Network (ANN) was built to retrieve optical
properties from spatially resolved reflectance data. With the development of DL techniques,
neural network has applied to revolutionize every step in DOI workflow by offering faster and
more accurate methods for data acquisition and processing, forward modeling, inverse
reconstruction, and diagnosis. Despite these advancements, the reconstruction process
remains to be one of the most challenging aspects in DOI due to its ill-posed and nonlinearity.
This has sparked the interest of researchers who aim to enhance the reconstruction quality,
decrease analysis time, and increase resistance to noise [84-86].
23
The field of optical tomography has three main methods for integrating Deep Learning
techniques: model-based methods, post-processing methods, and end-to-end methods. The
model-based methods incorporate DNNs to simulate photon migration models and improve the
accuracy of inverse problem. Yoo [87, 88] et al. developed a neural network using an encoderdecoder convolutional structure to solve the Lippmann-Schwinger integral equation for photon
migration. The experiments, conducted on phantoms and tumor-bearing mice, demonstrated
that the trained network was capable of accurately reconstructing anomalies without the need
for additional iterative processes or linear approximations.
The post-processing approach focuses on refining the reconstructions obtained through
conventional iterative processes. This method utilizes a network to clarify any distortions and
improve the quality of the reconstruction while maintaining the inherent interpretable nature
of photon propagation.
The end-to-end approach in optical tomography employs Deep Learning techniques to
produce direct reconstructions from measured data. This method eliminates the need for any
mathematical models, leading to faster computations in the testing phase. As an example, Zuo
[89] used an autoencoder architecture to reverse solve the spatial distribution and absorption
values of targets in both phantom and clinically acquired data for breast cancer applications.
The architecture consisted of two sections; the first section reconstructed the optical properties
from DOT measurements while the second section regenerated the flux data to calculate the
perturbation. The network's accuracy was boosted by including physical restrictions like
ultrasound-aided anatomical distribution and Born constraint in the fine-tuning process. This
resulted in an improvement in the distinction between malignant and benign breast lesions, as
24
well as a better representation of lesion depth distributions. In conclusion, the integration of
ML and DL techniques in the DOI workflow has the potential to significantly improve the
accuracy and efficiency of the reconstruction process, making DOI a more accessible and
powerful imaging technique.
1.4 Motivation: Low-cost 3D skin optical biopsy
According to Chapter 1.1.2 , the medical devices like optical biopsies can help in the
treatment of melanoma by providing non-invasive, real-time depth information about the
tissue being imaged. Medical professionals can benefit by this information to make more
informed decisions about the diagnosis and treatment of melanoma. Moreover, by measuring
changes in the 3D optical properties of the tissue over time, medical professionals can monitor
the progress of treatment and determine when it may be necessary to adjust the treatment
plan.
Currently, there are only a few low-cost, commercial imaging devices that provide noncontact depth measurements of tissue lesions for pathological diagnosis. Generally, these
solutions are limited to imaging only the superficial features of the tissue and cannot identify
subcutaneous nodules or estimate their invasive depth. There is a need for the approach that is
focused on assessing 3D insights based on subsurface composition differences with a versatile,
non-invasive and cost-effective hardware setup.
To effectively accessing the 3D insight of the melanoma, certain requirements must be met
in the design of the 3D skin optical biopsy. The spatial resolution should be less than 1mm to
accurately detect the minimum size of melanoma. The axial resolution should be similarly less
25
than 1mm with an imaging depth of over 5mm to accommodate the Breslow depth of
melanoma. Additionally, factors such as lower imaging cost, quicker acquisition time, and a
more user-friendly experience must also be considered in the design.
26
Chapter 2 Rapid diffused optical imaging for accurate 3D
estimation of subcutaneous tissue features
2.1 Introduction
Current dermatology exams rely on surface observation and biopsies are often necessary
for diagnosing malignant skin lesions. However, a significant number of biopsies are
unnecessary or diagnosed as benign. The availability and the cost of 3D imaging systems
remains a driving problem, which brings us to the need and urgency of a new way to
democratize optical biopsy. Here, we introduce 3D-multisite Diffused Optical Imaging (3DmDOI), an approach that estimates volumetric maps of scattering and absorption properties of
tissues, as a potential solution for non-invasive, contactless, 3D optical biopsy.
Our work innovates on two main aspects: a low-cost multisite acquisition platform, an
innovative mathematical model and a sampling strategy that leverages multisite measurements
to enhance the quality of 3D reconstructions. Building on digital skin rendering techniques (57-
59), we developed a contact-free imaging platform that performs multisite detection by
acquiring diffusion reflectance data from a large number of locations on a sample’s surface at
once. The platform prototype utilizes cost-effective consumer-grade digital projectors and
CMOS camera detection, maintaining a sub-$3,000 budget.
A robust hybrid mathematical model for 3D reconstruction, adapted to the imaging system,
combines analytical and stochastic methods to analyze diffusely reemitted light from samples.
This model dramatically simplifies the computation of the original volume from 2D images,
which is a known ill-posed inverse problem. The synergistic combination of imaging platform
and mathematical model enhances the efficacy of 3D reconstruction. Our results show that 3D-
27
mDOI has over a tenfold increase in computational efficiency compared to traditional
approaches such as finite element method, in estimating 3D simulated and physical phantom
data.
3D-mDOI is well posed to facilitate contact-free investigation of sub-surface tissue volumes,
offering detection and quantitative depth estimation of areas exhibiting different optical
parameters. In clinical practice, these sub-surface volumetric changes are known to provide
valuable insights for patients’ assessment, making this non-invasive and cost-effective
technology translatable to a variety of clinical applications. For example, in dermatological
screening subcutaneous changes are known to be related to the depth of moles and lesions of
interest (10, 45, 60). This approach, based on assessing 3D insights based on subsurface
composition differences, results in a versatile, non-invasive and cost-effective technology
translatable to a variety of clinical applications.
2.2 Low-cost multisite acquisition platform
Traditional diffuse optical imaging (DOI) systems are known for their precision and are
commonly employed in laboratory settings to determine the optical characteristics of biological
tissues. These systems typically use a finite array of laser diodes for illumination and
photomultiplier tubes (PMTs) for their high sensitivity in detecting photon fluence [90].
Despite their utility, the integration of DOI into regular clinical practice faces obstacles. First,
the cost of DOI systems is high, stemming from the high precision nature of the light sources
and detection components. Second, the DOI systems are designed with a limited field of view
and necessitate manual adjustments when imaging larger sample areas. The imaging tasks
require the expertise of specialists and result in prolonged times for data capture. Third, the
28
system directly contacts with the tissue to minimize the loss of signal and ensure accurate
measurements of light scattering and absorption. This requirement for tissue contact can lead
to a diminished level of comfort for users during the imaging process.
To address these limitations, we propose the acquisition setup, named Optical Properties
Tissue Imaging Multisite Acquisition Platform (OPTIMAP), specifically designed for non-contact
and cost-efficient acquisition of the reemitted light process. The platform includes a Digital
Micromirror Device (DMD) as a light projector, and a near infrared-enhanced CMOS camera as
a detector array. The design positions the light source and the camera, each with an adjustable
field of view, vertically 30 cm from the sample. This configuration facilitates automated, noncontact data collection from samples of varying dimensions. By utilizing consumer-grade
components for our imaging prototype, the cost is dramatically reduced from the million-dollar
range to just thousands, making it a much more affordable alternative to traditional DOI
systems.
2.2.1 Structural illumination in diffuse optical imaging
While reemitted diffused light holds valuable insights into a tissue's intrinsic properties, the
randomness of light-tissue interaction makes the reconstruction of optical properties through
diffuse optical imaging non-trivial. Scattering and absorption of the photon are modeled as
stochastic processes, which predict a range of possible outcomes, each associated with its
probability. Consequently, the reemitted diffused light often appears blurred, an effect that is
analogous to a low-pass filter in signal processing terminology. This blurring leads to a
reduction in both the temporal and spatial resolution of the reconstructed image when
compared to the original input light beam [91].
29
By selectively sampling photons that exhibit similar characteristics—such as following
similar paths, having equivalent wavelengths, or being equally polarized—the inherent
unpredictability of light scattering can be mitigated [92]. In the domain of DOI, Spatial
Frequency Domain Imaging (SFDI) [50] exemplifies this approach by utilizing active illumination,
where the light is intentionally modulated with specific patterns prior to tissue entry, coupled
with responsive detection systems tailored to these patterns. This synchronization between the
modulated light and the detectors is crucial, as it enables the capture of images at various
tissue depths by assessing the diffusion patterns of light. Hence, the technique of subsampling
photons sharing specific characteristics is widely used to mitigate the resolution-degrading
effects associated with diffusion in imaging acquisition.
The 3D-mDOI pipeline subsamples of light having similar travelling paths through structured
illumination. In our acquisition setup, the DMD creates multiple illumination points, presented
as a structured dot-like pattern on the sample surface. In each illumination pattern, multiple
illumination sources are strategically spaced to ensure that each camera pixel captures light
from a single illumination source exclusively. As a result, the spatially subsampled region
between the illumination source and detector pair can be determined using a precomputed 3D
photon distribution map. The testing sample undergoes an automated scan using various
illumination patterns that encompass the entire area, which facilitates additional oversampling
which is explained in Chapter 2.4.2 0
2.2.2 Experimental Image Acquisition
The cost-effective and non-invasive reflectance data acquisition is made possible by our
innovative OPTIMAP system, composed of readily available consumer-grade equipment. The
30
platform integrates a CMOS camera (acA2000-340km, Basler ace, Germany) and a
programmable digital light projector (DLP4500-C350REF, Texas Instruments, America),
positioned approximately 30 cm from the sample and roughly 15cm apart from each other. The
DLP projects an array of dot-like patterns onto the sample surface using 630nm red LED light,
while the CMOS camera simultaneously captures the light reemitted at the tissue-air interface.
We employ optical cross-polarization on both the camera and DLP to effectively suppress any
specular reflectance. The total expenditure for this setup is approximately $3,000, offering
further prospects for cost-reductions with future enhancements.
We design an acquisition protocol to ensure the automated collection of high-quality
reemitted light data. A custom C program synchronized changes in the DLP's patterns with the
camera operations, streamlining the data acquisition process. To bolster the signal-to-noise
ratio, particularly for pixels further from the light source, we captured 50 images for each
illumination pattern, utilizing the same acquisition settings, and averaged the data.
Additionally, we create an ultra-high dynamic range (UHDR) output by acquiring images across
multiple exposure times for each pattern, such as the physical phantom samples that spanned
exposure times of 10, 20, 40, 80 and 160 milliseconds.
The conventional HDR technique merges several 8-bit images captured at varied exposure
settings to yield an image with a superior dynamic range. We refined Matlab's 'makehdr'
function to enable output in a 16-bit image format. The optimal UHDR setting was determined
by balancing the dynamic range of the resultant UHDR images and the acquisition duration of
the raw images (Figure 2.1). Eventually, the UHDR image, derived from raw data with exposure
31
times of 10, 20, 40, and 80, was employed to assess the efficacy of 3D-mDOI when working with
physical phantoms.
Figure 2.1 3D-mDOI Acquisition Approach.
A synchronized image acquisition platform is designed for capturing reflectance from tissue phantoms.
The platform (A) comprises a Digital Light Processor (DLP) for structured illumination and a high-speed
CMOS camera with a 12-bit depth, both equipped with optical polarizers to eliminate specular reflectance.
A customized PDMS tissue phantom (B) is used, featuring insertions with varying optical properties. High
dynamic range images are generated to maximize the signal-to-noise ratio of the captured reflectance.
Sequential samples of raw images (C) are captured under different exposure times, while the illumination
pattern remains. Combinations of these raw images (D) are then processed to estimate reflectance
outputs at different photon counts. Intensity comparison in a logarithmic scale is shown in (E), highlighting
the improvement in signal-to-noise ratio starting from 5 pixels away from the central light source.
2.3 Hybrid mathematical model
With respect to DOI, there are generally two approaches to describe optical transport in
tissues, specifically: the Radiation Transfer Equation (RTE) and photon migration via Monte
Carlo simulation (Chapter 1.2). RTE associates randomly distributed optical parameters to a
continuous tissue medium and then solves the problem analytically or numerically. Various
analytical models utilizing RTE have been shown to achieve rapid estimation of the optical
DLP Cam.
Phantom
10ms
20ms
40ms
80ms
160ms
HDR2
HDR1
HDR3
A System Setup C Raw Image D HDR Output E Intensity Comparison
B
P P
32
parameters by simplifying assumptions in the tissue geometry and homogeneity [41, 63, 93,
94]. This simplification, however, generally comes at the expense of accuracy. While numerical
techniques such as the Finite Element Method can handle more complex geometries, they can
become impractical in large-scale problems due to the memory storage requirements for the
decomposition and the increase in computation time [95]. The Monte Carlo simulation creates
a discretized volume of the tissue and solves for photon-tissue interactions stochastically in
each voxel [73]. This approach can account for the highest level of diversity and structural
complexity of depth-related optical diffusion, but it is the most computationally expensive
method owing to the need for iterative 3D diffuse data fitting. Thus, there is a trade-off
between model complexity and reconstruction speed between these two approaches.
Here, we combine aspects of the analytical RTE model, as described by Farrell et al [41],
with the stochastic realism of Monte Carlo simulation. This combination gathers advantages
from both approaches while reducing the shortcomings, enabling the reconstruction of features
with an unknown geometry, buried within the tissue volume, while maintaining a reasonable
numerical simulation time.
2.3.1 Modified photon migration model based on low-cost multisite acquisition
platform
Within the theoretics photon migration framework of scattering media imaging [96], the
steady-state radiance, accounting for the energy flow of that light from a certain point over an
area and within a given angle, represented as �1(�", �#) (Eq. ( 2.1 )), at point �# is determined
by three primary elements: the light source, the detector, and photon-medium interactions.
�(�") denotes the emittance function of the light source, providing insight into how photons
33
are emitted from the source. �(�#) refers to the point spread function of the detector,
illustrating how the detector perceives or collects photons. In an ideal steady-state imaging
situation, both �(�") and �(�#) are typically constant, not varying across experiments. �(�)
provides the probability of selecting photons with a trajectory length represented as �. Ideally,
for every value of �, �(�) is 1, suggesting no further selection based on trajectory length in
our experiment. �1(�", �#, �) is the remaining radiance reaching the detector after photons
have interacted with the medium along their respective trajectories. The term can be computed
by Monte Carlo simulation with known medium optical properties with constraints set for fixed
values of (�", �#, �).
�1(�", �#) = ` �(�")�(�#)�(�)�1(�", �#, �)��
9
&
Eq. ( 2.1 )
Given the underlying assumption of photon independence in relation to both the source
and detector [97], and considering the characteristics of our low-cost multisite acquisition
platform, we simply �1(�", �#) with experimental correction �(�", �#) and �1(�", �#) as Eq.
( 2.2 ). In the situations we examine, the CMOS camera captures reemitted steady-state
radiance where the incident light projected by the DMD interacts with testing sample.
�(�", �#) is an experimental expression of acquisition parameters �(�")�(�#)�(�). The
factor �(�") accounts for unevenness in the illumination, leading to vignetting. �(�") also
corrects for the ambient background lighting which is assumed to be constant during the
experiment. �(�#) is a calibration constant which accounts for camera detector noise, such as
shot noise, camera readout and background noise.
In contact-free imaging, �(�) in Eq. ( 2.1 ) varies due to photons leaking between
neighboring detectors. The scattering nature of photons may cause them to be emitted from
34
tissue at angles not perpendicular to the surface. Consequently, there is a higher chance of
them being detected by a neighboring detector rather than the one directly aligned with them,
particularly in contact-free imaging as compared to contact imaging. We remove �(�) in Eq.
( 2.1 ) and introduce a corrected pattern �(�", �#) to � to simplify the model, compensating
for the energy loss when light travels back from the medium surface to the detector. The
integral expression ∫ �1(�", �#, �)�� 9
& can then be represented more succinctly as �1(�", �#) ,
given that we account for all the photons reemitting at �#.
�%(�&, �') = �(�&, �') �%(�&, �')
With
�(�", �#) = �(�") ∗ �(�#) ∗ �(�", �#)
Eq. ( 2.2 )
Building on the information provided earlier, the inverse problem of photon migration via
Monte Carlo simulation can be addressed by determining the optimal inference of the
medium's optical properties. This is achieved by using stochastic gradient descent to minimize
the discrepancy between the theoretical and experimental radiance values. However,
addressing the inverse problem of the Monte Carlo model utilizing steady-state measurements
poses two primary obstacles: computational time and result ambiguities. First, the �! and �"
estimations are extremely computationally intensive and impractical for reconstruction of the
optical properties for large, high-resolution volumes [98]. Secondly, determining the attributes
of non-uniform tissues remains undetermined, with the potential for multiple viable solutions
[96]. Given that 2D diffuse reflectance is recorded solely at the tissue surface, there arises a
complication; photons traversing from the source to the detector (�", �#) can undertake a
35
virtually limitless array of paths, a complexity that magnifies in the presence of tissue
heterogeneity.
2.3.2 Mathematical derivation of hybrid 3D reconstruction algorithm
The developed hybrid mathematical model, merging the 2D analytical insights garnered
from the RTE model with 3D photon diffusion maps, shaped through Monte Carlo simulations.
Employing a digital mirror device (DMD) to manage the incident light, a 2-megapixel camera
precisely captures the reflectance data, with each pixel functioning as a separate detector.
Initially, we lean on the analytical RTE model mentioned in Chapter 1.2.3.1 to yield a 2D
depiction of surface-level optical properties. Despite RTE's restriction to homogeneous tissues,
we hypothesize that a pixel subset in a small region represent a quasi-homogeneous structure.
This perspective enables us to construct �! and �" maps indicative of the region's optical
characteristics. Progressing from this 2D representations, we extend them into the 3D space,
adhering to the 3D-photon diffusion distributions mentioned in Chapter 2.3.1 generated
through Monte Carlo simulations.
In line with our quasi-homogeneous extension into 3D, we compute 3D photon diffusion
lookup maps on a homogeneous standard phantom. These maps, prepared for various light
source-detector distances, present spatial distributions for an estimated 50 million photons.
When refining our mapping strategy, we make a key assumption: melanin level variations
primarily affect the absorption coefficient, whereas the scattering coefficient largely dictates
the photon's trajectory within the medium. Given our emphasis on melanoma, we model
obvious change in absorption coefficient with minimal scattering coefficient variance [40] We
employ a consistent lookup map for all tested samples, hypothesizing that potential deviations
36
in this map, resulting from optical property shifts, are minor [99]. Moreover, we are
provisioned for these subtle variations to be rectified in later multi-calibration phases with joint
math model.
To mathematically conjoin the two models, the reflectance in the semi-infinite slab RTE
needs to be represented as reemitted radiance in the Monte Carlo model. The transformation
between reflectance to radiance through Eq. ( 2.3 ) is based on two of assumptions. One, the
scattering coefficient in tissues typically overshadows the absorption coefficient, leading to a
small absorption decrease of the incident radiance. Two, by ensuring adequate spacing
between the light source and detectors, reemitted signals captured by nearby detectors
primarily emit from a single light source. Hence, R(ρ), the reflectance ratio, mirrors the ratio of
radiance reflected at �# over the total from all detectors. Within the scope of our analysis, the
steady-state experimental radiance measurement �1(�", �#) is a direct product of
experimental correction �(�", �#) and the theoretical radiance �1(�", �#) (Eq. ( 2.2 )).
Recognizing that the Monte Carlo simulation naturally captures the 3D spatial distribution in its
calculation of �1(�", �#) , it is justifiable to transition the 2D optical properties from RTE into a
3D perspective, leveraging the 3D distribution deduced from Monte Carlo methods.
Eq. ( 2.3 ) also contains γ1(ρ) which is a correction term necessary to bridge any
inconsistency between the two mathematical models. Generally, the differences in the
reflectance obtained via RTE or Monte Carlo simulations, under the same experimental
conditions, is negligible. However, when � is small, the resulting radial distances are smaller
than half of the mean free paths of light in the tissue. Thus, any mismatch between the two
models result in systematic errors leading to an invalidation of the diffusion assumptions [41,
37
100]. γ1(ρ) is originally determined from both the distance ρ and the optical properties of the
medium. In our scenarios, we utilize a fixed γ1 for the inconsistencies with corresponding to
medium properties. This rationale stems from our approach wherein every voxel undergoes
multiple samplings, ensuring a thorough accounting for tissue heterogeneities. Regions prone
to systematic errors due to sampling at smaller ρ values are inherently corrected by samples
taken at greater distances [101]. Consequently, the influence of γ1(ρ) on the inverse model is
effectively averaged out.
�(�) ≅ γ�(ρ)
��(��, ��)
∑ ��(��, ��) ���
= �(��
, ��)
��(��, ��)
∑ ��(��, ��) ���
Eq. ( 2.3 )
Through the above analysis, the efficient reconstitution of the 3D optical coefficient matrix
is realized by transposing the 2D optical coefficient information into the 3D space, syncing it
with a pre-determined 3D photon distribution map. In our approach, every voxel undergoes
oversampling via numerous source-detector pairs, guaranteeing a thorough consideration of
photon interactions. This methodology minimizes inaccuracies stemming from inconsistencies
in optical coefficients, often attributed to uncertainties in light travel over diverse distances.
2.4 3D-mDOI Reconstruction Pipeline
3D-Multisite Diffused Optical Imaging (3D-mDOI), enables the reconstruction of unknown
geometric features embedded within tissue volumes, based on their optical properties of
absorption and scattering, while ensuring a manageable computational runtime. This is
achieved through an optimized three steps pipeline: first, image capture using a multisite image
acquisition platform, named as Optical Properties Tissue Imaging Multisite Acquisition Platform
38
(OPTIMAP); second, 2D nonlinear fitting combined with 3D reconstruction through a hybrid
mathematical model; and third, uniformity calibration leveraging experimental reference
phantom data.
Initially during image capture, the reemitted light information of the sample is captured via
our image acquisition platform OPTIMAP which uses a multisite illumination that creates a dotlike pattern on the sample (Figure 2.2 A) and a detection approach that exploits the large
number of independent pixels in CMOS cameras. This approach streamlines the acquisition
process, increasing the information throughput and shortening the path to experimental data.
Post-acquisition, the imaging data is processed by a hybrid mathematical model, which
effectively integrates elements of the steady-state RTE model, with the stochastic realism of
Monte Carlo simulation to model the diffusion of light in the tissue (Chapter 2.3). The product
of the rapid RTE model is a 2D map of the optical coefficients at the sample’s surface,
containing in each pixel both absorption and scattering values (Figure 2.2 B). These coefficients
are employed to initiate an accelerated inverse 3D-reconstruction process, assisted by precalculated 3D-photon distributions that represent tissue photon migration (Figure 2.2 C). The
reconstruction steps are repeated for each illumination pattern to systematically scan through
the entire sample (Figure 2.2 D). This step is followed by a uniformity calibration for reducing
system artifacts (Figure 2.2 E). The calibration matrix, derived from a standard sample with
known optical properties, is used to refine the partial 3D reconstructions, ensuring accuracy by
correcting for any deviations in the experimental data. By employing this sophisticated
methodology, we enable the effective reconstruction and visualization of the intricate optical
39
property distribution within biological tissues.
Figure 2.2 Sub cutaneous imaging with 3D-mDOI: approach overview.
(A) The imaging setup consists of a digital micromirror device (DMD) projector that generates patterns of
light-beams and a CMOS camera capturing the reflectance of the reemitted light from the sample.
Specular reflection is mitigated by a pair of polarizers (Blue cover), thus improving the camera's dynamic
range for diffuse light. Each captured image is split into (B, bottom) small patches, each centered on a
light source, with every pixel functioning as a detector. Reflectance values are selected from a neighboring
cross-section (Target Detector) and associated with a distance from the light source. These values serve
as the input for a (B, top) 2D nonlinear fitting of the RTE model, which computes the cross-section’s optical
coefficients(�-, �&) (Magenta Dot). The (C, top) 2D optical coefficient map for each patch is assembled
by integrating the optical coefficients from all target detectors. The corresponding (C, middle) 3D photon
distribution expands the 2D optical coefficient map into a 3D optical coefficient matrix. We integrate
multiple 3D optical coefficient matrices to form a (C, bottom) reconstructed 3D volume by a linear, singlestep reconstruction. Each voxel is sampled multiple times, improving the quality of the 3D reconstruction.
(D) The projection pattern is systematically scanned over the sample surface, repeating the steps A, B and
C to iteratively update the reconstructed 3D volume. (E) Measurements on a uniform phantom provide a
calibration for the reconstructed 3D volume and improving the results. The result is a depth estimation
of the ground truth features, evidenced by the visibility of objects or lesions at various depths within the
relative 3D coefficient volume.
2.4.1 Overview of 3D-mDOI reconstruction
Detailed information of the 3D-mDOI reconstruction process are illustrated below. 3D-mDOI
process starts with image acquisition by OPTIMAP (Figure 2.2 A). Captured images of the
reemitted light are preprocessed by converting intensity values to diffuse reflectance and by
calibrating the instrumental artifacts. The correction parameters utilized in preprocessing step
are determined through an optimization process from experimental data of a uniform
40
calibration sample. The data is subsequently fed into the hybrid mathematical model to
compute the optical coefficients.
In the hybrid mathematical model, each illumination point is considered a light source, and
each pixel of the camera is treated as an individual detector. The data from the central detector
is grouped with that of its neighboring detectors, with the assumption of quasi-homogeneity
for each small tissue surface area. This pooled data is later fed to the semi-infinite slab RTE
model (Figure 2.2 B) to derive the optical coefficients �! and �" for each source-detector pair.
This approach enables the calculation of localized optical coefficients with a level of precision
tailored to each source-detector pair. The aggregated local optical coefficients for each sourcedetector pair form a 2D optical coefficient map.
The rapid reconstruction of the 3D optical coefficients matrix is achieved by mapping the 2D
optical coefficient data into a 3D domain, aligning it with a pre-computed 3D photon
distribution map (Figure 2.2 C). Photons travelling between a source-detector pair, spaced by a
distance �, create stochastically characteristic photon trajectories. This characteristic can be
described by a 3D spatial probability density function, characterized by a 2D cross-section
shaped like a half-moon, and in 3D like a banana mentioned in Chapter 1.2.4 [74]. The central
depth of the 3D distribution is approximately half of the source-detector distance (Figure 1.5).
We calculate each partial 3D reconstruction by assigning the (�!, �") values for each sourcedetector pair to its corresponding 3D photon distribution. The intersection portions of the 3D
reconstructions are merged by a weighted averaging. An additional uniformity calibration is
performed on each partial 3D reconstruction. Each illumination pattern comprises multiple
41
independent illumination sources, and the final reconstructed volume for an illumination
pattern is assembled from these partial 3D reconstructions.
Sequential scanning of multisite illumination patterns across the sample’s entire surface
initiates a repeated algorithmic sequence, generating, with each pattern, a new corresponding
volume (Figure 2.2 D). The reconstructed 3D volume is computed by averaging the data from
the multiple pattern volumes (Figure 2.2 E). This process reduces errors caused by nonuniformities in optical coefficients, which arise from uncertainties in light propagation across
varying distances �. The resulting volume is then normalized using a calibration matrix from
measurements on a featureless phantom. Such uniformity calibration yields a relative 3D
coefficients volume, crucial for reducing artifacts due to unevenness of the consumer-grade
components of our setup, such as the structured illumination, and consequently enhancing the
quality of the 3D-reconstructions. The final output of 3D-mDOI offers an insightful estimation of
the embedded features' depth in the sample. The clarity and precision of this output has the
potential to further researchers or clinicians’ understanding of the spatial distribution and
relative characteristics of these features or anomalies.
2.4.2 Oversampling 3D-photon distributions: enable better reconstruction in
low-cost system
Our innovative pipeline enables strike a balance between cost-effectiveness of the
hardware and decent 3D reconstruction quality. Typically, more affordable hardware
configurations might compromise the photon data quality. To counteract this, we oversample
the photon information for each 3D voxel passing through various photon trajectories. We
achieve oversampling by employing structured illumination techniques to increase the number
42
of light sources and detectors. The numerous captured diffuse data is methodically selected
adhering to the principles of tissue photon migration to reconstruct the corresponding 3D
voxels. Specifically, we employ precomputed 3D photon distribution models from Monte Carlo
simulations as a guide in choosing the diffuse data most relevant for each voxel's
reconstruction. This strategy ensures every voxel contain detailed and comprehensive
information, enhancing the overall quality of the 3D reconstruction. Such an oversampling
technique effectively offsets the photon information associated with cost savings and the noncontact nature of the hardware setup.
2.4.2.1 3D-photon distributions and look-up map generation
A prerequisite for employing the oversampling strategy is to calculate 3D-photon
distributions with a known tissue phantom, which then act as a reference map to expedite the
reconstruction process. In our work we employ a 3D Monte Carlo simulation (72) to model
photon propagation within a tissue phantom, reflecting specific parameters indicative of
melanoma characteristics. The constructed phantom, designed to mimic homogeneous dermis
tissue, had dimensions of 30x30x10 mm with a voxel size of 0.5x0.5x0.2mm, aligning with
melanoma's typical size and Breslow depth. The simulation utilized a single collimated
Gaussian light beam as the incident light source, with a 680nm wavelength, within the red
region of the visible light spectrum. Although the light beam's waist size (0.01mm) is relatively
small compared to the voxel size of the phantom, its central position within the phantom was
designated as the incident light’s point of entry. The phantom’s air-tissue interface voxel were
considered as detectors.
43
The photon trajectories computed by Monte Carlo simulation were grouped based on their
exit point at the air-tissue interface. By aggregating the trajectories with identical entry and
exit voxels, the resulting map represents photon distributions in three dimensions . Each
voxel’s photon trajectory was normalized by the total number of incident photons. To account
for the inherent stochastic nature of the Monte Carlo method, we executed this simulation five
times, yielding an averaged 3D photon distribution map. This process facilitated the
computation of the reflectance ratio, determined by normalizing the count of photons escaping
at specific locations against the total number of incident photons. This ratio is indicative of the
degree of signal loss in 3D back-projections and offers important correction weights for the 3DmDOI's reconstruction stage.
2.4.3 System computation and calibration
The general analysis for processing diffusion data in 3D-mDOI is composed of a four-step
sequence: preprocessing of captured data, 2D nonlinear fitting, 3D reconstruction, and
uniformity calibration. Each step in the 3D-mDOI pipeline plays a critical role in ensuring that
the final reconstruction is both accurate and reliable.
2.4.3.1 Preprocessing of captured data
To ensure reliable reconstruction fidelity, multi-stage calibration for background
illumination, camera noise, and illumination unevenness are necessary for the data captured
from OPTIMAP (Chapter 2.2.2 ). Background noise during acqusition predominantly arises
from light leakage in the DMD projector, ambient room light, and inherent camera noise. To
address this, we captured a background image with all the DMD mirrors turned off. This image,
44
which represents the static noise components, was then subtracted from the experimental
images, eliminating constant background noise from the DMD projector. Additionally, camera
noise was minimized by performing a black correction for each exposure setting by capturing 50
dark frames for each exposure time and averaging them to obtain an exposure-dependent
correction frame. Subsequently, the experimental data was adjusted by subtracting the
corresponding dark frame based on its exposure duration. We maintained a constant gain,
collecting datasets after camera reached temperature stability to minimize the acquisition
variations during the experiment [102]. For more uniform illumination, a normalized flat-field
correction matrix was generated by imaging white targets at varied exposure durations. During
the experimental acquisition phase, we utilized the pre-stored flat-field correction matrix to
preprocess the captured images.
Signal-to-Noise Ratio (SNR) is metric for quantifying the amount of photon information
captured by our acquisition platform. This metric, expressed in decibels, measures the
proportion of the desired signal to the background noise. We utilize the average intensities of
the detected signal (�:.
) at a fixed source-detector distance � against the average intensities of
the bottom 5% intensities as the background (�<" ). This approach allows for a standardized
assessment of OPTIMAP’s capability to discern signal from the background noise.
��� = 20 × ���./ ;
�0
�1!
= Eq. ( 2.4 )
We convert the experimentally measured intensities to reflectance for each illumination
point. We assume each illumination point had minimal cross talk with others, covering a finite
Region of Interest (ROI) in a captured image, thanks to our systematically designed illumination
pattern. This premise allows us to crop the acquired images into smaller patches, each
45
containing just one illumination point. Based on reflectance data in captured multisite image
acquisition platform with a singular illumination point projected onto the test, we predetermined the size of the ROIs. Specifically, we defined the ROI size by identifying the
broadest range that encompassed pixels with positive intensity after background noise
subtraction. During the experiments, we extracted local maximum pixels from the data,
identifying them as illumination point locations. Intensities in each patch were later normalized
by the local sum. This intermediary step yields a value close to the ratio of detected reflectance
light in a pixel to the radiance from a specific light source in the hybrid mathmatical model
(Chapter 2.3.2 ). The correction parameters utilized in this preprocessing step are determined
through an optimization process from experimental data of a uniform calibration sample. In
practice, we determined the best scaling and offset values by using a least squares fitting
method to linearly represent relationship between the normalized intensity and the
corresponding reflectance value. The theoretical reflectance ratio, computed using the
Radiative Transfer Equation (RTE) with an initial guess of the sample’s optical coefficients,
allowed us to determine optimal correction factors. We matched the intermediary results to
the theoretical reflectance ratio using a least-square fit. These procedures are crucial for the
accurate computation of the subsequent 2D optical coefficients.
2.4.3.2 2D Nonlinear fitting in 3D-mDOI
We obtained a pair of parameters (�!, �") for each pixel region utilizing the relationship
between the reflectance R and the source-detector distance � described in the semi-infinite
slab RTE (Figure 1.4). These parameters were computed using the Levenberg–Marquardt
algorithm. Leveraging the quasi-heterogeneous assumption for the neighboring pixels, the
46
tissue properties are considered uniform in a small neighboring region. The optical coefficients
of the center pixel were computed with multiple (�, �) pairs from neighboring pixels in a cross
section. In particular, the computation of the optical coefficients of a given center pixel was
performed by separately utilizing (R, ρ) pairs from neighboring pixels arrayed in both horizontal
and vertical configurations. To assure both consistency and accuracy in this process, we took
the average of these independently derived results when finalizing the optical coefficients
attributed to the same center pixel. A map of the 2D optical coefficients for each illumination
patch was later generated by assembling these optical parameters pixel-by-pixel.
2.4.3.3 3D reconstruction in 3D-mDOI
The structured illumination in multisite image acquisition platform was designed to match
the 3D mapping of photon distribution for source-detector pairs in hybrid mathematical model.
We isolated each light source illumination point within an illumination pattern to guarantee
that each pixel value on the camera detector was influenced by only a single illumination point.
In the mathematical model, each illumination point acted as a light source, and each camera
pixel was viewed as an individual detector. Consequently, the intensity value for each sourcedetector pair is readily accessible, providing essential input for subsequent 2D optical
coefficient calculations and the 3D photon distribution mapping. To ensure coverage of the
entire sample region within the 3D coefficients matrix for each illumination source, we
systematically shifted the light source illumination points based on a pre-designed scan pattern
and reiterated the computation processes accordingly. (Figure 2.2 D). This approach minimized
nonuniformities in tissue reconstruction by ensuring multi-samplings of photon migrations from
various directions.
47
The back-projection of the 2D optical coefficient to its associated photon's 3D spatial
distribution allowed us to achieve a 3D reconstruction. The most appropriate 3D photon
distribution is chosen for each source-detector pair based on the source-detector distance �.
This distribution maps the relevant optical coefficient to the 3D voxels that photons are more
likely to traverse given the constraint of � (Figure 2.2 C). It’s worth noting that each voxel in the
reconstructed 3D matrix can have overlapping contributions from multiple 3D photon
distributions originating from different source-detector pairs. To address this, we integrated
each voxel from all 3D photon distributions contributing to it in two phases. First, we
computed the 3D coefficients matrix for each illumination point and normalized it using a
precomputed summation of all 3D photon distributions. We followed this by realigning these
partial 3D coefficients matrices to their original positions within the captured image. A
weighted average was applied to merge all the partial outcomes, resulting in a consolidated 3D
reconstructed matrix for further calibration.
2.4.3.4 Uniformity calibration
Uniformity calibration (Figure 2.2 E), the final stage in 3D-mDOI, was designed to produce a
calibration matrix and refined the reconstructed 3D coefficients matrix. Two distinct methods
were available for this purpose, each with its own experimental and computational
considerations. The first involved using a featureless, uniform phantom subjected to diverse
illumination patterns. Although this technique provided a more consistent calibration matrix
for subsequent reconstructions, it required both the presence of a uniform phantom and
additional experimental steps. The alternative strategy was based on the idea that the majority
of the phantom's volume was concentrated in its base region rather than features. Through
48
random sampling and averaging of reconstructed 3D patches, a representative 3D coefficients
matrix for the base was created and utilized as the calibration matrix. While this method was
more efficient for assessing samples that aligned with the given assumption, it might have
affected accuracy. Regardless of the method selected, the resulting calirabtion matrix was used
to correct each of the partial 3D reconstructions from experimental data, effectively minimizing
grid artifacts and systematic noise and thereby accentuating the 3D optical coefficients of the
features within the phantom.
2.5 Performance of 3D-mDOI
To evaluate the performance of 3D-mDOI, we have designed a series of tests using both
simulated and physical phantoms, thereby validating the system's capabilities from multiple
angles. In our tests with simulated tissue phantoms, we perform both visual and quantitative
comparisons of the reconstructions to ensure they more closely match the expected simulated
phantoms compared to those generated by FEM. Moving beyond simulations, we incorporate
physical phantoms to validate the performance of the reconstruction with the data collected by
OPTIMAP. The experimental noise, which reduces the photon information of the captured
diffusion data, prompts us to segment the validation into two distinct phases: first, we visually
and quantitatively evaluate the system's ability to calculate the relative optical coefficients of
the phantom features only using both 3D-mDOI and FEM; second, we investigate the feature
depth estimation capabilities to test the diagnostic boundaries of 3D-mDOI. Finally, we examine
the computational efficiency of the system, a crucial factor for its practical application, to
ensure our technology is both accurate and viable for extensive clinical deployment. Through
49
this comprehensive testing methodology, our goal is to establish 3D-mDOI as a reliable and
efficient instrument for 3D reconstruction and, potentially a solution of low-cost optical biopsy.
2.5.1 Generation of testing samples
To validate the performance of 3D-mDOI, we conduct rigorous tests with both synthetic and
physical tissue phantoms, meticulously designed to replicate tissue properties. These phantoms
serve as test samples for assessing 3D-mDOI’s potential use in 3D tissue optical properties
reconstruction.
2.5.1.1 Simulated phantom design
In our effort to evaluate the performance of 3D-mDOI, we engineered multiple simulated
tissue phantoms incorporating vatious pigment-like features (Figure 2.4 and Figure 2.5). These
standardized phantoms, measuring 30x30x10 mm, were segmented into two distinct types of
tissue: the uniform dermis and the embedded pigment-like features. Each tissue type, owing to
its specific composition encompassing blood, water, fat, and melanin concentrations, exhibits
unique optical properties as derived from the Jacques et al.'s tissue simulation model (73). For
the skin pigment simulations, we utilized the melanin concentration as a mean to drive the
optical properties. Specifically, we set a melanin volume fraction of 0% for dermis tissue and
20% for the pigment feature. The absorption coefficient (�!) for the dermis and the embedded
features is determined to be 0.001 mm-1
and 0.4879 mm-1 under red light (680nm),
respectively [73]. Contrarily, the scattering coefficient (�" ) remains consistent at 29.411 mm-1
for both the dermis and embedded features, regardless of melanin fraction variations [73]. To
further vary the features with different embedded depth, we designed two distinct inclusion
50
designs for these embedded features: surface protrusions and subsurface inclusions. The
surface design encompassed semi-spherical features, 2mm in radius, progressively embedded
at depths of 0.5, 1, and 3mm, informed by Breslow depth [103]. In contrast, the subsurface
design intrduced cylindrical inclusions, with depth spanning from 1mm to 3mm and a consistent
2mm radius, epitomizing 3D-mDOI's capability to reconstruct subsurface objects.
We used simulated phantoms to benchmark both visual and quantitative performance
(Table 2.2) and to emulate the image acquisition phase (Figure 2.2 A) through Monte Carlo
simulation via photon migration [73]. This process mirrored structured illumination by altering
the location of the illumination point, yielding a spectrum of reflectances across the
heterogeneous tissue model. Apart from the the incident light location, the parameters
employed for the Monte Carlo simulation remained consistent with those defined for the 3Dphoton distribution look-up map, ensuring methodological consistency and data accuracy. The
computed reflectance data informed the 2D nonlinear fitting components of 3D-mDOI.
2.5.1.2 Tissue phantom construction
We develop representative physical phantoms utilizing materials known to emulate human
tissue’s optical coefficients [104]. We used Polydimethylsiloxane (PDMS, SYLGARD 182 silicone
elastomer, DOW, America), owing to its consistent behavior, stability, and tissue-like optical
characteristics. Changes in the scattering (�") were achieved with the addition of TiO2 (TiO2,
CAS#1317-70-0 Sigma-Aldrich, America), while the absorption (�!) was changed using India
Ink(Higgins, America). Adjusting the proportions of TiO2 and India Ink within the PDMS, we
engineered phantoms that replicated the optical coefficients of human tissue and moles [73].
We designed two types of phantoms: Phantom A (Figure 2.6) with surface-level features, and
51
Phantom B (Figure 2.10) focusing on sub-surface features, each designed to evaluate 3DmDOI's prowess in reconstructing optical parameters across depths. Further details on the
features' optical parameters are detailed in Table 2.1.
Part 1: Estimated Optical parameters in phantom A
Feat. 1 2 3 4 5 6 b
�! [cm-1
] 0.69 0.86 1.38 0.86 0.86 0.83 0.5
�" [cm-1
] 90 90 97 97 93 53 90
Part 2: Estimated Optical parameters in phantom B
Feat. Type1 Type2 b
�! [cm-1
] 0.5 0.2 0.2
�" [cm-1
] 100 200 100
Table 2.1 Estimated optical parameters for two phantoms.
The parameters are computed by measuring the concentration level of the TiO2 and India Ink following
protocols reported in literature [104].
In our approach to craft multi-feature phantoms, we utilized 3D printed molds, ensuring
precision and repeatability. Utilizing polylactic acid as the mold material (Figure 2.3), we
developed a process to effectively mitigate bubble formation. We embedded various solid
features within a 3D-printed lid as our mold structure, and casted a PDMS base beneath it. Post
hardening of the PDMS base, this lid was detached and the resulting cavities were filled with a
precisely tuned mixture of PDMS, TiO2, and India Ink, to achieve specific optical coefficients
contrast from the PDMS background. The crafting of Phantom B introduced an extra layer,
where a 1mm bulk PDMS overlay ensured complete encapsulation of the features. To address
the challenge of bubble formation, each PDMS layer underwent vacuum chamber degassing.
While this step significantly reduced the presence of bubbles, we acknowledge the potential
52
inclusion of microbubbles and minute non-uniformities in the chemical mixture, which might
introduce variability in the optical parameters' readings (Figure 2.10).
Figure 2.3 3D printed mold design and customized PDMS phantom assembly.
(A) A rectangular container with a lid designed for generating PDMS phantoms with optical properties
similar to human dermis tissues. (B) Schematic representation of the lid, both inner circles and
pentagons depict features that have the same dimensions but differ in their protrusion depth. To ensure
the full coverage of the characteristic 'banana' distribution during depth estimation, the region of
analysis (Fig. S6) for a feature should extend at least four times its depth. By establishing concentric
circles with a radius twice the depth of the features, we ensure these features are adequately spaced
apart. The generation the customized PDMS tissue phantom requires two steps. First, the phantombased area is cured by placing a (C) 3D-printed lid on the top of the container. Then, the feature areas
left are filled with customized PDMS material, which possesses distinct optical properties (D). (E) Top
and (F) side views of the PDMS phantom provide a comprehensive overview of its spatial characteristics.
2.5.2 Validation with simulated tissue phantoms
We demonstrate the 3D-mDOI reconstruction quality utilizing digital tissue phantoms
resembling dermatological photo-physical properties, obtaining a range of experimental
settings and an estimate of the quality of the 3D-reconstructions. For this purpose, we first
53
synthesize digital phantoms with a variety of features that mimic the reported optical
parameters for tissue structures[67]. The input for the 3D-mDOI process—2D reflectance data
at the phantom surface —is estimated using an established Monte Carlo simulation tool [73],
simulating, within reason, the data being acquired in the multisite image acquisition platform
OPTIMAP. We assess the impact of acquisition parameters like sampling frequency and step
size on the depth, diversity, and quality of the reconstructions (Figure 2.4). The chosen
parameters for the simulations are also employed in the OPTIMAP to ascertain the optimal
performance. Subsequently, we visually (Figure 2.5) and quantitatively (Table 2.2) compare the
3D matrix of optical coefficients �! and �" computed on the simulated reflectance data of this
proposed approach with those from the state-of-the-art finite element method (FEM)
simulations [65].
2.5.2.1 Extensive parametric investigation
The 3D-mDOI reconstruction's precision hinges on the signal-to-noise ratio (SNR) of the
reflectance image, dictated by factors like acquisition parameters, light injection power, and
sampling frequency from varied illumination sources (Figure 2.4). Elevating the incident light's
power, signified by an uptick in simulated photons for every illuminating point, is one approach
to improve the SNR. When reconstructing the deeper regions of a sample, it's crucial to rely on
reflectance data from source-detector pairs positioned further apart. Due to the energy loss or
absorption faced by photons on longer trajectories, increasing the photon count is crucial for
more reliable observations of such regions. Another quality booster is a heightened sampling
frequency, which is inversely proportional to the step size between consecutive illumination
points. The increment in sampling frequency ensures that an individual voxel receives input
54
from a multitude of source-detector pairings, encompassing varied 3D-photon migration paths.
However, a surge in SNR, while enhancing precision, also elongates the acquisition duration.
Therefore, striking the right balance in acquisition parameters is key to the 3D-mDOI method's
efficiency.
Figure 2.4 The effect of photon counts and projected pattern size on the quality of the reconstructed
volumes from 3D-mDOI and FEM simulations, using a numerical phantom.
(A) 3D-mDOI reconstructed volume of a phantom with a feature extending 5 mm beneath the surface
processed under different Signal-to-Noise Ratio (SNR) input conditions. (B) Reference ground truth for the
5mm deep feature. As the photon count increases, the quality of the 3D-mDOI reconstruction (A)
generally remains constant. Higher photon counts translate to better SNR input and result in reduced
noise at the surface. (C) Finite Element Method (FEM) results provide reduced subsurface details, even in
a high photon count input scenario. (D) Phantom with sub-surface features from 2mm to 4mm to test the
effects of different sampling rates (step size) of the sampling pattern, measured in �����2.. The lateral
resolution of the phantom is 0.5 mm, and the projected sampling rates of 1/2 , 1/4 , 1/8 , and 1/16 (E)
correspond to illumination sampling steps of 1mm, 2mm, 4mm, and 8mm, respectively. For this test,
reflectance is simulated from individual illumination points to preclude interference between
illuminations. An increased sampling rate results in a denser distribution of illumination points on the
sample, providing more source-detector pairs for an improved phantom voxel reconstruction. (F) 3DmDOI reconstruction quality improves with an increase in sampling rate, with the tradeoff being a longer
acquisition time. The 1/8 sampling rate results to an experimentally reasonable output with a faster
acquisition time. (G) The reconstructions using the FEM approach have a low dependency on the
illumination density. Photon count 3D-mDOI
(!a)
FEM
(!a)
FEM
(!a)
Sampling
rate [pixels-1]
vs Photon Count vs Pattern Step Size
A
Less photons More photons
Xx mm
1/2
Less sampling More sampling
~3.5x105 ~1.5x106 ~1.0x106 ~1.7x106 ~2.0x106
Feat. till 5mm depth
1/16 1/8 1/4 1/2
~1.6x105 ~6.5x105
C
1/4
1/8 1/16
Ground Truth
Sub-surface feat.
at 2 - 4mm depth
1mm
1mm
1mm
1mm
Reconstructed Data
B
E
1mm
1mm
F
G
D
3D-mDOI
(!a)
Sampling
Rate
55
Employing a gradient of simulated photon counts, we carried out multiple reconstructions
of a consistent 5 mm deep feature across a spectrum of SNRs (Figure 2.4 A). The 3D-mDOI
volumetric reconstructions show a consistent estimate of the 3D shape of the feature, across a
wide photon range from 1.6 - 20.0 x 105 photons, supporting the robustness of our approach.
The position of the reconstructed feature matches the ground truth (Figure 2.4 B). The volume
is predominantly free from distortions and anomalies, though some are observed at its surface.
Contrarily, while the FEM accurately pinpoints the feature's position, its results exhibit notable
distortions and increased anomalies (Figure 2.4 C). Much like the 3D-mDOI, the FEM
reconstruction quality does not markedly improve with an increased photon count. These
observations suggest the presence of a saturation point beyond which intensifying the light's
strength has a diminishing impact on the final outcome's quality.
Continuing our exploration, we assess the effects of different sampling frequencies (Figure
2.4 E), ranging from 1/16 to 1/2 per pixel, on the reconstruction of a synthetic phantom with
sub-surface features (Figure 2.4 D) between 2mm and 4mm in depth. In general, a higher
sampling rate enhances reconstruction accuracy by addressing the undetermined regions of the
geometry and reducing artifacts (Figure 2.4 F). At the sparsest sampling rate of 1/16 per pixel,
3D-mDOI struggles to accurately depict the sub-surface feature, primarily due to an inadequate
number of source-detector pairs intersecting the central feature. When sampling rate
increases to 1/8 and 1/4 per pixel, we observe accurate reconstruction of the feature's position,
albeit with some geometrical distortions and surface artifacts. By the time the sampling rate
reaches 1/2 per pixel, artifacts become negligible, resulting in a commendable reconstruction
outcome. Contrastingly, the FEM methodology (Figure 2.4 G) deviated from this pattern. At
56
the 1/16 rate, FEM computes a reconstruction closely resembling the target output but placed
at a different location. While FEM reconstructions refine with narrower intervals, the outcomes
frequently skew in geometry, position, and exaggerated artifacts. One could infer from these
findings that this specific task is intrinsically arduous for FEM, implying that simple
enhancements in sampling frequencies may not redeem FEM reconstructions for such test
cases.
2.5.2.2 Visual comparison of reconstructions
To visually evaluate our method's ability, 3D-mDOI is applied to retrieve the location and
shape of features at different depths within simulated phantoms (Figure 2.5, A to C). The
testing features are designed to mimic the varying stages of melanoma progression according
to Breslow depth [103], which is the thickness of a melanoma from the skin's surface to its
deepest penetration point. Specifically, in these digital phantoms we simulate three features
that extended from the surface down to depths of 0.5mm, 1mm, and 3mm, respectively. 3DmDOI’s outputs are compared against both FEM results and the ground-truth simulated
phantoms, utilizing the XZ cross-sections as references (Figure 2.5 D). We normalize 3D matrix
of optical coefficients, thereby enabling a fair comparison owing to differing dynamic ranges of
the results from two methods. The 3D renderings of 3D-mDOI successfully locates the position
and size of the single features in the central region of the simulated volume (Figure 2.5 E). As
the depth of the features increases, 3D-mDOI accurately reconstructs them as sub-surface
volumes, showcasing an improved accuracy in the absorption coefficient matrix as compared to
the scattering coefficient matrix. In contrast, results from conventional FEM show the
57
limitations of the approach in these experimental settings, with multiple smaller artifact
features outside the expected ground truth region (Figure 2.5 F).
Figure 2.5 3D-mDOI captures information at various depths in simulated phantoms.
Comparison of reconstructions using 3D-mDOI and FEM results from numerically simulated dermis phantoms with (A) 0.5mm,
(B) 1mm, and (C) 3mm deep features, respectively. (D) XZ cross-section simulations and a 3D rendering of the feature ground
truth, as visual references. (E) 3D-mDOI correctly estimates the relative locations of the simulated features in the center of the xy plane of the volume, correspondingly to ground truth. (F) The FEM reconstructions present incorrect regions for the features.
(G) A subsurface feature, invisible from the sample surface, is designed by placing the dye pigment between 1mm and 3mm
deep. (H) The 3D-mDOI reconstruction is at the correct centered position below the surface level in the x-y plane, while (I) the
results from FEM analysis are located in multiple incorrect locations.
Additionally, we construct a simulated phantom with only sub-surface features (Figure 2.5
D) to assess the performance of 3D-mDOI in scenarios where the sample surface lacks visible
information. Despite an increase in surface noise, the results show that 3D-mDOI can
reconstruct the hidden feature at the correct location within the simulated data (Figure 2.5 H).
In this scenario, the 3D-mDOI reconstructed structures are larger than the features of the
simulated phantom. This blurring is the result of an apparent light diffraction effect,
attributable to the characteristic shape of the 3D photon distributions, which acts as the point
58
spread functions in the system. Comparatively, the FEM reconstruction recovers multiple
features at incorrect positions (Figure 2.5 I).
2.5.2.3 Quantitative comparisons for the synthetic phantom
We evaluate the quality of 3D-mDOI reconstruction using multiple quantitative, rigorous
measurements (Table 2.2). The simulated phantoms, featuring depths of 0.5mm, 1mm, 3mm,
and 5mm, serve as the ground truth. We compare the performance of reconstructions by 3DmDOI against FEM across three criteria: Intensity-based distance to ground truth, reconstructed
image quality, and segmentation error (Appendix).
For the intensity-based distance, we compute root-mean-square-error (RMSE) to assess
pixel-wise differences, as well as the Bhattacharyya distance to characterize the similarity in
histograms, utilizing as reference the simulated ground truth. Image contrast, a crucial metric
to identify reconstructed image quality, is then examined to evaluate the ability of 3D-mDOI to
distinguish between different structures or elements in the reconstruction.
After these assessments, we perform manual segmentations to analyze the quality of
segmentation post-reconstruction. For this latter criterion, we utilize the following metrics to
quantify the overlap of the segmented features between reconstructions and ground truth:
estimated depth of the reconstructed features, Specificity, Sensitivity, and Dice coefficient.
Estimated depth is a metric used to quantify how closely the reconstructed features match the
ground truth depth values. Specificity measures the ability of the segmentation to correctly
identify true negatives, while Sensitivity assesses the ability to identify true positives. The Dice
coefficient is a measure of the overlap between the segmented features and the ground truth.
59
We then utilize the Multi-scale Structural Similarity (MSSSIM) [105] to obtain a comprehensive
assessment capturing structural information, luminance, and texture facets of the
reconstructed volume. When comparing multiple reconstructions, the determination of
positively impacting values for these metrics relies on the context of the evaluation. Smaller
values are considered advantageous for intensity-based distance metrics such as RMSE and
Bhattacharyya Distance, whereas higher values are desirable for metrics concerning contrast,
specificity, sensitivity, Dice coefficient, and MSSSIM. These metrics collectively contribute to a
comprehensive assessment of the accuracy, quality, and fidelity of 3D-mDOI and FEM
reconstructions when compared to the ground truth data.
Across all evaluated metrics (Table 2.2), 3D-mDOI consistently surpasses the gold-standard
FEM, outperforming it by 6-fold on average of in the general test cases (Table 2.2, test cases 0.5
to 5mm). When comparing raw reconstructions, 3D-mDOI presents a lower RMSE and
Bhattacharyya values, relative to the ground truth, owing to its improved accuracy in recovering
intensity distributions and structural information. Furthermore, results from 3D-mDOI display
heightened image contrast, underscoring its enhanced resolution capabilities within its
reconstructions. In comparing the segmentation of features, both methods accurately
determine the relative orders of dimension of reconstructed depth. 3D-mDOI's results closely
mirror the actual depth, especially for tests where the depth is less than 1mm. Both methods
register high scores in specificity, a result stemming from the feature’s volume being smaller
than the background. Segmentations of 3D-mDOI have higher sensitivity scores, indicating a
higher proportion of actual feature voxels being identified in the test. This trend is confirmed
by higher Dice coefficient of 3D-mDOI, showing better alignment of segmented region to the
60
expected ground truth. Lastly, the elevated MSSSIM score for 3D-mDOI testifies to its improved
reconstruction quality across human observers' perception.
Table 2.2 Comprehensive quantitative analysis of synthetic phantom reconstructions.
The reconstruction efficacy of 3D-mDOI and FEM is quantitatively assessed using numerically simulated
dermis phantoms with features at depths of 0.5mm, 1mm, 3mm, and 5mm. This multi-faceted analysis
evaluates the volumetric reconstructions against a predefined ground truth, encompassing metrics such
pixel-wise distance (Root-mean-square-error and Bhattacharyya Distance), reconstructed image quality
(Contrast) and segmentation accuracy (Feature's reconstructed depth, Specificity, Sensitivity, Dice
coefficient, and Multi-Scale Structural Similarity). Superior scores within each sample’s metric are
accentuated in bold in the table. The results of 3D-mDOI outperform those from FEM across the board.
Both methods, however, exhibit challenges when reconstructing phantoms with subsurface features
ranging from 1mm and 3mm in depth. The advantage of 3D-mDOI, in this case, is maintained, with
particular evidence in the Bhattacharyya distance, contrast, and segmentation metrics.
We undertake additional quantitative comparisons with a subsurface feature ranging
between 1mm and 3mm in depth (Table 2.2, Subsurface test case). The visual interpretation of
the reconstruction of this fully embedded feature indicates the superior accuracy of 3D-mDOI
over FEM (Figure 2.5), supported by further quantitative measurements. In this scenario of a
subsurface feature, 3D-mDOI presents a general 68% decrease in the performance metrics,
when compared to the other test cases with features extending from the surface into the tissue
(Table 2.2, test cases 0.5 to 5mm). This general degradation of results is due to a combination
61
of reduced amount of signal and a blurring effect due to the shape of the point spread function
of the system. The reduced amount of signal is owed to fewer reflected photons diffusing
through the subsurface feature. The blurring distortion of features can be related to the shape
of the photon distribution (68), effectively a 3D spatial probability density function,
characterizing photons that travel between a source-detector pair and by the general
deterioration of signals as a function of depth.
2.5.3 Validation with physical phantoms
We utilize the OPTIMAP to perform 3D-mDOI on physical tissue phantoms containing 3Dmolded features with different absorption (�!) and scattering (�") coefficients. These
phantoms are crafted with titanium dioxide (TiO2) and India ink embedded in a
polydimethylsiloxane (PDMS) bulk medium with different ratios, in order to change the
absorption and scattering properties. The expected optical parameters for six different
features in the physical tissue phantom (Figure 2.6) are computed based on the ratios of
chemicals added in each feature (Table 2.1, Part 1). We visually and quantitatively assess the
performance of quantifying the relative optical coefficients (�!, �") of the phantom features
both 3D-mDOI and FEM. At the same time, we delve into how the 3D-mDOI recovered optical
parameters vary with the reconstructive depth (Figure 2.7).
We implement an additional step in 3D-mDOI to understand and parametrize its depth
estimation of features in physical phantoms. To study the inherent constraints of the
OPTIMAP's capacity for depth estimation, we conduct an in-depth analysis using an
experimental uniform phantom (Figure 2.8). This analysis focuses on the relationship between
the Signal-to-Noise Ratio of the reemitted signal and the 2D RTE fitting error, observing how
62
this error escalates as the detector's position increasingly deviates from the center of
illumination. This result signifies the system’s maximum reconstruction depth with the
OPTIMAP setup.
Building on our prior depth estimation (Figure 2.8 E), we evaluate various region of analysis
(ROA) candidates to counterbalance the increased experimental noise observed in the data
from physical phantoms compared to synthetic ones (Figure 2.9). The goal is to include the
necessary data for proper depth estimation while filtering out data having low SNR. According
to the geometry of the photon distribution, the depth at the center of the 3D photon
distribution extends to approximately half of the distance between light source and detector.
The side length of the ROA area is restricted to four times that of the reconstructed depth of
the feature. Given the physical restriction associated with the shape of the photon distribution,
a 4mm-by-4mm ROA area, centered on the illumination point, enables a 1mm depth
reconstruction. The proper choice of ROA (Figure 2.10) size includes the necessary data for
reconstruction while filtering out data having low SNR, leading to better depth estimation
outcomes of 3D-mDOI.
2.5.3.1 Optical coefficient validation with physical phantoms
To assess the performance of both 3D-mDOI and FEM, we quantify the relative optical
coefficients (�!, �") of the phantom features (Figure 2.6 A) and visually validate these findings
through 3D renderings. First, we perform corrections for instrument uneven illumination by
calculating the relative optical coefficients ratio for each distinctive feature. To accomplish this,
we normalize the feature data (Figure 2.6 B, red boxes), utilizing the values from the adjacent
backgrounds (Figure 2.6 B, yellow boxes). We then illustrate the comparison of the relative
63
optical coefficients ratio of the features derived from both the 3D-mDOI and FEM methods
(Figure 2.6 C) to delineate the distribution of results, pinpointing details such as the median,
potential outliers, and any discernible skewness in the data. The results from 3D-mDOI for
relative scattering coefficient ( �" ) ratios generally align with expected feature values, while
those of the relative absorption ( �! ) ratios are reasonably estimated ~66% of the times, owing
to two features 4 and 5, that display deviations from the expected value. The 3D-mDOI method
presents greater variance in the reconstructed relative optical coefficients when compared to
FEM owing to the FEM results being an almost constant value of 1, negating distinctions
between features in the plot.
To provide a visual insight into these results, we normalize the reconstructed optical
coefficients of the two approaches to a comparable dynamic range. 3D renderings of the 3DmDOI reconstruction (Figure 2.6 D) clearly enables visualization of the tissue phantom features,
suggesting the method’s resilience against consumer-level instrumentation and its associated
experimental noise. FEM, on the other hand, offers an imprecise representation of the features
in its normalized �! results, while its normalized �" output remains static at �" = 1 , lacking
sufficient numerical depth and insight.
64
Figure 2.6 3D-mDOI reconstructs distinguishable sub-surface features in the physical phantom.
Comprehensive analysis evaluating the physical phantom reconstruction for the methods of 3D-mDOI and
FEM. (A) A physical phantom, crafted with different proportions of titanium dioxide (TiO2) and India ink
in a polydimethylsiloxane (PDMS) medium, contains six unique features with varying absorption (��) and
scattering (��) coefficients. The multiplexed image acquisition platform facilitates systematic scans of the
phantom, employing a Digital Micromirror Device (DMD) to create specific illumination patterns and a
CMOS camera to capture the reemitted light. (B) In the experimental reference intensity image, regions
in red bounding boxes are selected features, which are then normalized by the regions of background in
neighboring yellow bounding boxes. This approach addresses uneven illumination in the reconstructed
data and provides same pixel sample size for comparison. Box plots show the relative optical coefficients
(��, ��) ratio for phantom features, with the distributions of approximately ��� voxel samples for each
feature. (C) These plots provide a clear statistical representation where the central box spans from the
first quartile to the third quartile, bisected by a line representing the median. The whiskers extend to a
maximum of 1.5 times the inter-quartile range, while any data points beyond these whiskers are denoted
as flier points. �� results from 3D-mDOI largely coincide with the expected optical coefficient ratios of
the features. However, certain deviations can be observed in the ratio of �� , especially for feature 4 and
5. Comparably, the FEM results have limited dynamic range, obscuring distinct differences between
feature values. (D) 3D renderings of the phantom showcasing the reconstructed features and highlighting
3D-mDOI's robustness against experimental noise compared to the FEM's more ambiguous renderings
with respect to the ground truth.
While the decrease of SNR presents minimal impact on recognizing features in the 3D-mDOI
reconstruction, it is essential to assess its effect on feature depth estimation and to implement
strategies to mitigate its impact. Inevitably, the reconstruction's fidelity wanes as depth
65
increases (Figure 2.7). Additionally, the values of �" are 100-fold larger than �!, leading to
considerably larger standard deviations in the fitted values for �". These phenomena,
manifesting in experimental measurements, are attributed to the reduced SNR in the images
and requires additional analysis steps to allow for depth estimation of multiple features within
the same phantom.
Figure 2.7 Depth-Dependent Variance in 3D-mDOI recovered Optical Parameters.
Plots representing the influence of depth on the calculated optical parameter values, emphasizing the
effects of diminished photon sampling at deeper sections. Utilizing experimental measurements from a
(A) physical phantom, we compute the ratio of each feature’s optical property to the phantom's base for
correcting the unevenness of the instrumental illumination. Ratios are reported for both relative (B)
absorption (�-) and scattering coefficients (�&) over a sample of 105 voxels. Center line represents the
average, error bars represent the standard deviation. While the shift in optical parameter value minimally
affects post-processing operations, like feature segmentation, it compromises the precision of 3D-mDOI's
depth estimation. The estimated correlation between the decay in these parameters and optical
parameter pair values suggests the possibility of correction via advanced fitting via the introduction of a
depth-incorporating lookup table.
The apparent shift (Figure 2.7 B) in optical values as a function of depth has little impact on
post processing steps like feature segmentations. However, it hampers the capability of 3DmDOI for finer feature depth estimation. The shift in optical parameter values with respect to
depth owes to the decreased number of photons being sampled at those sections. We have
assumed that the decay in these parameters is correlated to the values of optical parameter
66
pairs and can be corrected by further fitting. Another level of the lookup table incorporating
depth that calibrates the variation of the optical parameter as a function of depth could
attenuate this limitation in future work.
2.5.3.2 OPTIMAP experimental boundaries of performance
Recognizing that the fidelity of the reconstruction diminishes with increased depth, we
need to determine the practical limits of reconstructive quality using the experimental data
obtained through OPTIMAP. To understand the system limitations of 3D-mDOI’s reconstruction
depth, we compute the fitting error of Radiative Transfer Equation (RTE) (Figure 2.8, A, B and C)
and its standard deviation (SD) (Figure 2.8, D and E) for a single light beam onto the uniform
physical phantom. A higher fitting error indicates a poorer approximation of the phantom's
optical properties, whereas a larger standard deviation points to the inconsistency or instability
of the fitting. We assess the fitting error to determine the depth at which the reconstruction
remains accurate. Similarly, by evaluating the standard deviation, we ascertain the depth limit
where the reconstruction can be deemed reliable and trustworthy. As the source-detector
distance extends to 10mm, the SNR of the signal drops from 41dB to 11dB, and the reflectance
fitting quality drops sharply. This result signifies that the system’s maximum reconstruction
depth, with the setup utilized in this work, is constrained to approximately half that distance, or
5mm.
The fidelity of the fit is depicted through showcasing the logarithmic absolute fitting error
(Figure 2.8 A) along with its x, y profiles (Figure 2.8, B and C). We observe a diminishing
performance as the source-detection distance approaches 5 mm. Theoretically, the central
67
depth of 3D photon distributions is about half of the source-detector distance. This trend
indicates a satisfactory fit up to a depth of approximately 2.5 mm. The normalized SD of this
error (Figure 2.8, D and E) offers insight into the reliability of the fit. Notably, the SD error
exhibits a linear uptick until the source-detection distance hits 10 mm, at which point it doubles
(Figure 2.8 E). This abrupt increment in error is attributed to the edge effect of our test
phantom, resulting in an isotropic sampling discontinuity in 3D-mDOI. When the sourcedetection distance is longer than 10mm, undetectable reflectance signal for the pixels lead to
the reduced SNR and a drop in the SD fitting error. Hence, a practical 10mm limit in the sourcedetector distance translates to an approximate reconstruction depth cap of 5 mm, with depths
up to 2.5 mm being reconstructed optimally. These derived parameters are subsequently
utilized to determine the optimal depth for enhanced reconstruction.
We discern a 2D pattern in the fitting error emerging from the central pixel extending along
the x and y axis, a byproduct of the 2D nonlinear fitting procedure inherent in 3D-mDOI. When
the light source distance is less than 0.5 times the mean free path of the photon, the pixels
have larger absolute fitting error (Figure 2.8 A), an inherent mathematical error when solving
Radiative Transfer Equation in this situation. In 3D-mDOI we use neighboring reflectance data
to determine the properties of a central pixel. Due to the reliance on this central data, which
has a higher system error, the resulting areas along the x and y axes stemming from this central
pixel exhibit an enhanced relative fitting SD error. This spatial error pattern, emerging in the
shape of a cross, can be expected given the Radiative Transfer Equation used. On a 2D map
(Figure 2.8 D), the relative fitting SD error is distinctly displayed as a cross pattern. Apart from
this, the fitting SD error maintains its consistency across different viewing angles, underscoring
68
the isotropic nature of 3D-mDOI. The uniformity calibration is an astute solution to ease this
system's inherent errors. By doing so, we capture and retain the cross artifact observed in the
uniform sample (Figure 2.2 E). When assessing samples with optical properties that differ from
the uniform sample, the pronounced cross pattern becomes less evident. This reduction in
artifact visibility is a proof to the uniformity calibration's efficacy in representing the general
trend of the RTE fitting error.
Figure 2.8 Estimation of the system imaging depth using the fitting error from the Radiation transfer
equation (RTE).
We analyze the fitting error for a uniform phantom with a single incident light beam centered at
coordinates (0,0) to understand the physical limitations of 3D-mDOI reconstruction. (A) 2D logarithmic
plot of the absolute error between the experimentally acquired reflectance and the fitted curve,
together with (B) its 1D profile along the Y axis and (C) along the X axis. The pixels close to the light
source (A) have a larger fitting error due to the physical limitation of the RTE model. The pixels distant
from the light source also have a larger fitting error caused by the lack of signal and diffusive
information. (D) The 2D normalized standard deviation (SD) of the fitting error represents the stability
of the nonlinear fitting and shows a cross-shaped artifact in 2D. This artifact is caused by the cross
shape of the sampling pattern from a pixel-wise 2D nonlinear fitting of the sample’s light parameters.
(E) Plot of normalized fitting SD error with respect to the detection distance shows a linear correlation
before the distance reaches 10mm. The error doubles with the distance ranges from 10mm to 11mm,
where the SNR level degrades. The error drops close to zero when the distance is longer than 11mm
because of the absence of a signal. The radial gradient descent color map (E, inset) encodes the angular
A
RTE Fitting Error Normalized SD Fitting Error vs
Detecting Distance and Viewing Angle
E
B
C D
69
information as colors and the distance from the center as decreasing brightness, showing that the fitting
SD error is generally angle invariant.
2.5.3.3 Feature depth estimation with physical phantom
With the understanding of the OPTIMAP experimental boundaries, we apply 3D-mDOI
across varied analysis regions, aiming to achieve an enhanced and more accurate estimation of
feature depth. The example in Figure 2.10 shows two features which are 3D-molded (Figure
2.3) with a round subsurface shape (Table 2.1, Part B) in a physical tissue phantom and that are
respectively 3 and 5 mm deep. For a comprehensive exploration, we first study two extreme
cases where the depth constraints are set at 1mm and 8mm, under and oversampling depth,
defining the minimum and maximum ROA thresholds in our experiment (Figure 2.9). ROA
regions that deviate significantly from an optimal size produce 3D-mDOI reconstruction with
evident challenges. Specifically, excessively expansive ROAs (Figure 2.9, C Brown) are prone to
undesirable crosstalk from neighboring illumination sources, accompanied by noise infiltration
from aberrant pixels. Conversely, an overly constrained ROA (Figure 2.9, C Purple) might
capture insufficient portion of the photon distribution's shape, which is imperative for a proper
3D reconstruction. The experimental results are consistent with the assumptions in our model.
For instance, when operating with an 8 mm depth constraint, the reconstructions displayed an
overarching geometrical smoothing of the feature's boundary (Figure 2.9, C Brown). Such
depth constraint compelled the algorithm to primarily focus on the surface, neglecting the
intricate details in depth. When constrained, instead, to a 1mm depth, the reconstructions
appear flawed, primarily because of the absence of a comprehensive signal perspective,
resulting in pronounced local noise (Figure 2.9, C Purple). In the subsequent phases of our
70
analysis, optimal ROA are estimated from experimental settings and utilized for further
analysis.
Figure 2.9 Estimation of the system imaging depth Reconstruction performance with variable depth
constraints.
The fidelity of reconstructing the features at various depths is analyzed under two extreme regions of
analysis (ROA) constraints. Using experimental data acquired from (A) a physical tissue phantom featuring
3D-molded features at (B) depths of 3mm (B green triangle) and 5mm (B navy triangle), we demonstrate
the effects of overly constraining and excessively expanding ROA utilized for the reconstruction. The
depth constraints are set at 1mm (B purple circle) and 8mm (B brown circle), under and oversampling
depth, defining the minimum and maximum ROA thresholds for further experiment. These constraints
are applied to the (C) reconstruction of features extending for 3mm and 5mm in depth (C, Ground Truth).
Excessively restricted ROAs do not capture the crucial 'photon-banana' data, causing pronounced local
noise in the reconstruction (C purple box), while oversized ROAs result in crosstalk and noise
contamination, leading to a smoothed feature boundary (C brown box). These findings underline the
significance of selecting an appropriate ROA depth to ensure accurate 3D reconstructions.
Following these boundary cases, we utilize ROA with intermediate dimensions in 3D-mDOI
to evaluate the inherent depth estimation of features, revealing the relationship between
reconstruction depth and sampled area, a characteristic not mirrored in the standard FEM
(Figure 2.10). We execute four separate experiments, establishing depth constraints of 2, 3, 4,
and 6mm respectively (Figure 2.10 C). For ROAs areas between 4 x 4mm to 12 x 12mm, the
depth constraint ranges from 1 to 3mm, shorter than the true depth of the feature 1, that is
3mm. In these cases, the objects in the reconstructed volume are ill-informed and the
71
reconstructed objects present an artifact, appearing to extend to the bottom boundary of the
phantom (Figure 2.10 D, Navy, Blue).
Expanding the fitting ROA area increases the pixel count which consequently enhances the
SNR. Larger ROAs over 12x12mm yield improved 3D-mDOI images with a sampling depth
exceeding 3mm, outperforming the depth of feature 1 and leading to more accurate depth
estimations (Figure 2.10 D Magenta, Yellow). Besides yielding more accurate reconstructions,
variations in the profiles reconstructed features offer an additional method for estimating
feature’s depth.
Reconstructions are categorized into two groups based on whether the computed depth
reaches the phantom’s bottom edge. The transition between these groups suggests Feature 1
has a depth ranging from 3mm to 4mm, while a similar trend is observed for Feature 2 (Figure
2.10 E). Errors in the reconstructions are noticeable in ROA areas smaller than 24mm x 24mm
(Figure 2.10 E, Navy, Blue, Magenta). Beyond this, reconstructions begin to show a finite depth
for Feature 2 indicating a depth range of 4 mm to 6 mm (Figure 2.10 E, Yellow). The study
reveals the potential for accurately estimating object depth by optimizing the ROA area, a
characteristic not exhibited by conventional FEM.
72
Figure 2.10 Depth estimation using 3D-mDOI and the significance of optimal Region of Analysis
Selection.
The sampling area size significantly influences the accuracy of the 3D-mDOI reconstructions. We acquire
images with a large Field of View of a physical phantom containing surface-level features with optical
properties different from the bulk. We select (A) two distinctive features (Feature 1= 3mm, Feature
2=5mm deep) for this analysis. The fidelity of reconstructing the features at various depths is analyzed
utilizing (B) multiple Regions of Analysis (ROA). These ROAs restrict the surface pixels utilized for analysis,
subsequently defining the reconstruction depth. Based on the diffusion model in this work
(Supplementary Note. 1), the depth of 3D-mDOI reconstruction is limited to a quarter of the length of the
ROA side. The depth map (C) shows the relationship of the depth constraints with the Features’ depth.
(D) Analysis results for Feature 1 (depth: 3mm). A shallow depth constraint, shorter than the actual depth
of the feature (D, Navy, and Blue), leads to reconstructed objects that extend to the phantom’s base.
Conversely, when the depth constraint exceeds the feature’s actual depth (in this case 3mm) (D, Magenta,
and Yellow), the depth estimation of the reconstructed object improves. (E) Reconstructions of Feature
2 (depth: 5mm), present a similar trend, where a marked shift in the depth profile is observed as the ROAinduced depth constraint changes from 4mm to 6mm. The figure underscores 3D-mDOI's inherent ability
73
to discern object depths by strategically adjusting the analysis ROA, a capability absents in conventional
Finite Element Method.
2.5.4 3D-mDOI Computational Efficiency
To evaluate the computational efficiency of 3D-mDOI compared to FEM, we conducted
partial tests on Feature 2, as illustrated in the Figure 4, using varied number of illumination
points: 60, 400, and 1000. Evaluating the computational memory efficiency of 3D-mDOI reveals
significant advantages in both memory usage and processing speed during data analysis. Unlike
the FEM, where memory consumption scales linearly with both the number of voxels and of
light source-detector pairs, 3D-mDOI’s memory requirements scale linearly only with the
number of light source-detector pairs (Figure 2.11). For example, FEM-based reconstruction of
Feature 2 (Figure 2.10) with 400 illumination points, necessitates a substantial 70GB of virtual
memory, while the comparable process in 3D-mDOI requires just 9GB. The traditional FEM
model can easily run into memory overflow, on a standard research workstation, when
incorporating numerous light source and detector pairs due to the method's extensive memory
requirements. In contrast, 3D-mDOI accommodates large sets of light source and detector
pairs without sacrificing computational efficiency.
In terms of computational speed, 3D-mDOI's is marginally outpaced by FEM in the context
of synthetic phantoms reconstructions. Specifically, a numerical phantom reconstruction
(Figure 2.4 and Figure 2.5), is accomplished within 200 seconds utilizing an optimized FEM
package, whereas 3D-mDOI requires approximately 300 seconds. This trend is inversed with
physical phantoms: 3D-mDOI reconstructs Feature 2 (Figure 2.10) with 400 illumination points
in approximately 7 minutes, a stark contrast to FEM's 5-hour requirement (Figure 2.11). This
extended time is caused by the higher number of light source-detector pairs in physical
74
phantom reconstruction, which increases the iterations needed for FEM's accurate fitting,
prolonging its computational time. It should be further noted that 3D-mDOI offers potential for
further optimization. Implementing a lookup table [106] and introducing parallel processing
[107], could notably reduce its reconstruction time.
Figure 2.11 Comparative Computational Efficiency – memory usage and time - of 3D-mDOI vs. FEM.
We evaluate the computational efficiency of 3D-mDOI and Finite Elements Method (FEM) in
reconstructing Feature 2 (main draft, Figure 4), utilizing 60, 400, and 1000 illumination points. (A) The
memory usage of 3D-mDOI remains relatively steady at 9GB across all tests, irrespective of the number
of illumination points. In contrast, FEM's memory usage scales with the illumination points, utilizing
15GB and 70GB for 60 and 400 points respectively. Notably, a computer with 128GB of memory failed to
process the FEM reconstruction using 1000 illumination points. (B) Plot of the computational time
shows an increase of 3 minutes for 3D-mDOI and 52 minutes for FEM for every additional 100
illumination points. It requires 3D-mDOI just 7 minutes to process Feature 2, while FEM took 5 hours,
underscoring 3D-mDOI's marked advantage in computational speed. This up to 60-fold speed
enhancement makes 3D-mDOI apt for many medical applications.
75
2.6 Limitation and future work for 3D-mDOI
The innovative non-contact design of OPTIMAP system (Figure 2.1), brings a transformative
approach to the acquisition of diffuse optical imaging data of physical phantoms, despite a
decrease of the signal-to-noise ratio (SNR). This reduction in SNR presents a manageable
challenge within the robust framework of RTE nonlinear fitting. The framework navigates
complexities such as the intrinsic non-uniqueness of the inverse problem and the
reconstruction crosstalk between the two optical coefficients. It is noteworthy that the �!
values in the phantoms are substantially lower (roughly 100-fold) than the �" values, a factor
that could amplify the likelihood of estimation inaccuracies for �!. These challenges increase
variability in the optical coefficients of the physical features and produce grid artifacts related
to the illumination pattern when employing pixel-level 2D RTE fitting. A predictive
normalization strategy for the fitting model could address the crosstalk problem. In addition,
postprocessing methods like object segmentation could be applied to alleviate noise or grid
artifacts if more information about the features is known a priori. Simpler, small clusters at the
top layer can be removed by setting proper visualization threshold.
The OPTIMAP system, with its innovative low-cost and non-contact design, introduces an
innovative approach to optical mapping. This design choice, while facilitating accessibility and
ease of use, presents a nuanced challenge in achieving high-quality estimations at greater
depths. The system's sensitivity to signal-to-noise ratios is a critical aspect of its operation.
Specifically, the re-emitted light's low signal-to-noise intensities, compounded by instrumental
noise, naturally confine the system's effective reconstruction depth to approximately 5mm.
This limitation is not a flaw but rather an area for future enhancement.
76
The reconstruction reliability subtly diminishes with increasing depth (Figure 2.7). This
phenomenon is attributed to the intrinsic behavior of photon migration, wherein photons
traveling to deeper regions of a sample encounter greater energy loss. Consequently, the
likelihood of capturing re-emitted light from these regions decreases, leading to a shift in
optical values as a function of depth. This effect inherently challenges the 3D-mDOI's capability
to accurately estimate finer feature depths.
However, it is important to recognize that the system's current limitations open avenues for
methodological innovations. For instance, identifying and targeting the optimal region for
analysis could significantly enhance object segmentation, effectively mitigating some of the
outlined challenges. The pursuit of advancements such as depth-based correction methods
presents a promising direction for future research. These improvements hold the potential to
refine the system's accuracy in finer depth estimation, further solidifying the OPTIMAP system's
position as a valuable tool in the field of optical mapping. This perspective underscores the
system's current achievements while highlighting the constructive path toward its technological
evolution.
The reconstruction process offers opportunities for improvement stemming from solving
the Radiative Transfer Equation (RTE) inverse problem. The intrinsic non-uniqueness nature of
the RTE problem leads to reconstruction crosstalk between absorption coefficients �! and
scattering coefficients �" (Figure 2.6 D) . Notably the �! values in the tissue are significantly
lower, approximately 100-fold less, than the �" values [108]. This physical discrepancy
emphasizes the potential for estimation inaccuracies for �!, underscoring the need for
improved solutions. A promising approach to mitigating the crosstalk issue involves the
77
implementation of a predictive normalization strategy within the fitting model. Such strategy
could harmonize the interplay between �! and �" enhancing the accuracy of estimations. An
additional challenge relates to the system’s reliance on pixel-level fitting that, while
instrumental in extracting subsurface information, concurrently introduces noise and grid
artifacts. These artifacts, though manageable, necessitate proper adjustment of visualization
thresholds to omit simple, small clusters at the sample’s top layer.
3D-mDOI approach offers a promising avenue for improving the quality of information of
diffuse photons information by harmonizing advanced hardware with innovative computational
algorithms. Our multisite imaging platform is designed with two essential features in mind:
cost-effectiveness, achieved by using consumer-grade electronics, and patient comfort, ensured
by limiting data collection to manageable durations. These choices, however, entail a lower
signal-to-noise ratio for the reflectance data in the deeper sections of the testing samples,
limiting the accuracy of the reconstruction depth-wise to approximately 5mm. Improvements
in detection depths for 3D-mDOI are intrinsically tied to higher illumination power of the
projector or to a higher sensitivity of the camera instrument. While these technical
specifications generally correlate with the device costs, it should be noted that recent
advancements in manufacturing have introduced high-sensitivity tools in commonly used
consumer technologies [109]. One key potential upgrade is the transition from LEDs to laser
diodes with narrow emission spectra within the projector assembly. An equally transformative
enhancement would be the adoption of a Short-Wave Infrared (SWIR) camera sensors. These
updates would significantly enhance the signal-to-noise ratio, improving both the quality of
captured images and the reliability of feature extraction from deeper tissue layers. In the
78
context of retaining a budget-friendly hardware architecture, Deep Learning algorithms have
proven valuable in bridging the gap between lower cost detectors and high quality data to
harness valuable scientific insights, for example through denoising [110]. Such algorithms,
trained on large datasets to discern and eliminate noise from the images, augment the SNR,
improving the clarity of image reconstructions. Integration of these Deep Learning techniques
is particularly advantageous in scenarios involving hardware limitations or external noise
sources, extending the 3D-mDOI’s robustness and applicability in diverse conditions.
2.7 Summary
3D-Multisite Diffused Optical Imaging (3D-mDOI) delivers subsurface tissue imaging by
synergistically integrating a non-invasive, consumer-friendly imaging system with an advanced
analytical method. The Optical Properties Tissue Imaging Multisite Acquisition Platform
(OPTIMAP) integrates a digital micromirror device (DMD) for pinpoint illumination and a CMOS
camera for non-contact detection, increasing the richness of data. Analytically, 3D-mDOI
merges the Radiative Transfer Equation (RTE) with Monte Carlo simulations, balancing model
accuracy with computational efficiency. Its structured illumination approach, coupled with
advanced analysis, effectively leverages photon distribution oversampling to ensure
comprehensive information on each voxel, reducing the likelihood of reconstructive errors.
Rigorous calibration protocols address potential pitfalls such as inconsistent background light,
intrinsic acquisition noise and uneven illumination. Together, these features enable 3D-mDOI to
deliver consistent and reliable 3D analyses of tissue composition.
Our approach outperforms the conventional modeling techniques in the reconstruction of
simulated tissue phantoms, an advancement quantified by both visual and numerical
79
assessments. The 3D-mDOI achieves up to 40 times greater image contrast, reduces RMSE by
60%, decreases the Bhattacharyya distance of histogram distribution by 97%, and enhances the
reconstruction accuracy of subcutaneous structures by 85%. This heightened performance is
attributable to our approach of treating each pixel as an independent detector, combined with
the utilization of a large set of spatially constrained measurements which are guided by a 3D
probability distribution function. As a result, the reconstructed voxels are informed from
multiple angles and multiple measurements, significantly elevating the level of precision.
Despite the challenges posed by this subsurface feature, especially the reduced photon
reflection, 3D-mDOI consistently outperforms FEM across most evaluation metrics in simulated
tissue phantoms.
The performances of both the conventional and 3D-mDOI methods tend to diminish for test
cases featuring lower photon information, particularly in simulated tissue phantoms with
subsurface features. Certain performance metrics, specifically RMSE and Reconstructed Depth,
indicate a potential underperformance of 3D-mDOI compared to FEM when assessing some
subsurface features (Figure 2.5). The elevated RMSE for 3D-mDOI, for this type of feature,
stems from a mismatch between the depth estimated by 3D-mDOI (3.7-5.8 mm) and the actual
depth range (1-3 mm). While the FEM analysis boasts an RMSE that's 33% lower than that of
3D-mDOI—attributed to its depth estimation (0-4.5mm) aligning more closely with the true
range—it falters in accurate XY positioning, resulting in null values for both the Dice coefficient
and MSSSIM.
The OPTIMAP system, with its innovative non-contact design, brings a transformative
approach to the acquisition of diffuse optical imaging data of physical phantoms, despite a
80
decrease of the signal-to-noise ratio (SNR). This reduction in SNR presents a manageable
challenge within the robust framework of RTE nonlinear fitting, such as increasing the
reconstruction crosstalk between the two optical coefficients. However, this aspect has been
effectively addressed in the majority of our 3D-mDOI reconstructions, exhibiting robust
performance, accurately reflecting the expected �" values and capturing 66% of the expected
�! values. Rendered in 3D, the reconstruction successfully highlights tissue phantom features,
showcasing the method's potential despite the variability introduced by experimental noise and
consumer-level instrumentation. In contrast, the FEM yields consistently uniform values,
offering limited detail and failing to distinguish between different features effectively. 3D-mDOI
further improves by 8-fold computational memory efficiency with a 43-fold reduction in
computational time when processing physical phantoms.
Our processing pipeline incorporates additional constraints to mitigate the challenges of
reconstruction's fidelity as depth increases in physical phantoms. By imaging a uniform
phantom, we experimentally determine the absolute maximum reconstruction depth of the
OPTIMAP system. Analytically, we constrain the region of analysis to understand the effects on
reconstruction quality. Our results suggest that features extending deeper than the OPTIMAP
sampling will appear as extending to the phantom’s bottom edge. This is an expected result, as
3D-mDOI in this case would not have the information necessary for determining the deepest
point of such feature. However, this observation also unveils the promising prospect of honing
the depth estimation of biological features through a specific selection of the region of analysis,
an attribute not paralleled in traditional FEM techniques. Refining our methodology to achieve
precise depth estimation of features represents a key direction for our future research.
81
3D-mDOI emerges as a formidable potential solution for diverse scientific and medical
applications, notably for low-cost optical biopsy and melanoma detection. Achieving critical
benchmarks like sub-millimeter spatial resolution and a reconstructing depth of 5mm, the
system is well posed for quantifying the Breslow depth of melanoma, a crucial parameter for
assessing skin cancer severity. In the current healthcare ecosystem, the imperatives extend
beyond simply diagnostic accuracy. There is an escalating need for economically viable imaging
solutions, faster acquisition times, and enhanced patient quality of care. In these domains, 3DmDOI demonstrates distinct advantages. Its cost-efficient imaging acquisition framework not
only ensures cost-effectiveness but also boasts an intuitive design conductive to seamless
integration with various light-based imaging modalities, such as spectral imaging. As the
system continues to refine its reconstruction accuracy in future iterations, its scope is poised to
expand, potentially catering to the diagnostic requirements of multiple pathologies in a broader
array of tissues, beyond the skin, such as in dental, breast and brain.
82
Chapter 3 Multisite diffused optical imaging network
(mDOI-Net) for Sub-surface 3D Imaging of Tissues
3.1 Introduction
Machine learning techniques, particularly Deep Learning methods, have achieved
proficiency in identifying anomalies within 2D skin structures [77, 80, 82], demonstrating
diagnostic accuracy for melanoma that rivals experienced dermatologists [80]. However,
despite substantial research and literature on melanoma diagnostics, the actual use of ML
systems in clinical settings remains low. This disparity might be caused by two major factors:
the lack of interpretability of the Deep Learning networks and a narrow focus on melanoma
classification in the previous research.
First, the “black box” nature of neural networks causes opacity in their decision-making
processes, which in turn diminishes their effectiveness as diagnostic aids in medical settings.
This opacity becomes particularly problematic when a Deep Learning model classifies a skin
lesion as malignant; clinicians are frequently left without insights into the factors driving this
conclusion. Such opaqueness impedes dermatologists’ ability to understand and trust the
reasoning behind a diagnosis, which is crucial for effective medical practice. Consequently, this
lack of transparency represents a missed opportunity for dermatologists to gain valuable
educational feedback from interpretable network features, which could otherwise significantly
enhance their diagnostic skills over time. This challenge highlights the critical need for
advancements in neural network interpretability within medical diagnostics.
Second, current studies prioritize the pure distinction of melanoma from benign skin lesions
instead of providing auxiliary information for doctors, such as the suspiciousness of a mole or
83
enhanced information. While diagnosing melanoma is crucial, primary care doctors face
broader challenges. These include making decisions about patient management post-diagnosis,
such as deciding on the necessity of biopsies and planning appropriate follow-up actions, which
current ML tools do not sufficiently support [77]. Thus, the failure of ML tools to address the
real-world, practical needs of primary care is a likely reason for the disparity between research
topic and clinical needs.
The estimation of the tumor depth, or thickness, is a crucial metric in the diagnosis and
staging of melanoma primary care, yet it has been relatively overlooked in Deep Learning
research in this field. Traditional 3D imaging like reflectance or fluorescence-confocal
microscopy and optical coherence microscopy, which can directly estimate mole depth and
generate 3D volumetric results, are infrequently used in routine clinical practice due to financial
constraints and limited equipment availability. There have been several attempts to develop
cost-effective algorithmic approaches to derive the defocus depth map from a 2D dermoscopic
image for reconstruction the 3D surface of skin lesion in dermatology. One such method [9]
involves estimating the blur of a gaussian point spread function at the edges of a lesion image
to obtain this depth map. Subsequently, this depth information is used to reconstruct the 3D
surface of the lesion by fitting optimizing (Chapter 1.1.3 ). While an increase in estimated
depth correlates with an increase in Breslow depth, it is crucial to note that the actual Breslow
depth, typically determined through invasive biopsy, cannot be precisely calculated with this
method. However, incorporating estimated depth into Deep Learning models for skin lesion
classification has significantly improved the results, underscoring the importance of tumor
depth information in melanoma diagnostics. Leveraging Deep Learning to estimate tumor
84
thickness could exemplify how research can align more closely with clinical needs, offering a
valuable tool for melanoma care.
To address the above challenges, we introduce the Multisite Diffused Optical Imaging
Network (mDOI-Net), an interpretable Deep Learning approach, designed to estimate tissue
lesion depth. Building upon the foundational work of the 3D-mDOI pipeline (Chapter 2), mDOINet offers a cost-effective, interpretable, and adaptable solution for 3D tissue structure’s
estimation, bridging the gap between advanced research and practical clinical application.
The mDOI-Net approach builds upon three main components.
Synthetic dataset generation: the network initially trains on a tailored 3D synthetic
dermatology dataset derived from the HAM10000 dataset, to address the challenges of
acquiring clinical Diffuse Optical Imaging (DOI) and corresponding 3D lesion structures.
The process begins by generating numerous 3D synthetic lesion phantoms that replicate
the structural complexity observed in 2D dermoscopy images. Subsequently, Monte
Carlo simulations are employed to generate 2D reflectance images from these 3D
volumes, emulating the DOI process. This approach enables cost-efficient network
training and validation, allowing us to conduct extensive feasibility tests without the
financial and logistical burdens typically associated with collecting and annotating large
clinical datasets.
Physics-modeling: mDOI-Net incorporates physics-based models of steady-state diffuse
optical imaging within its Deep Learning architecture, enhancing the network’s utility
beyond a conventional “black box”. By integrating interpretable stochastic photon
behavior, the network offers insights into the physical interactions underlying the
85
imaging process. Additionally, its modular design, which organizes the architecture into
sections dedicated to specific subtasks, aids in simplifying debugging or performing
network’s corrections when needed. This approach not only simplifies maintenance but
also improves the model's interpretability. This thoughtful design is likely to enhance
confidence and trust among medical practitioners and their patients in its future clinical
usages.
Domain adaptation: the network’s capabilities are further validated through its
adaptability to both an advanced 3D synthetic dermatology dataset and physical
phantom data captured from a low-cost multisite acquisition platform (described in
Chapter 2.2). Refining the model to handle complex datasets that include unpredictable
elements—absent in synthetic data—is critical for expanding the usage cases of the
network. Successful validation tests demonstrate the network’s scalability to new data
types, facilitating a smoother transition from synthetic to real-world skin data analytics.
mDOI-Net's innovative approach, harnessing synthetic dataset generation, physics-based
modeling, and domain adaptation, converges to create a more robust model capable of
accurately estimating 3D tissue structures in real usage scenario. We not only ensure the
effectiveness in interpretable model training and validation, but also set the stage for a
seamless shift from synthetic to real-world skin data analytics. Our model is posed to be more
flexible, interpretable, and predictable than the current end-to-end, black-box neural network
benchmarks, leading it to a high potential of integration into the primary care setting.
86
3.2 Inspirations from Neural Volumetric Rendering
Reconstructing an object's intricate three-dimensional (3D) geometry based on its 2D
optical measurements is valuable in diverse fields outside biomedicine, ranging from virtual and
augmented reality to 3D graphics and robotics [111-113]. This remains a challenging task owing
to the partial 2D observations of an object can theoretically align with countless potential 3D
reconstructions. The domain of neural rendering is a proposed solution to address this
challenge, merging insights from computer graphics with the innovations of Deep Learning.
Neural networks have the capability to diminish the computational burdens associated with
conventional algorithmic-based computer graphics techniques. When optimized, these
networks are proficient in surpassing the speeds of traditional computer graphics techniques,
especially for complex scenes or when only approximate results are needed [112]. By
embedding computer graphics elements—such as camera models or ray tracing principles—as
regularization factors, network outputs are more physically plausible. This fusion enhances the
interpretability of the network and ensures that generated images align more closely with the
physical laws describing real-world.
3.2.1 Introduction of neural rendering techniques
Several neural rendering techniques with different characteristics have proven successful in
reconstructing 3D geometry of an object from multiple 2D images—often captured by camera
from various viewpoints. For instance, 3D-RecGAN [111] utilizes generative adversarial
networks to reconstruct the comprehensive 3D geometry of a specified object from a solitary
arbitrary depth view. 3D-RecGAN, however, requires the complete 3D shape ground truth
during training. PLATONICGAN [114], on the other hand, enables the generation of plausible 3D
87
shape models without the need for explicit 3D supervision. The network begins by converting a
variety of unstructured 2D images, each depicting different items from the same category, into
a compressed form, known as latent space, through an Encoder. Following this, a Generator
creates a 3D representation of the object's shape. This 3D model is then translated into a 2D
image by a rule-based, non-trainable, rendering layer, which is further evaluated by a 2D
Discriminator. This pipeline allows the network parameters to be updated based on
discriminating 2D images, circumventing the need for annotated 3D datasets. Inverse Graphics
GAN [115] enhances the 2D rendering process in the 3D reconstruction pipeline similar to
PLATONICGAN. The network employs a trainable proxy neural renderer that approximates the
results of a non-differentiable renderer. This enables the network to capture complex
photorealistic rendering patterns employed by the non-differentiable renderer and, as a result,
allows for the precise 3D reconstruction of an object's geometry.
Another interesting research field in neural rendering pertains 2D-to-2D mappings and is
called “view synthesis”: to the generation of observations for the object at new viewpoints.
Some of these view synthesis networks integrate 3D structure encoding as an intermediate step
in the process, to enhance the accuracy of the newly generated view angles, enabling realistic
scene rendering from multiple perspectives. One example is DeepVoxels [116], which
introduces a persistent 3D-structured scene representation in its generative neural network for
view synthesis. Notably, the DeepVoxels model achieves this result without the need to access
real a 3D reconstruction of the scene during training. This is a significant advantage as it
reduces the dependency on complex and often hard-to-acquire 3D annotated data. Owing to
88
those advantages, the backbone structure of the mDOI-Net is primarily inspired by the design
of DeepVoxels.
DeepVoxels requires a complex training to understand and infer 3D observed environments
from a provided 2D viewing scene:
1. A 2D U-Net feature extraction network initiates DeepVoxels’ training process, capturing
2D feature maps from the original 2D views.
2. A non-trainable lifting layer transforms the 2D feature maps into temporary 3D feature
volumes, each corresponding to a single 2D observation. The lifting layer employs the
World-to-View Transformation, where 2D features in the image space (the digital image
coordinate system) are extended to 3D feature volumes in the world space (the
coordinate system of the three-dimensional real world) following the corresponding
geometric rules.
3. A gated recurrent units fusion integrates the above temporary 3D feature volumes from
multiple observations to a persistent 3D volumetric feature volume, summarizing the
object's essence in world space as learned across the entire dataset.
4. A non-trainable projection layer then maps the persistent 3D volumetric features to the
Canonical View Grid using a perspective transform tailored to the novel viewpoint. This
process standardizes different views of an object into a common coordinate system,
facilitating subsequent rendering steps.
5. An occlusion network subsequently predicts the voxel boundary visibility under the depth
rays within the Canonical View Grid. By compressing the grid along the depth
89
dimension, a 2D depth map is generated, representing the object's visibility from the
new viewpoint.
6. Finally, a 2D U-Net rendering network takes the occlusion network's output to generate
2D scenes that align with the target views.
3.2.2 Adaptation to 3D Reconstruction of DOI
Investigating 3D structures through 2D imaging is crucial in both computer graphics and
biomedical research, particularly for Diffuse Optical Imaging (DOI) [87, 88, 116]. This
exploration is fundamental as it bridges methodologies and applications across these
disciplines. In computer graphics, acquiring multiple perspective images from various camera
angles is akin to the 2D reflectance images obtained at different incident light positions on a
sample's surface in DOI. These practices highlight the parallel objectives in both fields:
constructing a three-dimensional understanding from 2D representations.
In the field of computer graphics, the tasks of 3D reconstruction and view synthesis aim to
generate a 3D volumetric occupancy of the object from 2D images and create new viewpoints
scenes of an object, respectively. These tasks correspond closely with the inverse
reconstruction process in the DOI pipeline, which seeks to determine the three-dimensional
optical properties of a sample, and forward modeling, which predicts tissue reflectance under
varying structural illumination conditions. This synergy underscores a shared goal of
synthesizing and interpreting complex visual data to achieve a three-dimensional insight from
two-dimensional inputs showed in Figure 3.1.
Translating neural volumetric rendering techniques to address the DOI inverse problem has
demonstrated considerable potential in enhancing model interpretability and mitigating the
90
training challenges posed by the scarcity of annotated data, as discussed in Chapter 1.3.2 .
Models like PLATONICGAN and DeepVoxels, incorporate traditional rendering methods,
including ray casting and world-to-image space mapping. These techniques boost the
interpretability of the models and facilitate their troubleshooting, offering advantages over
more opaque end-to-end Deep Learning architectures, which perform multiple layers of neural
network transformation from raw 2D image directly to the 3D surface. Increased
interpretability in models facilitates the inference of 3D structures from 2D images without the
need for volumetric ground truth data. This feature is advantageous in contexts where clinical
data is sparse, such as in the accurate measurement of pathological Breslow depth
measurements.
While the adaptations of neural rendering to DOI’s inverse problem is theoretically sound,
practical challenges arise from the complex nature of tissue-photon interactions, which are
inherently more intricate than ray casting in computer graphics. Addressing these challenges
necessitates the generation of robust synthetic datasets, the modification of network
architecture, and the development of appropriate testing protocols and metrics that represent
photon migration. To this end, we have developed a 3D synthetic dermatology dataset derived
from dermoscopy images, encapsulating the structural details of skin lesions and simulating
their 2D reflectance under structured illumination. Integrating this dataset with the 3D-mDOI
model, which incorporates the physics of photon migration (Chapter 2.3), paves the way for an
interpretable and efficient reconstruction processes. We then devise multiple evaluations to
assess the network’s efficacy in overcoming the unique hurdles of diffuse optical imaging and
achieving domain-specific accuracy. The following sections will explore the intricacies of
91
dataset composition, network architecture, and performance evaluations, elucidating the
comprehensive approach undertaken in this research.
Figure 3.1 Exploring 3D structures via 2D imaging with Deep Learning: indirect reconstruction pipeline
sharing in Computer Graphics and Diffuse Optical Imaging (DOI) research.
The illustration outlines a cohesive workflow shared by DOI with Computer Graphics, detailing the
conversion of 2D images into 3D shapes and then back to 2D representations. Utilizing Deep Learning,
DeepVoxels, a neural volumetric rendering technique [116], creates diverse 2D views without relying on
pre-existing 3D models. Drawing inspiration from DeepVoxels, mDOI-Net framework reconstructs 3D skin
models from 2D reflectance imagery, reducing the dependency on 3D data in training. The process is twofold: initial 3D reconstruction is achieved through the optimization of 2D imagery alongside relevant
illumination data, such as camera positioning or structured light from a Digital Micromirror Device (DMD)
projector-camera setup. Subsequently, 2D projection techniques synthesize novel visual perspectives or
calculate 2D reflectance images. This convergence emphasizes a core concept common to both domains:
extracting intricate 3D details from straightforward 2D information, thereby deepening insights without
the necessity for 3D data throughout the model's training phase.
3.3 Synthetic lesion datasets
In the domain of Deep Learning (DL), the quality of the dataset substantially influences the
performance of DL architectures. Recent advancements in photon migration and DL research
[87-89, 116] have adopted predominantly a conventional approach to training data, often
92
employing simulations or fabrications of phantoms with uniform shapes or 3D Gaussian
geometries to represent anomalies. These conventional methods, however, are constrained by
their limited flexibility, leading to the generation of phantoms devoid of complex geometries.
This oversimplification is a significant impediment in the context of skin cancer diagnosis, where
lesions may exhibit complex spatial patterns. The aspect of asymmetry, crucial for malignancy
assessment, is inadequately represented by these uniformly shaped phantoms. It's critical to
develop a dataset that not only is diverse but that also accurately reflects the real-world
complexity of lesions to facilitate effective translation clinical applications.
The acquisition of clinical DOI images alongside their corresponding 3D lesion structures
presents significant challenges. To overcome these hurdles, we have devised a simulation
pipeline showed in Figure 3.2 leveraging clinical dermoscopic imagery, which is vital for training
robust Deep Learning networks. A notable example is the HAM10000 dataset [117], which
comprises 10,000 two-dimensional dermoscopic images of pigmented skin lesions. Our dataset
generation pipeline initiates with the transformation of 2D dermoscopic images into 3D lesion
volumes, creating 3D synthetic lesion phantoms that imbue the 2D lesions with threedimensional shapes and optical properties, guided by the established growth patterns of
pigmentation. Subsequently, the process entails converting these 3D lesion volumes back into
2D reflectance images, emulating the OPTIMAP acquisition technique discussed in Chapter 2.2.
This step involves Monte Carlo simulations to synthesize 2D reflectance images under
structured illumination, ensuring the synthetic dataset’s comprehensiveness and its accurate
representation of the lesions’ optical properties.
93
Figure 3.2 Overview of synthetic lesion dataset generation from 2D dermoscopic image.
This pipeline aims at training Deep Learning networks and portrays a simulation process that converts 2D
dermoscopic images into 3D synthetic lesion phantoms and subsequently estimates the 2D reflectance
images. Starting with (A) images from the HAM10000 dataset, the method involves simulating 3D shapes
that mirror the natural growth patterns of pigmented skin lesions and determining the lesion’s optical
parameters by estimating the melanin ratio from the 2D dermoscopic inputs. (B) The 3D synthetic lesion
phantoms are thoroughly detailed in their optical properties of absorption and scattering. Monte Carlo
simulations yield 2D reflectance images that replicate the OPTIMAP acquisition method, conducted under
structured illumination. This enhancement significantly boosts the probability that the (C) synthetic 2D
reflectance dataset is comprehensive and provides an accurate representation of the optical
characteristics of the 3D synthetic lesions. These synthetic 2D reflectance images then serve as inputs for
subsequent network training phases. Meanwhile, the optical characteristics of the 3D synthetic lesions
act as the 3D ground truth, essential for evaluating the performance of Deep Learning models. This
synthetic dataset creation strategy improves the robustness of Deep Learning models by furnishing them
with comprehensive and diverse training data, encapsulating the intricate interaction of shape, texture,
and optical properties typical of skin lesions.
3.3.1 From 2D to 3D: 3D synthetic lesion phantom generation
The generation of a synthetic lesion dataset commences with the creation of a detailed 3D
lesion phantom, derived from a 2D dermatological dataset. This process transitions from 2D
dermoscopic images to 3D synthetic lesions through two primary steps: generating the lesion's
3D morphology and defining its optical properties. Initially, the lesion’s 3D shape is inferred
from the observation of relevant clinical features in the 2D image. Subsequently, the colors and
patterns identified in the 2D dermoscopic image are translated into optical properties in the 3D
94
model, which represents the concentration and distribution of melanin within the area
observed.
This comprehensive approach results in a tissue phantom that encompasses a 3D synthetic
lesion, closely replicating the physical and optical properties of human tissue affected by a
lesion. This tissue phantom fulfills a dual role. First it acts as a 3D ground truth for the Deep
Learning network’s training process, providing a benchmark for accuracy and efficacy of the
network. Second, it serves as a foundation for generating 2D diffuse reflectance images which
are essential for network training input, facilitating a more robust and accurate training regime.
Through this integrated approach, the synthetic lesion dataset effectively bridges the gap
between 2D observations and 3D analytical requirements, enhancing the Deep Learning
model’s ability to reconstruct complex 3D dermatological conditions.
3.3.1.1 Exploiting 2D Dermoscopic images for simulating 3D lesions
In the traditional DOI, testing samples are often simplified to uniform spheres to facilitate
the process, but this simplicity compromises data diversity and fails to represent the
asymmetrical nature of tumors found in clinical settings. A more effective approach involves
designing testing samples that reflect the complex structural composition of target skin lesions,
as identified in 2D dermoscopy images and subsequently extrapolating these observations into
three-dimensional tissue constructs. The selection of 2D dermoscopy images for this purpose
should emphasize data quality, diversity, and quantity, with a preference for images that
include confirmed Breslow depth measurements to enrich the 3D modeling process.
The HAM 10000 dataset [117], an acronym for "Human Against Machine with 10,000
training images," provides an extensive collection of over 10,000 dermoscopic images featuring
95
pigmented skin lesions. This dataset is particularly valuable because it encompasses a wide
range of dermatological conditions with more than 95% of the images representing the top
seven diagnostic categories in dermatology, including both melanocytic and non-melanocytic
pigment lesions. Sourced from a diverse patient demographic and captured by different
imaging modalities, the images in the HAM10000 dataset have been subjected to strict quality
control measures to ensure their fidelity. While the dataset's extensive size and diversity make
it an invaluable asset for benchmarking the efficacy of Deep Learning algorithms against
dermatological expertise, it lacks information on tumor thickness, a critical parameter for
comprehensive analysis.
By leveraging the detailed surface and shape information present in the HAM10000 twodimensional images, we can construct a robust foundation for modeling 3D synthetic lesions.
This approach not only addresses the limitations of traditional DOI synthetic samples uniformity
but also paves the way for more accurate and realistic skin data analysis in Computer Assisted
Detection and Computer Assisted Diagnosis.
3.3.1.2 Referencing simulation dataset with Breslow depth-verified dermoscopic images
To establish a simulation dataset enriched with Breslow depth information, it is crucial to
utilize data that include verified depth measurements. Integrating 2D dermoscopic images with
verified Breslow depth enhances the accuracy of synthetic 3D lesion phantom modeling. The
HAM10000 dataset, despite its extensive 2D data, does not provide Breslow depth, highlighting
a gap that needs addressing. By integrating dermoscopic images with established Breslow
depth, we pave the way for more accurate synthetic 3D lesion modeling, which in turn validates
the efficacy of our 3D reconstruction techniques.
96
The International Skin Imaging Collaboration (ISIC) Archive [9, 118] serves as a vital resource
in this context. As the principal publication platform for the HAM10000 dataset, ISIC Archive
offers a broad and publicly available collection of skin images. With over 70,000 images and
more than 250 annotated for melanoma thickness, the ISIC Archive becomes instrumental in
supplementing the Breslow depth dimension. However, it is important to note the variability in
image quality within ISIC, given its community-driven nature, in contrast to the consistently
high-quality standard of HAM10000.
To bridge the information gap in HAM10000, we incorporated in our work a select group of
22 high-quality dermoscopic images from the ISIC Archive (Figure 3.3), each verified for Breslow
depth through biopsy. The data has also been employed to assess prior Breslow depth
estimation algorithms referenced in Chapter 3.1. Although it provides corresponding Breslow
depths with 2D dermoscopic images, there are two notable limitations. First, the Breslow
depth range of 0-1mm categorizes all samples as in situ melanoma, as discussed in Chapter
1.1.1 , leading to a lack of diversity in melanoma samples across different depth categories.
Second, a single depth value corresponds to the entire 2D lesion area, which may not reflect
the depth variability across the lesion. Despite these limitations, the dataset is instrumental in
integrating biopsy-confirmed biological data into the generation of synthetic datasets. A subset
of these ISIC-derived lesions, notable for their verified Breslow depth and limited number, is
employed to evaluate the performance of mDOI-Net. This targeted use ensures efficient
utilization during the validation process of the 3D reconstruction.
97
Figure 3.3 ISIC archive samples with corresponding biopsy-verified Breslow depth
This figure demonstrates the distribution of biopsy-confirmed Breslow depths within the range of 0 to
1mm for melanoma samples from the International Skin Imaging Collaboration (ISIC) Archive. There are
22 image samples in total, each sample is annotated with its respective depth. The integration of the
high-quality dermoscopic images in combination with the depth information serves as a crucial resource
for generating synthetic 3D lesion phantoms and is imperative for the validation of the performance of
3D depth estimation via mDOI-Net.
3.3.1.3 3D simulation shape generation
Melanomas, with their complex 3D morphologies, pose a significant challenge in
morphological modeling from 2D to 3D. The inherent heterogeneity and irregularity of
melanomas demand robust and adaptable modeling techniques for accurate representation.
Recognizing the critical role of precise shape generation in diagnosis and treatment, we have
developed three distinct approaches for this morphological transformation, each increasing in
complexity to cater to different modeling requirements (Figure 3.4).
The initial step in constructing a 3D lesion’s model, common across all methods, involves
identifying the lesion’s location on the skin surface using a 2D dermoscopy image. For the
images in HAM10000 dataset, we employ the clinically validated binary segmentation masks
98
provided by the datasets to directly locate the lesion. For the images in the ISIC Archive, we
derive binary segmentation masks via image processing techniques. The process begins by
converting the RGB dermoscopic images into the Hue, Saturation, Value (HSV) color model,
focusing on the Value channel, where skin lesions typically appear darker due to higher melanin
concentration. Using Otsu's thresholding [119], we identify these labeled lesion regions by
finding an optimal threshold that minimizes variance within the Value channel. Morphological
closing is then applied labeled lesions to fill in small gaps and connect disjointed areas.
Following this, labeled lesion regions are evaluated based on their area, and those below a size
threshold are discarded. This method effectively creates a binary lesion mask for analysis.
Geometric Ellipsoidal Model is the first approach and simplifies melanoma representation to a
semi-ellipsoid shape, fitting an ellipse to the 2D binary mask and assigning a randomized
depth to the lesion. This method’s primary advantage is its simplicity, offering a clear
equation-based representation that eases the optimization process in the inverse problem.
However, this comes at the cost of potentially neglecting detailed morphological features,
which may impact accurate tumor staging and treatment planning. This model is ideally
suited for scenarios where primary lesion attributes —such as spatial localization, diameter,
and depth—are the focus of analysis.
Depth-Stratified Erosion Technique provides a more comprehensive representation by
systematical erosion of the lesion area based on a depth function. This approach generates
the lesion’s structure layer by layer, with each stratum undergoing binary erosion to reflect
its depth, culminating in a 3D binary matrix that captures the melanoma's spatial
99
complexity. This method is advantageous for its ability to mirror the genuine heterogeneity
and irregularities typical of melanomas.
Spherical Harmonics-based Morphological Rendering is the most complex method and utilizes
spherical harmonics to depict the melanoma’s morphology [120-122]. This approach maps
the lesion’s shape onto a spherical surface using orthogonal functions, excels in
representing complex shape distortions, allowing for a detailed 3D model from a 2D
contour, after identifying the spherical harmonic coefficients that optimally represent the
2D lesion mask's shape on the surface. This technique allows for increased variability in
tumor 3D model, as the same 2D surface contour can originate multiple 3D shapes. This
approach, however, requires considerably more computational resources compared to the
methods we previously described.
Figure 3.4 Increasing diversity of 3D Morphological Modeling of Melanomas
We develop a series of techniques for modeling 3D morphological characteristics of melanomas
representing a progression in modeling complexity, from simple geometric representations to
sophisticated morphological renderings. Each technique begins by identifying the lesion with a 2D
segmentation mask from dermoscopic images. (A) The Geometric Ellipsoidal Model simplifies this
segmentation to an ellipsoidal shape that fits the binary mask's contour, ideal for basic shape analysis.
Growing in sophistication, the (B) Depth-Stratified Erosion Technique incrementally erodes the binary
mask as it delves deeper, extending the lesion's 2D heterogeneity into 3D. The most advanced technique,
(C) Spherical Harmonics-based Morphological Rendering, utilizes orthogonal functions to depict complex
shapes, capturing detailed distortions in the 3D structure of melanomas. Each method escalates in
complexity to mirror the varying complexity and heterogeneity inherent in simulated melanoma lesions
but scales complexity with computational costs.
100
For 2D images with annotated Breslow depth, we align the synthetic 3D lesion’s maximum
depth with the Breslow measurement. In the ellipsoidal model, the ellipsoid’s semi-axes are
adjusted to match the Breslow depth. In the erosion technique, the depth is controlled by
modifying the erosion rate. In the spherical harmonics approach, a depth constraint is
incorporated during the optimization of coefficient. This detailed approach ensures that the
synthetic 3D models accurately reflect the physical depth of the actual melanomas, enhancing
the realism and applicability of the simulation in our analysis and Deep Learning training.
3.3.1.4 3D Optical parameter estimation and assignment
Traditional approaches often involve uniformly assigning optical properties across the entire
lesion, a method that simplifies the model but at the cost of neglecting lesion irregularities. To
preserve the inherent diversity observed in 2D dermatological images, we have developed a
series of techniques that enhance the detail and accuracy of optical parameter assignments in
lesions (Figure 3.5).
Skin Tone Melanin Mapper is a color matching technique that translates skin tone into a
melanin percentage map, which then informs the estimation of the optical properties in the
affected region. Initially, we gather skin color (RGB) images representing a range of skin
tones [123], each correlating with different melanin percentages. We convert the image
into Hue, Saturation and Value (HSV) color space. Leveraging the Value channel from these
skin images in correlation with their melanin percentages, we create a precomputed skinmelanin lookup map. This map is utilized to transform the Value Channel of each
dermoscopy image into a melanin percentage map tailored to the lesion area, guiding the
101
calculation of optical properties based on the equation established by Jacques et al.'s for
tissue simulation [73]. The 2D optical properties determined for the lesion's surface are
then used to compute optical properties at various depths.
Depth-dependent Degradation is another technique we applied in the assignment of apparent
optical properties in lesion modeling, that draws inspiration from the Beer-Lambert law. In
Beer-Lambert law, when light passes through an absorbent medium, its intensity is reduced
exponentially as a function of distance traveled within the medium following the equation :
� = �& ∗ exp (−�! ∗ ����ℎ) , with I representing the final intensity, �& the intensity at the
initial position and �! the absorption coefficient. We apply this principle to model the
depth-related parameters of melanoma, which typically starts as a superficial growth and
invades deeper into the skin. We hypothesize that as depth increases, the measurement of
the lesion’s absorption coefficient will decrease, owing to Beer-Lambert law and the
decreasing density of cancer cells. First, while the absolute optical properties of the lesion
might show to be not affected by depth [124], our calculations will show an apparent
change in optical properties due to the measurement of signals deteriorated by depth. As a
consequence, we will observe an apparent change in the estimated optical coefficients,
following the equation �!
% = �! ∗ exp (−�! ∗ �), where �!
%
represents the absorption
coefficient of the subsequent layer beneath �! . Second, it is reasonable to hypothesize
that deeper layers of a lesion may have a lower density of cancer cells because as cells
proliferate, they may be more concentrated near the surface where they have better access
to nutrients and oxygen. Therefore, with the decreasing density of cancer cells with depth,
the actual absorption is expected to drop. This model adapts these assumptions not just for
102
absorption but also for scattering, offering a comprehensive depiction of the empirical
measurements of a lesion’s optical properties change at varying depths.
3D Perlin Noise is used to introduce a degree of randomness into the phantom generation
process, simulating the unpredictable growth patterns of cancers [125, 126]. Originating
from computer graphics, Perlin noise — a form of gradient noise — is widely recognized for
its ability to produce procedural textures. In a three-dimensional representation, the
resulting patterns of Perlin noise closely mimic the erratic growth observed in tumors,
making it suitable for emulating structures similar to breast lesions and capturing the
complex anatomy of breast tissue. We adjust the scale and frequency of the noise to craft
formations that resemble the asymmetrical appearance of skin lesion. Perlin noise is
generated five times in both the x and y dimensions and once in the z dimension, ensuring
that the noise integrates seamlessly with the previously generated 3D phantom.
Figure 3.5 Increasing diversity of 3D optical parameter assignment
This figure illustrates the different stages of complexity in simulating optical properties of melanoma
lesions. (A) The Skin Tone Melanin Mapper translates 2D dermoscopic imagery into precise lesion melanin
maps using precomputed skin tune-melanin lookup maps, guiding accurate optical property estimation at
the lesion's surface. (B) Depth-Dependent Degradation, inspired by the Beer-Lambert principle, models
the reduced absorption coefficient and scattering with increasing depth, reflecting the lesion's
progression with depth. (C) 3D Perlin Noise introduces procedural randomness to replicate the
unpredictable morphology of cancerous growth, ensuring our 3D phantoms embody the heterogeneous
anatomy observed in actual melanoma lesions. The incremental increase in methodological
sophistication ensures that our simulation captures the nuanced variation and complexity of melanoma
103
lesions, mirroring the dynamic and multifaceted nature of their progression to their optical parameter
assignment.
3.3.1.5 Overview of diverse synthetic lesion datasets
Based on the previous techniques, we designed a series of simulation that introduce various
levels of complexity to the generation of 3D synthetic lesion phantoms. These simulations are
tailored to meet the diverse evaluation needs outlined in Chapter3.6.4 , aiming to test the
network across a wide range of scenarios. The complexity of these simulations is achieved by
employing different rules for assigning 3D shapes when generating synthetic lesion phantoms,
enhancing the dataset’s versatility, and enabling comprehensive network testing against a
broad spectrum of scenarios. Concurrently, it permits the application of different dataset
versions to assess the unsupervised network's domain adaptation capabilities.
To enhance the generalization capabilities, we've developed two main types of 3D synthetic
lesion phantoms: synthetic derm cones and uniform cylinders. Figure 3.6 and Error! Reference
source not found. (column 1 to 3) feature the Synthetic Derm Cone types —Standard,
Constrained (-), and Enhanced (+), each depicting varying degrees of lesion complexity. The
simulation of all phantoms in this series initiates with lesion localization from 2D dermoscopic
images, then converts the lesion’s asymmetry in the image into 3D cone-like lesion structures.
The standard Synthetic Derm Cone, which was the first implemented in this series, utilizes
the images in the HAM10000 dataset. Its generation pipeline employs the DepthStratified Erosion Technique for 3D shaping, further adding the depth-related optical
property degradation.
104
The Constrained version modifies this approach and utilizes the ISIC Archive images with
known Breslow depths to tailor the 3D lesion depth.
The Enhanced version progresses the generation pipeline further with Spherical Harmonicsbased Morphological Rendering and 3D Perlin noise to achieve more realistic lesion
textures and complexities. Subsequently, 2D reflectance images in those synthetic
phantoms are generated through a high-speed Monte Carlo simulation process.
The Uniform Cylinder series in Figure 3.7 and Table 3.1 (column 4 to 5) complements the
synthetic derm cone series by simplifying lesion model to match physical phantom in Table 2.1.
This series has two types of phantoms:
The OPTIMAP version of dataset, which facilitates reflectance measurements and offers a
verified 3D ground truth for the physical phantom. Given the challenge of producing
complex shapes with heterogeneous optical properties, we utilize a cylinder object with
uniform optical coefficients to represent lesion.
The Synthetic version phantom mathematically replicates previous physical phantom,
utilizing a Geometric Ellipsoidal Model to ensure both the consistency of the 3D shape
and the homogeneity of parameters.
We further employ the OPTIMAP system (Chapter 2.2.2 ) for physical 2D reflectance
measurements and Monte Carlo simulations for image estimation respectively. Notably, the 2D
reflectance images obtained by the OPTIMAP system inherently include experimental noise,
adding a layer of variability to the data of Synthetic Uniform Cylinder.
105
Type of 3D
synthetic
lesion
phantom
Synthetic
Derm
Cone
Standard
Synthetic
Derm
Cone
Constrained (-)
Synthetic
Derm
Cone
Enhanced (+)
Synthetic
Uniform
Cylinder
OPTIMAP
Uniform
Cylinder
2D Datasets
Expansion
HAM10000 ISIC Archive HAM10000 NA NA
3D shape
assign
DepthStratified
Erosion
Technique
Depth-Stratified
Erosion
Technique
+ Breslow depth
constraints
Spherical
Harmonics-based
Morphological
Rendering
Geometric
Ellipsoidal
Model
Physical
phantom design
Parameter
Assignment
Depthdependent
degradation
Depthdependent
degradation
3D Perlin noise Uniform Uniform
Training Size 2000 NA 200 500 NA
Testing Size 20 22 20 20 2
Usage
Network
selection&
Network
training
Network
selection&
Domain
adaptation
Domain
adaptation
Network
training &
Domain
adaptation
Domain
adaptation
Table 3.1 Overview of 3D synthetic lesion phantom series for enhanced generalization
This table delineates two distinct series of 3D synthetic lesion phantoms: the Synthetic Derm Cone and
the Uniform Cylinder. Each series is crafted from specific clinical image datasets, 3D shape assignment
methods, and parameter assignment strategies, facilitating network selection, network training and
domain adaptation. Columns 1 to 3 elaborate on the Synthetic Derm Cone series variants: Standard,
Constrained (-), and Enhanced (+). These variants are developed using 2D clinical images datasets called
the HAM10000 and ISIC Archive and incorporate advanced techniques such as Depth-Stratified Erosion
and Spherical Harmonics-based Morphological Rendering for 3D modeling. Columns 4 and 5 introduce
the Uniform Cylinder series. The Synthetic Uniform Cylinder uses Geometric Ellipsoidal Models to achieve
consistent shapes and parameters. The OPTIMAP Uniform Cylinder provides a physical phantom's 3D
ground truth with OPTIMAP-obtained 2D reflectance images that include inherent experimental noise.
The table also specifies the usage context for each phantom type, addressing network selection and
Domain adaptation issues, and lists the dataset sizes available for training and testing purposes.
106
Figure 3.6 Comparison of Synthetic Derm Cone series—Standard, constrained (-), and Enhanced (+)
This figure presents three 3D synthetic lesion phantoms generated from the same 2D dermoscopic image
utilizing the Synthetic Derm Cone model, demonstrating varying degrees of lesion complexity. (A) The left
column shows the initial dermoscopic image obtained from existing clinical image databases, the labeled
lesion candidate, and the computed binary lesion mask used for localization. (B) The middle and (C) right
columns display the �- (absorption coefficient) at the first XY slice and the 3D rendering of �- for the
Standard, Constrained (-), and Enhanced (+) Synthetic Derm Cone phantoms. The standard version uses
HAM10000 dataset images and the Depth-Stratified Erosion Technique to shape the lesion in 3D, while
the Constrained version incorporates Breslow depth from ISIC Archive images to refine the lesion's depth.
The Enhanced version employs Spherical Harmonics-based Morphological Rendering and 3D Perlin noise
to add realistic textures. These phantoms are instrumental in refining Deep Learning algorithms and
evaluating the network’s capability of the domain adaptation.
It is important to emphasize that while our synthetic lesion generation pipeline effectively
simulates the visual aspects of tumors, it does not replicate the exact biological and physical
characteristics inherent to actual tumors. Our primary aim remains to develop a 3D lesion
model that reflects the properties typically associated with melanoma. This 3D lesion modeling
107
process is not intended for data inference or actual lesion modeling. Instead, its primary role is
to serve as a benchmark — the ground truth — against which we can assess our network and
validate its performance.
Figure 3.7 Synthetic and physical uniform cylinder lesion phantom
3D representations of uniform cylinder lesion phantoms from both Synthetic and OPTIMAP datasets. (A)
The first column displays the absorption coefficient (�-) at the initial XY plane: both the synthetic lesion
and the OPTIMAP-imaged physical phantom are depicted as cylinders with uniform, constant value for
each of the optical coefficients. (B) The second column provides a 3D rendering of �-, demonstrating the
uniform shape and parameter consistency for both the synthetic and expected physical phantoms. While
actual optical coefficients for the physical phantom may vary due to bubble formation during its creation,
this detail is omitted in our analysis. This juxtaposition highlights the utility of employing simplified
geometric models to replicate the complexities found in physical lesions.
The culmination of the above techniques results in a dynamic 3D tissue phantom structure.
This structure serves as the foundational ground truth for subsequent network training,
providing a critical tool for testing the network’s capabilities and its ability to adapt to different
domains. This approach not only supports the development of our network but also
contributes significantly to its potential application in real-world clinical settings, where diverse
and complex data scenarios are commonplace.
108
3.3.2 From 3D to 2D: 2D reflectance images estimation
Monte Carlo simulations are a cornerstone for modeling light-tissue interactions in Diffuse
Optical Imaging, as they effectively capture the complex and stochastic photon migration
patterns within biological tissues. By applying Monte Carlo simulation tools [73] to the
simulated 3D lesion volumes, we generate 2D reflectance images as they would appear under
structured illumination, emulating the OPTIMAP acquisition technique (Chapter2.2.2 ). These
generated images then serve as inputs for the inverse reconstruction network, facilitating the
translation from 3D models to 2D imaging data.
However, employing Monte Carlo simulations to generate large-scale datasets present two
significant challenges: the considerable computational time and the present inconsistencies in
the code language.
Monte Carlo simulations are computationally intensive. For instance, generating a single
reflectance image with one illumination setting using conventional tools can take
upwards of 5 minutes. Extrapolating this timing, producing a dataset of a thousand
phantoms, each with 21 different illumination settings, would require an impractical
duration of approximately 2.5 months.
The coding language inconsistencies further complicate the process. In our simulation
setup, illumination settings are configured in MATLAB, while the well-developed Monte
Carlo simulation repositories [73] are written in C, and Deep Learning tasks are
executed in Python [127]. This diversity necessitates complex integrations and results
in additional time overhead for ensuring proper communication between the different
programming environments.
109
To address these hurdles, we have undertaken significant code optimization and integration
efforts. Both the illumination settings and the Monte Carlo simulation code have been
rewritten in Python, and the simulation has been enhanced with CUDA acceleration using
Numba python package. This strategic approach drastically reduces the computational time,
condensing what would have required months of computation into approximately 15 hours.
This unification simplifies the workflow, enhancing compatibility across the different platforms
and substantially bridging the gap between simulation and Deep Learning environments. This
optimization not only accelerates the dataset production process but also improves the
efficiency and feasibility of integrating Monte Carlo simulations into our imaging and analysis
pipeline.
3.3.2.1 CUDA-Accelerated Monte Carlo Simulations Using Numba
For rapid generation of 2D reflectance image for the datasets, we leverage the capabilities
of the Numba CUDA package, significantly expediting the photon diffusion Monte Carlo
simulations that are based on 3D tissue sample data. Numba is a just-in-time (JIT) compiler that
is particularly effective at converting Python and NumPy code into optimized machine or GPU
instructions. This conversion substantially increases the execution speed of numerical
operations within Python, eliminating the need to resort to traditionally faster programming
languages like C for computational tasks.
In our application, we employ the “cuda.jit” decorator from Numba to enable the
compilation of Python routines for Monte Carlo simulations on NVIDIA CUDA-equipped GPUs.
A decorator can be thought as a bridge that adds functionality to an existing Python function
without altering their core definition. In this specific application, the decorator wraps and
110
extends the behavior of the python Monte Carlo simulation function without permanently
modifying it. This modification allows for an efficient emulation of photon trajectories within a
3D lesion phantom. We adapted a 3D Monte Carlo simulation tool [73], integrating it with
Python using the Numba decorator to achieve these improvements.
A comparative assessment of computational times across four different computational
approaches—pure Python, Python with multiprocessing, Numba CPU, and Numba CUDA—
demonstrates significant efficiency improvements (Table 3.2) with the same Monte Carlo
simulation. The simulation models the illumination of a uniform dermis phantom with a beam
composed of 1000 photons, directed at the phantom's center. The resulting 2D reflectance
image, as shown in the Figure 3.8, features a distinct pattern: a diffused halo at the center that
fades progressively towards the image's edges. The total reflectance ratio, denoting the
amount of light reflected from the surface relative to the incident light, consistently
approximates 0.92 across all settings, as shown in the Table 3.2. The consistency of the total
reflectance ratio across multiple approaches confirms that significant efficiency gains achieved
with the Numba CUDA implementation do not compromise the functional accuracy of Monte
Carlo simulations.
Remarkably, the Numba CUDA implementation finishes in at 0.033 minutes, showcasing an
efficiency about 100 times faster than that of the multiprocessing Python version, which takes
3.12 minutes. This marked improvement in processing speed greatly enhances our ability to
generate datasets swiftly, thereby streamlining the entire simulation and analysis pipeline and
facilitating faster advancements in networks training.
111
Table 3.2 Comparison of computational times and total reflectance ratios across different
computational approaches
This table presents a comparative assessment of computational times for a Monte Carlo simulation using
four different methods: pure Python, Python with multiprocessing, Numba CPU, and Numba CUDA. Each
method was tested on a Monte Carlo simulation of 1000 photons illuminating a uniform dermis phantom,
and the times recorded demonstrate a substantial increase in efficiency with the Numba CUDA method.
Furthermore, the total reflectance ratio, indicative of the amount of light reflected from the surface
compared to the incident light, remains relatively stable. These results underscore the significant
efficiency gains achievable with Numba CUDA, enhancing dataset generation and accelerating the
simulation-analysis pipeline for neural network training.
Figure 3.8 Visualization of 2D reflectance patterns generated with Monte Carlo across different
computational approaches
The figure illustrates the uniformity of 2D reflectance patterns generated under four different
computational frameworks: pure Python, Python with multiprocessing, Numba CPU, and Numba CUDA.
We display the 2D reflectance results on a logarithmic scale, resulting in negative values as reflectance is
below 1. Each image represents the distribution of reflected photons from a uniform dermis phantom
irradiated with a beam of 1000 photons, color coded by the number of photons. Independently from the
computational method employed, each reflectance image exhibits a central diffused halo, indicative of
photon dispersion, which diminishes in intensity towards the periphery of the frame.
112
3.3.2.2 Structural illumination setting for 2D reflectance image
We have transitioned the illumination setting code from MATLAB to Python and refined this
processing pipeline to enhance the effectiveness of the reflectance simulation. Before
generating the 2D reflectance estimated from the HAM10000 images, several preprocessing
steps are applied to the synthetic 3D lesion phantom to optimize the quality and computational
efficiency of the resultant 2D images. The synthetic 3D lesion phantom is resized into a 3D
matrix of dimensions 45x60x10 pixels, with each pixel representing a resolution of 0.5mm. This
specific configuration strikes a balance between computational efficiency and the spatial
resolution of the 3D phantom.
Lesions are then cropped to a volume of dimension of 15x30x10 pixels and repositioned to
the central region of the phantom. This centralization ensures optimal illumination quality by
enhancing the effective capture of photons regardless of the illumination angle or detector
position. These adjustments, in clinical settings, can be physically applied by adjusting the
camera's focal length and ensuring the area of interest is centered.
To simulate the actual OPTIMAP system, we establish appropriate illumination conditions
using CUDA-Accelerated high-speed Monte Carlo Simulations. This involves simulating a grid
pattern of 21 point-like illuminating beams. These beams are spaced 4 pixels apart (2mm) and
positioned at least 8mm from the phantom’s corner to ensure a maximal intended depth of
sampling within the tissue of 4mm. By structurally shifting the illumination pattern's position,
different sample areas are illuminated. Here we do not simulate the entire illumination pattern
at once but rather each single beam sequentially to avoid spatial overlap and unnecessarily
complex Monte Carlo illumination settings. These 2D reflectance images emulate the data that
113
would be captured by a camera lens during non-contact sample collection, serving as the
primary input for the mDOI-Net network.
3.4 Multisite diffused optical imaging network (mDOI-Net)
We propose mDOI-Net (Figure 3.9), an advanced Deep Learning framework merging the
flexibility of Deep Learning with the robust 3D-mDOI pipeline (Chapter Error! Reference source
not found.). Inspired by the view synthesis method known as DeepVoxels (Chapter 3.2), which
blends rule-based rendering into neural networks, mDOI-Net employs a two-pronged neural
network:
Reconstruction module, which focuses on inverse reconstruction to decipher 2D optical
information from DOI reflectance images and convert it to 3D data.
Projection module, which handles projection, translating the 3D model back into a 2D
format to simulate DOI measurements.
This versatile design allows each module segment to operate independently or in
conjunction, addressing a wide array of challenges in Diffuse Optical Imaging.
By cascading the two neural networks, mDOI-Net facilitates a 3D reconstruction in training
without the need for 3D ground truth data. In Figure 3.9, the mDOI-Net’s pipeline initiates with
the reconstruction network, which discerns the 3D optical properties from the DOI reflectance
measurements obtained from test samples. These deduced properties are then leveraged by
the projection module to simulate new reflectance data from the estimated 3D optical
properties. This step essentially predicts how the surface observation of this virtual 3D sample
would appear, based upon estimation of the interaction of light with tissue with the
reconstructed 3D optical properties.
114
The efficacy of this two-pronged model is not contingent upon the availability of large-scale
3D tissue models, thus eliminating a significant obstacle encountered in practical applications.
Through a methodical process of iterative comparisons between the estimated synthetic
reflectance and actual measurements, the networks engage in continuous self-optimization,
showing in capability of domain adaptation. This dynamic refinement mechanism significantly
enhances both the accuracy and speed of DOI inverse reconstructions, offering substantial
improvements over traditional methodologies.
Figure 3.9 Schematic representation of the mDOI-Net framework, an advanced Deep Learning model
for Diffuse Optical Imaging (DOI).
We capture 2D reflectance images (A) using a Digital Micromirror Device (DMD) projector and camera
setup. The input reflectance images are then processed by the mDOI-Net (B), which comprises two main
neural network modules: the Reconstruction module and the Projection module. The Reconstruction
module converts 2D DOI reflectance images into a reconstructed 3D lesion matrix. This 3D output is
subsequently transformed back into 2D reflectance images by the Projection module. These computed
2D reflectance images (D) are compared with the captured 2D reflectance image for loss computation and
further updating of the network parameters. This framework enhances DOI reconstruction by allowing
iterative comparisons and adjustments without the need for 3D ground truth data, significantly improving
accuracy and speed in DOI reconstructions when 3D clinical data are limited.
115
3.4.1 Limitations of 3D-mDOI and motivation for a novel approach
This work of thesis focused on developing innovative reconstruction methods that enhance
the accuracy and efficiency of solving inverse problems within the Diffuse Optical Imaging (DOI)
domain. 3D-mDOI (Chapter 2.4) is an algorithmic solution that derives three-dimensional
structural information from two-dimensional reflectance images obtained under structured
illumination. 3D-mDOI integrates analytical solutions of the Radiative Transfer Equation (RTE)
with spatially restricted, banana-shaped mapping for 3D reconstruction from the 3D photon
trajectory distributions (Chapter1.2.4 ). In comparison to the Finite Element Method as
numerical benchmarks (Chapter 1.2.3.2), the 3D-mDOI approach excels in multiple metrics: it
provides higher visualization fidelity, enables more robust quantitative analysis, and improves
computational efficiency and memory usage. These advancements suggest that 3D-mDOI holds
significant promise for estimating melanoma Breslow’s depth.
Despite its progress, the 3D-mDOI pipeline has some limitations:
(1) It generates elative, rather than absolute, light coefficient matrices, capturing
interrelationships among coefficients but failing to correlate their magnitudes to a
known standard reference.
(2) Grid-like artifacts on the reconstructed surface can compromise both visual and
analytical quality, necessitating manual segmentation that is both labor-intensive and
prone to operator variability.
(3) Despite computational speed improvements, further advancements are essential to
meet the rapid pace demanded by real-time clinical applications.
116
These limitations are consequence of the simplifications required to model complex realworld phenomena into algorithmic code, which is subject to a balance between computational
tractability and accuracy. While the simplifications make mathematical solutions feasible, they
fail to capture the complexity of real-life applications, opening the results to artifacts.
Deep Learning models offer a promising alternative as they are known for their flexibility
and adaptability, as well as their ability to learn from data and adjust to new conditions.
However, these models also present challenges and limitations owing to issues such as
overfitting, data quality, and the interpretability of the models themselves. These
considerations, together with the complexity of obtaining 3D dermatology clinical data, should
be considered while developing an alternative Deep Learning solution for Diffuse Optical
Imaging.
3.4.2 Reconstruction module: an architecture for solving inverse problems
We propose the Reconstruction module, an advanced AI-powered solution that enhances
the 3D-mDOI algorithmic pipeline with Deep Learning. Similar to 3D-mDOI, the input of the
Reconstruction module is a series of 2D diffuse reflectance images, each captured with
different structured light patterns shining upon a test sample. To reduce complexity and
mitigate interference from multiple light sources, these images are segmented into discrete
patches. Each patch isolates a single light source and its immediate area, allowing for targeted
analysis.
117
The Reconstruction module reinterprets the three crucial steps of the traditional 3D-mDOI
pipeline - 2D nonlinear fitting, 3D reconstruction, and uniformity calibration – respectively into
three parts (Figure 3.10):
1. 2D Light Properties Extraction Net,
2. Photon Banana Lifting Layer,
3. 3D Correction Net.
The three parts of the reconstruction module can be summarized as follows:
2D Light Properties Extraction Net is executed through a regression model via Multilayer
Perceptron (MLP) to optimize the matrix inversion via non-linear fitting to the Radiative
Transfer Equation (RTE) in 3D-mDOI. The MLP's ability to identify complex, non-linear
patterns enables an accurate extraction of the 2D light coefficients, providing a robust
foundation for subsequent three-dimensional mapping.
Photon Banana Lifting Layer replaces the 3D photon distribution map of 3D-mDOI (Chapter
2.4), lifting the derived 2D optical properties into a consolidated 3D volume. For each
source-detector pair, an appropriate 3D photon distribution is selected based on their
distances and is used to scale the derived 2D optical properties to 3D domain. After
aggregating these transformations for each illumination point across all source-detector
pairs, the resulting 3D absorption and scattering coefficients volume is normalized, forming
the foundation for further refinement. This layer is designed to be differentiable which
means that the model has the capability of computing gradient and updating the parameter
through training, which are cornerstones of Deep Learning paradigms.
118
3D Correction Net focuses on enhancing the uniformity calibration. In the 3D-mDOI version of
this module, 3D relative light coefficient matrices are carefully denoised to correct any nonuniformities or noise artifacts resulting from 3D-mDOI analysis. In this Reconstruction
module, the calibration is handled through a synergistic combination of MLP and 3D U-Net.
The MLP serves as a global scale corrector, adjusting the scales of absolute light coefficients
across the volumetric data, while the 3D U-Net excels in removing artifacts and noise,
significantly improving the fidelity of the reconstructed images.
Figure 3.10 Reconstruction evolution: from 3D-mDOI to Reconstruction module
The figure shows an overview of how we incorporate Deep Learning to modify the 3D-mDOI algorithmic
pipeline, achieving a more precise and efficient solution for the inverse problem in Diffuse Optical Imaging
(DOI). (A) 3D-mDOI consists of sequential steps including 2D nonlinear fitting, 3D reconstruction, and
uniformity calibration. For each step, the proposed Reconstruction module (B) implements corresponding
functions with Deep Learning to enhance computational performance and accuracy. The Reconstruction
module begins with the 2D Light Properties Extraction Net, which computes optimal light coefficients for
each illumination patches in the captured 2D reflectance images. This is followed by a Photon Banana
Lifting Layer that transfers these coefficients into 3D. The final step involves the 3D Correction Net to
enhance image uniformity and clarity. The Reconstruction module outputs the 3D volume of absorption
and scattering coefficients, addressing the limitations of traditional 3D-mDOI approaches by reducing
artifacts. Additionally, the Reconstruction module offers improved network interpretability compared to
standard Deep Learning approaches.
119
The final output from the Reconstruction module is a detailed volumetric matrix that maps
the tissue's absorption and scattering properties.
In Chapter 3.5, we will provide an in-depth exploration of the implementation of each step
within the module. This discussion will demonstrate how we seamlessly integrate the
interpretability of the 3D-mDOI pipeline with the adaptive capabilities of Deep Learning.
3.4.3 Projection module: an architecture for solving the forward problem
We present the Projection module (Error! Reference source not found.), an innovative
solution tailored to enhance forward modeling processes within the DOI pipeline, that creates a
2D reflectance image from a 3D optical properties dataset. This module has been developed
with a modular architecture that complements the Reconstruction module, ensuring seamless
integration and functionality.
Figure 3.11 Workflow of the Projection module in mDOI-Net
Overview of the sequential operations within the Projection module, which contains two major
components: the Photon Banana Projecting Layer and the 2D Rendering Net. Based on the structured
illumination illuminating (magenta dots) on the 3D volumes (A), the Photon Banana Projecting Layer (B)
transforms the 3D optical volume into a projected 2D absorption (�-) map and a 2D scattering (�&) map.
These maps are then used to compute a Radiative Transfer Equation (RTE) based reflectance image,
simulating how light behaves within the sample based on its 2D optical properties. Both the 2D projected
optical map and the computed reflectance image provide spatial cues on the location of the lesion
(magenta arrow). The 2D Rendering Net (C) ensures the reflectance images accurately represent photon
diffusion patterns. The refined 2D reflectance output (D) is enhanced by reducing noise and artifacts.
120
The core function of the Projection module is to process a 3D matrix that encapsulates the
sample's optical properties, aligning this process with the structural illumination settings used
in data acquisition with the OPTIMAP. This methodological consistency is important as it aligns
with the data formats output by the Reconstruction module. The initial phase of the Projection
module involves the Photon Banana Projecting Layer, which translates the 3D optical
information into a 2D reflectance image. This transformation is guided by the principles of
photon migration and the Radiative Transfer Equation, ensuring that the conversion retains the
integrity of the optical data.
Following this initial conversion, the resulting 2D image undergoes refinement through a 2D
Rendering Net that is tasked with denoising and removing any artifacts present in the
preliminary output. This net is architecturally similar to the 3D Correction Net used in the
Reconstruction module and integrates a multilayer perceptron with a 1D U-Net. This design
choice not only enhances the quality of the reflectance images but also maintains consistency
across the DOI analysis pipeline.
Photon Banana Projecting Layer operates in two main stages. Initially, it condenses the 3D
optical data into a 2D plane by integrating the 3D information within a predetermined
photon diffusion pattern, ensuring that the dimensional reduction preserves the critical
data. Subsequently, these condensed 2D optical coefficient maps are utilized to generate
reflectance images, again applying the principles of the Radiative Transfer Equation (RTE),
(Chapter 1.2.3.1).
2D Rendering Net processes the preliminary 2D reflectance images and creates clean 2D
reflectance images. This network is specifically designed to emulate the network
121
architecture of the 3D Correction Net from the Reconstruction module, except it processes
2D images rather than 3D matrices. It employs a combination of a multilayer perceptron
and a 1D U-Net to refine and enhance the quality of these reflectance images effectively.
The output of this network consists of a series of refined 2D reflectance images, each
delineating the light scattering patterns at discrete points of light introduction.
The design of the Projection module is crafted to enhance the fidelity of forward modeling
while maintaining operational congruence with the Reconstruction module. This design
promotes a coherent and seamless workflow within the DOI analysis process. Such
harmonization fosters an integrated approach where the 3D reconstruction and 2D projection
processes support and reinforce one another, enhancing the overall efficacy and depth of
understanding of the sample’s internal structure. The following sections will provide a detailed
exploration of each step within this module.
3.4.4 The challenge of dermatology 3D clinical data
There are significant challenges involved in collecting 3D imaging data, primarily due to the
limited availability of equipment and substantial costs, both labor- and equipment-related.
There are three main methods for obtaining 3D dermatology data: physical phantoms to
emulate lesions, serial tissue section reconstructions, and optical biopsy imaging. Each method
has inherent cost implications and technical complexities and only the latter two are
representative clinical data points.
Physical phantoms are designed to mimic the optical coefficients of human tissue to test
imaging systems (Chapter 2.5.1.2). While useful for controlled experiments, these phantoms
122
often do not fully represent the complexity of human tissue structures and are expensive to
fabricate with precise optical properties.
Serial section reconstructions, a method involving the layer-by-layer slicing and imaging of
tissue samples, are utilized to explore fine tissue details. This approach offers detailed insights
into tissue architecture but is labor-intensive and costly due to the need for meticulous tissue
handling and processing.
Optical biopsy utilizes advanced imaging technologies such as reflectance microscopes to
provide non-invasive 3D imaging. However, owing to its requirement for expensive equipment
and specialized operational expertise, optical biopsy remains uncommon in clinical use. These
cost barriers significantly restrict the availability of 2D reflectance images alongside their 3D
skin lesion data.
In the development of the mDOI-Net, the Reconstruction module plays a critical role but
encounters significant challenges due to the prerequisites for a large quantity of 3D ground
truth data. The 3D data is essential for effectively training and evaluating the module.
However, as discussed in this section, obtaining such data in sufficient quantities proves to be a
formidable challenge. The scarcity of viable data severely limits the practical deployment of the
Reconstruction module in current clinical environments, restricting its use and integration into
routine clinical practice.
3.4.5 mDOI-Net: a two-pronged model for training with limited 3D ground
truth
The mDOI-Net leverages a two-pronged model to enable 3D reconstruction in training with
limited 3D ground truth data. The process begins with the Reconstruction module, which
123
extracts 3D optical properties from captured 2D reflectance images. The output of this network
is then utilized by the Projection module to generate a new reflectance image. This model is
trained using the 2D L1 loss between the captured and computed reflectance images. Through
a systematic process of minimizing discrepancies between ground truth and computed 2D
reflectance images by the Projection module, the network aims to achieve a globally optimized
3D reconstruction of the tissue volume.
Besides the integration of two modules, further optimization is made in network
architecture with access to varied datasets. We implemented GAN (Generative Adversarial
Network) structures [128] which consists of two main components: a Generator and a
Discriminator. In simple terms, the Generator creates data that is indistinguishable from ground
truth data, attempting to fool the Discriminator. The Discriminator, on the other hand, works to
distinguish between the ground truth data and the data produced by the Generator. The
Generator is trained by additional GAN adversarial loss computed from the Discriminator. In
our configuration (Figure 3.9), basic mDOI-Net functions as the Generator within the GAN
structure. The design of the discriminators in mDOI-Net adapts according to different learning
paradigms, specifically Unsupervised learning and Semi supervised learning.
Unsupervised learning is utilized in cases where no 3D ground truth is available (Figure
3.12) [89, 129]. It allows the mDOI-Net to infer the underlying structure of the skin lesion
without computing loss between the 3D ground truth and the computed 3D reconstruction
through the Reconstruction module. A 2D Discriminator, is constructed using a fully 2D
convolutional patch-based architecture [130], responsible for evaluating the structural fidelity
124
of reflectance images generated by the Projection module, thereby refining the overall
reconstruction process.
Figure 3.12 Unsupervised learning framework of mDOI-Net utilizing GAN architecture
This diagram illustrates the mDOI-Net architecture, which integrates a Generative Adversarial Network
(GAN) framework to train the network in the absence of 3D ground truth data. The network consists of
two primary modules: the mDOI-Net and the discriminator. The mDOI-Net contains both the
Reconstruction and Projection modules. During the training phase, the captured 2D reflectance images
are processed through the (A) Reconstruction module to reconstruct absorption and scattering
coefficients within 3D volumes. The output of the reconstruction, integrated with structured illumination,
is then processed by the (B) Projection module to simulate the 2D reflectance images. Serving as a critical
component of the GAN framework, the (C) Discriminator evaluates the fidelity of the 2D images produced
by the Projection module by comparing them with the captured reflectance images. The network employs
a hybrid loss function that combines 2D reflectance L1 loss with adversarial loss, which allows mDOI-Net
to effectively train without accessing 3D ground truth data. During the deployment phase, we directly
output of the 3D Volume computed by Reconstruction module, ensuring an efficiently reconstruction
process.
Semi supervised learning is utilized when the dataset contains independently obtained 3D
ground truth data that does not have a direct one-to-one correspondence with the 2D input
images(Figure 3.13) [131]. In addition to the 2D Discriminator, a 3D Discriminator is specifically
incorporated to enhance the accuracy of 3D volume reconstructions. This approach allows the
mDOI-Net network to learn from the data distribution of unpaired 3D ground truth, significantly
relaxing the data pairing requirements in supervised learning.
125
Leveraging unsupervised or semi-supervised learning methods is beneficial in the context of
clinical adoption, as it reduces the training dependency on 3D ground truth data. The
enhancement in network architecture improves the potential of applicability of our approach to
real-world clinical use.
Figure 3.13 Semi supervised learning framework of mDOI-Net utilizing GAN architecture
This schematic illustrates the mDOI-Net's semi-supervised learning framework for enhanced 3D volume
reconstruction from 2D reflectance images. Semi-supervised learning utilizes unpaired datasets where
input reflectance and output reconstruction 3D optical coefficients do not have a one-to-one
correspondence. Like the Unsupervised Learning Framework, the network consists of two primary
modules: the mDOI-Net and the discriminators. The mDOI-Net contains both the Reconstruction and
Projection modules. During the training phase, the captured 2D reflectance images are processed through
the (A) Reconstruction module to reconstruct absorption and scattering coefficients within 3D volumes.
(B) The 3D Discriminator, a novel addition to the unsupervised learning framework, evaluates the
structural integrity of the 3D volumes against unpaired 3D ground truth. The 3D Discriminator allows the
network to learn the unique spatial patterns and distributions inherent in realistic 3D skin structures
without the necessity for directly paired datasets. Furthermore, the output of the reconstruction,
integrated with structured illumination, is then processed by the (C) Projection module to simulate the 2D
reflectance images. Serving as a critical component of the GAN framework, the (D) 2D Discriminator
evaluates the fidelity of the 2D images produced by the Projection module by comparing them with the
captured reflectance images. This innovative semi-supervised approach allows mDOI-Net to bridge the
gap between the high accuracy typically associated with supervised learning and the flexibility of
unsupervised learning, making it highly effective for clinical applications where acquiring perfectly
matched data pairs is not practical.
126
3.5 Implementation of mDOI-Net Components
This section of the thesis introduces the detailed design and implementation of various
components within mDOI-Net, focusing on selecting optimal model architectures and training
loss functions to enhance the network's performance and applicability. We use the Synthetic
Derm Cone Standard Dataset (Chapter 3.3.1.5) as the base dataset for performing the above
steps. Our goal is to determine the optimal network implementation with the standard design
of the synthetic dataset to achieve the most precise and reliable reconstructions and
projections under general use circumstances.
In the Reconstruction module, we begin by selecting a suitable training loss function for the
2D Light Properties Extraction Net with or without 3D ground truth. The training loss design is
crucial for accurately capturing the 2D tissue optical coefficients from the input reflectanace
images. Subsequently, we move to 3D Correction Net’s architecture design, focusing on
effectively denoises the imtermediate 3D Volume processed by Photon Banana Lifting Layer.
Moreover, we assess the performance of different aggregation techniques, comparing the
Gated Recurrent Unit (GRU) [132, 133] with a weighted average approach, to determine the
most efficient method for integrating multiple illumination patterns within the reconstruction
process.
In the Projection Module, we evaluate the different architectures of the 2D Rendering Net,
which is designed to optimize the generation of reflectance images under various structured
illumination conditions.
We combine the reconstruction and projection modules to test and verify the mDOI-Net’s
potential for clinical transferability. We test which network architectures [134] of mDOI-Net
127
are more efficient in transferring the training knowledge from the standard pretrained dataset
to the more advanced dataset. We explore the various training paradigms within a mDOI-Net
structure to evaluate the performance of domain adaptation with various level of data
availbility. The effectiveness of pretrained, unsupervised, and semi-supervised learning
approaches is evaluated by comparing their impact on the test synthetic sample by calculating
weighted Mean Squared Error (MSE) loss, 3D Structural Similarity Index Measure (3D-SSIM)
loss, and processing time across different network configurations.
3.5.1 2D Light Properties Extraction Net in Reconstruction module
The 2D Light Properties Extraction Net utilizes a Multilayer Perceptron (MLP) to improve the
accuracy of extracting 2D light coefficients by deciphering the complex and non-linear
relationships present in the data. To optimize the performance of the MLP, we have introduced
three specially tailored training loss functions: L1 Reflectance loss, Project L1 loss and
Smoothness loss.
L1 Reflectance loss, conceptually similar to the fitting residue used in non-linear fitting, is
the most crucial in extracting optimal sample light properties that predict 2D reflectance values
accurately from the Radiative Transfer Equation. It measures the fidelity between the
predicted and actual measurements. We adaptively assign varying weights to the L1
Reflectance loss to finely tune the model’s sensitivity to photon migration within the tissue.
This weighted approach prioritizes the precision of pixel values near the light source,
recognizing that detectors closer to the light source capture a higher quantity of re-emitted
photons, providing more reliable data for fitting.
128
Project L1 loss refers to the difference between the network's estimated 2D light
coefficients and the 2D projection of the 3D ground truth coefficients. It is important to note
that we do not have 2D light coefficient ground truth during the training. Instead, we estimate
the pseudo 2D light coefficient map from the projection of the 3D light coefficients ground
truth for loss computation.
Smoothness loss, penalizes variations in reconstructed light coefficient values across
adjacent pixels, promoting a smoother gradient in the resulting image, which is a critical quality
in optical imaging. In our data, both lesions and background areas tend to have similar light
properties to their neighboring pixels except in boundary cases. Smoothness loss serves as an
important geometric constraint in the loss function, helping other loss functions optimize
performance.
3.5.1.1 Feasibility analysis of 2D Light Properties Extraction Net
We analyze the optimal training methods with various training losses, proper computation
setting and the performance of extracting 2D light properties for the 2D Light Properties
Extraction Net (2D LPE Net). The comparison of performance between various training
methods is conducted using the L1 Ref. loss computed with test dataset, alongside the
computation time required to obtain the resulting 2D optical coefficient map (Table 3.3). Lower
the test loss, better the performance of the network is. Results generated with the optimal
network training method are further compared visually with those extracted using the 2D
Nonlinear fitting in 3D-mDOI pipeline to validate the accuracy and efficiency of the network
fitting approach (Figure 3.14).
129
According to our analysis, training a network for 2D optical property extraction is more
effective than applying 2D Nonlinear Fitting in the 3D-mDOI pipeline. In Table 3.3, the test L1
Ref. loss for 2D Nonlinear Fitting is relatively high at 0.65 compared to other network methods
with different training losses, indicating that its estimation of optical properties is less optimal
when it is used for computing reflectance to match the captured reflectance. Additionally, we
compare the computational time to extract optical properties using the reflectance data from a
single illumination between the 3D-mDOI and mDOI-Net pipeline. 2D Nonlinear Fitting requires
a considerably longer computational time (6.4s) than the general network approach, being
nearly 10,000 times slower when compared to 2D LPE Net’s test processing time (0.6ms) in
Table 3.3.
Table 3.3 Comparison of L1 Reference loss and computational times for different 2D light properties
extraction methods
The table showcases the performance and computational efficiency of estimating the 2D optical
coefficient map using various methods in both the 3D-mDOI and mDOI-Net pipelines. The 3D-mDOI
employs 2D Nonlinear Fitting for extracting 2D light properties, which results in a higher L1 Reference
(Ref.) Loss for the test dataset and longer computational times compared to various implementations of
the 2D Light Property Extraction Net (2D LPE Net) in mDOI-Net. We further compare the performance of
different training losses applied in network training. The table illustrates that training with L1 Ref. Loss
combined with Project L1 Loss (with 3D ground truth) or Smooth Loss (without 3D ground truth) results
in significantly lower L1 Ref. Loss, indicating superior performance in optical property extraction when
compared to the traditional 2D Nonlinear Fitting approach.
130
For considering network’s optimal training loss, we need two training methods depending
on the availability of 3D ground truth during network training. Based on the results of Table
3.3, training the 2D Light Property Extraction Net with L1 Ref. Loss and Project L1 Loss achieves
the lowest test L1 Reference Loss when 3D ground truth is available. We apply this
combination of Loss as the training loss for further training and valuation of Reconstruction
module. Conversely, when 3D ground truth is absent in the domain adaptation tasks, training
with L1 Ref. Loss and Smooth Loss results in the lowest test L1 Reference Loss.
We then visually compare the performance of two optical property extraction methods: 2D
LPE Net training with L1 Ref. Loss and Project L1 Loss and 2D Nonlinear Fitting, in estimating
absorption (�!) and scattering (�") coefficients, as well as the computed 2D reflectance via
Radiative Transfer Equation. It’s noteworthy that the Projected �! ��� �" in the ground truth
data (Figure 3.14 A) has cross structure artifact at the center due to the rule-based projection
from 3D ground truth. The projected optical coefficients in the ground truth still provides a
visually insight for evaluating the accuracy of the computational methods below.
Generally, 2D LPE Net provides a more precise and reliable estimation of optical properties
than 2D Nonlinear Fitting (Figure 3.14 B). For optical coefficients, the computed �! and �" by
the 2D LPE Net aligns more closely with the ground truth, exhibiting sharper lesion contour and
less spatial dispersion. This is likely due to better handling of underlying model assumptions
and noise reduction in the 2D LPE Net approach. In contrast, the results from 2D Nonlinear
Fitting are blurred. Moreover, computational artifacts are evident around the corners of the
result images obtained using 2D Nonlinear Fitting, which could further impair the performance
of subsequent 3D reconstructions in 3D-mDOI (Figure 3.19).
131
Figure 3.14 Comparison of two optical property extraction methods: the 2D Light Property Extraction
Net (2D LPE Net) and 2D Nonlinear Fitting
(A) Ground truth data displays the projected �- ��� �& , and 2D reflectance, calculated via Radiative
Transfer Equation. They serve as a benchmark for evaluating the accuracy of the computational methods.
Performance of the 2D Light Property Extraction Net (2D LPE Net) versus 2D Nonlinear Fitting in estimating
optical properties are compared in (B). The 2D LPE Net consistently provides more precise and reliable
estimations of �- ��� �& , aligning more closely with the ground truth with sharper features and no
artifact at the corner of the results shown in 2D Nonlinear Fitting.
3.5.2 3D Correction Net in Reconstruction module
The 3D Correction Net serves as a denoising step within our advanced Reconstruction
module, designed to rectify distortions or noise artifacts that could undermine the accuracy of
the optical data interpretation. Here, we discuss the innovations in the 3D Correction Net’s
input design and the test loss functions to evaluate 3D reconstruction performance.
132
For the input of the 3D Correction Net, we innovatively incorporate the 2D grayscale images
of skin lesions as an additional input channel to the 3D intermediate matrix output by Photon
Banana Projecting Layer. The grayscale image aids the network with extra 2D edge contour
information of the lesion when it deciphers the complex geometry of the lesions through the
diffuse structures in reflectance images. This strategy is highly adaptable for clinical
applications, as the required reference surface image of the skin lesion is easily obtainable,
facilitating a seamless transition from image acquisition to detailed reconstruction.
To evaluate the fidelity of the 3D Correction Net, we deploy two evaluation loss functions
for the test dataset. (i) The 3D Structural Similarity (3D-SSIM) loss measure the 3D structural
details between predicted and true 3D optical coefficients volumes. For the SSIM index, a
maximum value of 1 indicates that the two volumes are perfectly structurally similar, while a
value of 0 indicates no structural similarity. We use one minus the SSIM index as the SSIM loss.
(ii) The weighted Mean Squared Error (MSE) loss measures the pixel wise deviation of 3D
reconstruction volume from the ground truth volume.
In the computation of weighted MSE loss, we use the same normalization weight as that in
the Photon Banana Projecting Layer. This weight reflects the 3D aggregation of photon
distributions among multiple partial reconstructions, each associated with distinct illumination
data. If a 3D voxel within the 3D reconstructed matrix is computed using multiple illumination
data, the result of the voxel should exhibit a higher level of robustness. Consequently, such
result voxel in the 3D reconstructed matrix should be assigned a higher weight in the loss
function. Thus, the adoption of this normalization weight encodes the information from
structured illumination in the loss function.
133
3.5.2.1 Feasibility analysis of 3D Correction Net
We conducted a comparative analysis of the performance of different network
architectures for the 3D Correction Net (Table 3.4), using weighted MSE loss and 3D SSIM loss
as evaluation metrics for the test dataset. Generally, the lower the loss, the better the
reconstruction result is. We apply these crucial metrics to select the optimal network
architecture due to their ability to gauge reconstruction quality.
Table 3.4 Performance comparison of different network architectures for the 3D Correction Net
The table displays the results of a comparative analysis of various network architectures for the 3D
Correction Net, evaluated using weighted Mean Squared Error (MSE) loss and 3D Structural Similarity
(SSIM) loss on a test dataset. The No Architecture (NA) baseline, which lacks a 3D Correction Net, exhibits
the highest losses, underscoring the importance of utilizing a structured network for improved
reconstruction quality. The Multilayer Perceptron (MLP) and 3D U-Net architectures demonstrate
progressively better performance, with the combined MLP and 3D U-Net configuration showing the most
favorable results, achieving the lowest losses in both metrics.
In Table 3.4, the No Architecture (NA) baseline, which does not utilize a 3D Correction Net,
yields the least favorable results, with a weighted MSE loss of 0.13 and an SSIM loss of 0.11.
This result highlights the critical need for including a 3D Correction Net in the Reconstruction
module. The Multilayer Perceptron (MLP) architecture improves performance with a lower MSE
loss but at the cost of structural accuracy, as seen by an increased SSIM loss. The 3D U-Net
134
architecture significantly enhances both metrics, demonstrating a strong balance between error
reduction and detail preservation.
The best results are seen with the combined MLP + 3D U-Net configuration, which achieves
the lowest weighted MSE loss of 0.0061 and an SSIM loss of 0.0066, indicating superior
reconstruction quality. These findings prove that our design of integrating MLP with 3D U-Net is
highly effective for denoising 3D optical coefficient reconstructions.
3.5.3 Gated Recurrent Unit Aggregation in Reconstruction module
We explored the integration of a data-driven Integration Net, specifically a Gated Recurrent
Unit (GRU), to combine the 3D light coefficient matrices from different illumination patterns.
It’s an alternative approach for the weighted average in both Photon Banana Lifting Layer
(Chapter 2.4.2 ) and 3D-mDOI (Chapter 2.4.2 ), which integrates data from multiple
intermediate reconstructions, each derived from individual illumination. While effective and
efficient, weighted average is a rule-based approach lacking the adaptive learning capabilities
necessary for handling diverse datasets without manual recalibration.
GRUs are a type of neural network architecture renowned for their efficacy in processing
sequential data [132, 133]. In the GRU-enhanced Reconstruction module, input reflectance
data from a single illumination source are processed sequentially. First, 2D light coefficients are
extracted. These coefficients are then transformed into 3D intermediate coefficient matrices.
Finally, they are integrated using the Integration Net, which is implemented with a GRU. The
Integration Net receives the current timestep's lifted 3D intermediate coefficient matrix from a
single light source, while the GRU’s hidden state vector progressively integrates these matrices
135
from all preceding timesteps. This model allows for iterative fusion of the current 3D
coefficients with the hidden state during training.
Drawing inspiration from the DeepVoxels [116] approach (Chapter 3.1), the specific
equations governing the GRU operation are as follows.
�$ = �(�=�$ + �=�$/' + �=)
�$ = �(�0�$ + �0�$/' + �0)
�$ = ����(�"�$ + �"(�$ ∘ �$/') + �")
�$ = (1 − �$) ∘ �$/' + �$ ∘ �$
Eq. ( 3.1 )
Here, the update gate �$, and the reset gate �$, in the GRU model play pivotal roles in
modulating the information flow within the hidden layers of the network. �$ determines the
degree to which new data should influence the current hidden state, whereas �$ assesses the
extent to which past information remains relevant. The state proposal, �$, is formulated
through the Rectified Linear Unit (ReLU) activation function, presents a new candidate for the
hidden state that is informed by the latest input as well as the past state, adjusted by the reset
gate. Subsequently, the hidden state �$, is updated, synthesizing the existing state with the
new candidate proposal through a weighted combination. The GRU's one-step update
mechanism for each input allows the reconstruction system to learn how to group illumination
data directly. This adaptability improves the handling of diverse and complex datasets with
various structured illumination, eliminating the need for constant manual recalibration.
136
3.5.3.1 Feasibility analysis of Gated Recurrent Unit Aggregation
The comparison (Table 3.5) between the standard Reconstruction Modul and the GRUenhanced version reveals significant differences in terms of network performance and
processing time in network testing. The standard Reconstruction Module shows a lower
weighted MSE loss of 0.0015 and a lower SSIM loss of 0.0080, alongside a swift processing time
of just 1.1 seconds. Conversely, integrating GRU into the network architecture increases both
the weighted MSE loss to 0.0022 and SSIM loss to 0.011. This indicates a reduction in the
quality of reconstruction, with GRU introducing more artifacts around the lesion areas,
potentially complicating the visualization and interpretation of finer details. Additionally, the
processing time for the GRU variant increases to 9.8 seconds, suggesting a considerable
escalation in computational demand.
Table 3.5 Performance and processing time comparison between standard and GRU-enhanced
Reconstruction modules
The table illustrates the performance differences between the standard Reconstruction module, which
uses weighted averaging to integrate data from multiple intermediate reconstructions, and the Gated
Recurrent Unit (GRU) enhanced version. The standard module exhibits superior performance with a lower
weighted Mean Squared Error (MSE) loss and an 3D Structural Similarity (SSIM), along with a rapid
processing time of 1.1 seconds. In contrast, the GRU version increases both loss metrics—weighted MSE
to and SSIM—and extends the processing time to 9.8 seconds. These results suggest the standard module
as a more efficient choice for applications needing faster processing and clearer outputs.
These results highlight a critical trade-off when incorporating GRU into the reconstruction
process: while it may enhance certain modeling capabilities for learning structured illumination,
it does so at the expense of increased computational time and potentially decreased clarity in
137
the reconstructed images. The significant increase in processing time and loss metrics with the
GRU configuration suggests that the standard Reconstruction Module may be more suited for
applications requiring faster processing times and cleaner reconstruction outputs for the
dataset generated by a fix structured illumination pattern. Further visual and quantitative
evaluations for the GRU-enhanced Reconstruction module are included in Chapter 3.6.2 .
3.5.4 2D Rendering Net in Projection module
2D Rendering Net is responsible for denoising the intermediate reflectance from the Photon
Banana Projection Layer in the Projection Module. The network structure of the 2D Rendering
Net resembles that of the 3D Correction Net, which also includes 2D grayscale images of skin
lesions as an additional input channel to the 2D U-Net. This setup provides the 2D lesion
surface contour to generate the high-quality reflectance images depicting diffuse patterns of
the lesion under various points of illumination. We train the 2D Rendering Net using L1 loss
between the computed reflectance and the ground truth to optimize network parameters.
Distinct from the 3D Correction Net, we employ two test loss functions for the 2D Rendering
Net: (i) L1 loss between the computed reflectance and the ground truth reflectance, and (ii) L1
loss after average pooling of these reflectance images to reduce the stochastic effect of photon
migration in the network evaluation.
When photons enter tissue, they scatter in ways that cannot be precisely predicted due to
the complex nature of biological structures they encounter. This randomness introduces 'noise'
into the imaging data, which can manifest as rapid fluctuations or variations—often referred to
as high-frequency noise—in the pixel values.
138
Average pooling, a down-sampling technique used in neural networks, calculates the
average value of a group of pixels in a feature map. Employing average pooling with a window
size of 3 extracts a broader scale of topological and statistical information between the
computed and captured 2D reflectance images. This evaluation method ensures a more
reliable assessment of photon migration patterns in reflectance images, accommodating their
inherent randomness and unpredictability.
3.5.4.1 Feasibility analysis of 2D Rendering Net
We compare the performance of various network architectures using L1 loss and L1 loss
after average pooling for the test dataset (Table 3.6). The No Architecture (NA) method, which
does not use a 2D Rendering Net in the Projection module, serves as a baseline with L1 losses of
0.57, slightly reducing to 0.53 after average pooling.
Table 3.6 Comparison of L1 Loss before and after average pooling across different network
architectures
The table evaluates the performance of various network architectures using L1 loss metrics, both before
and after average pooling, on a test dataset. The No Architecture (NA) method, not employing a 2D
Rendering Net, serves as a baseline. The Multilayer Perceptron (MLP) configuration, although resulting in
higher losses before and after pooling, struggles with the stochastic nature of light-tissue interactions.
Conversely, the 2D U-Net significantly reduces the L1 loss before and after pooling, indicating its superior
ability to capture complex light scatter within the tissue. The best performance is demonstrated by the
combined MLP and 2D U-Net approach, which not only lowers the initial L1 losses to 0.19 but also further
reduces them to 0.16 after pooling, effectively rendering 2D reflectance images with detailed photon
migration patterns.
139
The Multilayer Perceptron (MLP) configuration significantly increases the L1 loss, both
before and after average pooling, indicating it may not effectively handle the stochastic nature
of light-tissue interactions. In contrast, the 2D U-Net architecture reduces the L1 loss to 0.22
and further to 0.18 after pooling, suggesting its design better captures complex light scatter
patterns within tissue.
The proposed network architecture, the combination of MLP and 2D U-Net, achieves the
best results, with the lowest L1 losses of 0.19, further reduced to 0.16 after pooling. This hybrid
approach combines the MLP’s global correction with the 2D U-Net’s spatial denoising
capabilities, efficiently rendering 2D reflectance images with detailed photon migration
patterns.
3.5.5 mDOI-Net for Domain Adaptation
We evaluate the performance of mDOI-Net in the context of Domain Adaptation [134].
Domain adaptation, which is a Deep Learning technique that progressively improves model
performance from a basic synthetic dataset (commonly called source domain) to increasingly
complex advanced synthetic dataset (called target domains). Transitioning from simpler to
more complex synthetic datasets highlights the potential to gradually increase the diversity of
the synthetic data. This approach can ultimately facilitate the transfer to real clinical data,
which includes unpredictable elements that were not present in the initial synthetic dataset
used for pretraining.
In the mDOI-Net framework for domain adaptation, we employ a Generative Adversarial
Network (GAN) structure (Chapter 3.4.5 ). Initially, the Reconstruction and Projection modules
are pretrained using the Synthetic Derm Cone Standard dataset. Subsequently, these modules
140
are combined and served as a Generator. The performance of the Generator on the Synthetic
Derm Cone Constrained dataset is accessed in this section to gauge the effectiveness of the
domain adaption.
This adaptation process incorporates both unsupervised and semi-supervised learning
paradigms by integrating distinct Discriminators into the network (Chapter 3.4.5 ). Under both
learning paradigms, the Generator is trained using a hybrid loss function that incorporates the
2D projection L1 loss and Vanilla GAN adversarial loss. The combination of the losses drives the
Generator to create 3D reconstruction and 2D reflectance indistinguishable from ground truth
data.
The design of discriminators varies across different learning paradigms. In unsupervised
learning, a 2D Discriminator differentiates between computed and ground truth reflectance
images, while semi-supervised learning employs a 3D discriminator to distinguish between
computed and ground truth 3D reconstructions that might not matched to each other.
3.5.5.1 Teacher-Student framework for better GAN training
Although the enhanced GAN structure of mDOI-Net alleviates challenges in dataset
preparation by enabling unsupervised and semi-supervised learning, training a GAN structure is
difficult primarily due to the need to balance the generator and discriminator during training.
This balance is more challenging with the complex design of Reconstruction module, often
resulting in compromised performance in domain adaptation tasks.
We propose Teacher-Student framework [135, 136] involves a simpler, less complex
"student" network model learning to emulate a more complex "teacher" model. Specifically in
our case, the "Reconstruction module" acts as the teacher, training on the Synthetic Derm Cone
141
Standard dataset. Meanwhile, the student, which employs an encoder-decoder architecture,
trains on the Synthetic Derm Cone Constrained dataset. Knowledge transfer is implemented by
adding an extra Mean Squared Error loss during training. This additional loss function works
alongside other losses to ensure that the student's reconstruction outputs increasingly
resemble those of the teacher. This method effectively transfers the teacher's knowledge of
photon migration to the student, enabling the student to perform its tasks with enhanced
accuracy and efficiency.
The standard encoder-decoder architecture used for the student model is easier to train via
backpropagation comparing to Reconstruction module due to its simplicity. Starting with an
untrained student model as the Generator can facilitate achieving a better balance with the
untrained Discriminators, making it easier for the training process to converge.
3.5.5.2 Feasibility analysis of mDOI-Net for Domain Adaptation
We conduct a comprehensive performance evaluation across different training paradigms
within two mDOI-Net frameworks for domain adaptation (Table 3.7). Similar to the evaluation
of the 3D Correction Net, here we utilize weighted MSE loss and 3D-SSIM loss as evaluation
metrics for the test dataset.
The Reconstruction Module pretrained by Synthetic Derm Cone Standard dataset displays
the baseline performance when testing directly on Synthetic Derm Cone Constrained dataset,
with the weighted MSE Loss of 0.011 and a 3D SSIM Loss of 0.020. The processing time of the
pretrain module is 1.1 seconds.
Within the standard mDOI-Net, there is a noticeable performance decline in the both
unsupervised and semi-supervised learning of the standard mDOI-Net, recording significantly
142
higher MSE and SSIM losses. These increased losses suggest difficulties in managing the
balance between labeled and unlabeled data, potentially compounded by the complexities in
training GAN.
Remarkably, the teacher-student mDOI-Net under semi-supervised learning conditions
excels, achieving a weighted MSE Loss of 0.010 and a 3D SSIM Loss of 0.007 in test dataset. The
minimal processing time of 0.003 seconds is particularly impressive, underscoring the efficacy
of the teacher-student model.
The teacher-student framework not only significantly outperforms the standard
Reconstruction module but also has a faster processing speed. The exceptional performance of
this model in a semi-supervised setting demonstrates that integrating a teacher-student
structure in GANs can markedly enhance domain adaptation capabilities, offering a viable path
forward for further clinical transition with domain adaptation.
Table 3.7 Performance Evaluation of mDOI-Net Across Different Training Paradigms for Domain
Adaptation
The table summarizes the results from a comprehensive evaluation of the mDOI-Net framework,
comparing the performance of different network architectures under various training paradigms. Utilizing
weighted MSE loss and 3D SSIM loss as key metrics, the table demonstrates how different configurations,
from pretrained networks through unsupervised and semi-supervised learning to the advanced teacherstudent approach, influence the effectiveness and speed of domain adaptation in clinical imaging
applications. Notably, the teacher-student mDOI-Net shows exceptional performance and deployment
efficiency, highlighting its potential in enhancing domain adaptation capabilities.
143
3.6 Performance of mDOI-Net Components
After analyzing the optimal implementation of each mDOI-Net’s components in Chapter 3.5,
we begin evaluating their performance from a general usage perspective. Our analysis is divided
into four key areas: dataset generation diversity and quality, the performances and
interpretability of reconstruction and projection modules independently comparing with other
benchmark methods, and the system's adaptability for the more advanced dataset.
Dataset Generation: The primary focus here is on two tasks: analyzing the diversity of the
synthetic dataset and comparing the pretrained data to more advanced data. The check of
synthetic dataset’s diversity estimates whether the design 3D synthetic lesions matches
common cases of melanoma. Comparing the pretrained and advanced datasets, along with the
performance analysis for domain adaptation in the later section, helps us understand the
relationship between the differences in the datasets and the corresponding network
adaptability.
Reconstruction module: this module is evaluated by testing a spectrum of models, each
varying in complexity, to ascertain the quality of reconstructions within the 3D Synthetic
Dermatology Dataset. This evaluation addresses two fundamental questions: whether a hybrid
approach that integrates physics with Deep Learning outperforms a purely Deep Learningbased approach for inverse modeling, and what the optimal level of network complexity might
be to achieve the most accurate reconstructions.
Projection Module: the effectiveness of the Projection Module is measured by its potential
to serve as a viable alternative to traditional Monte Carlo simulations. Key evaluation metrics
for this module include computational speed and the accuracy of spatial attributes in the
144
simulations. These metrics are critical for assessing the module’s efficiency and its ability to
produce spatially accurate reflectance images.
System’s Adaptability: the final area of assessment examines the network's ability to adapt
from a pretrained dataset to a more advanced dataset. We assess the network's adaptability at
different levels of difference using multiple test cases. This adaptability is pivotal for ensuring
that the network can effectively transition from synthetic training environments to real-world
clinical settings.
Each of these segments collectively informs the overall performance and utility of the
mDOI-Net, providing a comprehensive understanding of its strengths and areas for
improvement in the context of medical imaging and diagnostics. This structured evaluation
ensures that the network is not only theoretically sound but also practically effective in realworld applications.
3.6.1 Dataset generation
Evaluating the efficacy of synthetic datasets for training Deep Learning architectures is
crucial for validating their potential of applicability in clinical settings. Our assessment involves
a comprehensive examination of methods and metrics, measuring the diversity and quality
from the synthetic phantom properties to their associated 2D reflectance images.
3.6.1.1 Diversity of the synthetic datasets
The diversity of the four datasets—Synthetic Derm Cone Standard, Synthetic Derm Cone
Enhanced+, Synthetic Derm Cone Constrained-, and Synthetic Uniform Cylinder—is illustrated
by the assignment of the shape and absorption coefficients ( �!) for the 3D lesion. For shape
145
analysis, we compute the histograms of the volume and thickness of the synthetic lesion. For
optical properties analysis, we examine the statistical parameters of the lesion’s �! value.
These tests ensure a comprehensive understanding of the synthetic dataset's ability to
generate complex morphology and optical properties of skin lesions.
Figure 3.15 Histogram of lesion volume and thickness across synthetic datasets illustrates
the different distributions of the 3D morphology of synthetic lesions across four datasets. The
volume histogram for the Synthetic Derm Cone Standard dataset (Figure 3.15 Histogram of
lesion volume and thickness across synthetic datasets A, blue) shows a diverse range of lesion
sizes, with most lesions in the lower volume range and a few outliers at higher volumes. The
Enhanced (+) dataset (Figure 3.15 Histogram of lesion volume and thickness across synthetic
datasets A, pink) displays a similar trend but with a more dispersed grouping of data points
around the central mean. The Synthetic Derm Cone- dataset (Figure 3.15 A, orange) exhibits a
volume histogram with fewer entries and a narrower distribution, indicating fewer variations in
lesion size within this category. Lastly, the Synthetic Uniform Cylinder dataset presents a
decreasing volume histogram (Figure 3.15 A green), where more lesions have smaller volumes.
The thickness distribution across the four datasets reveals distinct characteristics (Figure
3.15 B). For the Synthetic Derm Cone dataset, the thickness histogram (Figure 3.15 B, blue)
shows a normal distribution with lesion thicknesses up to 4mm. This histogram peaks at the
thickness of 4.5mm, indicating that the dataset simulates a large portion of lesions in the
advanced metastatic stage. In the Synthetic Derm Cone+ dataset, the thickness histogram
(Figure 3.15 B, pink) has a narrower normal distribution with a few distinct peaks, with most
lesions falling within a specific thickness range of Stage I and II melanoma. The Synthetic Derm
146
Cone- dataset exhibits a thickness histogram (Figure 3.15 B, orange) with very limited
variability, implying that only Stage I lesions are simulated in this dataset. Finally, the Synthetic
Uniform Cylinder dataset displays a relatively even thickness histogram (Figure 3.15 B, green),
indicating a more uniform thickness across lesions, which is expected for cylindrical shapes.
Figure 3.15 Histogram of lesion volume and thickness across synthetic datasets
We present histograms depicting the volume and thickness of synthetic lesions across four datasets:
Synthetic Derm Cone Standard (blue), Synthetic Derm Cone Enhanced+ (pink), Synthetic Derm Cone
Constrained- (orange), and Synthetic Uniform Cylinder (green). The volume histogram (A) generally shows
a wide dynamic range, with most lesions concentrated at lower volumes and a few outliers at higher
volumes. Notably, the Constrained- dataset (A orange) exhibits a much narrower distribution, indicating
less variability. As for lesion thickness (B), distributions vary significantly across datasets. The Synthetic
Derm Cone dataset (B blue) displays a normal distribution up to 4mm and shows a peak at 4.5mm. The
Enhanced+ dataset (B pink) also shows a normal distribution, while the Constrained- dataset (B orange)
features minimal variability in thickness. The Uniform Cylinder dataset (B green) maintains a relatively
uniform thickness distribution, consistent with the cylindrical shape of the lesions.
compares the statistical analysis lesions' �! values among four datasets. For each synthetic
phantom, we compute the feature parameters including mean, maximum value, minimum
value and standard deviation of the 3D lesion phantom' �! values. Then for each dataset, we
utilize a box plot to show distribution of the feature parameters among various 3D lesion
147
phantoms. The Synthetic Derm Cone dataset (Figure 3.16 A) shows a wide range of absorption
values, with a higher variability in the distribution of maximum �! value of the lesion. This
suggests a diverse range of optical properties, which could represent varying stages or types of
lesions. The Synthetic Derm Cone+ dataset (Figure 3.16 B), has similar statistics with Standard
dataset, though the feature parameters of the �! values are generally larger than those in
Standard dataset. The Synthetic Derm Cone- dataset (Figure 3.16 C) demonstrates a much
narrower range of absorption values, implying less variability and a more uniform lesion
representation. Lastly, the Synthetic Uniform Cylinder dataset (Figure 3.16 D) shows relatively
uniform absorption values with some variability of �! values among different phantoms,
highlighting its tendency to produce uniform lesions with consistent optical properties.
Figure 3.16 Statistical analysis of lesion absorption coefficients (��) across four synthetic datasets
This figure presents a box plot analysis of the �- values for lesions across four datasets. The feature
parameters including the mean, maximum, minimum, and standard deviation of the 3D lesion phantom's
�- values are computed for each synthetic phantom. Box plots summarize the distribution of these
parameters among various 3D lesion phantoms, showing the statistics: first and third quartiles (top and
bottom boundary lines of the box), median (orange line in the center of the box), and outliers (scattered
dots outside the box). Synthetic Derm Cone (A) exhibits a wide range of absorption values, especially of
the maximum value of the phantom’s �- . Synthetic Derm Cone+ (B), shows similar statistics to the
standard dataset, albeit with generally larger �- values. Synthetic Derm Cone- (C), displays a narrower
range of �- values, indicating less variability and more uniform lesion representation. Synthetic Uniform
Cylinder (D) presents relatively uniform absorption values with some variability of �- values among
different phantoms.
148
3.6.1.2 Difference between pretrained and advanced dataset
For the domain adaptation test in Chapter 3.6.4 we pretrain the network from basic
dataset and transfer the network knowledge for better learning the more advanced synthetic
datasets. Here, we compare the 2D reflectance images with different location of light
illuminations for two cases of domain adaptation. Specifically:
(i) between synthetic lesion datasets starting from the same 2D lesion dermoscopic image,
and
(ii) between synthetic and physical cylinder datasets to observe the differences between the
2D reflectance generated by Monte Carlo simulation and OPTIMAP acquisition.
For the case (i) (Figure 3.17 A), the 3D renderings of the �! for the Synthetic Derm Cone,
Derm Cone+, and Derm Cone- datasets are shown, with their generation details discussed in
Figure 3.6. Each 3D rendering provides a visual representation of the lesion phantoms’ internal
optical properties, demonstrating variations correspond to different dataset generation
approaches.
The 2D reflectance images are presented for the three synthetic datasets (Figure 3.17 B)
under different locations of light illuminations. These images are plotted using a shifted
logarithmic scale to ensure that the dynamic range is compressed between 0 and 2. This
adjustment makes the data range more suitable for the training of neural networks by
standardizing input values. In each column, the incident light position is varied, as indicated by
the arrows, to show how different illuminations affect the reflectance image. Generally, the
reflectance images show significant changes as the light source position varies. This variability
indicates the reflectance images’ sensitivity to different illumination conditions. Those
149
reflectance images also reveal the location of the lesion, indicated by a dramatic decay of the
reflectance value (central blue areas). This is expected because the synthetic lesion has a larger
�! compared to the surrounding tissue areas. With more light being absorbed, less reflectance
remains in the region with the lesion beneath the surface.
Figure 3.17 Visual comparison of reflectance images under various illumination in different synthetic
lesion datasets.
We display the 3D renderings of �- (A) for three variations of the Synthetic Derm Cone dataset: Standard,
Enhanced (+), and Constrained (-). Each rendering illustrates the internal optical properties of the lesion
phantoms, highlighting the variations resulting from different dataset generation approaches. The 2D
reflectance images for these datasets (B) are adjusted and displayed under various illumination positions
(indicated by arrows). The reflectance images change significantly with the position of the light source,
underscoring their sensitivity to different lighting conditions. The difference of reflectance among
datasets is visible, especially when comparing the reflectance images in column 4. In the Enhanced dataset
(row 2), when the light directly hits the lesion, a distinct diffuse pattern is evident with concentrated
reflectance at the center. This contrasts with the more dispersed patterns in other datasets. The Standard
dataset (row 1) shows lower reflectance around the illumination point compared to the Constrained
dataset (row 3).
Among datasets, the reflectance images from the Synthetic Derm Cone Enhanced dataset
display a distinct diffuse pattern, especially in case (Figure 3.17 B, row 2, column 4) when the
illumination point is directly over the lesion area. The diffused reflectance concentrates at the
centers, while in the corresponding reflectance images from other datasets, the patterns are
more dispersed. In contrast, the reflectance images from the Standard dataset and the
150
Constrained dataset show less variation. A more apparent difference is also observed in column
4, where the result from the Standard dataset exhibits lower reflectance in the areas
surrounding the illumination point compared to the Constrained dataset
We further evaluate the differences between the 2D reflectance images generated by
Monte Carlo simulations and those acquired via OPTIMAP. The aim is to understand how
synthetic data compare with captured data under similar experimental conditions, highlighting
the efficacy and limitations of simulated data in mimicking real-world scenarios.
We design the synthetic phantom with the similar �! distribution and cylindrical lesion
shape as the physical phantom (Figure 3.18 A). The further comparison can be found through
Figure 3.7.
We present the 2D reflectance under various incident light positions. The images are
adjusted to a shifted logarithmic scale for their better usage as the inputs of the Deep Learning
network. In the synthetic dataset (Figure 3.18 B, row 1), the reflectance patterns are relatively
consistent, with changes primarily influenced by the position of the light source. The
reflectance decreases significantly at the center where the lesion is located, depicted by darker
blue hues, indicating a higher absorption due to the lesion’s higher �! values.
Conversely, the physical dataset displays segmented intensity images revealing the
reflectance data (Figure 3.18 B, row 2). Since multiple light dots are illuminated simultaneously
in the OPTIMAP acquisition, we crop the captured images so that each segment is influenced by
only one illumination point. Another notable difference is that the data captured are not
reflectance values like those in the synthetic dataset, but rather intensity images that correlate
with reflectance values. In this analysis, we directly compare the synthetic reflectance images
151
with the captured intensity images. The comparison of result images from both datasets
suggests that an intensity reflectance correction is necessary to improve the similarity between
synthetic and captured data. This adjustment will be explored in future studies.
Figure 3.18 Comparative analysis of synthetic and physical reflectance data in cylinder datasets
(A) 3D renderings of �- showing the similar distribution and cylindrical lesion shape in both synthetic
(top) and physical (bottom) phantoms. (B) 2D reflectance images are adjusted to a shifted logarithmic
scale under various incident light positions, enhancing their suitability as inputs for a Deep Learning
network. In the synthetic dataset (row 1), reflectance patterns remain consistent, with changes influenced
primarily by the light source position. Darker blue hues at the center indicate higher absorption where
the lesion is located due to its elevated �- values. Conversely, the physical dataset (row 2) displays
segmented intensity images reflecting the reflectance data. These intensity images, while related, are
generally larger than the reflectance values like those in the synthetic dataset. This analysis sets the stage
for future studies focused on intensity reflectance correction to enhance the similarity between synthetic
and physcial data.
3.6.2 Reconstruction module
We thoroughly evaluate the Reconstruction module, a key component of mDOI-Net,
focusing on its ability to reconstruct 3D optical coefficients from 2D reflectance images. We
assess its performance on Synthetic Derm Cone Standard dataset both visually and
quantitatively in comparison to various baseline approaches. Additionally, we demonstrate the
interpretability of the network by showing how the quality of the reconstructions from the
Reconstruction module correlates with the 2D lesion images, a trend not observed in the
baseline approaches.
152
We benchmark the performance of the mDOI-Net against the algorithmic method 3D-mDOI
and two prevalent Deep Learning networks: the Encoder-decoder and U-Net networks. The
algorithmic 3D-mDOI (Error! Reference source not found.) is the foundation approach,
inheriting its photon migration principles to the design of the Reconstruction module. This
baseline approach is labeled as 3D-mDOI* because it’s a reimplemented using PyTorch library
version, which better adapts to the input data and facilitates further comparison with other
network approaches.
The Encoder-decoder and U-Net architectures are the most frequently utilized networks for
tackling inverse problems in Diffuse Optical Imaging. The Encoder-decoder network is designed
to compress data into a lower-dimensional space and subsequently reconstruct it, making it
adept for tasks requiring feature extraction and regeneration. The U-Net, on the other hand,
extends this architecture with a symmetric expanding path, which enables precise localization,
essential for high-resolution image segmentation tasks in DOI. The architecture of both the
Encoder-decoder and U-Net are delineated in the Appendix.
We train the networks to convergence with the weighted L2 loss function between 3D
ground truth and the computed 3D reconstruction. The network models are trained until
convergence using commonly used ADAM optimizer with a learning rate of 10−5
.
3.6.2.1 Quantitative comparison of reconstructions
The performance of various network architectures is evaluated based on test weighted
mean squared error (MSE) loss, test 3D Structural Similarity Index Measure (SSIM) loss, and
processing time for reconstructing the 3D optical coefficients matrix (Table 3.8).
153
The 3D-mDOI model is the algorithmics baseline with the test weighted MSE loss of 0.030
and a 3D SSIM loss of 0.42, alongside a relatively high processing time of 117 seconds (Table
3.8). This combination of moderate accuracy and longer processing time suggests that while
this model is capable of decent performance, its efficiency may be limited for applications
requiring rapid outputs.
Table 3.8 Performance comparison of 3D reconstruction from various approaches.
The table evaluates the performance based on test weighted Mean Squared Error (MSE) loss, test 3D
Structural Similarity Index Measure (SSIM) loss, and processing time. The PyTorch reimplemented version
of 3D-mDOI model, acting as the algorithmic baseline, achieves a moderate MSE and an SSIM, but has a
lengthy processing time of 117 seconds. The U-Net architecture records the highest MSE of 0.19 and an
SSIM of 0.59, showing the poorest performance among the models. The Encoder-decoder, however,
exhibits lower MSE and SSIM losses, demonstrating higher precision with a very fast processing time of
0.004 seconds. The Reconstruction Module, with the lowest MSE of 0.0022 and SSIM of 0.0054, indicates
the highest accuracy and capability to replicate structural details. It processes within 1.1 seconds,
balancing speed and detailed performance, and highlighting its superior reconstruction capabilities and
interpretability despite a slower speed compared to standard Deep Learning baselines.
For standard Deep Learning baselines, the U-Net architecture shows the poorest
performance, with a relatively high MSE loss of 0.19 and an SSIM loss of 0.59 (Table 3.8). In
contrast, the Encoder-decoder exhibits lower MSE and SSIM losses, indicating higher precision
in its predictions. Speed-wise, these standard Deep Learning baselines process data
exceptionally quickly, typically within around 0.003 seconds, making them well-suited for realtime applications where speed is essential. However, as a trade-off for this rapid processing,
their standard network lacks interpretability.
154
The Reconstruction Module records the lowest MSE loss at 0.0022 and SSIM loss at 0.0054
(Table 3.8), demonstrating the highest accuracy and ability to replicate structural details among
the evaluated models. With a processing time of 1.1 seconds, the Reconstruction Module
operates at a reasonable speed for many practical applications. Despite its slower processing
time, its superior reconstruction performance and network interpretability ensure that this
does not overshadow its overall advantages compared to other models.
3.6.2.2 Visual comparison of reconstructions
We visually compare the performance of reconstructions using multiple approaches. Each
example is assessed through both a 2D surface slice and a 3D rendering of �! , with the
synthetic ground truth method shown in Figure 3.19 A.
Our proposed Reconstruction Module visually outperforms the baseline models, delivering
3D lesion reconstructions that are closer to the ground truth (Figure 3.19 B). Additionally, we
compare the performance of the Gated Recurrent Unit Enhanced Reconstruction module with
the Standard Reconstruction module (Chapter3.5.3 ). While the integration of GRU presents a
double-edged sword—introducing extra noise that could undermine the clarity and precision
essential for accurate 3D lesion interpretation (Figure 3.19 C)—this finding aligns with our
quantitative measurements presented in Chapter 3.5.3.
The baseline 3D-mDOI approach provides a broad understanding of the overall shape and
trend within the tested lesion sample, but it falls short of accurately detailing finer aspects,
especially in rendering outputs (Figure 3.19 D). The U-Net algorithm, while proficient at
preserving the shape in the XY slice, struggles with denoising the top surface and exhibits dot
155
artifacts (Figure 3.19 E). Additionally, its results fail to capture the decreasing trend in the Z
direction, highlighting its limitations in trend analysis across different axes. In contrast, the
Encoder-decoder network reliably identifies trends along the Z-axis but falters in accurately
delineating the shape in the XY plane, suggesting a potential trade-off between trend accuracy
and spatial resolution (Figure 3.19 F).
Figure 3.19 Visual Comparison of reconstruction output for various approaches
We present a visual comparison of different reconstruction methods for 3D lesion modeling, evaluated
through 2D surface slices and 3D renderings of �- . The synthetic ground truth (A) is shown for reference.
We illustrate the output from our Reconstruction Module (B), which closely mirrors the ground truth,
demonstrating its superior performance over the baseline models. (C) displays the results from the Gated
Recurrent Unit (GRU) Enhanced Reconstruction module, which introduces extra noise and artifact to the
reconstruction. The output of PyTorch version 3D-mDOI* (D) provides a broad overview but lacking in fine
detail. (E) depicts the Encoder-decoder's output, which successfully identifies trends along the Z-axis but
struggles with shape fidelity in the XY plane. The U-Net’s output (F) maintains the XY shape but suffers
from noise issues and fails to capture the Z-direction trend, revealing the limitations of its skip-layer
design. This comprehensive visual analysis underscores the capabilities and limitations of each approach,
highlighting the superior accuracy and efficiency of our proposed Reconstruction Module.
156
The improved visual performance of Encoder-decoder compared to the U-Net stems also
from its network design, which initially encodes the information from reflectance images into a
1D vector and then generates the 3D information based on this vector. It’s free the network
from the negative effect from the network architecture like skip layer. But the bottleneck of this
network structure is also in its encoded vectors. We do see lack of details and 2D representative
in the reconstructed results from the Encoder-decoder network.
The poor visual performance of U-Net may be due to its skip-layer design in network
architecture, which transmits the structural information of the reflectance image directly to the
later stages of the network. While this design is highly beneficial in denoising tasks where the
input and output images are similar in structure, it introduces dot artifacts in our specific use
cases, which hampers the performance of the reconstruction.
Our proposed Reconstruction Module, built upon the foundational principles of 3D-mDOI
and its robust understanding of photon migration, consistently outperforms other baseline
models in producing more accurate results, thereby showing the potential of establishing a new
standard in imaging reconstruction efficiency.
3.6.2.3 Interpretability of Reconstruction module
We further utilize the quantitative metrics (Appendix) to analyze the performance of the
reconstruction. We choose the Dice coefficient, a commonly used metric to quantify the
performance of segmented reconstruction from the network output. It measures the similarity
between computed segmentation and the ground truth, providing a numerical value that
ranges from 0 (no overlap) to 1 (perfect overlap).
157
In Figure 3.20, we illustrate the relationship between the dynamic range of �! in ground
truth phantoms and the Dice coefficient scores across of the segmented reconstruction
generated from four different network architectures: Reconstruction Module, EncoderDecoder, U-Net, and 3D-mDOI*.
Figure 3.20 Relationship between dynamic range of absorption coefficient �� in the synthetic dataset
and Dice coefficient scores for various reconstruction approaches
The figure demonstrates how the dynamic range of absorption coefficient �- in ground truth phantoms
correlates with Dice coefficient scores across four different reconstruction approaches: the
Reconstruction Module, Encoder-Decoder, U-Net, and 3D-mDOI*. The Dice coefficient, ranging from 0 to
1, isa standard metric for evaluating segmentation performance comparing with the ground truth. The
Reconstruction Module (magenta points) consistently achieves the highest Dice scores across nearly all
�- ranges, showcasing its superior ability to accurately reconstruct and segment synthetic lesions.
Meanwhile, the U-Net (yellow points) and Encoder-Decoder (cyan points) display moderate performance,
and the 3D-mDOI* (green points) shows the lowest overall performance, highlighting challenges in precise
segmentation of complex lesion shapes.
158
The Reconstruction Module (Figure 3.20 magenta points) consistently achieves the highest
Dice coefficients across nearly all dynamic ranges of �! , indicating its superior ability to
accurately reconstruct and segment synthetic lesions. In contrast, the results of U-Net (Figure
3.20 yellow points) and Encoder-Decoder (Figure 3.20 cyan points) shows moderate Dice
coefficients. The 3D-mDOI (Figure 3.20 green points) has the lowest overall performance, with
consistently lower Dice coefficients, indicating difficulties in precise segmentation with the
synthetic dataset with complex lesion shapes.
We also observe that the Reconstruction Module performs increasingly well as the dynamic
range of �! values expands, demonstrating robustness against variations in these values. This
behavior is understandable because a large dynamic range in �! typically signifies a clearer
distinction between the synthetic lesion and surrounding tissue.
We can leverage the interpretable findings from the Reconstruction Module's outcomes to
boost network performance. By exposing the network to more cases with lower dynamic
ranges through data augmentation, we can enhance the effectiveness of the Reconstruction
Module.
3.6.3 Projection module
The Projection module stands as a pivotal component of the mDOI-Net framework,
translating the complex 3D optical characteristics of tissue into 2D reflectance images. We
compare the performance of Projection module on Synthetic Derm Cone Standard dataset with
the baseline method both visually and quantitatively. The interpretability of the network test
loss is also discussed in this section.
159
We choose Encoder-decoder architecture (Appendix) as the baseline model. This model
excels at compressing high-dimensional input into a compact latent space, which efficiently
captures essential hierarchical features. This latent space is then expanded to rendering the 2D
reflectance image, utilizing these captured features.
To ensure a fair comparison, both the Projection module and the Encoder-decoder are
trained using L1 loss between the computed reflectance and the ground truth. They are trained
by ADAM Optimizer until convergence with a learning rate of 10−5
.
3.6.3.1 Quantitative comparison of reflectance estimation
Similar to network architecture selection in Chapter 3.5.4 the evaluation of network
performance involves three metrics (Table 3.9): (i) the L1 loss between the computed
reflectance and the ground truth reflectance; (ii) the L1 loss calculated after applying average
pooling to these reflectance images, which reduce the stochastic effects of photon migration
and provides a more stable basis for network evaluation. (iii)the processing time for generating
a reflectance image via network for comparing the efficiency of the two network approaches.
Table 3.9 Test performance and processing times for Encoder-decoder and Projection module
The table summarizes the performance of two network architectures in terms of test L1 loss and
processing times. The networks are assessed on two test metrics: the L1 loss between the computed
reflectance and the ground truth, and the L1 loss after average pooling, which mitigates the stochastic
effects of photon migration, enhancing stability in network evaluations. For both metrics, the
Projection Module demonstrates superior accuracy with a lower L1 loss compared to the Encoderdecoder. Despite its higher accuracy, the Projection Module requires more processing time, at 0.18
seconds, compared to the Encoder-decoder’s rapid 0.0026 seconds, underscoring the trade-offs
between accuracy and efficiency in selecting network architectures for specific applications.
160
In general, the Projection Module outperforms the Encoder-decoder in accuracy,
maintaining lower L1 losses both before and after applying average pooling. Before pooling, the
Projection Module's Test L1 loss (0.15) is lower than that of the Encoder-decoder (0.22),
indicating a 31.8% improvement in approximation accuracy by the Projection Module. This
trend continues with the L1 loss after average pooling; the Projection Module's loss drops to
0.11, highlighting its 42.1% greater reduction in loss compared to the Encoder-decoder (0.19).
However, the Projection Module's superior accuracy in reflectance estimation requires
more processing time than the Encoder-decoder. The Encoder-decoder processes at a notably
rapid pace of 0.0026 seconds, much faster than the Projection Module's 0.18 seconds.
3.6.3.2 Visual comparison of 2D reflectance images
The performance of the Projection module is then visually assessed by comparing the
generated 2D reflectance images with the ground truth. The visual comparison (Figure 3.21),
reveals the Projection module's capacity to replicate the diffuse patterns of light interaction
with the tissue. We include 4 cases from the testing data with its L1 loss listed in Table 3.10.
Lower the loss, better the performance is.
Figure 3.21 Case 1 shows the result reflectance where the Encoder decoder obtains the
highest L1 loss (1.4) among all the test samples (Table 3.10). Case 1 reveals a stark contrast in
performance between the Projection module and the Encoder-decoder. The Projection module
displays a 2D Gaussian-like distribution of reflectance values centered around the middle,
closely resembling the Ground Truth. In contrast, the Encoder-decoder produces a highly
structured, grid-like artifact. This case suggests that the Encoder-decoder, often considered a
161
black box, tends to generate results that lack physical meaning, as we cannot control what it
learns.
Figure 3.21: Visual comparison of Projection Module and Encoder-decoder in estimating 2D
reflectance images
The figure visually assesses the performance of the Projection module and Encoder-decoder by comparing
their generated 2D reflectance images with the Ground Truth across four distinct cases. Case 1 highlights
the Encoder-decoder displaying highly structured, grid-like artifacts unlike the more accurate Gaussianlike distribution produced by the Projection module. In Case 2 and Case 3, the results from both models
reflect the diffused patterns seen in the Ground Truth. The Projection module demonstrates more of the
diffused distribution, while the Encoder-decoder captures the clearer location of the lesion. Case 4
showcases the best reflectance generation, with the Projection module slightly outperforming the
Encoder-decoder. Overall, the Projection module consistently demonstrates a closer resemblance to the
Ground Truth and avoids producing the non-physical artifacts typical of the Encoder-decoder, confirming
its advantage in estimating 2D reflectance more accurately.
Case 2 reveals the result reflectance where the Projection module performs the worst
among all the test samples, reaching a 0.9 test L1 loss (Table 3.10). Though both networks
162
capture the general location of the lesion, represented by the blue hole in the reflectance
images (Figure 3.21 Case 2), the Encoder-decoder outputs a clearer and more defined pattern
than the Projection module. The lack of clarity of the lesion in the Projection module is caused
by its intent to learn the stochastic diffuse pattern. Neither network can fully capture the
significant degradation in ground truth reflectance.
In Case 3, the test L1 losses for both approaches are the same, each with a value of 0.12
(Table 3.10). The results from two models capture a general diffuse pattern and closely mirror
the Ground Truth. The Projection module demonstrates more of the diffused distribution, while
the Encoder-decoder captures the clearer location of the lesion.
Case 4 displays the best reflectance generations from both models, revealing the complex,
diffuse distribution seen in the Ground Truth (Figure 3.21 Case 4). The Projection module's
output slightly outperforms the Encoder-decoder, with losses of 0.029 versus 0.042 for the
Encoder-decoder (Table 3.10).
Table 3.10 Comparative L1 Loss for test dataset between Encoder-decoder and Projection module
across four test cases
The table displays the L1 loss values for the Encoder-decoder and Projection module across four distinct
test cases. The data illustrates the differences in performance between the two models, with the
Projection module generally achieving lower L1 loss values, indicative of closer adherence to the ground
truth. Particularly noteworthy are the results from Case 1, where the Projection module significantly
outperforms the Encoder-decoder, underscoring its efficiency in generating more accurate 2D reflectance
images without the non-physical artifacts typical of the Encoder-decoder.
163
In conclusion, the comparative analysis across multiple test cases shows that the Projection
module consistently outperforms the Encoder-decoder in estimating 2D reflectance. The
Projection module not only maintains a closer resemblance to the ground truth with its diffused
distribution but also avoids generating non-physical, structured artifacts unlike the Encoderdecoder.
3.6.3.3 Interpretability of Projection module
We further analyze the interpretability of test L1 loss of two network architectures—the
Projection Module and the Encoder-decoder—to determine if it correlates with the physical
properties of the test scenarios. This analysis involves examining how different illumination
locations affect the normalized L1 loss of the 2D reflectance generation. In Figure 3.22, the
average of the L1 loss (shown as color dots) among the test samples with the same illumination
is computed and normalized. This loss is further color-mapped, with hues closer to magenta
indicating higher losses and those closer to cyan indicating lower losses. The X and Y axes
represent the pixel coordinates of the illumination in the reflectance image. A red diamond
marks the center coordinates of the reflectance image, pinpointing the lesion's location.
For both network architectures, the L1 loss generally increases as the illumination point
moves closer to the image center, typically resulting in higher losses (more magenta hues). This
pattern is logical since illumination closer to the lesion tends to produce reflectance images that
deviate from a 2D Gaussian-like diffusion pattern, making them more complex and challenging
to model. This scenario is exemplified in cases 2 and 3 of the Figure 3.21, which are notably
more difficult for the networks to learn compared to cases 1 and 4 of the Figure 3.21.
164
Interestingly, while the L1 loss in the Projection Module consistently adheres to this trend,
the Encoder-decoder exhibits inconsistencies, particularly in the third row of the Figure 3.22 B.
The discrepancies observed in the Encoder-decoder could stem from its propensity to generate
highly structured, grid-like artifacts that lack physical plausibility. This finding suggests that the
L1 loss from the Projection Module is more interpretable and predictable than that from the
Encoder-decoder, indicating a more reliable performance when transitioning to clinical use.
Figure 3.22 Effect of illumination location on normalized L1 Loss for two network architectures
The figure analyzes the normalized L1 loss of the Projection Module (A), and the Encoder-decoder (B) as
influenced by varying illumination locations in 2D reflectance generation. We compute the average
normalized L1 loss (Color dots) among the test cases sharing the same pixel coordinates of the illumination
within the reflectance image, with magenta indicating higher losses and cyan representing lower losses.
As the illumination point locates closer to the image center (Red diamond), L1 loss typically increases,
reflecting greater complexity and challenge in modeling due to deviations from a 2D Gaussian-like
diffusion pattern caused by the lesion. Notably, while the Projection Module consistently follows this
trend, the Encoder-decoder shows inconsistencies, especially in the third row of Panel B, possibly due to
its tendency to generate unphysical, structured artifacts. These findings highlight the Projection Module's
superior interpretability and predictability of L1 loss, suggesting its potential for clinical applications.
165
3.6.4 Domain adaptation
Domain adaptation employs techniques that enhance a network's performance on
advanced datasets following initial pretraining on a basic synthetic dataset. The Reconstruction
Module, which is pretrained using this basic dataset, serves as a baseline approach. The
Teacher-student mDOI-Net is trained via semi-supervised learning on more complex datasets,
with its performance evaluated both visually and quantitatively. As we move from simpler to
more complex synthetic datasets, there appears to be a potential for increasing the network's
capability to handle diverse datasets. These tests help validate the feasibility of a seamless
transition from synthetic to real clinical datasets.
We test three different domain adaptation tasks: (i) Adaptation from Synthetic Derm Cone
Standard to Synthetic Derm Cone Constrained. (ii) Adaptation from Synthetic Derm Cone
Standard to Synthetic Derm Cone Enhanced. (iii) Adaptation from Synthetic Uniform Cylinder to
OPTIMAP Uniform Cylinder.
Task (i) is considered the simplest of the three. It involves transitioning between two
datasets that use the same generation process for their 3D synthetic phantoms. The only
difference is the depth of the synthetic lesion in the Constrained dataset, which is adjusted
based on biopsy-verified Breslow depth. The primary adaptation challenge here is
accommodating the variations in lesion thickness between the two datasets.
Task (ii) also utilizes synthetic datasets but introduces slightly different 3D shapes and �!
values for the lesions in the Enhanced dataset. Here, the network needs to adapt to variations
in both the shape and optical coefficients of the lesions.
166
Task (iii) involves a more complex scenario, using both synthetic and OPTIMAP datasets.
This task requires the network to tackle multiple subproblems, including learning the diffuse
patterns of reflectance images under varied structured illumination, converting captured
intensity into reflectance, and denoising data from experimental noise. It represents the first
trial to validate the potential of the proposed network to handle real imaging data.
3.6.4.1 Quantitative Comparison of mDOI-Net
We compares the performance of two network architectures, the Reconstruction Module
and the Teacher-student mDOI-Net, transferring between basic and more advanced datasets
using metrics such as test weighted Mean Squared Error (MSE) loss and test 3D Structural
Similarity Index Measure (SSIM) Loss which are introduced in Chapter 3.5.2 .
Table 3.11 Comparison of the Reconstruction Module and Teacher-student mDOI-Net's performance
across various cases of domain adaptations.
We list the test weighted Mean Squared Error (MSE) loss and test 3D Structural Similarity Index Measure
(SSIM) Loss for transitions between different pretrained dataset and more advanced datasets. Notably,
the Teacher-student mDOI-Net consistently shows lower MSE and SSIM losses compared to the
Reconstruction Module, especially in the transition from Synthetic Uniform Cylinder to OPTIMAP Uniform
Cylinder. This result indicates Teacher-student mDOI-Net’s superior performance in adapting to realworld data with better accuracy and structural integrity.
In the adaptation from Synthetic Derm Cone Standard to Synthetic Derm Cone Constrained,
the Teacher-student mDOI-Net slightly outperforms the Reconstruction Module, achieving
167
lower MSE Loss and SSIM Loss (Table 3.11). This trend continues in the transition from
Synthetic Derm Cone Standard to Synthetic Derm Cone Enhanced, where the Teacher-student
mDOI-Net again shows better accuracy with a lower MSE Loss of 0.012, although the SSIM Loss
is comparable at 0.022 versus the Reconstruction Module's 0.021.
The most notable difference is observed in the transition from Synthetic Uniform Cylinder
to OPTIMAP Uniform Cylinder. Here, the Teacher-student mDOI-Net demonstrates markedly
better performance with an MSE Loss of 0.034 and a SSIM Loss of 0.18, compared to the
Reconstruction Module's 0.063 and 0.52, respectively. This indicates that the Teacher-student
mDOI-Net transitions to real-world data more effectively, maintaining better accuracy and
structural integrity.
Overall, while both architectures show strengths, the Teacher-student mDOI-Net
consistently records lower MSE Loss across all test cases, suggesting it offers more robust
performance, particularly in scenarios involving substantial variations in data characteristics.
3.6.4.2 Visual Comparison of Reconstructions
We showcase the performance of the first two domain adaptation scenarios: transitioning
from the Synthetic Derm Cone Standard to the Synthetic Derm Cone Enhanced and Constrained
datasets (Figure 3.23). The test sample from the Standard dataset is reconstructed using the
Reconstruction Module, while the samples from the Enhanced and Constrained datasets are
processed using mDOI-Net. The effectiveness of the reconstructed 3D �! matrices across these
various datasets is evaluated against the synthetic lesion ground truth, with each example
featuring both a 2D surface slice and a 3D rendering of �! .
168
Figure 3.23 Comparative analysis of domain adaptation in Synthetic Derm Cone Datasets: ground
Truth versus computed reconstructions
This figure illustrates the performance of domain adaptation scenarios in transitioning from the Synthetic
Derm Cone Standard dataset to the Synthetic Derm Cone Enhanced and Constrained datasets.
Reconstruction Module is applied to the Standard dataset (row 1), with its result demonstrates high
fidelity in both 2D and 3D representations. mDOI-Net is applied to the Enhanced and Constrained
datasets. While the Enhanced dataset reconstruction shows reasonable visual similarity with some
deviations in shape and intensity (row 2), the Constrained dataset reconstruction effectively captures
variations in lesion thickness (row 3). This visual comparison underscores the effectiveness of the network
architectures in handling complex reconstructions and highlights the potential of domain adaptation
techniques in transitioning from basic to more advanced datasets.
All the reconstructed outputs successfully capture the general shape properties of the
lesions, closely aligning with the respective ground truths across different datasets. The result
from Reconstruction module (Figure 3.23 row 1) demonstrates high fidelity in replicating the
ground truth in both 2D and 3D representations, indicative of effective learning capability of
the Reconstruction module. The reconstruction from the Enhanced dataset by mDOI-Net
(Figure 3.23 row 2) shows reasonable visual similarity but exhibits some deviations in shape and
intensity, especially in the 3D renderings. Moreover, the reconstruction from the Constrained
169
dataset (Figure 3.23 row 3) effectively captures variations in lesion thickness, which is a key
characteristic distinguishing the Standard and Constrained datasets. The visual analysis of the
mDOI-Net results highlights the capabilities of its network architectures in managing
reconstructions from multiple complex datasets and underscores the potential of domain
adaptation in transitioning from basic to more advanced datasets.
Figure 3.24 Visual Comparison of Domain Adaptation Outcomes in Reconstruction module and mDOINet
The figure illustrates the effectiveness of domain adaptation techniques through the reconstructed �-
outputs from the Reconstruction Module and the Teacher-student mDOI-Net. The Ground truth (first row)
shows the expected lesion shape in both a 2D slice and a 3D rendering, serving as a benchmark for
subsequent reconstructions. The output from the Reconstruction Module (second row) indicates the
lesion's location but lacks precision in reproducing the �- values. The output from the Teacher-student
mDOI-Net (third row), enhanced through semi-supervised learning on the OPTIMAP dataset. Noticeable
improvements are evident in both the 2D slice and 3D rendering for the output of mDOI-Net. Despite not
being flawless, this comparison highlights the advantages of advanced domain adaptation techniques,
particularly in real-data scenarios.
170
Figure 3.24 illustrates the effectiveness of domain adaptation techniques in improving
network performance through a visual comparison of the surface slice and 3D renderings for
reconstructed �! matrix from both the Reconstruction Module and the Teacher-student
mDOI-Net. The first row of Figure 3.24, labeled "Ground truth," shows the expected shape of
the lesion in both 2D slice and 3D rendering forms, providing a reference for evaluating the
subsequent reconstructions.
The Reconstruction Module's output (Row 2) is pretrained on a basic synthetic dataset and
then tested directly on real data. While the output indicates the lesion's location with a higher
spatial frequency of dot artifacts in the center than the surrounding areas, it lacks granularity
and precision in accurately reproducing the �! values. This visually confirms the limitations of
training exclusively on simpler synthetic datasets.
The output of the Teacher-student mDOI-Net, which has been enhanced through semisupervised learning on the OPTIMAP dataset, shows marked improvements. This is evident
from the more detailed reconstructions in both the 2D slice and 3D rendering. Although not
perfect, the comparative visualization of the Teacher-student mDOI-Net's output alongside the
Reconstruction Module's output highlights the potential benefits of using advanced domain
adaptation techniques, particularly in scenarios that involve real datasets.
3.7 mDOI-Net improvements over 3D-mDOI and standard ML
We perform a comprehensive assessment for the performance improvements and potential
of clinical adoption achieved by our physics-enhanced neural networks against traditional 3DmDOI and Deep Learning baseline approaches. This section provides a detailed comparison for
four methods for reconstructing 3D information from multiplexed Diffuse Optical Imaging data:
171
Algorithmic (3D-mDOI), ML Supervised (DLs), ML Supervised (Reconstruction module), and ML
Unsupervised (mDOI-NET).
3.7.1 Evaluation for reconstruction performance
Based on the reconstruction results shown in Chapter 3.6, we summarize the improvements
of mDOI-Net reconstruction with respect to two general areas: Performance/Accuracy, and
Efficiency/Speed.
Performance and Accuracy focuses on the type and accuracy of 3D reconstruction provided
by the various approaches, as well as its effectiveness in handling artifacts and noise. The
ability to deliver precise and reliable images is fundamental for any technology aimed at clinical
use.
Efficiency and Speed: The speed at which a model can be trained and deployed directly
impacts its eventual utility in a clinical environment. Metrics such as training convergence and
deployment efficiency are critical, especially in settings where timely results can significantly
influence clinical results and patient outcomes.
3.7.1.1 mDOI-Net improvements in reconstruction performance
For performance analysis (Table 3.12), 3D-mDOI is characterized as a baseline method for
comparing the loss improvement among Deep Learning approaches. Since 3D-mDOI is an
algorithmic reconstruction approach, its handling of artifacts and noise is considered weak,
often leaving grid artifacts on the surface of the reconstruction. Though U-Net shows no
improvement over baseline, the Encoder-decoder network demonstrate an improvement in
loss, having 93% lower loss comparing to 3D-mDOI. It manages artifacts and noise to a degree,
172
removing the grid artifacts found in 3D-mDOI, but it does introduce its own artifacts. The
Reconstruction module excels beyond the Encoder-decoder, achieving a relative improvement
of 95% superior to 3D-mDOI. This model not only enhances the precision of reconstructions but
also boasts strong artifact and noise management, crucial for producing clear reconstruction
results without artifacts. mDOI-NET, with its Semi supervised learning capabilities, further
demonstrates a remarkable 37.05% improvement in loss over its supervised counterparts in
tasks of domain adaptation. This enhancement is particularly noteworthy as it achieves these
results without relying on extensive labeled datasets, thereby broadening its applicability in
diverse clinical settings where such data may be scarce.
Table 3.12 Performance and processing time comparison of different approaches for 3D
reconstruction
This table first illustrates the performance analysis of various Deep Learning approaches in comparison to
the baseline 3D-mDOI, which is used primarily to measure loss improvement. Notably, the Reconstruction
module surpasses the Encoder-decoder with a 95.69% improvement in testing loss. This method process
a sample reconstruction in 1.1 seconds, effectively balancing speed and performance. In contrast, mDOINET shows a 37.05% improvement in the domain adaptation task, which is significant given its ability to
perform without extensive labeled datasets. mDOI-NET matches the deployment speeds of the fastest DL
baselines at 0.0032 seconds, making it viable for high-speed clinical applications.
Though 3D-mDOI achieves a remarkable 94% reduction in computation time compared to
Finite Element Method (Chapter2.5.4 ), it still requires 117 seconds for reconstruction, which is
the slowest option compared to all Deep Learning-based methods we considered. Transitioning
173
to the DL baselines, these models are exceptionally fast, deploying in around 0.003 seconds,
making them ideally suited for high-speed clinical applications. The Reconstruction module
takes slightly longer to reach training convergence than DL baselines due to the addition of
both photon banana lifting and projecting layers (Chapter 3.4). Nevertheless, it boasts a swift
processing time of 1.1 seconds, effectively balancing speed and performance, and bridging the
gap between traditional methods and advanced Deep Learning techniques.
Shifting focus to mDOI-NET, especially in its GAN structure, this method encounters more
substantial challenges in training convergence due to the need to balance the generator and
discriminator dynamics. This complexity might hinder its adaptability in scenarios where rapid
deployment is essential. Despite taking longer to train, mDOI-NET's processing time is 0.0032
seconds, which has the similar processing speed to the DL baselines. The advantages of mDOINET suggest that the trade-offs in long training time are a small price to pay for significant gains
in diagnostic capability and computational speed.
In order to comprehensively assess the potential for clinical adoption of various medical
imaging technologies, it is essential to employ a multidimensional set of evaluation metrics.
These metrics go beyond mere reconstruction performance and efficiency, encompassing a
range of criteria that together provide a holistic view of a method's applicability in clinical
settings. This section introduces and explains the following key metrics used to evaluate clinical
adoption potential.
Data and Training Requirements: The scalability of a model in clinical settings often
depends on its data requirements. This includes the necessity for large datasets and whether
174
the technology can only function with the aid of 3D ground truth data, which may not always be
readily available in clinical environments.
Interpretability and Usability: For medical technologies to be effectively integrated into
clinical workflows, they must not only be robust and flexible but also interpretable and easy to
use. Factors such as model and data bias, along with the integration of bio-photonics theories,
play a crucial role in ensuring that the outputs are understandable and actionable by medical
professionals.
Flexibility and Robustness: Technologies that adapt well to different clinical conditions and
variable data quality are more likely to be successfully integrated into clinical practice. Metrics
for this category include clinical translatability and adaptability to variability, assessing how well
the technology performs under diverse operational conditions.
By evaluating medical imaging technologies across these metrics, researchers and clinicians
can better understand their potential impact and limitations, facilitating more informed
decisions regarding their adoption and implementation in real-world clinical settings.
3.7.1.2 mDOI-Net improvements in clinical adaption potential
3D-mDOI is an algorithmic method that does not require any training process. To enhance
the generalizability of 3D-mDOI, we have proposed a Reconstruction module by incorporating
Deep Learning components into the 3D-mDOI pipeline. While this integration significantly
improves the reconstruction performance of 3D-mDOI, it necessitates a training process. This
raises a concern for clinical usage, as clinical datasets are hard to access. Unlike Deep Learning
baselines and the Reconstruction module, which require large datasets and depend on 3D
ground truth data when transitioning from synthetic to clinical datasets, mDOI-NET significantly
175
mitigates this dependency. By leveraging pretrained reconstruction and projection modules
with synthetic dataset, mDOI-NET achieves high levels of accuracy in domain adaptation with
fewer real-world examples. This innovative approach not only lowers the barriers associated
with extensive data gathering often encountered in medical fields but also reduces the overall
training time and resource consumption, streamlining the path from research to clinical
application.
As for Interpretability, 3D-mDOI strictly adheres to bio-photonics theory, which provides a
solid theoretical foundation. However, its generalizability is limited due to the model's
simplicity, as not all biophysical rules can be easily modeled analytically. On the other extreme,
reconstructing using Deep Learning baselines is a purely data-driven method. Although this
approach does not face model bias, it still encounters limitations due to data bias, as there is
few explanations how dataset guides the network for reconstruction. The physics-based
Reconstruction module and mDOI-NET are positioned in the middle of this spectrum, enhancing
the interpretability of Deep Learning networks. This ease of interpretation is crucial for clinical
adoption, ensuring that practitioners can trust and effectively utilize the models. Additionally,
the design of these models facilitates easy adjustments and fine-tuning, allowing for rapid
corrections and updates based on ongoing clinical feedback and evolving medical knowledge.
We position mDOI-NET as the most adaptable and clinically applicable among the
technologies evaluated. While ML Supervised methods, including Deep Learning baselines and
the Reconstruction module, excel in data-rich environments, they require substantial resources
to train the network. 3D-mDOI, with its lengthy computation times and limited model
generalizability, is better suited for research rather than clinical applications. In contrast, the
176
flexibility of mDOI-NET is demonstrated by its robust domain adaptation capabilities, which
facilitate a seamless transition from experimental to clinical settings. This adaptability is
essential for managing unexpected cases and variations in experimental conditions, such as
fluctuating Signal-to-Noise Ratios (SNRs) and diverse levels of empirical noise. The versatility of
mDOI-NET enables it to perform reliably across a wide range of scenarios, making it an
invaluable tool in the rapidly evolving field of medical diagnostics.
3.7.2 Specialized adoption scenarios among various methods
We argue against the one-size-fits-all solution in the development of medical imaging
technologies. Recognizing that each method has its distinct advantages and disadvantages, it is
crucial to match these capabilities with appropriate clinical applications.
Algorithmic (3D-mDOI)
• Best used for gaining a coarse reconstruction of the detected object without
requiring any training process.
• Ideal for delivering explainable reconstructions underpinned by a strong theoretical
foundation in bio-photonics without considering the computational speed.
ML Supervised (DL baselines)
• Optimal for data driven, large-scale medical studies that can support extensive highquality data.
• Recommended for environments where fast computational speed is a critical
priority.
• 3D ground truth data is needed in training process.
• No interpretability is needed for the networks.
ML Supervised (Reconstruction module)
177
• The high interpretability of mDOI-Net makes it a promising method for enhancing
patient outcomes and supporting medical decision-making.
• It’s suitable for settings where rapid deployment is necessary, but some complexity
and higher in training computational resources are manageable.
• 3D ground truth data is needed in training process.
ML Unsupervised (mDOI-Net)
• The preferred method for domain adaptation with limited access of data.
• With its strong performance in clinical translatability and adaptability to variability, it
is well-suited for exploratory and adaptive clinical environments.
In general, each method presents unique benefits and trade-offs between accuracy, noise
handling, training convergence, and deployment efficiency. The choice between these models
would depend on specific clinical requirements, such as absence of training (3D-mDOI), reduced
time-to-results (ML Supervised DL baselines), high accuracy (ML Supervised Reconstruction
module), or adaptability in unsupervised settings (mDOI-NET). This analysis helps in selecting
the appropriate technology based on the operational demands and specific clinical scenarios
faced.
3.8 Limitations and Future Work for mDOI-Net
Despite the promising capabilities of the mDOI-Net, our approach to data generation
encounters some limitations, specifically in terms of large-scale diversity of the training dataset
and its inherent training costs. The current framework of mDOI-Net depends heavily on
synthetic datasets, primarily derived from the HAM10000 dataset, to train its Deep Learning
architecture. While this has enabled the initial development of a robust network capable of 3D
178
tissue structure estimation, it also poses a challenge to the system’s ability to adapt to realworld scenarios. The synthetic nature of the data may not capture all the variability and
complexity of physical lesion samples, which can limit the model's predictive accuracy in clinical
applications.
To address the constraints in data generation and improve the model's robustness, we
could explore more advanced data synthetic techniques to extensive skin model complexity in
the future. By augmenting our dataset, we aim to introduce a wider array of skin and lesion
characteristics, thereby enhancing the network’s ability to learn more diverse patterns and
features. Firstly, the current 3D synthetic phantom uses uniform dermis to cover the synthetic
lesion. While this may serve basic modeling purposes, it currently lacks the complexity needed
to fully mimic the heterogeneous nature of human skin. Physical skin samples exhibit a
multitude of characteristics including varying layers, melanin, and collagen levels, as well as
other features such as freckles, scar tissue, hair or tattoos. Secondly, future iterations of the 3D
lesion generation are expected to represent the intricate structures of melanoma more
accurately. This includes detailed vasculature and the various thicknesses and patterns of
cancer invasion in the skin layers. Such advancements would significantly improve the model's
capability to be used in medical research, potentially aiding in the early detection and
treatment planning of melanoma by providing a more precise and detailed representation of
the disease's manifestation in the skin.
These enhancements in the variety of data simulation come at the cost of extended training
times and increased computational resources. Even though mDOI-Net surpasses the
algorithmic 3D-mDOI in the deploying time, its training and testing phases are longer in
179
comparison to end-to-end baseline solutions. mDOI-Net’s integration of photon migration in
the network further requires additional memory storage and computational power comparing
to the end-to-end baseline solutions. The extensive needs for training time and computational
resources becomes a non-trivial issue when we attempt to scale up the diversity of the training
dataset. The substantial computational cost may become a bottleneck for the medical device
to achieve the techniques like online training and edge computation. Recognizing this, one of
our primary goals for future work is to optimize system’s scalability and generalizability without
compromising the training time and computational resources for the network.
This thesis presents the results of preclinical research and development for the optical
biopsy using mDOI-Net, which holds significant potential for impactful applications, especially in
primary care settings. mDOI-Net’s capability to quickly produce 3D reconstructions of tissue
lesions from 2D reflectance data has important potential for early diagnosis and treatment
planning.
As we transition from preclinical to clinical phases, we are meticulously planning clinical
trials to assess the efficacy and safety of mDOI-Net in real-world medical settings. Robust
prototypes will be developed for these trials to ensure that mDOI-Net delivers consistent and
repeatable results across various scenarios. We will establish partnerships with clinical
stakeholders to obtain melanoma pathology slides as benchmarks to calibrate and refine mDOINet’s diagnostic capabilities. These steps are designed to continuously evolve mDOI-Net into a
clinical tool that will eventually become integral to the diagnostic process. The cost-efficiency,
flexibility, and interpretability of mDOI-Net make it a promising method to enhance patient
outcomes and support medical decision-making.
180
3.9 Summary
Throughout this Chapter, we introduce mDOI-Net, an innovative Deep Learning framework
that refines the conventional workflow of 3D-mDOI. The dual-module neural network, with its
reconstruction and projection nets, represents a significant leap forward in the field. Here, we
reflect on the advantages of mDOI-Net, the trade-off between computational time and
interpretability, and its potential usage.
mDOI-Net brings multiple advantages to the reconstruction of DOI, chief among them being
its ability to perform Deep Learning training workflows while circumventing the need for
extensive 3D tissue models during training. This innovation not only simplifies the process but
also reduces the dependency on vast datasets that are often a challenge to procure.
Additionally, the framework's versatility allows for independent or combined operation of its
networks, ensuring adaptability to a wide range of diagnostic challenges.
An inherent trade-off within mDOI-Net lies in balancing computational efficiency with the
depth of interpretability. While the system is designed to provide a detailed understanding of
its processes, this comes at the cost of increased computational time. However, this trade-off is
justified by the value of interpretability in medical contexts, where understanding the 'why'
behind a diagnosis is as crucial as the diagnosis itself. Enhancing the trust and confidence of
medical practitioners and their patients in DOI technology is a priority, and mDOI-Net's
interpretable nature contributes significantly to this end.
The potential usage of mDOI-Net extends into various realms of clinical diagnostics, with a
particularly profound impact anticipated in dermatological imaging. Its ability to rapidly and
accurately reconstruct tissue optical properties has the potential to facilitate early detection of
181
pathologies such as melanoma. The flexibility of mDOI-Net suggests its applicability beyond
dermatology, into any medical field where DOI can be leveraged for non-invasive diagnostics.
With continued refinement, particularly in computational optimization, mDOI-Net can
become an indispensable tool for clinicians, enhancing diagnostic workflows and contributing
to improved patient outcomes. The implications of this work are vast, signaling a new era
where advanced technology and clinical practice converge to push the boundaries of what's
possible in medical diagnostics.
182
Chapter 4 Conclusions
This thesis explores the innovative imaging techniques of multisite diffuse optical imaging
(mDOI) as a transformative approach for diagnosing melanoma. In this chapter, we will first
summarize the findings from our current research. Next, we will elaborate on why the method
is innovative and beneficial for melanoma diagnosis and beyond. Lastly, we will discuss the
future directions for transitioning the technology from its current version to a clinical device.
4.1 Summary of current mDOI findings
Melanoma, a type of skin cancer, has been a growing concern in the United States. Tumor
depth is a crucial diagnostic and staging parameter because it significantly influences prognosis
and treatment strategies. However, the assessment of 3D melanoma insights is often not
included in initial dermatological examinations. mDOI technologies provide a potential solution
to assess 3D insights based on subsurface composition differences with a versatile, noninvasive, and cost-effective hardware setup.
In Chapter 1, we discussed traditional optical biopsy, which provides immediate, in situ
evaluations without tissue removal, aiding in the early detection of diseases like cancer. We
also explored its limitations, including limited availability and high costs. Additionally, we
introduced mathematical models describing how light interacts with tissue and how diffuse
optical imaging techniques apply these models to reconstruct a tissue’s optical properties and
differentiate lesions from surrounding tissue. The chapter further detailed how machine
learning, particularly Deep Learning, is being applied to enhance skin diagnosis and improve the
reconstruction processes in diffuse optical imaging.
183
Following the background information and motivation of the project, in Chapter 2 we
introduced the 3D-mDOI pipeline. Here, we presented Tissue Imaging Multisite Acquisition
Platform (OPTIMAP), a high-efficiency, contactless, low-cost automated image acquisition
system. This system is an adaptation of the traditional diffuse optical imaging device, enhanced
through structured illumination. We developed a customized algorithmic solution for
reconstructing a 3D volumetric matrix of a lesion's optical attributes. Our analytical solution
offers higher reconstruction fidelity and efficiency with reduced computational load compared
to the gold standard Finite Element Method (FEM), as it is better suited to OPTIMAP's design,
which emphasizes low cost and high signal-to-noise ratio. Additionally, we developed a
comprehensive quantitative evaluation pipeline that enables access to various aspects of
reconstruction quality and verified the performance of 3D-mDOI using simulated and physical
phantoms. Overall, 3D-mDOI holds significant potential for accessing subsurface information in
biological tissues.
Chapter 3 introduces a Deep Learning alternative to address the challenge of enhancing
adaptability and computational efficiency. Due to the limited availability of both tools capable
of collecting 3D data and volumetric datasets in dermatology, we created a simulated lesion
starting from a 2D dermoscopic lesion image and computed how simulated 3D lesions and the
corresponding reemitted 2D light would appear when the 3D lesion phantom is imaged with
our OPTIMAP instrument. The simulation dataset created allows for large-scale data generation
at low cost while maintaining good biological diversity. Besides the simulated training dataset,
we developed an interpretable Deep Learning network, mDOI-Net, for 3D optical coefficients
reconstruction to optimize our 3D-mDOI pipeline. The mDOI-Net combines a Reconstruction
184
module, which estimates 3D information from a 2D acquisition, with a Projection module,
which estimates our instrument's acquisition from 3D reconstruction. Our network results in
superior speed and enhanced quality, allowing network transferability with limited 3D data.
Overall, it has higher potential for clinical transition than 3D-mDOI.
4.2 Advancing clinical practice with mDOI
We highlight mDOI's strengths in delivering accessible, user-friendly, and widely applicable
3D optical property insights for skin cancer diagnostics and beyond in this section.
mDOI techniques introduce a low-cost, accessible imaging solution that can potentially be
integrated into primary care settings and even developed into clinical devices. We utilized offthe-shelf components such as a Digital Light Processor and a CMOS camera to build our
OPTIMAP prototype. The total cost of the prototype is around $3,000, which is significantly
lower compared to the cost of traditional optical biopsy equipment that generally starts around
$60,000. By adapting the customized design of the OPTIMAP for clinical devices, the cost of the
imaging system could be further reduced to an estimated $200. This pricing falls within a range
that is affordable for family doctors and even patients.
In the mDOI pipeline, data captured by a low-cost imaging system undergoes analysis to
provide both reference and quantitative measurements, demonstrating the method's efficacy.
Particularly, the visualization of 3D reconstructed outputs of optical properties effectively
accentuates differences in lesion characteristics compared to surrounding tissues. A
comprehensive quantitative analysis is developed, providing insights into the 3D structure of
lesions that go beyond the traditional 2D ABCDE (Asymmetry, Border, Color, Diameter,
185
Evolving) criteria. The combination of visualization and quantitative analysis enables better
estimation of lesion traits for further diagnosis.
mDOI's imaging system and quantitative analysis meet dermatologists' preferences owing
to its user-friendly design, as indicated by our customer discovery. The imaging system does not
require direct viewing through an instrument, unlike standard dermoscopy. This allows the
doctor to maintain a comfortable distance from the patient, improving the imaging experience
for both parties. The quantitative measurements provide additional information, particularly 3D
insights, to assist doctors during the diagnosis process rather than delivering a diagnosis result
directly. This approach allows doctors to make more informed decisions, ensuring they
maintain responsibility for the diagnosis, which enhances their comfort and confidence in the
accuracy of their assessments. Furthermore, the consistency of the analysis process across
different doctors ensures the reliability of the diagnostic data. In practical use, imaging data can
be captured in primary care settings, allowing dermatologists to review the reconstructions for
further diagnosis. This process facilitates the seamless integration of mDOI technologies into
routine patient care and enhances collaborative treatment planning.
The reconstructed 3D optical properties can be further processed to provide either a more
detailed 3D tumor contour or lesion composition, depending on the application. For instance,
predicting the finer 3D lesion contour aids in assessing melanoma progression and planning
surgical interventions [137], such as excisional biopsies and Mohs surgery, with a reduced risk
of over-cutting and potential nerve damage or facial scarring [138]. Additionally, the
reconstructed 3D optical properties can further unmix tissue compositions, estimating the
percentage of melanin, oxyhemoglobin, deoxyhemoglobin, water, and lipids [139, 140]. This
186
post-processing not only improves melanoma detection but also opens new avenues for
studying skin conditions by tracking changes in lipid content, and hydration levels by examining
the percentage of water content.
4.3 Transitioning current mDOI to clinical devices
The transition of mDOI technology from its current research-based implementation to a
clinical-ready device involves several critical steps, generally encompassing two key aspects: AI
training with clinical data, clinical device design and manufacturing. This section outlines the
necessary resources, processes, and considerations required to achieve this transformation,
ensuring that the technology is accessible, reliable, and user-friendly for widespread clinical and
personal use.
4.3.1 mDOI-Net: from simulation to clinical data
We have verified the promising capabilities of the mDOI-Net using our synthetic dataset.
Although we have considered various aspects of lesions during data generation, the clinical
data is much more complex than our model. We plan to use more advanced simulations to
pretrain the mDOI-Net, ensuring it reaches a high level of proficiency with simulated data
before applying it to real clinical data. This section details the steps required to enhance the
mDOI-Net's performance through complex simulations, transition of knowledge of mDOI-Net to
more clinically relevant datasets, and subsequent application to clinical samples.
4.3.1.1 Pretrain mDOI-Net with the advanced simulation dataset
The initial phase of training mDOI-Net involves using millions of basic simulated samples to
train the Reconstruction and Projection modules. The foundation of our simulations is the
187
HAM10000 dataset, containing more than 10000 dermoscopic images from different
populations acquired and stored by different modalities. Over time, we can explore more
sophisticated simulation techniques, incorporating a wider array of lesion stages and tissue
compositions to better mimic clinical data.
For each lesion, we plan to generate five different 3D shapes and three different levels of
melanin concentration to account for the variability in lesion presentation. Each of the 150,000
lesion variations can be subjected to additional tissue modifications, such as variations in
collagen levels and other extracellular matrix components, to simulate different skin types and
aging effects. Furthermore, anatomical characteristics such as proximity to blood vessels, bone,
and surrounding tissue heterogeneity are introduced to enhance the realism of the simulations.
By applying around 10 different variations for each lesion case, we aim to reach approximately
1.5 million simulations in total.
These realistic simulations in the dataset are crucial for building a robust foundation in the
network, ensuring it comprehends the vast biological diversity of skin and lesions. This
thorough pretraining effectively prepares mDOI-Net to handle the complexities of real-world
data.
4.3.1.2 Clinical transition
While simulations provide a foundational dataset, validation with real patient samples is
crucial for ensuring the technology’s clinical relevance and accuracy. Clinical data samples are
expected to be gathered under a standardized imaging protocol to maintain consistency and
accuracy. This involves the use of techniques such as reflectance confocal microscopy to collect
3D ground truth and mDOI clinical level imaging porotype to capture the 2D reflectance data via
188
structured illumination. Standardization is key to ensuring that the training data closely
resembles the conditions under which the device will be used in real clinical settings.
Acquiring a diverse set of real samples is crucial for testing and refining the mDOI system, as
it helps identify and address unpredictable elements not present in the initial simulations. This
step involves collaborating with clinical partners to access a variety of skin types and lesion
samples, thereby enhancing the robustness of the AI models and the overall system. It’s
noteworthy that skin type and condition vary significantly across different regions, populations,
and even cultures [141]. For instance (Figure 4.1), the rate of melanoma in Australia is among
the highest in the world due to the predominantly fair-skinned population and high UV
exposure. In contrast, countries like China have a much lower melanoma incidence, influenced
by genetic differences and lower UV exposure. From a cultural perspective, for example,
traditional Chinese commonly place a strong emphasis on physical sun protection, with many
people wearing long sleeves and hats to reduce their overall UV exposure. Thus, it is essential
to collect regional samples that represent a wide distribution of these variations.
189
Figure 4.1 Diverse Age-Standardized Incidence Rate of Melanoma of Skin per 100,000 in Different
Regions
The figure illustrates the age-standardized incidence rate (ASR) of melanoma of the skin per 100,000
individuals, for both sexes, across various countries in 2022 [142]. The ASR values are color-coded, with
darker shades representing higher incidence rates. Regions with no applicable data or unavailable data
are also indicated. This visualization highlights the geographical disparities in the incidence of melanoma,
with notably higher rates in regions such as North America, Europe, and Oceania, and lower rates in parts
of Africa and Asia. The data underscores the need to prepare regional clinical datasets for fine-tuning
mDOI-Net, which will enable more accurate melanoma diagnosis tailored to the specific characteristics
and incidence rates of different regions. This approach aims to improve the diagnostic performance and
reliability of mDOI-Net across diverse populations, addressing the unique challenges and variations in
melanoma presentation globally.
Transitioning from simulations to real clinical data is a critical step in training the network. It
is important to recognize that a single network solution will not fit all scenarios due to the
diversity of the regional clinical samples. Training different AI networks for various regional
samples ensures that the mDOI technology can provide accurate diagnostics for users of
different demographics. We estimate a subdivision of the global population into two classes
190
based on the regional age-standardized incidence rate (ASR) of melanoma per 100,000 in Figure
4.1. The first class includes regions with higher ASR rates, such as North America, Europe, and
Oceania, while the second class encompasses regions with lower ASR rates, such as parts of
Africa and Asia.
By starting with a targeted collection of around 500 real samples for each class, covering
various skin and lesion conditions, we aim to fine-tune mDOI-Net for clinical applications. In
many clinical trials and medical studies, a sample size of several hundred samples is often
sufficient to capture a wide range of variability within a population and achieve statistically
significant results [143]. For example, to enable fast and high-quality reconstruction of clinical
accelerated multi-coil Magnetic Resonance data, a study by Hammernik et al. [144] combined
the mathematical structure of variational models with Deep Learning. Their dataset included 20
image slices from 10 patients, totaling 200 images for training. This well-targeted dataset
effectively demonstrated the clinical setting, validating the performance of reconstruction
results of the proposed models. Another clinical example uses the Messidor dataset [145],
which consists of 550 images, to apply transfer learning for optimizing Deep Learning models in
the automated detection of diabetic retinopathy. These studies demonstrated the sufficiency
and effectiveness of using datasets with a sample size of several hundred samples for clinical
applications, further supporting the use of compact yet comprehensive datasets for robust
model training.
When preparing the clinical datasets, we will ensure that the data diversity accurately
reflects the regional distribution of skin tones, types, and lesion conditions. This phase involves
validating the model's performance on real data and making necessary adjustments to ensure
191
reliability and accuracy in a clinical environment. By doing so, the model will be robust and
generalizable across different populations, ultimately improving its clinical applicability and
effectiveness.
4.3.1.3 Potential challenges
The development of robust mDOI performance requires significant computational
resources, involving the creation and pretraining of potentially millions of large-scale
simulations. This extensive process is necessary for the AI to accurately interpret and analyze
optical data under various conditions. However, these advancements lead to longer training
times and higher computational demands. Therefore, investing in high-performance computing
infrastructure is crucial to achieve the desired AI performance and reliability. Such extensive
resources are generally not available in academic settings, highlighting the need for industrylevel support and facilities.
Another significant challenge is acquiring a relatively small but diverse clinical dataset,
consisting of about 500 samples, that accurately represents various skin types and lesion
conditions. Establishing partnerships with dermatology clinics and research centers worldwide
is essential to ensure the collected data encompasses a wide range of skin tones, conditions,
and cultural practices. However, this process can be demanding due to varying regulations and
rules in each region. Each country may have different ethical guidelines, data protection laws,
and approval processes for clinical research. Navigating these regulatory landscapes requires
careful planning and collaboration with local institutions. Overcoming these hurdles is crucial to
ensure the comprehensive and diverse dataset needed for robust AI model training.
192
4.3.2 Imaging System: From Prototype to Clinical Device
Technological advancements have continually driven devices to become smaller, more
efficient, and cost-effective over time. This trend is crucial for the transition of mDOI
technology from a research prototype to a widely accessible clinical device. The journey
involves several key improvements and considerations to ensure the device is practical for
everyday use while maintaining high diagnostic accuracy.
4.3.2.1 Device Improvement
To make mDOI technology viable for clinical use, both hardware and preprocessing
components need significant enhancements. Here we list three areas that need to be improved
during the transition from prototype to clinical device.
Signal quality Improvement:
• Enhanced Detection: Use higher-quality detectors and stronger light sources to
improve the signal-to-noise ratio, resulting in better SNR acquisition.
• Shorten Imaging Times: Reduce imaging durations to approximately 10 seconds to
capture sufficient reflectance data, balancing the need for precision with practical
usability.
• Cost vs. Precision Trade-off: Carefully consider the trade-offs between the cost of
components and the precision of the reconstruction. Identifying the optimal balance
is essential for creating an affordable yet highly functional clinical device.
Preprocessing Improvements:
193
• Background Light Removal: Implement algorithms to eliminate background light
interference, ensuring clearer imaging results.
• Uneven Illumination Correction: Develop techniques to correct for uneven lighting
across the imaging field, which will enhance the accuracy of the reconstructed
images.
• Intensity- Reflectance correction: Optimize the intensity reflectance correction
computation, crucial for further 3D reconstruction.
Standardized Imaging Protocol:
• SNR and Precision Characterization: Characterize the signal-to-noise ratio and
reconstruction precision across different imaging settings to establish consistent
performance standards.
• Lookup Table Development: Develop a comprehensive lookup table for various
imaging parameters, such as exposure time, field of view, and imaging distance. This
will guide users in achieving optimal imaging conditions.
• Data Standardization: Ensure that data collection processes are standardized into a
uniform format suitable for testing with mDOI-Net. Consistency in data format is
crucial for reliable AI-based analysis.
By addressing these areas, we can significantly enhance the performance and usability of
the mDOI imaging system, paving the way for its adoption in primary care settings and even as
a clinical device.
194
4.3.2.2 Manufacture and quality control
To ensure the mDOI technology meets the necessary standards for widespread clinical use,
the manufacturing process must incorporate rigorous quality control measures. This section
outlines the general steps involved in manufacturing and quality control, focusing on
maintaining quality at a larger scale, device calibration, handling manufacturing tolerances, and
ensuring device reliability.
Calibrate the Device Before Usage
Calibration is critical to ensure that each mDOI device performs accurately and reliably.
Before devices are initially used, they will undergo a thorough calibration process as outlined in
Chapter 2.4.2, using a standard reference. The standard reference object will be a uniform
phantom with well-known optical properties that mimic those of human tissue. By using a
phantom with homogeneous characteristics, the calibration process can accurately set the
device's baseline performance, ensuring that any variations in measurements are due to actual
differences in tissue properties rather than device inconsistencies.
The user calibration process will be designed to provide easy-to-follow procedures for endusers to fine-tune the device before each usage. This ensures that the mDOI device can adapt
to the specific environment in which it will be used, optimizing its performance for accurate and
consistent results. Environmental factors such as temperature, humidity, and the amount and
type of background light can vary significantly, all of which can interfere with imaging results.
Therefore, the advanced calibration and standardization procedure (Chapter 4.3.2.1) involves
taking a few initial readings with the device in the intended operating environment and
195
adjusting its settings to compensate for any potential interference. This process ensures that
the images captured are clear and accurate, guaranteeing that the mDOI technology delivers
reliable results in real-world applications.
Potential challenges
Transitioning mDOI technology to a clinical device presents several design challenges. One
of the primary goals is to develop a compact, user-friendly, and affordable device without
compromising the quality of the imaging system. This involves overcoming various obstacles,
such as miniaturizing hardware components while ensuring their efficiency and reliability. For
instance, integrating more efficient Digital Light Processors and CMOS cameras is necessary to
maintain high performance, but this must be balanced against cost constraints. Solutions could
include the use of advanced materials and manufacturing techniques to reduce component
sizes and costs without sacrificing quality.
Another challenge lies in ensuring that the device can operate consistently across different
environments and conditions during large-scale production. This consistency is difficult to
achieve due to several factors. First, environmental variables such as temperature, humidity,
and room lighting conditions can significantly impact the performance of optical devices.
Designing a device that can automatically calibrate and adjust for these variables requires
advanced algorithms and robust hardware components. Second, scaling up production
introduces variability in manufacturing processes. Even minor inconsistencies in component
quality or assembly can lead to significant differences in device performance. Implementing
stringent quality control protocols, such as automated testing and inspection processes, is
essential to detect and address these variances. Overall, addressing these design challenges
196
with innovative solutions is essential to make mDOI technology viable for widespread clinical
use.
4.4 Summary
The transition of mDOI technology from research to a clinical-ready device involves several
critical steps, focusing primarily on two main aspects: AI training with clinical data and the
design and manufacturing of the clinical device. The adoption of mDOI technology from
simulations to clinical applications is a multifaceted process requiring extensive computational
resources and a carefully planned approach to data collection. The initial training of mDOI-Net
using approximately 1.5 million simulated samples leverages the comprehensive HAM10000
dataset and introduces extensive variations in lesion shapes, melanin concentrations, tissue
modifications, and anatomical characteristics to ensure a robust foundation. For clinical
validation, we estimate the need for around 500 real samples per regional class, supported by
empirical evidence from similar studies. These samples must reflect the diverse regional
distribution of skin tones, types, and lesion conditions to ensure accurate diagnostics across
different demographics. This thorough and detailed approach ensures that mDOI-Net is wellprepared to handle the complexities of real-world data, ultimately aiming to enhance
diagnostic accuracy and improve patient outcomes in skin cancer diagnostics.
Transitioning mDOI technology from a prototype to a clinical device requires significant
hardware and preprocessing improvements. Enhancing signal quality through better detection
and reduced imaging times is essential for balancing precision and usability. Effective
preprocessing algorithms for background light removal and uneven illumination correction will
ensure clear images. Standardized imaging protocols will guide optimal data collection and
197
analysis, ensuring consistency. Rigorous calibration processes will help adapt the device to
various environments, maintaining accuracy. These steps will make mDOI technology a reliable
and accessible device tool for skin cancer diagnostics, improving patient outcomes and reducing
costs.
mDOI techniques represent a significant step forward in the non-invasive diagnosis of skin
cancer and potentially other skin-related conditions. mDOI’s development reflects a
convergence of various fields, including optical engineering, dermatology, bio-photonics
modeling, and AI. It also points towards a future where advanced imaging tools are routinely
accessible in primary care settings. This thesis contributes to the foundational knowledge
necessary to drive further innovation and adoption of mDOI in clinical practice, ultimately
aiming to improve patient outcomes and reduce the global cost burden of skin cancer.
198
References
1. Society, A.C. Key Statistics for Melanoma Skin Cancer. 2022;
https://www.cancer.org/cancer/types/melanoma-skin-cancer/about/keystatistics.html].
2. SEER*Explorer. Five-Year Survival Rates.
3. Arnold, M., et al., Global burden of cutaneous melanoma in 2020 and projections to
2040. JAMA dermatology, 2022. 158(5): p. 495-503.
4. UK, c.r. Stages and types,Melanoma skin cancer.
5. Institute, D.-F.C. How We Diagnose Melanoma.
6. Alliance, M.R. Melanoma Statistics.
7. Harken, A.H. and E.E. Moore, Abernathy's surgical secrets e-book. 2017: Elsevier Health
Sciences.
8. Mitchell, T.C., G. Karakousis, and L. Schuchter, Melanoma, in Abeloff's Clinical Oncology.
2020, Elsevier. p. 1034-1051. e2.
9. Satheesha, T., et al., Melanoma is skin deep: a 3D reconstruction technique for
computerized dermoscopic skin lesion classification. IEEE journal of translational
engineering in health and medicine, 2017. 5: p. 1-17.
10. Wang, T.D. and J. Van Dam, Optical biopsy: a new frontier in endoscopic detection and
diagnosis. Clinical gastroenterology and hepatology, 2004. 2(9): p. 744-753.
11. Skaggs, R. and B. Coldiron, Skin biopsy and skin cancer treatment use in the Medicare
population, 1993 to 2016. Journal of the American Academy of Dermatology, 2021.
84(1): p. 53-59.
12. Weinstein, D.A., S. Konda, and B.M. Coldiron, Use of skin biopsies among
dermatologists. Dermatologic Surgery, 2017. 43(11): p. 1348-1357.
13. Lott, J.P., et al., Population-based analysis of histologically confirmed melanocytic
proliferations using natural language processing. JAMA dermatology, 2018. 154(1): p.
24-29.
14. Esserman, L.J., I.M. Thompson, and B. Reid, Overdiagnosis and overtreatment in cancer:
an opportunity for improvement. Jama, 2013. 310(8): p. 797-798.
199
15. Davidovits, P. and M.D. Egger, Scanning laser microscope. Nature, 1969. 223(5208): p.
831-831.
16. Huang, D., et al., Optical coherence tomography. science, 1991. 254(5035): p. 1178-
1181.
17. Rajadhyaksha, M., et al., In vivo confocal scanning laser microscopy of human skin:
melanin provides strong contrast. Journal of investigative dermatology, 1995. 104(6): p.
946-952.
18. Pellacani, G., et al., Reflectance confocal microscopy as a second-level examination in
skin oncology improves diagnostic accuracy and saves unnecessary excisions: a
longitudinal prospective study. British Journal of Dermatology, 2014. 171(5): p. 1044-
1051.
19. Glinos, G.D., et al., Optical coherence tomography for assessment of epithelialization in a
human ex vivo wound model. Wound Repair and Regeneration, 2017. 25(6): p. 1017-
1026.
20. Masters, B., P. So, and E. Gratton, Optical biopsy of in vivo human skin: multi-photon
excitation microscopy. Lasers in Medical Science, 1998. 13(3): p. 196-203.
21. Ilie, M.A., et al., In vivo confocal laser scanning microscopy imaging of skin
inflammation: Clinical applications and research directions. Experimental and
therapeutic medicine, 2019. 17(2): p. 1004-1011.
22. Ferrante di Ruffano, L., et al., Optical coherence tomography for diagnosing skin cancer
in adults. Cochrane Database of Systematic Reviews, 1996. 2018(12).
23. Edwards, S.J., et al., VivaScope® 1500 and 3000 systems for detecting and monitoring
skin lesions: a systematic review and economic evaluation. Health Technology
Assessment, 2016. 20(58): p. 1-259.
24. Pereira, P.M., et al., Melanoma classification using light-Fields with morlet scattering
transform and CNN: Surface depth as a valuable tool to increase detection rate. Medical
Image Analysis, 2022. 75: p. 102254.
25. Smith, L., et al., Machine vision 3D skin texture analysis for detection of melanoma.
Sensor Review, 2011. 31(2): p. 111-119.
26. Hoshi, Y. and Y. Yamada, Overview of diffuse optical tomography and its clinical
applications. Journal of biomedical optics, 2016. 21(9): p. 091312.
27. Jiang, H., Diffuse optical tomography: principles and applications. 2018: CRC press.
200
28. Hielscher, A., et al., Near-infrared diffuse optical tomography. Disease markers, 2002.
18(5, 6): p. 313-337.
29. Boas, D.A., et al., Imaging the body with diffuse optical tomography. IEEE signal
processing magazine, 2001. 18(6): p. 57-75.
30. Tromberg, B.J., et al., Non–invasive measurements of breast tissue optical properties
using frequency–domain photon migration. Philosophical Transactions of the Royal
Society of London. Series B: Biological Sciences, 1997. 352(1354): p. 661-668.
31. Choe, R., et al., Diffuse optical tomography of breast cancer during neoadjuvant
chemotherapy: a case study with comparison to MRI. Medical physics, 2005. 32(4): p.
1128-1139.
32. Corlu, A., et al., Three-dimensional in vivo fluorescence diffuse optical tomography of
breast cancer in humans. Optics express, 2007. 15(11): p. 6696-6716.
33. Eggebrecht, A.T., et al., Mapping distributed brain function and networks with diffuse
optical tomography. Nature photonics, 2014. 8(6): p. 448-454.
34. Zeff, B.W., et al., Retinotopic mapping of adult human visual cortex with high-density
diffuse optical tomography. Proceedings of the National Academy of Sciences, 2007.
104(29): p. 12169-12174.
35. Culver, J.P., et al., Volumetric diffuse optical tomography of brain activity. Optics letters,
2003. 28(21): p. 2061-2063.
36. Cerussi, A.E., et al., In vivo absorption, scattering, and physiologic properties of 58
malignant breast tumors determined by broadband diffuse optical spectroscopy. Journal
of biomedical optics, 2006. 11(4): p. 044005.
37. Jacques, S.L., Optical properties of biological tissues: a review. Physics in Medicine &
Biology, 2013. 58(11): p. R37.
38. Chance, B., Photon migration in tissues. 1989: Springer.
39. Zonios, G., et al., Melanin absorption spectroscopy: new method for noninvasive skin
investigation and melanoma detection. Journal of biomedical optics, 2008. 13(1): p.
014017-014017-8.
40. Garcia-Uribe, A., et al., In-vivo characterization of optical properties of pigmented skin
lesions including melanoma using oblique incidence diffuse reflectance spectrometry.
Journal of biomedical optics, 2011. 16(2): p. 020501-020501-3.
201
41. Farrell, T.J., M.S. Patterson, and B. Wilson, A diffusion theory model of spatially resolved,
steady-state diffuse reflectance for the noninvasive determination of tissue optical
properties in vivo. Medical physics, 1992. 19(4): p. 879-888.
42. Nichols, M.G., E.L. Hull, and T.H. Foster, Design and testing of a white-light, steady-state
diffuse reflectance spectrometer for determination of optical properties of highly
scattering systems. Applied optics, 1997. 36(1): p. 93-104.
43. Tseng, S.-H., A. Grant, and A.J. Durkin, In vivo determination of skin near-infrared optical
properties using diffuse optical spectroscopy. Journal of biomedical optics, 2008. 13(1):
p. 014016.
44. Arridge, S.R. and W.R. Lionheart, Nonuniqueness in diffusion-based optical tomography.
Optics letters, 1998. 23(11): p. 882-884.
45. Harrach, B., On uniqueness in diffuse optical tomography. Inverse problems, 2009. 25(5):
p. 055010.
46. Culver, J., et al., Three-dimensional diffuse optical tomography in the parallel plane
transmission geometry: Evaluation of a hybrid frequency domain/continuous wave
clinical system for breast imaging. Medical physics, 2003. 30(2): p. 235-247.
47. Pogue, B.W., et al., Instrumentation and design of a frequency-domain diffuse optical
tomography imager for breast cancer detection. Optics express, 1997. 1(13): p. 391-403.
48. Doulgerakis, M., A.T. Eggebrecht, and H. Dehghani, High-density functional diffuse
optical tomography based on frequency-domain measurements improves image quality
and spatial resolution. Neurophotonics, 2019. 6(3): p. 035007.
49. Torricelli, A., et al., Time-resolved reflectance at null source-detector separation:
improving contrast and resolution in diffuse optical imaging. Physical review letters,
2005. 95(7): p. 078101.
50. Cuccia, D.J., et al., Modulated imaging: quantitative analysis and tomography of turbid
media in the spatial-frequency domain. Optics letters, 2005. 30(11): p. 1354-1356.
51. O'Sullivan, T.D., et al., Diffuse optical imaging using spatially and temporally modulated
light. Journal of biomedical optics, 2012. 17(7): p. 071311.
52. van de Giessen, M., J.P. Angelo, and S. Gioux, Real-time, profile-corrected single
snapshot imaging of optical properties. Biomedical optics express, 2015. 6(10): p. 4051-
4062.
202
53. Gioux, S., A. Mazhar, and D.J. Cuccia, Spatial frequency domain imaging in 2019:
principles, applications, and perspectives. Journal of biomedical optics, 2019. 24(7): p.
071613.
54. Lyons, A., et al., Computational time-of-flight diffuse optical tomography. Nature
Photonics, 2019. 13(8): p. 575-579.
55. Satat, G., et al., All photons imaging through volumetric scattering. Scientific reports,
2016. 6(1): p. 1-8.
56. Konecky, S.D., et al., Imaging complex structures with diffuse light. Optics express, 2008.
16(7): p. 5048-5060.
57. Bélanger, S., et al., Real-time diffuse optical tomography based on structured
illumination. Journal of biomedical optics, 2010. 15(1): p. 016006.
58. Pogue, B.W., et al., Implicit and explicit prior information in near-infrared spectral
imaging: accuracy, quantification and diagnostic value. Philosophical Transactions of the
Royal Society A: Mathematical, Physical and Engineering Sciences, 2011. 369(1955): p.
4531-4557.
59. Liu, J., et al., Parametric diffuse optical imaging in reflectance geometry. Ieee Journal of
Selected Topics in Quantum Electronics, 2010. 16(3): p. 555-564.
60. Chen, J. and X. Intes, Time-gated perturbation Monte Carlo for whole body functional
imaging in small animals. Optics express, 2009. 17(22): p. 19566-19579.
61. Schweiger, M., S.R. Arridge, and I. Nissilä, Gauss–Newton method for image
reconstruction in diffuse optical tomography. Physics in Medicine & Biology, 2005.
50(10): p. 2365.
62. Bonner, R., et al., Model for photon migration in turbid biological media. JOSA A, 1987.
4(3): p. 423-432.
63. Dehghani, H., et al., Numerical modelling and image reconstruction in diffuse optical
tomography. Philosophical Transactions of the Royal Society A: Mathematical, Physical
and Engineering Sciences, 2009. 367(1900): p. 3073-3093.
64. Arridge, S., et al., Approximation errors and model reduction with an application in
optical diffusion tomography. Inverse problems, 2006. 22(1): p. 175.
65. Dehghani, H., et al., Near infrared optical tomography using NIRFAST: Algorithm for
numerical model and image reconstruction. Communications in numerical methods in
engineering, 2009. 25(6): p. 711-732.
203
66. Pierrat, R., J.-J. Greffet, and R. Carminati, Photon diffusion coefficient in scattering and
absorbing media. JOSA A, 2006. 23(5): p. 1106-1110.
67. Schmitt, J., et al., Multilayer model of photon diffusion in skin. JOSA A, 1990. 7(11): p.
2141-2153.
68. Tromberg, B.J., et al., Non-invasive in vivo characterization of breast tumors using
photon migration spectroscopy. Neoplasia, 2000. 2(1-2): p. 26-40.
69. Prahl, S.A., Light transport in tissue. 1988, The University of Texas at Austin.
70. Wang, L. and S.L. Jacques, Monte Carlo modeling of light transport in multi-layered
tissues in standard C. The University of Texas, MD Anderson Cancer Center, Houston,
1992: p. 4-11.
71. Sharma, M., et al., Verification of a two-layer inverse Monte Carlo absorption model
using multiple source-detector separation diffuse reflectance spectroscopy. Biomedical
optics express, 2014. 5(1): p. 40-53.
72. Wang, L., S.L. Jacques, and L. Zheng, MCML--Monte Carlo modeling of light transport in
multi-layered tissues. Comput Methods Programs Biomed, 1995. 47(2): p. 131-46.
73. Steven Jacques, T.L., Scott Prahl. mcxyz.c, a 3D Monte Carlo simulation of heterogeneous
tissues. 2019; Available from: https://omlc.org/software/mc/mcxyz/index.html.
74. Feng, S., F.-A. Zeng, and B. Chance, Photon migration in the presence of a single defect: a
perturbation analysis. Applied optics, 1995. 34(19): p. 3826-3837.
75. Ahishakiye, E., et al., A survey on deep learning in medical image reconstruction.
Intelligent Medicine, 2021. 1(03): p. 118-127.
76. Varoquaux, G. and V. Cheplygina, Machine learning for medical imaging: methodological
failures and recommendations for the future. NPJ digital medicine, 2022. 5(1): p. 48.
77. Jones, O., et al., Artificial intelligence and machine learning algorithms for early
detection of skin cancer in community and primary care settings: a systematic review.
The Lancet Digital Health, 2022. 4(6): p. e466-e476.
78. Wu, Y., et al., Skin Cancer Classification With Deep Learning: A Systematic Review.
Frontiers in Oncology, 2022. 12.
79. Bhatt, H., et al., State-of-the-art machine learning techniques for melanoma skin cancer
detection and classification: a comprehensive review. Intelligent Medicine, 2022.
204
80. Dildar, M., et al., Skin cancer detection: a review using deep learning techniques.
International journal of environmental research and public health, 2021. 18(10): p.
5479.
81. Jojoa Acosta, M.F., et al., Melanoma diagnosis using deep learning techniques on
dermatoscopic images. BMC Medical Imaging, 2021. 21(1): p. 1-11.
82. Li, S., et al., Distinguish the Value of the Benign Nevus and Melanomas Using Machine
Learning: A Meta-Analysis and Systematic Review. Mediators of Inflammation, 2022.
2022.
83. Farrell, T.J., B.C. Wilson, and M.S. Patterson, The use of a neural network to determine
tissue optical properties from spatially resolved diffuse reflectance measurements.
Physics in medicine & biology, 1992. 37(12): p. 2281.
84. Smith, J.T., et al., Deep learning in macroscopic diffuse optical imaging. Journal of
Biomedical Optics, 2022. 27(2): p. 020901-020901.
85. Zhang, L. and G. Zhang, Brief review on learning-based methods for optical tomography.
Journal of Innovative Optical Health Sciences, 2019. 12(06): p. 1930011.
86. Jalalimanesh, M.H. and M.A. Ansari, Deep learning based image reconstruction for
sparse-view diffuse optical tomography. Waves in Random and Complex Media, 2021: p.
1-17.
87. Yoo, J., et al., Deep learning diffuse optical tomography. IEEE transactions on medical
imaging, 2019. 39(4): p. 877-887.
88. Yooa, J., et al., Deep Learning Can Reverse Photon Migration for Diffuse Optical
Tomography.
89. Zou, Y., et al., Machine learning model with physical constraints for diffuse optical
tomography. Biomedical Optics Express, 2021. 12(9): p. 5720-5735.
90. Cerussi, A.E., et al., Diffuse optical spectroscopic imaging correlates with final
pathological response in breast cancer neoadjuvant chemotherapy. Philosophical
transactions of the royal society A: mathematical, physical and engineering sciences,
2011. 369(1955): p. 4512-4530.
91. Streeter, S.S., S.L. Jacques, and B.W. Pogue, Perspective on diffuse light in tissue:
subsampling photon populations. Journal of Biomedical Optics, 2021. 26(7): p. 070601-
070601.
205
92. Doornbos, R., et al., The determination of in vivo human tissue optical properties and
absolute chromophore concentrations using spatially resolved steady-state diffuse
reflectance spectroscopy. Physics in Medicine & Biology, 1999. 44(4): p. 967.
93. Fan, Y., J. An, and L. Ying, Fast algorithms for integral formulations of steady-state
radiative transfer equation. Journal of Computational Physics, 2019. 380: p. 191-211.
94. Arridge, S.R. and J.C. Hebden, Optical imaging in medicine: II. Modelling and
reconstruction. Physics in Medicine & Biology, 1997. 42(5): p. 841.
95. Schweiger, M., GPU-accelerated finite element method for modelling light transport in
diffuse optical tomography. Journal of Biomedical Imaging, 2011. 2011: p. 10-10.
96. Gkioulekas, I., A. Levin, and T. Zickler. An evaluation of computational imaging
techniques for heterogeneous inverse scattering. in European Conference on Computer
Vision. 2016. Springer.
97. Kavuri, V.C., et al., Sparsity enhanced spatial resolution and depth localization in diffuse
optical tomography. Biomedical Optics Express, 2012. 3(5): p. 943-957.
98. Hayakawa, C.K. and J. Spanier, Perturbation Monte Carlo methods for the solution of
inverse problems, in Monte Carlo and Quasi-Monte Carlo Methods 2002. 2004, Springer.
p. 227-241.
99. Alerstam, E., S. Andersson-Engels, and T. Svensson, White Monte Carlo for time-resolved
photon migration. Journal of biomedical optics, 2008. 13(4): p. 041304.
100. Wang, L.V. and S.L. Jacques, Source of error in calculation of optical diffuse reflectance
from turbid media using diffusion theory. Computer methods and programs in
biomedicine, 2000. 61(3): p. 163-170.
101. Wang, L. and S.L. Jacques, Hybrid model of Monte Carlo simulation and diffusion theory
for light reflectance by turbid media. JOSA A, 1993. 10(8): p. 1746-1752.
102. Welch, A.J. and M.J. Van Gemert, Optical-thermal response of laser-irradiated tissue.
Vol. 2. 2011: Springer.
103. Fisher, N.M., et al., Breslow depth of cutaneous melanoma: impact of factors related to
surveillance of the skin, including prior skin biopsies and family history of melanoma.
Journal of the American Academy of Dermatology, 2005. 53(3): p. 393-406.
104. Ayers, F., et al. Fabrication and characterization of silicone-based tissue phantoms with
tunable optical properties in the visible and near infrared domain. in Design and
206
Performance Validation of Phantoms Used in Conjunction with Optical Measurements of
Tissue. 2008. International Society for Optics and Photonics.
105. Wang, Z., E.P. Simoncelli, and A.C. Bovik. Multiscale structural similarity for image
quality assessment. in The Thrity-Seventh Asilomar Conference on Signals, Systems &
Computers, 2003. 2003. Ieee.
106. Zhong, X., X. Wen, and D. Zhu, Lookup-table-based inverse model for human skin
reflectance spectroscopy: two-layered Monte Carlo simulations and experiments. Optics
express, 2014. 22(2): p. 1852-1864.
107. Doulgerakis-Kontoudis, M., et al., Toward real-time diffuse optical tomography:
accelerating light propagation modeling employing parallel computing on GPU and CPU.
Journal of biomedical optics, 2017. 22(12): p. 125001.
108. Pei, Y., H.L. Graber, and R.L. Barbour, Normalized–constraint algorithm for minimizing
inter–parameter crosstalk in DC optical tomography. Optics express, 2001. 9(2): p. 97-
109.
109. Bogoch, I.I., et al., Mobile phone and handheld microscopes for public health
applications. The Lancet Public Health, 2017. 2(8): p. e355.
110. Tian, C., et al., Deep learning on image denoising: An overview. Neural Networks, 2020.
131: p. 251-275.
111. Yang, B., et al. 3d object reconstruction from a single depth view with adversarial
learning. in Proceedings of the IEEE international conference on computer vision
workshops. 2017.
112. Samavati, T. and M. Soryani, Deep learning-based 3D reconstruction: a survey. Artificial
Intelligence Review, 2023. 56(9): p. 9175-9219.
113. Jin, Y., D. Jiang, and M. Cai, 3d reconstruction using deep learning: a survey.
Communications in Information and Systems, 2020. 20(4): p. 389-413.
114. Henzler, P., N.J. Mitra, and T. Ritschel. Escaping plato's cave: 3d shape from adversarial
rendering. in Proceedings of the IEEE/CVF International Conference on Computer Vision.
2019.
115. Lunz, S., et al., Inverse graphics gan: Learning to generate 3d shapes from unstructured
2d data. arXiv preprint arXiv:2002.12674, 2020.
207
116. Sitzmann, V., et al. Deepvoxels: Learning persistent 3d feature embeddings. in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
2019.
117. Tschandl, P., C. Rosendahl, and H. Kittler, The HAM10000 dataset, a large collection of
multi-source dermatoscopic images of common pigmented skin lesions. Scientific data,
2018. 5(1): p. 1-9.
118. Gutman, D., et al., Skin lesion analysis toward melanoma detection: A challenge at the
international symposium on biomedical imaging (ISBI) 2016, hosted by the international
skin imaging collaboration (ISIC). arXiv preprint arXiv:1605.01397, 2016.
119. Yousefi, J., Image binarization using Otsu thresholding algorithm. Ontario, Canada:
University of Guelph, 2011. 10.
120. El-Shenawee, M. and E.L. Miller, Spherical harmonics microwave algorithm for shape
and location reconstruction of breast cancer tumor. IEEE Transactions on Medical
Imaging, 2006. 25(10): p. 1258-1271.
121. Limkin, E.J., et al., The complexity of tumor shape, spiculatedness, correlates with tumor
radiomic shape features. Scientific reports, 2019. 9(1): p. 4329.
122. Goldberg-Zimring, D., et al., Statistical validation of brain tumor shape approximation
via spherical harmonics for image-guided neurosurgery. Academic radiology, 2005.
12(4): p. 459-466.
123. Petrov, G.I., et al., Human tissue color as viewed in high dynamic range optical spectral
transmission measurements. Biomedical optics express, 2012. 3(9): p. 2154-2161.
124. Paulsen, K.D. and H. Jiang, Spatially varying optical property reconstruction using a finite
element diffusion equation approximation. Medical Physics, 1995. 22(6): p. 691-701.
125. Bae, H.-J., et al., A Perlin noise-based augmentation strategy for deep learning with
small data samples of HRCT images. Scientific reports, 2018. 8(1): p. 17687.
126. Barufaldi, B., et al., Computational breast anatomy simulation using multi-scale Perlin
noise. IEEE transactions on medical imaging, 2021. 40(12): p. 3436-3445.
127. Imambi, S., K.B. Prakash, and G. Kanagachidambaresan, PyTorch. Programming with
TensorFlow: Solution for Edge Computing Applications, 2021: p. 87-104.
128. Karras, T., et al., Progressive growing of gans for improved quality, stability, and
variation. arXiv preprint arXiv:1710.10196, 2017.
208
129. Kim, W., A. Kanezaki, and M. Tanaka, Unsupervised learning of image segmentation
based on differentiable feature clustering. IEEE Transactions on Image Processing, 2020.
29: p. 8055-8068.
130. Hou, L., et al. Patch-based convolutional neural network for whole slide tissue image
classification. in Proceedings of the IEEE conference on computer vision and pattern
recognition. 2016.
131. Zhu, J.-Y., et al. Unpaired image-to-image translation using cycle-consistent adversarial
networks. in Proceedings of the IEEE international conference on computer vision. 2017.
132. Ikuta, M. and J. Zhang, A deep convolutional gated recurrent unit for CT image
reconstruction. IEEE Transactions on Neural Networks and Learning Systems, 2022.
133. Sun, J., et al. Neuralrecon: Real-time coherent 3d reconstruction from monocular video.
in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
2021.
134. Shim, D. and H.J. Kim, CD-Depth: Unsupervised Domain Adaptation for Depth Estimation
via Cross Domain Integration. 2022.
135. Menghani, G. and S. Ravi, Learning from a teacher using unlabeled data. arXiv preprint
arXiv:1911.05275, 2019.
136. Chen, H., B. Lagadec, and F. Bremond. Enhancing diversity in teacher-student networks
via asymmetric branches for unsupervised person re-identification. in Proceedings of the
IEEE/CVF winter conference on applications of computer vision. 2021.
137. Slutsky, J.B. and S.W. Fosko, Complications in Mohs surgery. Mohs and Cutaneous
Surgery: Maximizing Aesthetic Outcomes, 2014: p. 55-85.
138. Flores, E.S., et al., Intraoperative imaging during Mohs surgery with reflectance confocal
microscopy: initial clinical experience. Journal of biomedical optics, 2015. 20(6): p.
061103-061103.
139. Mobley, J., T. Vo-Dinh, and V.V. Tuchin, Optical properties of tissue. Biomedical
photonics handbook, 2003. 2: p. 1-2.
140. Dhawan, A.P., B. D'Alessandro, and X. Fu, Optical imaging modalities for biomedical
applications. IEEE Reviews in Biomedical Engineering, 2010. 3: p. 69-92.
141. Marugame, T. and M.-J. Zhang, Comparison of time trends in melanoma of skin cancer
mortality (1990–2006) between countries based on the WHO mortality database.
Japanese journal of clinical oncology, 2010. 40(7): p. 710-710.
209
142. Cancer, I.A.f.R.o. Global Cancer Observatory: Cancer Today - Data Visualization Maps.
2023; Available from: https://gco.iarc.who.int/today/en/dataviz/mapsprevalence?mode=population&options_indicator=%5Bobject%20Object%5D_%5Bobject
%20Object%5D&types=2&cancers=16.
143. Administration, U.S.F.a.D. Design Considerations for Pivotal Clinical Investigations for
Medical Devices: Guidance for Industry, Clinical Investigators, Institutional Review
Boards, and FDA Staff. 2013
144. Hammernik, K., et al., Learning a variational network for reconstruction of accelerated
MRI data. Magnetic resonance in medicine, 2018. 79(6): p. 3055-3071.
145. Lam, C., et al., Automated detection of diabetic retinopathy using deep learning. AMIA
summits on translational science proceedings, 2018. 2018: p. 147.
146. Dubuisson, S. The computation of the Bhattacharyya distance between histograms
without histograms. in 2010 2nd International Conference on Image Processing Theory,
Tools and Applications. 2010. IEEE.
147. Wang, Z., et al., Image quality assessment: from error visibility to structural similarity.
IEEE transactions on image processing, 2004. 13(4): p. 600-612.
210
Appendices
Quantitative measurement for the synthetic phantoms
The normalization of reconstructed results from both 3D-mDOI and FEM is paramount to
ensure consistent and unbiased quantitative measurements when comparing intensity-based
distances and overall image quality against a ground truth. To facilitate an impartial
comparison across various test scenarios, the reconstructions were scaled to a [0,1] range,
while the ground truth was rendered into a binary function: the feature area was designated a
value of 1 and the background, a value of 0. Employing these standardized outputs, several
metrics were computed: Root Mean Square Error, which quantifies the deviation between a
model's reconstructions and actual simulated ground truth; the Bhattacharyya distance, a
measure of the likeness between two histgram’s distributions; image contrast, which evaluates
the disparity in optical coefficients that makes features distinguishable from base region; and
the Multi-scale structural similarity index, evaluating image similarity alligned to the human
visual perception.
The meticulous segmentation process plays a crucial role in ensuring a accurate and
consistent assessment of segmentation errors between 3D-mDOI and FEM. To embark on this,
manual segmentation of features was carried out from the reconstructions based on specific
guidelines. Primarily, if a feature was perceptible from the surface, its shape in the initial
reconstruction layer was expected to correspond to that of the ground truth. This stipulation
draws from the understanding that an image modality other than the ground truth can provide
insights into the feature's surface structure, thereby setting a standard for manual
segmentation. Furthermore, any chosen feature should be clearly distinguishable within the
211
phantom, and its total volume should not surpass half of the phantom's entirety. This not only
sets a boundary for the reconstructed volume of the feature but also highlights the significance
of establishing an appropriate scale for the multisite image acquisition platform. A feature
should be ensconced with ample volume on all sides, ensuring that the 3D photon distribution
effectively engulfs the feature from every direction, thereby facilitating a superior quality
reconstruction. Following this, the manually segmented features were transformed into a
binary format, akin to the processing of the ground truth. This binary representation sets the
stage for calculating metrics such as the depth of the reconstructed feature, segmentation
specificity, sensitivity, and the Dice coefficient, which gauges the congruence between
segmented features from reconstructions and the ground truth.
Root Mean Square Error (RMSE)
RMSE is a frequently used metric to quantify the difference between a set of predicted
values and the actual values they are meant to predict. �0,?@A"$. and �B$ represent the pixel
value of the reconstruction and ground truth, respectively. N is the total pixel number. RMSE
measures the average of the pixel-wise error.
���� = O∑ (�789:;&<. − �><) @ ?
.
� ( 1 )
Bhattacharyya Distance
The Bhattacharyya distance measures the similarity between two probability distributions,
especially for classes having similar mean values but different standard deviations. We
computed the normalized histograms of the reconstruction and ground truth and the
212
Bhattacharyya distance [146] with respect to the two histograms was calculated. The
Bhattacharyya distance of two histograms � and � over the same intensity range X is defined as
the following.
�1(�, �) = −ln (VW�(�)�(�)
ABC
) ( 2 )
There is no ideal range for the Bhattacharyya distance. Generally, the lower the value, the higher is
the similarity of the histograms.
Image Contrast
While there are numerous methods to quantify the image contrast of a feature, for our
study, we adopted a strategy that uses percentage contrast, calculated based on the disparity
between the highest and lowest intensity values present in the reconstruction. This choice
stemmed from an observation: the optical coefficients of feature reconstructions in 3D-mDOI
exhibited a variance (Fig. 3C), an artifact attributed to the illumination patterns. As a result, the
distribution of optical coefficients for both features and the background closely followed a
normal distribution in the reconstruction's histogram. Recognizing this, we refrained from
merely contrasting the peak intensity value against the lowest. Aiming for a more reliable and
representative metric, we utilized a customized approach: the average of the top 20%
intensities (�C,!$.
) for the feature and the average of the bottom 20% intensities (�<) for the
background noise.
�������� = �D8-<. − �1
�1
( 3 )
Feature’s Reconstructed Depth
213
The depth of the reconstructed feature was determined from the binary function obtained
from the segmented results. Given that the synthetic features were shaped as half circles, we
utilized the deepest 1% of pixel depths within the feature for an average, providing a robust
estimation of its size. To validate this depth, the segmented feature matrix was visualized for
verification. Notably, each computed depth was based solely on a single manually segmented
feature. Thus, the resulting depth could vary with changes in the segmentation.
Specificity, Sensitivity and Dice coefficient
The specificity, sensitivity, and Dice coefficient are commonly used to quantify the
performance of segmentation. Specificity refers to the true negative prevalence for a feature,
while the sensitivity corresponds to the true positive rate for a feature.
����������� = �><_F> ∩ �789:;&<_F>
�><_F>
( 4 )
����������� = �><_G8-< ∩ �789:;&<_G8-<
�><_G8-<
( 5 )
Where the �B$_+B, �0,?@A"$_+B , �B$_-,!$ ,and �0,?@A"$_-,!$ denote the volume of the
background in the ground truth condition, the volume of the background in reconstructed
phantom, the volume of the feature in the ground truth and the volume of the feature in
reconstructed phantom, respectively. The Dice coefficient jointly gauges the similarity of two
samples.
���� ����������� = 2 ∗ �><_G8-< ∩ �789:;&<_G8-<
�><_G8-< + �789:;&<_G8-<
( 6 )
The reconstruction quality is better when all these three parameters are closer to 1.
214
Multi-scale Structural Similarity (MSSSIM)
The Structural Similarity Index (SSIM) was widely used to evaluate the quality of images,
particularly in comparing an original to its degraded counterpart. This metric encompassed
three primary attributes: luminance, which assessed the brightness; contrast, which highlighted
variations in luminance or color, making distinctions in an image; and structure, which focused
on inherent patterns and textures, providing the image its unique character. These elements
collectively influence the viewer's perception and interpretation of an image.
In our study, we observed that the shape of the 3D photon distribution, effectively acting as
the point spread function of the system, caused the 3D-mDOI segmentation to cover a more
expansive area than what was seen in the ground truth. This discrepancy led to a marked
reduction in SSIM scores. To address this challenge, we turned to the Multi-scale SSIM
(MSSSIM), a refined version of SSIM. MSSSIM introduced multiple levels of sub-sampling,
allowing for a better representation of how the human brain perceives image quality [147].
Adapting techniques from literature [105],we applied MSSSIM to each Z slice, comparing the
ground truth with the reconstructed phantom. The averaged MSSSIM values over the z slices
were presented in Table 2.2, showcasing our dedication to producing reconstructions that were
not only precise but also visually authentic.
Consider two signals � and � arising from the same 3d coordinate of the ground truth
volume and of a 3D reconstruction, in this work from 3D-mDOI or FEM. Let �E, �F represent
the mean of �, � respectively and, �E
* , �F
* represent the variance of �, � respectively. �EF
denotes the covariance between � and �. MSSSIM is fundamentally based on the comparison
of measures of luminance �(�, �), contrast �(�, �) and structure �(�, �):
215
������(�, �) = [�H(�, �)]
IH ∙g[�J(�, �)]
K"
H
JL.
[�J(�, �)]
M" ( 7 )
Where �(�, �) = *('((G3'
('$G(($G3' , �(�, �) = *H'H(G3*
H'$GH($G3*
and �(�, �) = H'(G3I
H'H(G3I
are formulated
based on the luminance estimation �E, luminance contrast �E
* , indication of structural
similarity �EF and three small constants �' = 6.5, �* = 58.5, �I = 29.2. �, �, � are
parameters that define the relative importance of the luminance, contrast and structure. In
this work, we assigned a value of 1 to the relative importance parameters: �2, together with
�7,�7 at all scales. We compare contrast and structure between � and � at different scale
� from 1 to �. For the luminance, we consider only the similarity at scale �.
216
Network architectures and parameter summary
Reconstruction module: U-Net
U-Net Architect (
(c1): Conv3d(22, 8, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(2, 2, 2))
(b1): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n1): LeakyReLU(negative_slope=0.2)
(c2): Conv3d(8, 16, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(2, 2, 2))
(b2): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n2): LeakyReLU(negative_slope=0.2)
(c3): ConvTranspose3d(16, 8, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(2, 2, 2), output_padding=(1, 1, 0))
(b3): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n3): LeakyReLU(negative_slope=0.2)
(c4): ConvTranspose3d(16, 2, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(2, 2, 2))
(b4): BatchNorm3d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n4): ReLU()
)
U-Net Summary (
Trainable parameters: 34566
)
Reconstruction module: Encoder-decoder
Encoder Architect (
(0): Conv2d(22, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): LeakyReLU(negative_slope=0.2)
(3): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): LeakyReLU(negative_slope=0.2)
(6): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): LeakyReLU(negative_slope=0.2)
(9): Flatten(start_dim=1, end_dim=-1)
217
(10): Linear(in_features=8960, out_features=200, bias=True)
)
Encoder Summary (
Trainable parameters: 2,471,432
)
Decoder Architect (
(0): ConvTranspose3d(200, 256, kernel_size=(5, 7, 1), stride=(1, 1, 1))
(1): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): ConvTranspose3d(256, 128, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1), output_padding=(1, 1, 0))
(4): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU()
(6): ConvTranspose3d(128, 64, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1), output_padding=(0, 0, 1))
(7): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU()
(9): ConvTranspose3d(64, 2, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
(10): ReLU()
)
Decoder Summary (
Trainable parameters: 4,422,978
)
Reconstruction module: mDOI-Net
RTE1DNN(
(linear1): Linear(in_features=28, out_features=14, bias=True)
(act1): ReLU()
(linear2): Linear(in_features=14, out_features=7, bias=True)
(act2): ReLU()
(linear3): Linear(in_features=7, out_features=2, bias=True)
(act3): Softmax(dim=None)
)
RTE1DNN Summary (
Trainable parameters: 527
)
RTEVoxel_Reconst Architect (
(Segmentation): Sequential (
(conv1): Conv3d(2, 2, kernel_size=(1, 1, 1), stride=(1, 1, 1))
(bn1): BatchNorm3d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(softmax): Softmax(dim=1)
)
(MLP): Sequential(
(0): Linear(in_features=5, out_features=64, bias=True)
218
(1): ReLU()
(2): Linear(in_features=64, out_features=128, bias=True)
(3): ReLU()
(4): Linear(in_features=128, out_features=64, bias=True)
(5): ReLU()
(6): Linear(in_features=64, out_features=16, bias=True)
(7): ReLU()
(8): Linear(in_features=16, out_features=2, bias=True)
(9): Softmax(dim=None)
)
(3D-Unet): Sequential (
(c1): Conv3d(5, 8, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(2, 2, 2))
(b1): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n1): LeakyReLU(negative_slope=0.2)
(c2): Conv3d(8, 16, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(2, 2, 2))
(b2): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n2): LeakyReLU(negative_slope=0.2)
(c3): ConvTranspose3d(16, 8, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(2, 2, 2), output_padding=(1, 1, 0))
(b3): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n3): LeakyReLU(negative_slope=0.2)
(c4): ConvTranspose3d(16, 2, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(2, 2, 2))
(n4): ReLU()
))
RTEVoxel_Reconst Summary (
Trainable parameters: 39134
)
RTEVoxel_GRU Architect (
IntegrationNet(
(new_integration): Sequential(
(0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
(1): Conv3d(5, 2, kernel_size=(3, 3, 3), stride=(1, 1, 1))
(2): Dropout2d(p=0.2, inplace=False)
)
(old_integration): Sequential(
(0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
(1): Conv3d(2, 2, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
(2): Dropout2d(p=0.2, inplace=False)
)
(update_old_net): Sequential(
(0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
(1): Conv3d(2, 2, kernel_size=(3, 3, 3), stride=(1, 1, 1))
)
(update_new_net): Sequential(
(0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
(1): Conv3d(5, 2, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
)
(reset_old_net): Sequential(
(0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
(1): Conv3d(2, 2, kernel_size=(3, 3, 3), stride=(1, 1, 1))
)
(reset_new_net): Sequential(
(0): ReplicationPad3d((1, 1, 1, 1, 1, 1))
(1): Conv3d(5, 2, kernel_size=(3, 3, 3), stride=(1, 1, 1), bias=False)
)
219
(sigmoid): Sigmoid()
(relu): ReLU()
))
RTEVoxel_GRU Summary (
Trainable parameters: 1140
)
Projection Module: Encoder-decoder
Encoder Architect (
(0): Conv3d(2, 64, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
(1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): LeakyReLU(negative_slope=0.2)
(3): Conv3d(64, 128, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
(4): BatchNorm3d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): LeakyReLU(negative_slope=0.2)
(6): Conv3d(128, 256, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
(7): BatchNorm3d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): LeakyReLU(negative_slope=0.2)
(9): Flatten(start_dim=1, end_dim=-1)
(10): Linear(in_features=8960, out_features=200, bias=True)
)
Encoder Summary (
Trainable parameters: 4,423,176
)
Decoder Architect (
(0): ConvTranspose2d(200, 256, kernel_size=(4, 4), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU()
(6): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU()
(9): ConvTranspose2d(64, 21, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
(10): ReLU()
)
Decoder Summary (
Trainable parameters: 1,497,429
220
)
Projection Module: mDOI-Net
RTEVoxel_Project Architect (
(MLP): Sequential(
(0): Linear(in_features=4, out_features=64, bias=True)
(1): ReLU()
(2): Linear(in_features=64, out_features=16, bias=True)
(3): ReLU()
(4): Linear(in_features=16, out_features=2, bias=True)
(5): ReLU()
)
(2D U-Net): Sequential(
(c1): Conv2d(3, 4, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2))
(b1): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n1): LeakyReLU(negative_slope=0.2)
(c2): Conv2d(4, 8, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2))
(b2): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n2): LeakyReLU(negative_slope=0.2)
(c3): ConvTranspose2d(8, 4, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2), output_padding=(1, 1))
(b3): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(n3): LeakyReLU(negative_slope=0.2)
(c4): ConvTranspose2d(8, 1, kernel_size=(4, 4), stride=(2, 2), padding=(2, 2), output_padding=(1, 1))
(n4): ReLU()
))
RTEVoxel_Project Summary (
Trainable parameters: 2787
)
GAN Module: 2D Discriminator
2D Discriminator Architect (
(model): Sequential(
(0): Conv2d(21, 8, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): LeakyReLU(negative_slope=0.2)
(3): Conv2d(8, 16, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
221
(5): LeakyReLU(negative_slope=0.2)
(6): Conv2d(16, 32, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(7): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): LeakyReLU(negative_slope=0.2)
(9): Conv2d(32, 1, kernel_size=(4, 4), stride=(1, 1))
))
2D Discriminator Summary (
Trainable parameters: 13609
)
GAN Module: 3D Discriminator
3D Discriminator Architect (
(model): Sequential(
(0): Conv3d(2, 8, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
(1): BatchNorm3d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): LeakyReLU(negative_slope=0.2)
(3): Conv3d(8, 16, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
(4): BatchNorm3d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): LeakyReLU(negative_slope=0.2)
(6): Conv3d(16, 32, kernel_size=(4, 4, 4), stride=(2, 2, 2), padding=(1, 1, 1))
(7): BatchNorm3d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): LeakyReLU(negative_slope=0.2)
(9): Flatten(start_dim=1, end_dim=-1)
(10): Linear(in_features=8960, out_features=128, bias=True)
(11): Linear(in_features=128, out_features=1, bias=True)
)
))
3D Discriminator Summary (
222
Trainable parameters: 1189289
)
Abstract (if available)
Abstract
Conventional light imaging in living tissues is limited to depths under 100um by the significant tissue scattering. Consequently, few commercial imaging devices can image tissue lesions beneath the surface, or measure their invasion depth, critical in dermatology. We present 3D-Multisite Diffused Optical Imaging (3D-mDOI) a novel approach that combines photon migration techniques from diffuse optical tomography, with automated controls and image analysis techniques for estimating lesion's depth via its optical coefficients. 3D-mDOl is a non-invasive, low-cost, fast and contact-free instrument capable of estimating subcutaneous tissue structures volumes through multisite-acquisition of reemitted light diffusion on the sample surface. It offers rapid estimation of Breslow depth, essential for staging melanoma.
Building upon 3D-mDOI, we designed an improved machine learning solution called multiplexed Diffused Optical Imaging Network (mDOI-Net) to estimate the depth of tissue lesions. mDOI-Net has an interpretable structure and provides depth information on tissue lesion depth in diverse circumstances. Its training is performed with customized synthetic dermatology datasets that we generate from publicly available datasets, ensuring data diversity and adaptability. The network reconstructs the 3D optical properties of the tissue from 2D diffuse images by introducing physical modeling of steady state diffuse optical imaging to the network. Our solution is posed to be more flexible, interpretable and predictable than current end-to-end, black-box neural network benchmarks.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A bridge program from jail to community behavioral health treatment
PDF
A bridge program from jail to community behavioral health treatment [prototype showcase]
PDF
A bridge program from jail to community behavioral health treatment [summary]
PDF
A bridge program from jail to community behavioral health treatment [prototype high-fidelity]
PDF
Extending multiplexing capabilities with lifetime and hyperspectral fluorescence imaging
PDF
Semantic structure in understanding and generation of the 3D world
PDF
Metasurfaces in 3D applications: multiscale stereolithography and inverse design of diffractive optical elements for structured light
PDF
Multiplexing live 5d imaging with multispectral fluorescence: Advanced unmixing through simulation and machine learning
PDF
Diffusion MRI of the human brain: signal modeling and quantitative analysis
PDF
Hyperspectral phasor for multiplexed fluorescence microscopy and autofluorescence-based pathologic diagnosis
PDF
Reconstructing 3D reconstruction: a graphical taxonomy of current techniques
PDF
Intricate microfluidic devices for biopharmaceutical processes: forging ahead with additive manufacturing
PDF
Reconfigurable high speed optical signal processing for optical communication and modulation format manipulation
PDF
Diffusion tensor tractography: visualization and quantitation with applications to Alzheimer disease and traumatic brain injury
PDF
Reconfigurable and flexible high-speed optical signal processing and spectrally shared optical subsystems
PDF
Analytical tools for complex multidimensional biological imaging data
PDF
Fast flexible dynamic three-dimensional magnetic resonance imaging
PDF
Point-based representations for 3D perception and reconstruction
PDF
A treatise on cascaded computer generated holograms
PDF
Autostereoscopic 3D diplay rendering from stereo sequences
Asset Metadata
Creator
Cai, Shanshan
(author)
Core Title
Democratizing optical biopsy with diffuse optical imaging
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Biomedical Engineering
Degree Conferral Date
2024-08
Publication Date
08/15/2024
Defense Date
05/30/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D reconstruction,diffuse optical imaging,low-cost,non-contact,structured illumination
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Fraser, Scott (
committee chair
), Applegate, Brian (
committee member
), Cutrale, Francesco (
committee member
), Zavaleta, Cristina (
committee member
)
Creator Email
33zeze77@gmail.com,cais@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC1139991ZE
Unique identifier
UC1139991ZE
Identifier
etd-CaiShansha-13397.pdf (filename)
Legacy Identifier
etd-CaiShansha-13397
Document Type
Dissertation
Format
theses (aat)
Rights
Cai, Shanshan
Internet Media Type
application/pdf
Type
texts
Source
20240819-usctheses-batch-1199
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
3D reconstruction
diffuse optical imaging
low-cost
non-contact
structured illumination