Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Neural network integration of multiscale and multimodal cell imaging using semantic parts
(USC Thesis Other)
Neural network integration of multiscale and multimodal cell imaging using semantic parts
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NEURAL NETWORK INTEGRATION OF MULTISCALE AND MULTIMODAL CELL
IMAGING USING SEMANTIC PARTS
by
John Paul Francis
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
December 2023
TABLE OF CONTENTS
List of Tables.................................................................................................................................................iii
List of Figures............................................................................................................................................... iv
Chapter One: Introduction..............................................................................................................................1
Thesis Statement......................................................................................................................................1
Motivation............................................................................................................................................... 1
Approach................................................................................................................................................. 3
Contributions of this Work...................................................................................................................... 5
Dissertation Organization........................................................................................................................ 6
Chapter Two: Related Work...........................................................................................................................9
Introduction............................................................................................................................................. 9
Integrative Modeling Definition and Parameterization in Biomedical Imaging................................... 10
Nonparametric Biomedical Image Reconstruction................................................................................11
Multimodal Deep Learning................................................................................................................... 15
Nonparametric Biomedical Image Registration.................................................................................... 18
Limitations of Nonparametric Techniques for Biomedical Image Integration......................................20
Semantic Segmentation in Biomedical Imaging................................................................................... 21
Semantic Image Synthesis..................................................................................................................... 23
Summary................................................................................................................................................24
Chapter Three: Expanded Problem Statement............................................................................................. 26
The Integrative Multiscale Cell Modeling Problem.............................................................................. 30
Chapter Four: Application and Analysis of Results.....................................................................................35
Computing Semantic Feature Maps...................................................................................................... 35
Isometric Alignment of Semantic Feature Maps...................................................................................47
Integrated Modeling Results..................................................................................................................49
Summary................................................................................................................................................56
Chapter Five: Workflow Implementation.................................................................................................... 58
Introduction........................................................................................................................................... 59
Related Work......................................................................................................................................... 63
Approach............................................................................................................................................... 72
Implementation......................................................................................................................................75
Application............................................................................................................................................ 79
Summary................................................................................................................................................83
Chapter Six: Limitations and Future Work.................................................................................................. 87
Contributions......................................................................................................................................... 87
Application............................................................................................................................................ 90
Future Work........................................................................................................................................... 93
Summary................................................................................................................................................99
References.................................................................................................................................................. 100
LIST OF TABLES
Table 1: Fluorescence Microscopy Auto-segmentation Results....................................................47
Table 2: X-ray Tomography Baseline Results............................................................................... 52
Table 3: Fluorescence Microscopy Baseline Results.....................................................................52
Table 4: Multimodal Transfer Results........................................................................................... 53
Table 5: Resampled Multimodal Integration Results ....................................................................55
LIST OF FIGURES
Figure 1: Deep Generative Model for Single Cells......................................................................................11
Figure 2: A joint deep learning model to recover information and reduce artifacts in missing-wedge
sinograms for electron tomography and beyond..........................................................................................14
Figure 3: Playing Games in the Dark: An approach for cross-modality transfer in reinforcement learning -
same latent space for different modalities....................................................................................................15
Figure 4: Multi-task multi-sensor fusion for 3D object detection................................................................17
Figure 5: Multimodal registration images ...................................................................................................19
Figure 6: Neural Network Segmentation of Cell Ultrastructure Using Incomplete Annotation..................21
Figure 7: You Only Need Adversarial Supervision For Semantic Image Synthesis....................................23
Figure 8: Semantic Cell, Integrated Multiscale Cell Modeling Framework................................................33
Figure 9: X-ray Tomography Auto-segmentation Results........................................................................... 44
Figure 10. Fluorescence Microscopy Input Image Examples......................................................................45
Figure 11. Fluorescence Microscopy Auto-segmentation Results...............................................................45
Figure 12. Instance segmentation illustration.............................................................................................. 48
Figure 13. Fluorescence Microscopy Instance Segmentation Example.......................................................48
Figure 14. Multimodal Integration Results.................................................................................................. 53
Figure 15. Resampled Multimodal Integration Results............................................................................... 55
Figure 16. FAIR workflow diagram.............................................................................................................73
Figure 17. Workflow data structure..............................................................................................................74
Figure 18. DERIVA Ecosystem................................................................................................................... 77
Figure 19. Variational structural prediction results based on treatment condition.......................................96
CHAPTER ONE: INTRODUCTION
A. THESIS STATEMENT
The structural modeling of cells can be accomplished by integrating images of
cellular morphology from multiple scales and modalities using a parts based approach. In
this thesis, we demonstrate a method for combining the statistical distribution of
structures from x-ray tomography and fluorescence microscopy using neural networks to
predict the localization of high resolution components in low resolution modalities by
using the single cell as a shared unit of transfer.
B. MOTIVATION
The field of neural networks as applied to statistical modeling has advanced
rapidly in the last decade, but there have always been problem domains in which
information spanning multiple scales and modalities are not easily combined and
modeled through traditional input/output machines. Neural network model designs that
allow for the inference of one modality from another modality (such as in image
captioning) and the extrapolation of high resolution from low resolution have been
developed, but there have not been many experiments which seek to infer a high
resolution representation of one modality using another modality of a different scale.
Such a description may challenge our imagination, but occurs frequently in the
charting of unsurveyed or unsurvey-able areas. For instance, many might be familiar with
1
the experience of looking at a map, reconciling it with your vision of the street in front of
you, and imagining the rest of the city which you have yet to observe. This representation
of all the city simultaneously from your perspective can, of course, never be observed in
fact, but it can be envisioned through complex methods, and we often do perform such
thought experiments with varying success to surmise which blocks might have more
people on the street, for instance [Chen, B.].
The experience of biomedical imaging is no different.
In biomedical image visualization, the ultimate goal is delivery of the highest
resolution image possible. Unfortunately, that objective is physically limited by the
occlusion of large structures in dense microscopic space and the fragility of smaller scale
biomedical structures under high energy photon exposure. The observational
requirements of practitioners are not satisfied currently, nor does it appear they will be for
some time. It is therefore of utmost importance that information captured from different
imaging scales be integrated effectively. What is more, it is desirable that this integration
results in objective multi-scale and multi-modal representations that can be shared and
validated against other experimental results. That method of integration is the dual of
multi-scale simulation, that is a primary concern in the design of therapeutic treatments.
The desire for multi-scale and multi-modal biomedical models has led to the
development of several integrative approaches to combine biomedical imaging
2
information. However, these methods have so far been limited to idealizations of large
scale containers and even manual illustration. General techniques for modeling complex
multiscale structures are needed.
Imaging across scales has been described as a grand challenge. The most effective
answer thus far has been to improve the fidelity of single modality techniques or to
correlate multi-modality experiments so that images and structural components directly
overlap. In biomedical imaging experiments where components in images are
semantically labeled for identification, most analysts reduce these labeled representations
to single measurements such as overall volume. Correcting for affine scale, these label
maps are identical between modalities and contain some shared semantic parts, a fact
which can be exploited to relate the modalities for convolutional neural network
processing and synthesize high-resolution structural representations and components
across scales.
C. APPROACH
The above efforts to integrate biomedical imaging modalities across scales have,
until now, relied on highly manual and parameterized differential equation models. The
research described in this dissertation investigated the development of a more automated
neural network method for general-purpose multiscale and multimodal image integration.
3
The results show it is possible to integrate multiple modalities using
representations of semantic parts, and to predict the locations of these semantic parts
across modalities, using neural networks. These neural networks can produce high
resolution localization predictions across modalities and are capable of modeling more
complex distributions than parameterized methods.
The integration was performed by abstracting the imaging data from each modality
into semantic pixel maps which were effectively identical. Similarly, differences in scale
were unified through the shared abstraction of single cells which could be correlated as
separate instances. The prediction of parts was then achieved by filtering these single cell
semantic label maps to only their shared parts and holding out the target predictions for
validation and testing.
Previous forays into multimodal and multiscale integration have been designed for
data which is directly aligned. No methods anticipate samples from different metric
spaces nor allow for partial or missing information between modality spectra. This
problem is further complicated by the amount of data necessary for deep learning
methods and the need to create many sample inputs in an automated fashion.
Accordingly, a primary effort of this thesis was the creation of automated methods
to produce semantic segmentation maps that would be consistent across modalities and
4
scales. These methods included semantic segmentation methods for both modalities
individually as well as instance segmentation methods which were all accomplished
through the training and deployment of neural networks across a curated data collection.
Given these methods for generating consistent semantic representations across scales and
modalities, it then becomes possible to formulate the task of predicting features between
them. One goal of this dissertation was the use of 3D neural network methods, but such
methods required larger data samples than we could attain. This encouraged the
development of methods for high-dimensional, low-sample size regimes that often
characterize deep learning problems in the biomedical domain.
While this method is still manual, requiring a tailored pipeline to create
semantically aligned input samples, it is general enough to be applied to other domains,
especially in biomedical imaging where semantic segmentation is common in practice.
The effectiveness of this integration method is demonstrated by a proof-of-concept
implementation.
D. CONTRIBUTIONS OF THIS WORK
This dissertation claims the following contributions to the state of the art:
- computational composition of high-resolution multiscale biomedical images
5
- prediction of components from unseen spectra across imaging modalities
- method for the combination of heterogeneous label sets in neural network training
- general framework for the integration of data sources from different resolutions
and scales using shared semantic labels
- curated dataset of x-ray tomography and fluorescence microscopy images for the
INS-1E cell type under different glucose treatments and timepoints
E. DISSERTATION ORGANIZATION
The remainder of this dissertation is organized as follows:
Chapter II: To provide the reader with a background in the technical areas of
this dissertation, this introduction chapter is immediately followed by a summary analysis
of related work.
Chapter III: This chapter presents a more formal statement of the multi-scale,
multi-modal integration problem. This includes the description of a typical instance of the
single cell modeling problem, and defines a novel technique for reaching the solution of
that problem: the integrated semantic cell. That single cell modeling problem is then
6
extended by a variety of complicating factors to achieve the necessary inputs, in order to
demonstrate the general applicability of the method.
Chapter IV: This chapter details the technical application and results of the semantic cell
method for the multiscale cell modeling problem. We will discuss how semantic maps are
computed and provided as input to the integration method, as well as the dramatic effect
of automated segmentation methods in completing this step. With the computation of
semantic inputs achieved, we will discuss the application of instance segmentation and
the design and deployment of transfer functions to achieve isometric alignment based
upon the single cell. Chapter IV will then finish with a description of the integrated
multiscale model training process and an analysis of its results on the structural
prediction task.
Chapter V: Chapter V demonstrates an automated system in which FAIR data inputs can
be processed via FAIR processes into FAIR data outputs in a “full journey”. This system
introduces the concept of FAIR workflows which are dynamically configurable to operate
on given sample data using given computational processes and execute automatically in a
hosted Google Colab environment for both validation and deployment. This workflow
framework leverages Deriva, BDBags, and MiniDs for data and process retrieval and is
demonstrated in the semantic cell application of deep learning across imaging domains.
7
Chapter VI: The dissertation concludes with a summary of contributions, limitations, and
suggestions for applications which can be derived from the semantic cell method. A
number of promising avenues are provided for follow-on research.
8
CHAPTER TWO: RELATED WORK
A. INTRODUCTION
Structural cell modeling efforts have employed highly niche and disparate
approaches to the multiscale and multimodal integration problem. To some extent, the
sparsity of these techniques reflects the relative lack of interdisciplinary teams that are
positioned to address this class of applications as well as the recency of the opportunity to
address the challenge using more general and modern computational techniques. Almost
all modern statistical modeling approaches to date from the computer science community
as applied to biomedical imaging have addressed relatively limited problems within a
single imaging modality or have addressed multimodal problems in other domains with
methods that do not directly fit the more limited observational paradigm of biomedical
imaging. Rarely does machine learning address multi-scale data modeling challenges in
biomedical imaging.
The work described draws heavily upon previous research results in computational
modeling for structural biology as well as emerging approaches to data integration in the
subfield of deep learning at large. This chapter documents significant research literature
in each area, with special attention paid to those that are particularly relevant or
9
considered ground-breaking. Where appropriate, the discussion includes a comparison
with this work, so as to demonstrate its contribution to the state of the art.
B. INTEGRATIVE MODELING DEFINITION AND
PARAMETERIZATION IN BIOMEDICAL IMAGING
Integrating knowledge from multiple imaging modalities and scales is of utmost
importance in biomedical discovery, but rarely is a formal model for this integration used.
Generally biomedical imaging researchers focus on techniques to maximize resolution
and throughput for their own specialty while maintaining only foundational knowledge of
the surrounding context in their discovery process. However, the variety of existing
imaging techniques and inability of any one imaging modality to answer all questions
implies that some gains may be had in advanced methods to help bridge the scales and
integrate information from multiple techniques more formally, rather than advancing the
performance of any one technique alone.
In the past, integrative modeling approaches in structural biology have been highly
parametric, with some modelers using images as reference to illustrate idealized
containers of known protein compositions [Goodsell], or representing cell structures
mathematically as explicit and discrete geometric forms [Murphy]. Parameterization in
10
the semantic cell method is abstracted to operate on high dimensional semantic maps,
allowing complex statistical modeling on whole image inputs of various scales and
modalities whose semantic schemas are aligned. There is surprisingly little supporting
research in the area of abstracting the connections between imaging scales and modalities
using semantic maps. However, there are many techniques which use so-called
“nonparametric” neural network techniques to model structure in whole image inputs and
produce whole image outputs.
C. NONPARAMETRIC BIOMEDICAL IMAGE RECONSTRUCTION
While parametric modeling and simulation is still an active area of structural
biology research, modeling of larger biological phenomena is increasingly practiced
through the nonparametric statistical modeling of imaging data. The Allen Institute’s
Integrated Cell [Donovan-Maiye] is a great example of a high resolution 3D cell model
which is achieved by the direct nonparametric probabilistic modeling of images using
neural networks. The integrated cell method operates on multichannel fluorescence
microscopy images to predict the location of “target structure” image channels from
“reference structure” image channels and then uses this system to amalgamate high
resolution 3D image representations of many target image channels simultaneously which
would not be possible to acquire experimentally (see Figure 1).
11
Figure 1. From “Deep Generative Model for Single Cells” (2022)
[Donovan-Maiye]
The authors suggest that nonparametric approaches can generalize well to model a
wide variety of structural localization patterns, and are more convenient to employ than
traditional parametric models. Some areas they suggest for applying and extending this
method of structural prediction are: time-series imaging, perturbations from drug
treatments, and cell state differentiation. Since then, each of these areas of structural
prediction have been approached with similar neural network techniques, specifically
using fluorescence microscopy data and generative adversarial network (or GAN) based
architectures [Wei][Goldsborough][Yuan]. The transition of structural cell modeling
practices from parametric to nonparametric methods has furthermore been fairly
12
complete, as many students of leaders in classical parametric modeling have led the
progress in nonparametric neural networks as co-authors.
The methods in the niche subfield of nonparametric structural cell modeling are
designed to apply specifically to fluorescence microscopy data, for which structural
subcomponents are easily identifiable and thresholded as separate image channels.
Unsurprisingly, this advantage has made fluorescence microscopy the modality of choice
for parametric and nonparametric computational cell modeling dating back to Murphy’s
CellOrganizer work in 2004.
The identification and separation of cell structure in images is itself a complex
process. In small scale modalities such as electron microscopy and tomography as well as
large scale modalities such as MRI and ultrasound, images are grayscale with many
structures of interest in close proximity and obscured by noise. Even in fluorescence
microscopy the need to deconvolve overlapping fluorescent tags limits the number of
structures which can be imaged simultaneously, which is why predicting many target
structures as opposed to imaging them directly was interesting in the Allen Cell work to
begin with.
Further complicating matters, the imaging process can often damage structure in
ways which prevent full sampling. For instance, high resolution tomography requires
cells to be frozen and effectively killed to acquire a snapshot, preventing time series
13
imaging at that resolution. Tomography can also destroy sections of the subjects during
the imaging process, resulting in artifacts such as the “missing wedge” in single cell
samples. In fluorescence microscopy, too much imaging at a high dose can result in
photobleaching and cell death. Similarly, many of us are familiar with the risks of
repeated MRI and X-ray exposure in human imaging which limit the radiation dose and
image quality which can be achieved.
Many challenges in structural identification for biomedical images are addressed
through semantic segmentation, which we will cover in Section G. However, there are
also some methods in the category of “augmented microscopy” [Wang] or “image
reconstruction” that are relevant to this section and attempt to impute observational gaps
in biomedical imaging samples.
These methods use nonparametric structural modeling methods to infer high
resolution images from low dose/low resolution imaging or fill in the missing wedge
[Ding] for single modalities (see Figure 2). In this case the models are designed to predict
the complete statistical distribution of an instance by training on randomly held-out
sample areas and downsampled pairs without any reference to explicit structural parts or
channels.
14
Figure 2. “A joint deep learning model to recover information and reduce artifacts in
missing-wedge sinograms for electron tomography and beyond” (2019) [Ding]
The nonparametric reconstruction methods presented in this section operate within
only a single modality and do not attempt any integration of structural information
between imaging modalities or scales. For examples of data integration, we look next to
examples of nonparametric multimodal modeling methods in biomedical and other
domains.
D. MULTIMODAL DEEP LEARNING
The semantic cell method requires that information from multiple imaging
modalities are combined to generate multiscale representations. The notion of combining
information from multiple modalities is very common in nonparametric modeling, which
we will call “deep learning” now that we have moved into that subfield more explicitly.
Multimodal deep learning methods use encoding and decoding neural network
architectures similar to those used in single-modality methods, only they convolve
15
vectors for multiple inputs of different modalities and concatenate or combine them in
encoded latent space before decoding the final result. (see Figure 3).
Figure 3. “Playing Games in the Dark: An approach for cross-modality transfer in
reinforcement learning - same latent space for dif erent modalities” 2019. [Silva]
Multimodal methods have commonly been used to combine images and text to
make discriminative predictions about sentiment in robotics [Tzirakis] as well as to
combine visual representations of different types, for instance LIDAR and RGBD, or
satellite and GIS, to perform more accurate and comprehensive object classification for
“scene understanding” [Li][Chen][Gomez].
Recently, multimodal methods using “transformers” have achieved breakthrough
performance on generative modeling applications, with networks such as Dall-e which
are capable of creating images from user-generated text-captions [Ramesh]. As this work
16
has developed in the community, the encoded latent space terminology has been replaced
with the concept of “tokens” which emphasize the adaptability of these symbolic
representations in a variety of decoding and translation tasks across modalities.
So far, the cell modeling field has taken note of mostly the text and image
applications, where multimodal architectures and techniques are now commonplace.
Most often these multimodal biomedical modeling experiments combine “-omics”
information such as gene sequences with imaging data to improve the performance of
discriminative tasks such as classifying subpopulations of cells [Yang] [Wang, X.]
[Uhlerand] [Tang] [Cao] [Chessel]. Currently, there have been no attempts to use
multimodal architectures for generative tasks of the likes of Dall-e, either for translating
images to omics or omics to images.
There are some applications of “image fusion” in which two biomedical imaging
modalities applied to the same sample can be combined to assist with tasks such as
semantic segmentation [Zhang] [Wang, Y.]. (See Figure 4). Additionally, neural networks
can be used to resolve superficial differences between two imaging datasets of the same
modality for domain adaptation [Chen, C.]. Image to image applications have also been
used to translate imaging modalities from grayscale to RGB or RGB to RGBD in the case
of natural images.
17
Figure 4. “Multi-task multi-sensor fusion for 3D object detection” 2019. [Liang]
In biomedical imaging, there have not been many studies which aim to generate or
translate between two different modalities using deep learning methods. There are,
however, many methods which have been designed to create overlays of multiple image
modalities geometrically, which we explore next.
E. NONPARAMETRIC BIOMEDICAL IMAGE REGISTRATION
The integration of imaging information from multiple modalities and scales is not
a new concept in biomedical imaging. Past approaches however have focused on the
basic problem of imaging the same sample with two different modalities, which is
nontrivial, as well as aligning these specific multimodal images of the sample using
“image registration” techniques.
18
Multimodal biomedical image acquisition can be performed in both optical
microscopy and medical imaging contexts. In optical microscopy, the most common
modality pairing is correlated light and electron microscopy (CLEM) in which the same
sample is prepared for optical microscopy and then imaged again using higher resolution
destructive techniques. In medical imaging, there is PET/CT as well correlated MRI, CT,
and/or ultrasound studies in cardiology, in particular. The technical difficulty of these
experiments is mostly in sample preparation. However, once the image modalities are
captured, experimenters must then reconcile the slightly different orientations, pixel
scales, and geometric properties of each. The task of resolving these differences to create
one-to-one pixel overlays is called “image registration”.
For some simple cases, image registration can be performed through basic affine
transformations such as rotation, rescaling, and shearing in three dimensions. There are,
however, some “non-rigid” image registration methods in which feature similarity
between the two modalities are encoded and correlated with neural networks to handle
more complex nonlinear deformation and alignment functions [Zampieri].
While the process of aligned multimodal imaging data acquisition is laborious to
perform, multimodal image registration is generally a solved problem. Multimodal
overlaid images and videos from top researchers such as Eric Betzig and others are very
popular [Hoffman] and have even been augmented through manual illustration by some
19
researchers such as Gael McGill [Jenkinson] to create some of the most acclaimed
multiscale representations in the biomedical modeling field today. (see Figure 5).
Figure 5. Multimodal registration images of Eric Betzig (left) [Hof man] and Gael
McGill (right) [Jenkinson] .
In principle, these aligned multimodal image samples could be used to train
generative models which could translate between the two modalities, but the samples are
so high-dimensional and few in number that learning this transformation would be
infeasible using basic deep learning methods.
F. LIMITATIONS OF NONPARAMETRIC TECHNIQUES FOR
BIOMEDICAL IMAGE INTEGRATION
20
The limitation of the nonparametric techniques discussed so far is that they require
many completely aligned input/output sample pairs. For example, in image
reconstruction, the same sample image must be provided with and without the target area
of interest. In multimodal deep learning and image registration, the same sample must be
provided under the exposure of both modalities.
The technical difficulty in applying multimodal methods to biomedical image
integration is that aligned multimodal biomedical imaging data are few in number and
difficult to attain. Without tens of thousands of aligned multimodal image samples, there
are simply not enough direct sample pairs on which to train a nonparametric algorithm.
Given how valuable these integrated biomedical imaging models are, it is therefore a
great opportunity to develop hybrid methods where unaligned images from different
samples can be represented and correlated based on the relationships between abstract
parts.
G. SEMANTIC SEGMENTATION IN BIOMEDICAL IMAGING
As mentioned in section C, the identification of structures in noisy biomedical
images is often accomplished through the process of semantic segmentation. Semantic
segmentation can be a manual or automated process where each pixel in an image is
21
assigned membership to a specific class of structure. Through this process, a grayscale
image block can be converted into a 3D reconstruction of multicolored structures and
volumes whose quantitative attributes can be measured and compared. (see Figure 6)
Figure 6. "Neural Network Segmentation of Cell Ultrastructure Using Incomplete
Annotation," 2020 [Francis]
Semantic segmentation is a time consuming but critical research process for many
imaging labs. Using deep learning to automate semantic segmentation has therefore been
widely popular across many modalities [Ronneberger] [Isensee]. Large numbers of
aligned image/label pairs make the automation process fairly straightforward and
semantic segmentation methods continue to improve in sophistication and accuracy.
Many leading methods now incorporate GAN-like architectures to ensure results not only
minimize pixel-wise label differences but also minimize overall likeness to real
segmentation examples [Luc] [Kamran].
22
In the biomedical domain, semantic segmentation maps are treated as final
imaging results which are then reduced and analyzed as numerical volumes or distances
or visualized as surfaces. Much of this work began with that outlook. However, there is
also a subfield of deep learning which treats semantic segmentation maps as data which
can be modeled and synthesized as if they were images themselves.
H. SEMANTIC IMAGE SYNTHESIS
The field of semantic image synthesis is mostly concerned with how semantic
label maps can be used to improve generative imaging tasks. Some of this work called
“image inpainting” attempts to complete missing areas of images. Often images
generated by deep learning lack clear components or distinctions between them and can
benefit from using segmentation boundaries as prior information [Liao][Liu].
Going further, some methods try to predict entire images from their semantic
labels. While this problem space is under-defined in that one semantic label map can
correspond to infinitely many images, the methods can generate realistic samples and
have led to many popular applications in which detailed landscape images can be
generated from rudimentary sketches of mountains, rivers, and streams [Sushko] (see
Figure 7).
23
Figure 7. “You Only Need Adversarial Supervision For Semantic Image Synthesis”
(2021) [Sushko].
I. SUMMARY
This chapter presented work related to the design and implementation of schemes
for biomedical image modeling and integration. Overview summaries were provided for
parametric integration methods as well as nonparametric methods such as image
reconstruction, multimodal modeling, and image registration, as well as their limitations.
We reviewed semantic segmentation methods and their role in traditional structural
analysis as well as modern applications for semantic image synthesis with deep learning.
We did not cover everything and key surveys are available here [Vasan] [Shamsolmoali]
[Uhlerand].
24
The review presented in this chapter shows that the creation of nonparametric
generative models which combine biomedical images from multiple scales and modalities
has not been previously attempted or proposed. However, many previous efforts in
classical biomedical modeling show the merit of this problem and position this research
in a growing field of deep learning where data integration and complex problems in the
biomedical imaging domain have been a frequent subject of difficulty as well as
inspiration. The chapters that follow show how these previous efforts can be integrated
and extended into the larger scope of this dissertation through the nonparametric
modeling of aligned semantic parts.
25
CHAPTER THREE: EXPANDED PROBLEM
STATEMENT
In order to present a general-form method for integrative multiscale cell modeling,
it is necessary to characterize a generic form of the integrative modeling problem:
“Integrative modeling is characterized by defining a unified model representation
for many multimodal inputs, such that the result can be scored based on its
derivative input information.”
Though the terms of this statement are familiar, their usage bears definition:
● model representation: An explicit system of features and constraints that
determines an acceptable range of output results.
● multimodal inputs: Complementary information of different types which
are provided to a model. Multimodal inputs can contain different subsets of
features describing the same underlying domain.
● score: A single metric describing a sample’s overall quality. Scores can be
calculated using a loss function, for instance, describing how closely a
result aligns with input information.
Mathematically, this optimization problem can be illustrated as follows.
26
Let there be two modalities, modality A and modality B, each containing a subset
of observable features. That is, Modality A gives a set of features SA = {sa1, sa2, sa3, …,
saM} and modality B gives a set of features SB = {sb1, sb2, sb3, …, sbN}. Each input
instance “xi” is drawn from only one modality and can contain features SA or SB, but
never both.
The score of any model choice is a function of the model’s output z can then be
scored with respect to the particular input information from x it seeks to represent: p(z | x,
). In the multimodal case, this cost must be calculated with respect to input features of
each modality, as given below:
[p(z | x, )] = log(zSA, zSB | xSA, xSB, )
where the subscripts SA and SB denote the constituent components of z such that zSA + zSB =
z and likewise for x.
The optimization function is then to choose some model with parameters , such
that this score is maximized, or the negative score is minimized, across all modality input
samples.
(∀x ∈ A,B) [- arg min log(zSA, zSB | xSA, xSB, )]
or
27
- arg min ∑ log(z[i, SA]
, z[i, SB]
| x[i, SA]
, x[i, SB]
, )
This dissertation postulates that although there exist no samples xi containing
features SA and SB to optimize against, a neural network model can be constructed to
generate whole representations z = (z[i, SA]
, z[i, SB]) such that a union of features SA and SB
can be evaluated, provided that there exist some common features S_common between
SA and SB that are isometric. In this manner, SA and SB are rewritten as:
SA = f(S_a + S_common)
SB = g(S_b + S_common)
where f(x) and g(x) are operations designed to achieve isometric uniformity. We can then
formulate our optimization function as:
- arg min ∑ log(z[i, SA]
, z[i, SB]
| x[i, S_commons]
, )
Which we can solved given a combination of our partial solutions:
(x ∈ A) p(zS_a
| xS_common
, ) + (x ∈ B) p(zS_b
| xS_common
, )
where S_common can, again, be derived across all inputs x ∈ A,B as follows:
28
x ∈ A : S_common = f
-1
(SA) - S_a
x ∈ B : S_common = g
-1
(SB) - S_b
It is therefore possible to optimize the multimodal integration problem with
heterogeneous inputs given the following information:
● Features SA for x ∈ A
● Features SA for x ∈ B
● Functions f(x) and g(x) which can derive common isometric features from
SA and SB
Hereafter, this general framework is referred to as the semantic cell model, where
the single cell is the common isometric unit with which semantic features can be aligned.
This relationship implies that heterogeneous inputs always contain common
semantic features that can be extracted and transferred between metric spaces. In fact,
however, shared isometric features may not always exist, either due to a lack of
coordination in measurement or domain misalignment between the sample spaces of
modality A and B. It is therefore, of the utmost importance, that the domain is aligned
and semantic feature measurements standardized between the two modalities in order for
any of these points to hold.
29
A. THE INTEGRATIVE MULTISCALE CELL MODELING
PROBLEM
The semantic cell framework is best explained by describing its application to a
specific problem type. The problem we will use is the integrative multiscale cell
modeling problem, with the following characteristics:
● a neural network model representation accepting whole image inputs of a
fixed dimension and producing whole image outputs of a fixed dimension.
● x-ray tomography and fluorescence microscopy inputs of different
resolutions, color spaces, and metric dimensions.
● a pixel-wise cross-entropy scoring function where image outputs can be
evaluated based on the overlap between predicted structural label maps and
ground truth structural label maps.
Implementing a neural network scoring function for partial inputs is itself a
complex process, which will be covered in Chapter IV. Constructing the optimization
model is straightforward, given the feature inputs SA, SB, and the process for deriving
S_common between the two. However, defining and generating the feature inputs and
transfer process is non-trivial. Generation of each of these three SA, SB, and S_common
30
functional inputs is discussed in turn below, with special attention to the integrative
multiscale cell modeling problem stated above.
1. Features SA for x ∈ A and SB for x ∈ B
The features chosen for each modality are subjective and can vary significantly
between applications. It is possible to record for each modality all pertinent information
about the experimental result: date captured, imaging device used, treatment condition,
cell type, and, of course, the pixel values of the resulting image, to name a few. The
choice of which features to use depends upon which information we are interested in
representing in our final result. For the integrative multiscale cell modeling problem we
are most interested in representing subcellular structure.
Capturing features for subcellular structure can be as basic as measuring the
volumes of compartments, pixel intensity values for areas of interest, and so on. In the
multiscale integrative cell modeling problem, we are ultimately interested in creating
whole image outputs from whole image inputs and therefore we require a more detailed
representation of every pixel in the input image as a structural feature. This process of
converting an image to a pixel-wise feature map is referred to as semantic segmentation.
Semantic segmentation can be done manually, but is time-intensive. Therefore, the
construction of features SA and SB for this problem are achieved through semantic
segmentation models for p(xSA | x) for x ∈ A and p(xSB | x) for x ∈ B, which are detailed
in Chapter IV.
31
2. Functions f(x) and g(x) which can derive common isometric features from SA and SB
In order to predict a union of input features, the semantic cell model requires that
there exists some common subset of features between SA and SB which can be
isometrically aligned. In the integrative multiscale cell modeling problem, both the x-ray
tomography and fluorescence microscopy modality capture images of subcellular
structure and the commonality can be established explicitly by enforcing a unified
labeling scheme for features in SA and SB such that a pixel label of 1 and 2 are assigned
to pixels belonging to the cell membrane and nucleus respectively in both the x-ray
tomography and the fluorescence microscopy image. In order for these features to be
utilized, however, there is an additional step of making these features isometric between
the two modalities using the transformation functions f(x) or g(x).
The functions defined to achieve S_common need not be continuous and are
highly contingent upon the native perspective or attributes of the modalities we seek to
align. For instance, in the integrative multiscale cell case, the x-ray tomography image is
captured at one voxel resolution with one cell in the field of view while the fluorescence
microscopy image is captured at a higher magnitude voxel dimension with many cells in
the field of view. Correcting for this metric discrepancy can be achieved by simplifying
each sample as a collection of 2D slices, for instance, and applying affine transformations
to rescale the smaller image. Additionally, a discrete instance segmentation operation can
32
be performed to separate the fluorescence image into multiple cell instances which can
then be cropped and padded to achieve uniformity with the x-ray tomography inputs. This
process will be further detailed in Chapter IV.
The design choices for these modality transfer functions f(x) and g(x) as well as
the features SA and SB which they operate upon are the crux of the semantic cell
framework. In practice, we find that semantic labels provide an apt unit for aligning
features between multiple perspectives while the single cell provides a similar unit for
translating between metric spaces. While this example is highly specific, we posit that
these core assumptions can extend to a number of applications, especially in the
biomedical space, in which semantic parts can be identified as landmarks across a
number of multimodal experiments on the same subject.
The framework for the integrative multiscale cell modeling problem is illustrated
in the figure below:
33
Figure 8. Semantic Cell, Integrated Multiscale Cell Modeling Framework
In the next chapter, we will detail the technical application and results of the
semantic cell method to the multiscale cell modeling problem. We will discuss how
semantic maps are computed and provided as input to the integration method, as well as
the use of automated segmentation methods in completing this step. With the
computation of semantic inputs achieved, we will discuss the application of instance
segmentation and the design and deployment of transfer functions to achieve isometric
alignment based upon the single cell. Chapter IV will then finish with a description of the
integrated multiscale model training process and an analysis of its results on the structural
prediction task.
34
CHAPTER FOUR: APPLICATION AND ANALYSIS
OF RESULTS
This section looks at the effectiveness of the semantic cell method for predicting
structural features across imaging modalities. It includes evaluation with respect to “held
out” ground truth structures in the unimodal and multimodal context as well as examples
of purely generative results for unseen structures as described in our initial motivations in
Chapter I. The discussion begins with the formulation of a rudimentary neural network
computation scheme, first applied for the creation of semantic feature maps from raw
images for both the fluorescence and tomography modalities, which then provides a basis
for more advanced methods in utilizing heterogeneous feature sets in training, and finally
computing combined feature maps from multiple modalities. A description of the
structure of these optimization techniques follows for each step of the process. Finally,
this section draws upon these evaluations to provide an analysis of the comparative
effectiveness and merit of the semantic cell method, particularly with respect to the
quantifiable impact of isomorphic alignment between multimodal inputs.
A. COMPUTING SEMANTIC FEATURE MAPS
Computing semantic feature maps is the first step in the semantic cell framework.
This operation must be performed to identify common features between input modalities
35
and can be achieved as a traditional semantic segmentation task as described in Chapter
II. In our application the two input modalities are the Soft X-ray Tomography and
Fluorescence Microscopy imaging techniques.
1. Inputs and Dataset Creation
a. Tomography
The soft X-ray tomography technique allows for the reconstruction of 3D
grayscale images at approximately 25 nm^3/voxel which is sufficient to reveal large and
small scale ultrastructural components in vitro [White].
In our work, soft x-ray tomography was performed on 216 INS-1E cells. These
cells were split evenly among three different treatment conditions: 25mM glucose, 25mM
glucose + Ex-4, and unstimulated. Each of these subsets was also split into capture times
1 minute, 5 minutes, and 30 minutes after treatment. The process of identifying and
segmenting individual components is not trivial; manually segmenting just one tomogram
for membrane, nucleus, mitochondria, and insulin vesicles, for instance, takes experts up
to 8 hours and requires specialized software. Due to its high cost, manual annotation is
prohibitive for large scale study such as the semantic cell. Automating this segmentation
process is therefore of high interest for our method as well as other downstream analysis
work. In order to produce training data, as well as perform initial analysis, a group of
experts manually segmented 27 of those resulting tomograms for membrane, nucleus,
36
mitochondria, and insulin vesicles (See fig. \ref{fig:example}). In addition, soft x-ray
tomography was performed on HEK and 1.1B4 cells, 12 of which were segmented by
experts as part of a negative control group.
b. Fluorescence
The fluorescence microscopy technique allows for the reconstruction of 2D
grayscale images at approximately 250nm^2 in the x/y direction. In our work we
captured ‘X’ images of the INS-1E cell line in the same treatment conditions we
investigated for tomography in section A. Fluorescence required less manual labeling due
to the ability to use fluorescent dyes. Dyes were used to highlight the membrane, nucleus,
and mitochondria (but not the insulin vesicles). These dyes could not be used at once but
by splitting the sample into groups we were able to arrive at a sufficient amount of
image/label pairs for each feature to train a neural network.
2. Neural Network Architecture and Formatting
For our neural network semantic segmentation task, we chose a standard 2D U-net
architecture [Ronneberger] as described in Chapter II. Our choice was based on the
success of this architecture in a variety of biomedical imaging domains and particularly
its multiscale architecture which could conceivably handle large structures such as the
nucleus as well as insulin vesicles that were orders of magnitude smaller. Our 2D
network consisted of a contracting path and an expanding path. The contracting path was
5 levels deep with 2x2 pooling between each level, while the expanding path was
37
connected by an upsampling filter with a 2x2 kernel. All convolutions had a kernel size
of 3x3, stride=1, pad=1 followed by rectified linear units (ReLu). Each level was
composed of 2 convolutions back to back. The number of filters at the top level was 32
and doubled at each level of depth in the network. The last layer contained 1x1
convolution followed by a softmax, which gave the pixel-wise probabilities of each
segmentation label.
We chose to use a 2D U-net as opposed to a 3D U-net for multiple reasons. First,
using a 3D network would have required us to downsample or chunk our 3D volumes
into smaller sub-blocks to fit GPU memory. We felt downsampling would discard
necessary detail and compromise segmentation performance for small structures while
chunking would deprive the model of larger scale context for prediction. Additionally,
using a 2D network effectively allowed us to multiply our tomography training set size
by 512 as each slice could be considered a separate instance. Furthermore, using the same
2D network architecture for 3D tomography and 2D microscopy allowed us to combine
our work and later present opportunities to repurpose the network to easily accept both
inputs together.
The original image and manual segmentation for each annotated tomogram were
resized to a standard 512x512x512 voxels using linear and nearest-neighbor
interpolation, respectively.
38
3. Training Methods for Heterogeneous Label Sets
Segmentation by neural networks relies on the availability of a large amount of
fully annotated training data. As a rule of thumb, this requirement is often set at 10,000
instances per class to prevent overfitting and produce a generalizable result. From the 27
fully annotated INS-1E tomograms, we randomly selected 12 or training, 5 for validation,
and 10 for testing. As stated above, using a 2D network architecture effectively increased
our 12 3D instance training dataset to a 6144 instance training dataset. However, this still
being on the low end, we were interested in investigating the application of neural
network segmentation including incompletely annotated training data from our control
group to increase the size of our training set and improve overall annotation performance.
In the control group, twelve 1.1B4 human beta cells and human embryonic kidney
cells (HEK) had been annotated with only the membrane and nucleus label. Therefore,
including these structures in training with the INS-1E cells was problematic not only
because they contained no ground truth for mitochondria and insulin segmentation
optimization, but also because their “membrane" label has different semantic meaning
since un-annotated mitochondria was included in it. Combining these two datasets in a
classical learning scheme would therefore pollute the learning signal and undermine
performance for those labels which were sometimes missing and those which had
inconsistent meanings. Rather than discard incompletely labeled data, we proposed to
39
address this problem with a method that can learn effectively with such incomplete
annotations.
Our method was based on the observation that each partially annotated data can be
derived from fully annotated data by merging labels. In our application in particular,
partially labeled data for 1.1B4 and HEK cells can be derived from fully annotated data
by merging insulin vesicle and mitochondria labels with the membrane label. Following
this derivation, any fully annotated data output by our network could then be compared
with partially labeled manual annotation in training. The mathematical formulation is as
follows:
Let L be a complete label set. Without loss of generality, we assume that each
voxel in an image is assigned with a single label. Let IF = {(Ii, Si)}ni=1 be n fully labeled
training images, where Ii and Si are image and ground truth segmentation, respectively.
Let IP = {(Ij, SPj)}mj=1 be m partially labeled training images, with label set LP and
|LP|<|L|.
Let TP:L → LP be a mapping (label merging) function that maps each label from
the full label set L to one label in LP. Note that the label in LP may have different
semantic meaning, i.e. union of multiple labels, from the same label in L. In our
application, TP is an identity function except TP (mitochondria) = membrane and
40
TP(insulin vesicle) = membrane. Let θ be the parameters of a convolutional neural
network. Let Mθ(I) be the result produced by the model for image I.
We proposed the following objective function for combining fully and partially
labeled data in training:
(θ) =
=1
∑
θ
( ),
( ) +
=
∑
θ
( ),
( )
Where LF is the standard term on fully labeled training data and LP is applied on
partially labeled data.
For training data with full annotation, we apply the cross entropy function.
θ
( (), ) =
∈
∑
∈(,)
∑ − log
( (, θ))
where x is the index through image voxels, Pl is the model predicted probability
for label l from the full label set L, and r(S, l) is the region assigned to l in ground truth S.
For partially label training data, we apply a modified cross entropy function by
transforming model predictions for label set L to model predictions for label set LP:
41
θ
(),
( ) =
∈
∑
∈
( ,)
∑ − log
( (, θ))
where pP is the model predicted probability for label set LP, which is derived from
the model predicted probability for L by:
(, θ) =
'
( )=
∑
'(, θ)
Note that learning with incomplete annotation is different from learning with
sparse annotation [Cicek], where only part of the training data is fully annotated and the
rest is completely unlabeled, and learning with ambiguous annotation [Cour], where a set
of candidate labels are given for each instance but only one label is correct. Learning with
incomplete annotation addresses unique situations in which annotations contain only a
subset of labels of interest which subsume missing, more detailed labels of greater
semantic depth.
It should be noted that while this inconsistency in our manually segmented data
was an artifact of our particular project, diverse data sets from similar imaging conditions
are common in general and the ability to leverage heterogeneous data in neural network
training presented many opportunities, especially when labeled data is scarce (See
Discussion in Section 4).
42
Using this training method for heterogeneous label sets, all 12 partially annotated
1.1B4/HEK cells were included for model training as well. The method was also
employed for the slightly more simple case of the fluorescence modality in which each
sample had been dyed with only one of the three target labels.
4. Auto-segmentation Results
a. Tomography
In experimental validation, combining 12 partially annotated data with 12 fully
annotated data for CNN training produced substantial improvement for every anatomical
structure over using the fully annotated data alone. This was a success both for the
heterogeneous training method in general and for our specific aim of accurately
automating the generation of semantic feature maps.
For comparison, we tested the heterogeneous training method against a baseline
method which was trained using only fully annotated training data. In both cases, data
augmentation was implemented by applying a randomly generated affine transform, with
(-180, 180) rotation and (-20%, 20%) scale variation, to the training data on-the-fly. Both
methods were run for 200 epochs. For quantitative evaluation, the dice similarity
coefficient (DSC) [Dice] was applied to measure agreement between automatic and
manual segmentation.
43
The figure below shows example segmentation results produced for testing data
for each segmentation method. The table below summarizes the segmentation
performance on validation and testing data, respectively. As expected, including partially
annotated training data for training substantially improved segmentation performance for
membrane and nucleus, which are labeled in both fully and partially annotated data.
Interestingly, improvement was also observed for insulin vesicles and mitochondria,
which were not labeled in the partially labeled training set. One potential explanation for
this result is that including partially labeled data for training simply increased the size of
our training set, which is critical for learning a more robust overall representation for
better segmentation. The improvement for these labels also suggests, however, that a
more robust knowledge of the distribution on one set of structures in the cell can inform
the inference of others. Specifically, both the insulin vesicles and mitochondria are
located within the membrane and if a segmentation method is better able to discern
membrane regions from other structures, it has a better chance of correctly segmenting
insulin vesicles and mitochondria as well.
Overall, the segmentation performance was improved from 0.639 DSC to 0.706
DSC, a 10.5% gain over the baseline performance.
44
Figure 9. X-ray Tomography Auto-segmentation Results
b. Fluorescence
The fluorescence auto-segmentation network similarly leveraged a mix of partially
labeled images to produce a combined label map. The inputs were slightly different in
that the raw image input was itself a 3 channel image of NADH, brightfield, and
fluorescence lifetime signal. Also, training slices were of dimension 256x256 with
150nm^2 resolution meaning one less layer of U-net convolution was required compared
to the 512x512 inputs.
45
Figure 10. Fluorescence Microscopy Input Image Examples
936 slice instances were provided for training, 408 for validation, and 408 for
testing. In each partition, one third of the slices contained each of the three labels. Using
the method outlined in the section above, the network was trained by predicting all three
labels together, but only optimizing against the label which existed in the ground truth.
Figure 11. Fluorescence Microscopy Auto-segmentation Results
Ultimately, network performance averaged 83% DSC across the three labels with
results shown in the table below:
46
Membrane Nucleus Mitochondria Average
Testing set 0.898 0.870 0.724 0.830
Table 1. Fluorescence Microscopy Auto-segmentation Results
B. ISOMETRIC ALIGNMENT OF SEMANTIC FEATURE MAPS
The next requirement for the semantic cell method is to achieve isometric
alignment between the feature maps of the two modalities. In our case the biggest
complication towards this end was that the fluorescence imaging method captures plate
wells with 8-10 cells in each image while the tomography imaging method captures
single cells. Accordingly, we had to develop methods to automate the segmentation of
single cell instances from the multi-cell fluorescence feature maps.
1. Single Cell Instance Segmentation
Instance segmentation differs from semantic segmentation in that instead of
labeling all pixels of a class uniformly, pixel groups must be separated based on some
concept of separate class objects or “instances”, see figure below.
47
Figure 12. Instance segmentation illustration
In our work we adapted a method to perform distance map computation based on
Hayder, Zeeshan, Xuming He, and Mathieu Salzmann. "Boundary-aware instance segmentation." Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Using this method, a unet
type architecture can be optimized to predict not whether a pixel is a member of the cell
membrane class, but how far each pixel is from the background class.
Figure 13. Fluorescence Microscopy Instance Segmentation Example
From this method it is then possible to threshold out distance values above a
certain value and in most cases achieve fairly clean separation between cell membrane
48
labels. We can then produce a 256x256xN binary image where N is the number of cells or
islands in the image.
2. Isometric Transformation Function Design and Deployment
After computing the instance segmentation map, the remaining aspect of the
transform function is seemingly straightforward. For each cell instance value, we apply
the instance segmentation as a mask on the feature map and crop a square area with a
small border around the non-zero pixels. We then resize the image to 512x512 to achieve
uniform scale.
Lastly, for our case, the tomography segmentation values were adjusted so that the
mitochondria took the third label position in order to align with the fluorescence values
such that the shared semantic features of membrane, nucleus, and mitochondria
corresponded to channels 1, 2, and 3 in each modality. This final transformation function
on the tomography is minimally expensive and complex computationally and was
therefore performed dynamically at runtime whereas the instance segmentation was
bulk-processed in advance.
C. INTEGRATED MODELING RESULTS
49
1. Model Architecture
The final step in the semantic cell method requires a neural network architecture
where feature map outputs can be evaluated against ground truth feature map inputs from
multiple modalities. We went to lengths in section B to design functions which align the
input feature maps with identical and interchangeable formatting. This step in effect
makes the downstream methods for optimization no different than that we developed for
training with heterogeneous labels within a single modality in Section A. The only
difference is the implementation of the isometric transfer functions in the training data
generator after which the isometrically invariant instances can effectively be treated as a
partial label set depending on its original modality membership. Accordingly, we again
use a U-net architecture as a generic 2D encoder/decoder network.
2. Experimental Design
The experimental design was challenging by nature as the intended effect of the
semantic cell method is to generate statistically reasonable predictions of the localization
of unseen structures for which no ground truth exists in the native modality. In generative
modeling work, this statistical reasonable-ness is typically quantified by metrics such as
the inception score in which generated examples are evaluated for their quality with
respect to the original training distribution and diversity with respect to each other. In
related work with structural auto-encoding on cell images, however, we see that it is more
common to again use the dice metric in which the ground truth is held out and evaluated
against the structural localization prediction on a pixel-wise basis. Again, we are by
50
nature lacking this ground truth data for unseen structures across modalities, for instance
for the insulin granules in fluorescence microscopy. However, we surmised that we could
treat the mitochondria, shared between each modality, as if it were an unseen structure
and therefore evaluate how effective the network was at predicting this unseen structure
in the multimodal context.
In order to fairly measure the performance of integrated multimodal prediction, we
had to first evaluate how well the mitochondrial and insulin vesicle structures could be
predicted from base reference structures in their native modality. At that point we could
then measure how integrating this information from a multimodal source compared to
performing the inference within a single modality as a baseline. Each experiment was run
for 200 epochs, often over the course of a week. The results are presented in the
following subsection.
3. Structural Prediction Results
Our first baseline experiment was to determine how well the mitochondria feature
and insulin feature could be predicted from the membrane and nucleus features in the
unimodal context. The results in the x-ray tomography modality are given below.
51
mem-T nuc-T mito-T granule-T
given given 0.098 -
given given - 0.02
given given given 0.0868
given given 0.094 0.018
Table 2. X-ray Tomography Baseline Results
In these experiments it was noteworthy that although having the true mitochondria
label did improve the predictive accuracy of the insulin label by 4.5x, predicting the two
labels jointly, even in the unimodal case, was slightly less accurate than predicting each
alone. This suggested early on that predicting many structures together would not
necessarily be beneficial, particularly when statistical correlations between base
structures and target structures were weak. Nevertheless, these values were orders of
magnitude better than random and set the basis for what to expect in multimodal
prediction.
Our second baseline experiment. The results in the fluorescence microscopy
modality are given below:
Table 3. Fluorescence Microscopy Baseline Results
52
mem-F nuc-F mito-F
given given 0.55
Next, we evaluated the combined integrative model where mitochondria and
insulin were predicted from membrane and nucleus input for both modalities, but
optimized only against the mitochondria and insulin vesicle ground truth for the
fluorescence and tomography modality respectively. The results for the transfer of
structures in the tomography evaluation were as follows:
mem-T mem-F nuc-T nuc-F mito-F granule-T mito-T
given - given - - 0.02 -
given given - - - 0.098
- given - given 0.55 - -
given given given given 0.602 0.013 0.122 (holdout)
Table 4. Multimodal Integration Results
input granule baseline mito baseline joint prediction ground truth
Figure 14. Multimodal Integration Results
53
In this case, the predictive accuracy of the insulin in its native modality held
mostly steady with the same slight dropoff in performance witnessed in the joint
optimization in Table 1. However, the prediction of mitochondria in tomography as
transferred from the fluorescence microscopy modality (0.033) was 66% worse than its
performance when optimized in its native modality (0.098). The insulin vesicle prediction
accuracy within the fluorescence context by nature could not be evaluated.
Analyzing this information, we suspected that some aspect of the isometric
relationship was not being satisfied in order for the accuracy of the mitochondria feature
prediction to be so significantly worse. Perhaps the distribution of mitochondria with
respect to the membrane and nucleus in fluorescence was skew to its distribution in
tomography, which could be possible due to biological factors. More likely, we
hypothesized that the shared membrane and nucleus features were not entirely
isomorphic due to segmentation artifacts in the tomography as well as a vertical shear in
the morphology of the tomogram cells as a result of them being plunge frozen. As a result
we decided to re-run the experiments with the xray tomography data manually segmented
and sampled orthogonally to the shear direction to produce images where no shear was
apparent. Ground truth for mitochondrial prediction was re-evaluated on this new xray
tomography set and the results are shown below.
54
mem-T mem-F nuc-T nuc-F mito-F granule-T mito-T
given - given - - 0.02 -
given given - - - 0.098
- given - given 0.55 - -
given given given given 0.592 0.019
0.132
(holdout)
Table 5. Resampled Multimodal Integration Results
input granule baseline mito baseline joint prediction ground truth
Figure 15. Resampled Multimodal Integration Results
From these results, note that in the unimodal context insulin vesicle predictions
were 50% worse and mitochondria predictions were 30% better. The reasons for this are
not investigated further but likely due to the improved segmentation accuracy of the
mitochondria in the manually labeled data and the different treatment conditions which
made the insulin less predictable (particularly the absence of 5 minute time point and
55
drug treatment cases in which the insulin is empirically easier to predict in our
experience). More significantly for our investigation, however, after taking care to
provide semantic feature maps that were more isometrically aligned, the prediction of
mitochondria as transferred from the fluorescence context was within 10% of its
unimodal predictive accuracy. These results validated the significance of the isometric
condition in performing integrative modeling and also suggested that by using the
semantic cell method, target structures can be predicted across modalities from shared
features roughly as well as they could be predicted within their own modality.
D. SUMMARY
In this section we described the application of the semantic cell method in each of
its three stages (semantic feature map creation, isometric alignment, and integrated
optimization). We showed how neural networks could be used to help perform each of
these tasks as a general purpose encoder/decoder. Ultimately, we found that while the
prediction of our target features from shared features was a hard problem in the x-ray
tomography context, the distribution of mitochondria relative to the membrane and
nucleus could be learned from the fluorescence context and applied to the tomography
context with as good performance as if the distribution had been learned from within the
tomography context. This suggests that the method is viable for multimodal structural
56
prediction to the extent that structural prediction of target structures from reference
structures is viable itself.
Furthermore, we are inspired by the secondary role of the semantic cell method as
a direct means of demonstrating or validating isomorphism between multimodal
experiments that were coordinated with the intent of integration. Bringing these parts
together, particularly with neural networks, requires significant attention to detail to
ensure that formatting and quality standards are met and that data is well organized. As
such, in our next section we explore methods that we developed for database and
workflow organization throughout the course of this dissertation to enforce these
standards and allow for further exploration in multimodal integration through this method
and others.
57
CHAPTER FIVE: WORKFLOW IMPLEMENTATION
The semantic cell method contains multiple processes. Raw input data is fed
through a series of three neural networks and multiple formatting scripts. The semantic
cell method also deals with complex data inputs coming from two different imaging
modalities which must be organized so they and their intermediary results can be ingested
into these various processes. Processes in turn must be configured to recognize data
inputs and in some cases perform varying optimization functions based on alternating
modality samples during runtime.
We initially implemented the semantic cell method described in Chapter IV on a
custom multi-GPU lab server. Processes were configured manually and performed
asynchronously. Intermediate results and their improvements were archived and named in
folders with evolving conventions along with many backups and lab notes. In this
implementation, the method cannot hardly be reproduced, reused, or improved with new
data inputs. In this sense, the implementation, while fine for iterative method
development, left the system isolated many layers and apart from the raw data inputs it
seeks to bring together. As such we were interested in developing a more automated and
interoperable workflow from which our method development and other integrated
modeling development in the future might stand to benefit, particularly with respect to
58
practical experimental configuration time, validation costs, and fundamental scientific
reproducibility.
This chapter describes a findable, accessible, interoperable, and reusable workflow
framework for deep learning processes and its application for the semantic cell method.
A. INTRODUCTION
Reproducibility – the ability to repeat the computational steps used to obtain a
research result – is central to the scientific method, both as a way to verify results and as
a way to disseminate knowledge and techniques throughout the community. The steps to
reproduce a research result can be complicated. Published manuscripts contain
explanatory prose in addition to tables, diagrams, equations, and supplementary urls to
database and code repositories in order to provide the information needed. The burden
upon publishing researchers to represent this information as well as for outside
researchers to assemble it is significant and it has been found that more than 70 percent of
researchers have tried and failed to produce other researcher’s experiments [Baker].
In recent years, reduced costs and increasing performance of computational
processing has led to the development of increasingly complex computational techniques.
Deep learning, in particular, has come to dominate the field of scientific computing due to
59
its ability to model complex data distributions and make accurate predictions across a
variety of domains [b19]. In order to arrive at these results, deep learning methods require
graphics cards with GBs of RAM, neural network models with millions of parameters,
and TB-scale datasets containing 10,000 plus example instances to produce a result. Each
of these aspects: the high-performance computing environment, model complexity, and
high-volume of structured data, make deep learning techniques more challenging to
reproduce and implement than techniques which have come before.
Deep learning reproducibility has been recognized as an area of need and some of
the above aspects of the problem have received significant attention. Neural network
parameters such as model architecture, training hyperparameters, and loss functions are
commonly reported and shared in code as primary subjects of investigation. Also,
computational environments are often containerized to ensure stable execution. Similarly,
training datasets are often fixed to rudimentary sample sets with benchmarks and
attributes known throughout the community. Still, the simplified treatment of input data
as a fixed parameter systematically disregards the fact that input data is the principal and
most highly variate input to the entire deep learning process in application. As a result,
deep learning techniques are not just difficult to validate but also often fail to generalize
for new datasets, new domains, and new classes of problems. Better systems which
catalog deep learning models, computational environments, and datasets, ease the process
of execution, and treat data as a dynamic input and output are needed. Accordingly, we
propose the following requirements for reproducible deep learning systems:
60
● Catalog deep learning model parameters, computational environments, and data
inputs
● Combine and execute elements in 1) for automatic validation
● Interoperably execute across datasets and catalog data outputs as experimental
results
While methods for cataloging model parameters and reproducing computational
environments are relatively developed for deep learning techniques, methods for
cataloging inputs and output data as a dynamic experimental variable in the process are
notably missing. There is a strong body of work focusing on managing data results for
reproducibility in other disciplines. FAIR practices, in particular, are widely adopted to
help make published data findable, accessible, interoperable, and reusable across
applications throughout the research community.
Due to the primacy of structured data in both deep learning method development
and execution, we propose that integrating deep learning techniques with FAIR data
management systems and standards would not only ease the burden of reproducibility, but
also make deep learning techniques more easily interoperable across FAIR datasets in
practice. In order to realize this objective, we introduce the concept of FAIR workflows
which execute FAIR deep learning processes on FAIR input data and upload the output
results back as new FAIR data entities. These shared standards in cataloging ultimately
61
allow for an end-to-end system which can be automatically executed and with dynamic
data inputs and outputs as specified in our requirements.
Ultimately, putting data management front and center and working from robust,
live data catalogs advances the positioning of deep learning models as practical,
verifiable tools that can be used to process research data and not simply ends in
themselves. More concretely, the FAIR workflow framework speeds up the validation and
execution of deep learning models, which allows for a more rapid distribution and
utilization of the computational technique.
In the following section, we will review related work in deep learning
reproducibility and FAIR principles which we later leverage in the FAIR workflow
approach described in Section III. In Section IV, we will describe the specifics of our
FAIR workflow implementation in software and its application with respect to a deep
learning pipeline example in Section V. Finally, we will end with a discussion and
conclusion in Section VI.
62
B. RELATED WORK
1. Reproducibility in Deep Learning
Deep learning experiments are complex processes and reproducing them involves
many parameters and sub-methods that are tracked and recapitulated using a variety of
methods. These include:
● 10,000+ instances of raw input data
● Preprocessing and augmentation functions performed on the input data
● Training, validation, and testing data partitions
● Neural network model architectures with specific numbers of layers, convolutions,
and loss functions
● Training hyperparameters (learning rate, epoch iterations, random seeds, etc.)
● Hardware (GPU/CPU, Nvidia/AMD models)
● Training results and metrics
● Optimized model weights
● Deployment inputs
● Deployment outputs
These parameters can be split largely into the aforementioned categories of model
parameters, computational environment parameters, and data parameters, each of which
has been addressed in its own way with respect to reproducibility in prior work.
63
2. Model Parameters
Out of the three categories, the reproducibility of model parameters typically get
the most attention and most deep learning research concerns the study of alterations to
neural network architectures and loss functions to see which model values lead to the best
performance. This research is carried out by freezing all other variables to default values
and benchmarking experiments with adjusted architectures and loss functions against
baseline using common metrics such as precision, accuracy, recall, or the DICE
coefficient, for instance [Jean-Paul]. It is common to share architectures using block
diagrams and prose in publications and many of the biggest inflection points in deep
learning research have come from neural network architecture improvements such as the
transformer which were shown to demonstrate higher performance than other
architectures operating under the same conditions.
In order to reproduce these results, some research includes links to code
repositories on source control platforms such as GitHub showing the implementation.
Otherwise, readers are left to attempt the implementation themselves using block
diagrams and model parameters described in prose as a guide. Many such code
implementations are shared by researchers other than the author on GitHub with varying
fidelity to the original method.
64
Even when the implementation is shared by the original author, it is not always
guaranteed that running the program will lead to the reported metric results due to
inconsistencies between the "clean" code that is shared and the code that was used during
research. Additionally, most of these repositories do not share the trained network
weights, in part due to their large file size that is not supported on source control, or due
to their desire to retain these weights as proprietary knowledge. Recently, the weights for
Facebook's published LlaMa language model were shared only as a result of an illegal
leak, for instance [Vice]. As a result, even if model architectures, loss functions,
hyperparameters, and even the code implementation are shared, researchers still do not
have the information they need to validate a result without first recapitulating the
computational environment and data for training.
3. Computational Parameters
Deep learning techniques, although conceived in the mid 1900's, were only made
possible by the use of graphics processing units in 2012 [Krizhevsky,]. Accordingly, the
proper configuration and utilization of these graphics processing units {GPUs} is critical
to producing or utilizing the technique. Today, there are a few deep learning code
libraries, most notably Tensorflow and PyTorch, that allow for interoperability of the
same deep learning code between different GPUs and even standard computational
processing units, albeit slower [Abadi].
65
Still, there is often significant overhead to configure these hardware units,
specifically finding the right recipe of hardware driver, code library, and even python
version in order to execute a method on your own given hardware. This is a process
complicated enough to require interactive tables with hundreds of rows on both
Tensorflow and Nvidia's (a GPU manufacturer) website [Tensorflow]. In some cases with
old hardware and new code, or new hardware and old code, finding the right
configuration to execute a script is simply not possible and requires an update or
downgrade to one or many of these components.
As the field has matured, these compatibility issues have noticeably improved and
there has also been an uptake in the use of a system called Docker [Merkel] in order to
reproduce the deep learning computing environment as a fixed variable. Docker presents
the notion of a container which is similar to a virtual machines but without a separate
operating system. This allows lightweight application in a fixed and transportable
computational runtime. Still, however, even Docker will require a researcher to own a
GPU or a server with a GPU and manually set up the container which is not always
straightforward.
There are some solutions which provide server-hosted deep learning environments
as a service. Google Colab is one of the most accessible, which comes with preinstalled
deep learning libraries and allows users to write and execute code collaboratively in an
interface quite similar to the Jupyter notebook interface known throughout the
66
community [Carneiro]. Colab provides a free GPU which can be operated for a few hours
and is a paid service beyond that point. It should also be noted that the costs to reproduce
training and arrive at weights can now reach up to 10 million dollars for state of the art
foundation models which poses logistical challenges to reproducibility [Jonathanvanian].
These costs, however, decrease with the creation of more advanced and lightweight
methods. Many deep learning GitHub repositories now provide links to Colab
implementations which go much of the way to enabling a reproducible model and
computational environment linkage. Reproducibility and interoperability with respect to
data, however, remains an afterthought as all data must be pulled from the server or
mounted on Google Drive in the proper format for ingestion.
4. Data Parameters
Although deep learning is a "data-driven" technique, data is most often treated as a
fixed variable in the deep learning research community. Best practices commonly state to
use standardized datasets in the same way they advocate for using containerized
computing environments [Coakley]. Fixed datasets are indeed advantageous for studying
changes in model parameters as it allows researchers to evaluate alterations to
architectures and benchmark performance metrics with respect to the same training
datasets whose attributes are widely known. However, while this is a desirable situation
for validation and reproducibility of results created with those particular datasets, it
67
creates a deficit with respect to interoperability of models on new datasets which is
critical to the utility of the technique.
The rather myopic focus on a few datasets such as MNIST, ImageNet, Coco, and
so on is largely the product of how difficult it is to create good training sets. There are
cautionary tales of networks that have learned to predict sunny and cloudy days rather
than the presence of objects of interest due to artifacts of weather in training data
collection [Wilson], for instance, as well as medical networks that have learned to read
imposed metadata in corners of images as opposed to pathology within the anatomy itself
[Jansen]. In order to be valid for use in deep learning, training data sets must satisfy a
number of qualities including having sufficient size, high-quality class annotations,
balanced classes, intra-class diversity, and standardized/pre-processed format. Collecting
a training dataset is something that can take a deep learning researcher years and
accordingly the creation and management of training data is generally seen as an
auxiliary effort to be undertaken by specialized outside entities such as NIST, for instance
[Deng]. Nonetheless, the dearth of datasets with which to run experiments has been cited
as an area of need in the deep learning community, and is often included as a suggested
contribution in calls for papers at deep learning conferences [paperswithcode].
In practice, this focus on standardized datasets results can be seen in deep learning
libraries which include built-in functions such as data.loadmnist() in order to mount
common standard datasets for training. For applied deployment or training beyond these
68
standard datasets, some of the more widely used models such as YOLO [Redmon] and
MONAI [Cardoso] do have modules which aim to automatically convert datasets into
proper input format for interoperability of methods for training and deployment.
However, these methods are highly manual, operating one by one, coupled to those
specific model frameworks, and do not seek to leverage existing datasets in bulk using
reproducible frameworks.
In the biomedical community there has been some effort to publish deep learning
research datasets such as the Head and Neck Dataset from the Cancer Imaging Archive
[Clark]. There has been work such as the BioImaging Model Zoo [Ouyang] which aims
to make existing pre-trained models for various more specialized biomedical applications
findable for reuse. In general, however, there are not easy methods to mount and switch
between these datasets during deep learning development and deployment. Nor are the
outputs of deep learning execution cataloged as experimental results with references to
the processes used to generate them.
Despite these deficiencies in existing frameworks, deep learning is an area that
uniquely stands to benefit from leveraging better data reproducibility practices due to its
direct reliance on structured input data and its ability, at least in theory, to generalize
across applications and domains [Alberti].
5. FAIR Data Management
69
There have been many efforts and initiatives to address the various aspects of data
management, sharing, and reproducibility in the scientific community. These include the
Open Data movement [Huston], data citation standards such as DOI [Chandrakar],
centralized data repositories and archives such as the PDB [Bank], metadata ontologies
such as DICOM and MeSH [Mustra], and mandates for data management plans across
government-funded research [Kozlov]. Common to all these efforts is the understanding
that better efforts are needed to maximize the value and impact of research results across
the scientific community and deep learning stands to benefit from its integration with
these efforts in data management, in particular.
“FAIR” data is a more recent development (2016) recognizing that data sources
are growing rapidly in volume and complexity in a way that outstrips our current ability
to leverage them. Accordingly FAIR presents a set of unifying principles to improve data
discoverability and specifically focuses on 4 tenets: findability, accessibility,
interoperability, and reusability. Adhering to these principles involves the use of many
techniques such as assigning persistent and unique identifiers to datasets [Wilkinson],
using open access databases to store datasets [Mons], and structuring data using common
schemas [Sansone].
Together, adopting these tenets and practices can provide rigorous and complete
documentation of the data inputs used to achieve a result. However, these standards alone
70
can be challenging to implement and do not necessarily ensure that someone with access
to the data will be able to execute them effectively in combination with methods.
Furthermore, the fact that FAIR principles are meant to apply only to published data
isolates the documentation and reproducibility considerations as an afterthought, separate
from the development process taken to achieve the results. This ordering makes
FAIRness difficult to achieve and prone to being incomplete if, for instance, metadata has
not been recorded along over the course of a project. For this reason our approach focuses
on an extended philosophy of “continuous and ubiquitous FAIRness” [Foster] in which
FAIR principles are applied throughout all components of the research process over time
as a matter of working practice. The adoption of FAIRness during working practice
ensures that experimental results are compliant with a reproducibility framework from
the beginning, and of course, that any results that were derived from reproducible
processes during development will be reproducible upon publication. Ultimately, the
more comprehensively FAIRness is integrated in a practical context, the more its benefits
will transfer to others in practical use.
As data-driven science and complex statistical processing methods become
increasingly common, it is necessary to represent results as derived entities from specific
groups of inputs, methods, and computational environments, whose inputs in turn may
have been generated from other methods of their own. We argue that the integration of
FAIR data management systems provides the solution and it is from these continuous
linkages that we can not only keep pace with validating our own results, but also realize
71
the full promise of deep learning in which linkages and processes between new data and
new methods can be more readily configured and applied. Accordingly, in our approach
we aim to develop a framework which positions deep learning techniques as a process in
a FAIR research journey and not an end in itself. Our work is ultimately unique in
encapsulating a deep learning process as an element of a “full journey” including not only
a FAIRly cataloged deep learning model and compute environment, but also FAIRly
cataloged input data and FAIRly cataloged output data which are connected in an
automatic process.
C. APPROACH
In order to solve the needs of deep learning reproducibility our work aims to create
a framework that can:
● Catalog deep learning model parameters, computational environments, and data
inputs
● Combine and execute elements in 1) for automatic validation
● Interoperably execute across datasets and catalog data outputs as experimental
results
For our solution we choose to leverage continuous FAIR methodology in which
deep learning processes can be linked and cataloged alongside FAIR data inputs. Our
72
central contribution in this work is then to introduce the design and implementation of a
“FAIR workflow”, in which these linked and cataloged inputs and processes can be
executed together in a high performance computational environment. Finally, the outputs
of this execution can then be automatically cataloged back into the data repository in a
"full-journey" with links to the inputs and processes from which they were derived. In
this manner, the advantages of deep learning reproducibility can be actualized and this
framework has the particular benefits of speeding up the validation and execution of deep
learning models for outside investigators, and allowing for a more rapid deployment and
distribution of the computational technique to generate research results internally for
practical use.
Figure 16. FAIR workflow diagram
73
The FAIR workflow is centered around a “Workflow” data structure which
contains references to the input data, in our case given from a “Biosample” object, as
well as a deep learning “Process”. In our case "Biosamples" are a previously existing
element of the database we use containing rich metadata such as a “Specimen” and
“Protocol” performed on that Specimen with links to the “Image Data”, “Preprocessed
Data”,and “Derived Data” associated with the “Biosample”. Finally, the “Process” has a
”Process Files” field containing one or many compressed folders of scripts.
Figure 17. Workflow data structure
These data structures are interlinked and ultimately a workflow is created by first
linking a Biosample and specifying a specific subset of its “Preprocessed Data” and then
linking a Process and specifying a “Process File” zip from its linked “Process”.
74
In order to execute the process, we then need only to specify a Workflow ID in an
open-access server-based high performance computing environment and the
“Preprocessed Data” will be automatically pulled from the data catalog and automatically
referenced by the unzipped process scripts. Upon completion the output of the workflow
execution uploaded automatically as “Derived Data” for that “Biosample” with a
reference to the “Process” by which it was derived. As such we are able to use FAIR
principles to create a "full-journey" in which all aspects of deep learning reproduciblity
are cataloged and linked together, executed in such a way that positions deep learning as
an experimental method with archived results and dynamically configurable inputs.
D. IMPLEMENTATION
The implementation of the FAIR workflow framework is centered around the
creation of the workflow data structure in our FAIR data repository in which all necessary
fields can be linked and referenced. From that point, our implementation brings together
a variety of existing software tools to fetch referenced inputs, execute referenced
processes, and catalog results to their respective locations within the stored data structure.
75
1. Deriva
DERIVA is a data management platform designed specifically for organizing and
sharing data for scientific research organizations and collaborations. It allows for
flexibility in creating data structures with custom fields, maintaining rich metadata
associated with the stored data, and establishing relationships between different data
entities. DERIVA is also highly configurable. The system can adapt to changes in the
underlying data model, and the interfaces also allow customization of style,
documentation links, navigation menus and more. DERIVA is also an asset management
platform designed to support scientific collaboration through the full lifecycle of data -
from early experiment design, to data acquisition, analyses and ultimately publication.
DERIVA offers flexibility, specificity and security for consuming practically any
kind of data and connecting the dots between them to make them more discoverable and,
ultimately, more useful to scientists [Bugacov]. DERIVA was used to create the data
structure described in Section A above.
76
Figure 18. DERIVA Ecosystem
2. MiniD
MiniD, short for Minimal Identifiers, is a system designed to provide lightweight
and decentralized persistent identifiers for data objects. [Chard].
MiniD enables the generation of identifiers locally and can be computed by any
user with access to the data object. For our purposes, we used MiniD in order to create a
persistent identifier for our workflow that can be referenced and downloaded in our
execution environment. The "minting" of the MiniD identifier is performed by the Deriva
catalog server during the export of the Workflow BDBag.
77
3. BDBags
BDBags, or Big Data Bags, are a specification for bundling and transferring big
scientific datasets and their associated metadata. BDBags extend the functionality of the
BagIt specification, which defines a format for creating hierarchical directory structures
and a manifest file that lists all the files and their checksums. BDBags also support data
provenance, versioning, and reproducibility [Madduri] by including comprehensive
metadata and enabling the capture of dataset changes over time.
Ultimately, the design of BDBags allows us to package our workflow to reference
data files stored in remote locations, enabling the bag to be shared without physically
moving all the data files. This feature is particularly useful for deep learning applications
in which large input datasets can be expected [Chard].
4. Globus
Globus is a research data management platform designed to facilitate secure and
efficient data transfer, sharing, and collaboration among researchers and institutions
[Foster]. It offers a web-based interface and command-line tools that allow users to easily
transfer data between various endpoints, including local storage, institutional storage
systems, and cloud-based platforms such as deriva. Globus utilizes high-performance
network infrastructure to optimize data transfer speeds and supports parallel transfers for
78
large-scale datasets. In this project we use Globus as our authentication interface with
deriva from within our hosted computing environment.
5. Colab
Finally, Google Colab, short for Google Colaboratory, is a cloud-based platform
that provides a semi-free and collaborative environment for writing, executing, and
sharing Python code [Carneiro]. Google Colab provides access to preset hardware
resources, including CPUs, GPUs, and TPUs, which allows users to execute
high-performance computing tasks from remote local machines. In this work we use
Google Colab as our execution environment in which the entire end-to-end FAIR data
workflow can ultimately be executed from a single Jupyter notebook here by simply
pressing play.
E. APPLICATION
In order to demonstrate the FAIR workflow method, our first use case is a deep
learning deployment for a network called the “Semantic Cell” network. The “Semantic
Cell” network is interesting because it is intended to be interoperable between multiple
biomedical imaging modalities and was developed using datasets cataloged over time
using FAIR methods on the deriva platform.
79
1. Semantic Cell Overview
The objective “Semantic Cell” network is to predict unseen subcellular
components between multiple imaging modalities using statistical correlations between
reference structures that are visible in one or more modalities and target structures that
are visible to only one modality or another. Training this network requires semantically
labeled data from multiple modalities as well as a means of achieving isometric
uniformity between images from each modality.
As such, the Semantic Cell process is multistage. It requires input data to be fed to
autosegmentation processes and then fed through isometric transform and filtering
processes that are unique to each modality before data can be fed as input data for
training or testing the module which finally predicts the unseen structures. Accordingly,
each of whose intermediary processes and inputs must be cataloged in order to execute
the pipeline end to end. See figure below:
2. Data Management
80
Organizing image data for use in the "Semantic Cell" network was achieved using
active data repositories and is made possible by virtue of the rich metadata captured by
the working system. Input data to the method involves imaging data from both the X ray
Tomography modality and the Fluorescence Microscopy which were labeled as such.
Each also contain rich metadata such as the specimen cell types and protocol treatments.
These data standards have been used not only for all the input data but also for the
derived autosegmentation data that has been generated historically. Accordingly, we are
able to leverage this persistent metadata to include samples from similar conditions to
minimize structural variance between the two domains and also automate the processing
of the data based on the modality it came from.
Perhaps most importantly of all, the use of a data management system allowed for
the creation and organization of these data collections by investigators other than the deep
learning model developers and spurred the development of novel multi-modal machine
learning methods based on pre-existing data collection efforts that would not have been
conceived of otherwise.
3. Workflow setup and execution
As a proof of concept we specifically focus on the workflow process for the
execution of the final structural prediction module p(zSa|xScommon).
81
This model was trained on a dataset of auto-segmented x ray tomography images
with collection ID: 1-B4AC. The model weights and its associated scripts were zipped
and uploaded to the FAIR data repository as a process file entity with ID: 1-DHTC. We
then created the workflow entity 1-DHRT, linked it to the created process, and liked it to
a preexisting fluorescence microscopy biosample 1-DEZJ containing preprocessed image
files, each with their own IDs. Next, we exported the workflow as a BDBag with MiniD
minid:gnMpiji976D5 and set that location and the workflow ID as parameters in a .ipynb
script on Google Colab. Next we sequentially executed the script in Google Colab,
stopping only to authenticate ourselves on to the deriva dev.pbcconsortium.org database
using globus. The remainder of the script manifested the BDBag, executed the model
with GPU computing resources, and uploaded the generated derived image data back to
deriva for biosample 1-DEZJ automatically in less than a minute. The reader can
reproduce this workflow here:
https://colab.research.google.com/drive/1KhQr-sFm89lo7ddxRS2ihXNUsu-RLpf_?usp=s
haring
The outfitting of the model prediction script to run within the FAIR workflow
method required only two one-line alterations to read from a standard schema in the
dynamically generated Workflow.csv file in local memory to determine the input and
output directories for the input and output data
82
This example highlights a level of function that can be reached by incorporating
FAIR data practices with deep learning pipelines as in this case all elements of deep
learning reproducibility can be cataloged, linked, and executed automatically with the
results archived in a way which is amenable to the use of deep learning as a research tool.
The fact that the model can be deployed equivalently on inputs from multiple modalities
given certain metadata in the catalog further exemplifies how the integration of deep
learning techniques with FAIR data management stands to extend the capabilities of deep
learning techniques for new applications.
F. SUMMARY
In this chapter we outlined the need for scientific reproducibility and current
challenges in reproducibility for deep learning techniques. We discussed these challenges
with respect to cataloging model parameters, reproducing computational environments,
and representing of data as a variable input for deep learning processes and resolved that
a deep learning reproducibility system must have the capacity to catalog deep learning
model parameters, computational environments, and data inputs; combine and execute
these elements for automatic validation; interoperably execute across datasets and catalog
data outputs as experimental results. We proposed the integration of deep learning
methods with FAIR data management systems and the use of continuous and ubiquitous
83
FAIRness towards this end, ultimately arriving at the design of a FAIR "Full-journey"
workflow. We described the implementation of this FAIR workflow in software as well as
its application to a novel deep learning example which ultimately delivered to some
extent on our initial aims, specifically speeding up the time to validate deep learning
models and execute on new data, while allowing for its distribution and utilization to
produce research-grade data as a computational technique.
FAIR workflows are an extension of a philosophy asserting that the more
comprehensively FAIRness is integrated in a practical context, the more its benefits will
transfer to practical use. The ability to represent all steps necessary to achieve a scientific
result using FAIR principles ultimately allows us to create a simple Jupyter script of no
more than a few lines in which a deep learning method can be validated and deployed on
existing FAIR datasets to upload FAIR results by anyone with access to the internet. This
framework sets minimum requirements in order for the same FAIR, “full-journey”
workflow to be achieved for any other computational process and inputs using all
open-source tools. Putting data management front and center and working from robust,
live data catalogs also advances the positioning of deep learning modes as practical,
verifiable tools that can be used to process research data and not simply ends in
themselves.
The realization of automatic FAIR workflows makes both the benefits of
FAIRness and the potential of deep learning systems more apparent as an end-to-end
84
system linking data, processes, and computing environments to automatically generate
results. We believe this development goes some of the way to actually speeding up the
process of attaining scientific results and delivering on the promise of FAIRness to help
establish virtuous overlapping cycles in data and method development and bring the
scientific process full-circle. It also advances the promise of deep learning methods to be
applied more readily across many different types of FAIR datasets and domains.
While the challenges of implementing continuous FAIR data standards are still
significant, our hope is that the utility and automated nature of FAIR data workflows will
provide benefits which outweigh these costs, not only for use by future generations of
researchers, but also for those researchers who implement FAIR data practices and seek
to deploy their computational processes for internal use today.
The ability to automatically upload workflow results additionally opens the door
for development of the FAIR workflow method to string together many computational
processes and represent, for instance, the full semantic cell method as an automatic
pipeline. From a usability standpoint, there is also an exciting potential to use these
deployed workflows. There are also opportunities to test the performance of deep
learning models in practice for convergence or acceptability with respect to certain tasks.
Perhaps the biggest potential of integration with FAIR data management is to help
diversify deep learning method development to address more expansive classes of
85
practical problems and confront perhaps the single most important area of investigation
that can benefit deep learning performance, which is the investigation and treatment of
the data variable to create a more robust awareness for complex tasks.
86
CHAPTER SIX: LIMITATIONS AND FUTURE WORK
This final chapter provides a summary of the findings presented in this
dissertation. The first section highlights the major contributions of this work, with special
attention to results and implications relevant to other integrative modeling and neural
network systems. This is followed by a discussion of the practical impact of this
dissertation, and strategies for how these techniques might be applied in biomedical
imaging and cell modeling research more generally. The worth of a research effort of any
magnitude can be judged both by the problems it solves and the new questions it raises.
Accordingly, this chapter concludes with an annotated list of recommended extensions
and avenues of further inquiry.
A. CONTRIBUTIONS
The semantic cell framework offers a fundamentally new approach to using
multiple imaging modalities to model cell structure. The underlying concept is simple: to
predict unseen features across modalities based on their shared features using neural
networks. Allowing the use of neural networks to perform this prediction expands on
related work and presents a framework for the use of improving neural network
architectures and methods on the multiscale and multimodal integrated modeling
problem.
87
It is uninteresting, however, to claim universality of an integration solution by
virtue of a single custom example. We go further to also define the FAIR workflow
framework in which our method and other multimodal neural network methods could be
applied on multimodal data catalogs to enable future work in integrative modeling. The
annotations recommended herein are practical and demonstrated, and are needed to
prepare and execute the semantic cell method.
Traditional integrative cell modeling attempts to support a single overriding
application purpose, to combine information from heterogeneous experimental sources in
a unified structural representation. The semantic cell method allows for the mixing of
heterogeneous label sets in neural network training, thereby allowing for the joint
prediction of many label sets from heterogeneous sources. Similarly, the semantic cell
method introduces a framework in which shared features from multimodal/multiscale
imaging can be identified and aligned using isometric transformation functions to use
these heterogeneous label prediction methods across modalities. The combination of the
two yields a new class of structural prediction techniques for integrated cell modeling in
which target structures across many modalities can be represented. The semantic cell
method defines requirements for shared features between modalities as well as general
methods for computing feature maps using neural network techniques and automated
workflows.
88
The more multimodal features available, and the closer their shared features’
isometric alignment, the higher the diversity and accuracy of features represented in the
integrated result. The closest predecessor neural network system, the Integrated Cell,
does not work with multimodal or multi scale data inputs and features. The closest
multiscale/multimodal data integration predecessors are manual and parametric. The
semantic cell is the only integrative cell modeling method so far which allows for both
multimodal/multiscale inputs and nonparametric neural network techniques.
For a large portion of the problem space (generally, any tasks in which multimodal
images are combined in modeling), the semantic cell is the only viable neural network
method available. This dissertation includes the description of a framework, and
associated implementation, for the computational composition of high-resolution
multiscale biomedical images as well as the prediction of components from unseen
spectra across imaging modalities using neural networks. It includes a method for the
combination of heterogeneous label sets in neural network training, as well as a general
framework for the integration of data sources from different resolutions and scales using
shared semantic labels. This initial implementation was intended for experimentation
with new types of learning tasks, label sets, and optimization algorithms with multiple
imaging modalities. It is therefore hoped that presentation of the FAIR workflow
computational framework and the public release of our own workflow implementation
including the processes and associated datasets of x-ray tomography and fluorescence
microscopy images for the INS-1E cell type will spur follow-on research.
89
B. APPLICATION
During the history of biology, the investigation of the relationship between
structure and function has often involved interpretation and measurement across
experiments from many scales and modalities. It is not always easy to determine how and
to what extent features at different scales are related. As this dissertation effort was
accomplished, the ability of neural network statistical modeling techniques to predict
phenomena such as human disease markers from raw genetic information and images has
increased steadily beyond what many could have foreseen – often without traditional
understanding of the chemical or physiological process involved. Some deep learning
researchers argue that understanding intermediary concepts, like the organizational
structure and mechanics of the cell are not necessary to determine functional results.
Indeed, this line of thinking has proven true in the case of state-of-the-art text generation
systems which have no formal understanding of grammar or syntax.
Even within imaging, the utility of the semantic cell method for creating
multimodal and multiscale models of physical structure will likely diminish as the
imaging technology for single modalities improves. In human imaging, for instance,
modalities such as CT are now being performed with lower radiation, higher resolution,
and faster frame rates, steadily consolidating the advantages that modalities such as
90
ultrasound and MRI had as complements. Similarly, during the course of this project,
CRISPR methods allowed for insulin to be endogenously tagged for fluorescence
microscopy imaging, raising the question of what advantage x-ray tomography will
continue to have, particularly as the resolution and hyperspectral capabilities of
modalities such as fluorescence imaging improve. Today, however, the ability of a single
modality in -omics or imaging to capture all relevant biomedical features at the cell or
human scale is still not realized, however. As various modalities improve their specificity
and advance their strengths, the semantic cell is expected to remain a useful framework
for combining and predicting target information across multiple modalities, particularly
in niche research investigations at the cutting edge of imaging capabilities.
For integrating and inferring multimodal target information, the primary hurdle
remains the organization and annotation of heterogeneous biomedical training data with
aligned semantic features. Chapters IV and V explained how the semantic cell method
could be executed directly from a structured database including such heterogeneous data.
However, the creation of shared databases and experimental protocols in academic
research and even the medical industry remains a moving target. Multi-scale, multimodal.
and multi-institutional imaging datasets in the biomedical domain are not commonplace
and methods for standardized annotation or automatic methods for such annotation are
not always reliable. More work to create large databases of aligned multimodal datasets
will be very helpful towards this end.
91
It is also worth noting that for most biomedical imaging experiments, a sample can
be subjected to measurement by many different modalities in sequence without damage,
making the inference of multimodal information unnecessary. The problems of
photobleaching and tomographic destruction in cell imaging, therefore, make it a
somewhat special case in this respect. In domains where multimodal imaging can, in fact,
be performed on the same sample, inference could still be preferable if it allows
experimentalists to forgo the cost of multi-modal imaging by inferring certain results.
However, this would only make sense if the cost of multimodal acquisition outweighed
the uncertainty and noise of the prediction, which in our application is nowhere close to
100% accurate. Attempting to improve accuracy via improved network architectures or
by targeting more highly correlated structural features is warranted to demonstrate more
general utility of the technique outside the domain of cell imaging.
Towards this end, improving the accuracy of the semantic cell method for
structural prediction would be possible using improved network architectures and
optimization functions. However, this dissertation does not recommend the general use of
the semantic cell method to attempt to predict the localization of insulin vesicle structures
in fluorescence microscopy images of INS-1E cells. Rather, the next generation of
INS-1E cell modeling using the semantic cell method or otherwise should seek to
establish further shared features across modalities and enforce their isometric
correspondence. Then, by identifying those modality-specific features that are both
medically significant and highly correlated to features which are present across
92
modalities, the true value of multimodal experimentation can be realized. Modeling and
validating these relationships through the semantic cell method, single modality
experiments could be increasingly bolstered by databases of integrated multimodal
studies to reveal useful correlations and produce accurate predictions for features that
were not directly observed.
C. FUTURE WORK
As with most dissertation efforts, the original expectations for this project were
higher than was realistic for timely completion. Also, issues arose during this research
that were out of the project scope but merited further exploration This section lists both
suggestions and plans for future efforts in this area.
1. Improvements in accuracy
a. Optimization functions
The semantic cell method describes a general optimization function as “- arg min
∑ log(z[i, SA]
, z[i, SB]
| x[i, SA]
, x[i, SB]
, )”. There are questions, however, about how features
should be weighted. For instance, if we are most interested in the unseen target structures,
it could make more sense to evaluate loss only with respect to the decoded
modality-specific targets and not for the shared structures as well. Similarly, when
93
deciding the “best model”, the metric could be with respect only to the DICE score of the
target structures as opposed to all cell structures. In the reverse approach, we could
disregard the DICE score entirely (and even pixel-wise cross entropy in the loss function)
and simply evaluate terms such as the KL-divergence or other structural similarity
metrics between the generated images and the sample data based on overall likeness.
In this work, our table results may have benefitted from these changes to the loss
function and validation metrics. We kept with the most common neural network loss
function as a proof of concept for the framework in general.
b. Improved segmentation
As evidenced by our results in section 4.C.3, the quality of the auto segmentation
had an effect in significantly undermining the isometric relationship between the
fluorescence microscopy and the xray tomography. Gaps and distortions in the xray
tomography segmentation maps made it so that its distribution of shared structures were
skewed relative to those of the fluorescence microscopy and accordingly, it was more
difficult to transfer the learned mitochondrial correlation from fluorescence to
tomography for autosegmented data than it was for manually segmented data.
The advantage of autosegmentation is still that it allows for the generation of more
training data. This advantage was shown by the 2x higher accuracy of granule prediction
94
in the larger autosegmented dataset over granule prediction in the manually segmented
dataset. However, this advantage is moot in the multimodal context if autosegmented
feature maps do not closely resemble each other due to artifacts. Ideally, we would have
an autosegmentation method that averaged closer to 95% DICE (as opposed to 75%
DICE in our example) so that high-quality segmentation maps could be generated for
each modality and structural prediction results could benefit from a larger and
better-conditioned dataset.
c. Larger sets of shared features
Perhaps most importantly, more shared features would certainly improve the
accuracy of the method. We saw this empirically in the experiments in which including
mitochondria ground truth improved the insulin prediction in the xray tomography
context by nearly 5x. In some respects, the experiments we performed to predict
structures from just the membrane and nucleus were absurd. Including more structures
such as the microtubules or golgi would allow for the learning of more complex
dependencies and uncovering of the truly deterministic hierarchical relationships between
the parts of the cell structure which is the premise of the semantic cell method’s viability
and the viability of integrative cell modeling in general.
2. Other feature candidates for alignment and prediction
95
a. Treatment conditions
One advantage of our database was that we are able to track the treatment
conditions of cells within our testing set. One phenomenon we found was that the
prediction of insulin based on membrane, nucleus, and mitochondria within xray
tomography had high variability with respect to treatment condition as shown in the
figure below:
Figure 19. Variational structural prediction results based on treatment condition
Specifically, 5 minutes after glucose stimulation, the localization of insulin was
10x more predictable than it was 30 minutes after stimulation. This result suggests, for
one, that isometric alignment and optimization could be defined with variational
autoencoding based on treatment condition to deliver higher results. In a larger sense,
these results also suggest a higher statistical correlation between parts of the cell during
certain phases of the cell cycle, which could merit further investigation.
96
b. Timepoints
Additionally, one original goal which proved out of scope for this project was to
combine modalities with time-series imaging (such as fluorescence microscopy) with
modalities that do not (such as xray tomography) to generate time-series data for
snapshot imaging modalities. Scientifically, this effort had limited merit due to the
difficulty of validating future timepoints for cells which had been plunge-frozen. The
implementation however would be as simple as training the prediction of a semantic
feature map at one timepoint from the timepoint before it using a modality such as
fluorescence microscopy. Time series prediction could then be performed in any other
modality by applying the trained network to an isometrically aligned semantic feature
map. Perhaps someone in the future may be able to better convince their advisor to
pursue this.
c. Raw images
Lastly, there was an unexplored potential of translating not just from the raw
images to semantic feature maps, but from semantic feature maps back to raw images.
This concept would position the isometrically aligned semantic feature map as a sort of
rosetta stone from which an image from any modality could be translated to appear as an
image of any other modality. In this manner, the video described above could be not just a
hallucinated video of an xray tomography semantic map, but a hallucinated video
impossibly resembling the xray tomography itself. There are a number of ways this
97
method could be validated, for instance by trying to segment new structures out of the
hallucinated raw image and comparing them to ground truth. Such a decoding scheme
would bring the process full circle and could potentially shed light on the limitations of
semantic feature maps as a representation of the raw image data, if not cells themselves.
3. Practical auto-segmentation
Of all the methods described in this dissertation and its recommended extensions,
the area with the most to add to the practice of biomedical imaging analysis is automatic
segmentation. In the beginning of this work, the task of auto-segmentation was
considered an essentially solved problem in computer science that was not worthy of a
computer science dissertation topic. It is true that autosegmentation is more of a data
collection and systems engineering problem than a scientific research question and is a
solved problem for common structures in common domains such as CT. However, the
process is critical for deriving metrics for phenotypical study which will invariably
require hundreds of hours of manual segmentation to create sufficiently high sample sizes
if an autosegmentation method does not exist. For many unlabeled research imaging
modalities studying unusual structures, this will certainly be the case. Even in medical
practice, niche domains such as pediatric cardiology often have in-house segmentation
specialists and many hospitals do not adopt even the most basic autosegmentation
algorithms due to a lack of expertise or ability, even though the value of segmented
images for quantitative and qualitative measurements is clear. It is worthwhile to explore
98
the practical implementation of autosegmentation as a research question because it is of
extremely high value with respect to saving hours of skilled labor and enables the
progress of downstream scientific research in general.
D. SUMMARY
This chapter highlights the major contributions of this work: a method for
predicting heterogeneous label sets with neural networks, a framework for applying this
to the multimodal cell modeling problem, and an implementation of neural network
pipelines for multimodal biomedical imaging data using FAIR databases and workflows.
Significant opportunities for future work remain—both for the practical application of
this method, and for the extension of its detail and scope.
99
REFERENCES
Abadi, M., Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, …
Xiaoqiang Zheng. (2015). TensorFlow: Large-Scale Machine Learning on Heterogeneous
Systems. Retrieved from https://www.tensorflow.org/
Alberti, M., Pondenkandath, V., Vogtlin, L., Wursch, M., Ingold, R., & Liwicki, M.
(2019). Improving Reproducible Deep Learning Workflows with DeepDIVA. In 2019 6th
Swiss Conference on Data Science (SDS). 2019 6th Swiss Conference on Data Science
(SDS). IEEE. https://doi.org/10.1109/sds.2019.00-14
Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. In Nature (Vol. 533, Issue
7604, pp. 452–454). Springer Science and Business Media LLC.
https://doi.org/10.1038/533452a
Bank, P. D. (1971). Protein data bank. Nature New Biol, 233, 223.
Bugacov, A., Czajkowski, K., Kesselman, C., Kumar, A., Schuler, R. E., &
Tangmunarunkit, H. (2017). Experiences with DERIVA: An Asset Management Platform
for Accelerating eScience. In 2017 IEEE 13th International Conference on e-Science
100
(e-Science). 2017 IEEE 13th International Conference on e-Science (e-Science). IEEE.
https://doi.org/10.1109/escience.2017.20
Cao, K. et al. 2022. uniPort: a unified computational framework for single-cell data
integration with optimal transport. Cold Spring Harbor Laboratory.
Cardoso, M. J., Li, W., Brown, R., Ma, N., Kerfoot, E., Wang, Y., Murrey, B.,
Myronenko, A., Zhao, C., Yang, D., Nath, V., He, Y., Xu, Z., Hatamizadeh, A.,
Myronenko, A., Zhu, W., Liu, Y., Zheng, M., Tang, Y., … Feng, A. (2022). MONAI: An
open-source framework for deep learning in healthcare (Version 1). arXiv.
https://doi.org/10.48550/ARXIV.2211.02701
Carneiro, T., Medeiros Da Nobrega, R. V., Nepomuceno, T., Bian, G.-B., De
Albuquerque, V. H. C., & Filho, P. P. R. (2018). Performance Analysis of Google
Colaboratory as a Tool for Accelerating Deep Learning Applications. In IEEE Access
(Vol. 6, pp. 61677–61685). Institute of Electrical and Electronics Engineers (IEEE).
https://doi.org/10.1109/access.2018.2874767
Chandrakar, R. (2006). Digital object identifier system: an overview. The Electronic
Library, 24(4), 445-452.
101
Chard, K., D’Arcy, M., Heavner, B., Foster, I., Kesselman, C., Madduri, R., Rodriguez,
A., Soiland-Reyes, S., Goble, C., Clark, K., Deutsch, E. W., Dinov, I., Price, N., & Toga,
A. (2016). I’ll take that to go: Big data bags and minimal identifiers for exchange of
large, complex datasets. In 2016 IEEE International Conference on Big Data (Big Data).
2016 IEEE International Conference on Big Data (Big Data). IEEE.
https://doi.org/10.1109/bigdata.2016.7840618
Chessel, A. and Carazo Salas, R.E. 2019. From observing to predicting single-cell
structure and function with high-throughput/high-content microscopy. Essays in
Biochemistry. Portland Press Ltd.
Chen, B. et al. 2022. Multi-modal fusion of satellite and street-view images for urban
village classification based on a dual-branch deep neural network. International Journal
of Applied Earth Observation and Geoinformation. Elsevier BV.
Chen, C. et al. 2018. Semantic-Aware Generative Adversarial Nets for Unsupervised
Domain Adaptation in Chest X-Ray Segmentation. Machine Learning in Medical
Imaging. Springer International Publishing.
Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., & Ronneberger, O. (2016). 3D
U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. In Medical
102
Image Computing and Computer-Assisted Intervention – MICCAI 2016 (pp. 424–432).
Springer International Publishing. https://doi.org/10.1007/978-3-319-46723-8_49
Clark, K., Vendt, B., Smith, K. et al. The Cancer Imaging Archive (TCIA): Maintaining
and Operating a Public Information Repository. J Digit Imaging 26, 1045–1057 (2013).
https://doi.org/10.1007/s10278-013-9622-7
Coakley, K., Kirkpatrick, C. R., & Gundersen, O. E. (2022). Examining the Effect of
Implementation Factors on Deep Learning Reproducibility. In 2022 IEEE 18th
International Conference on e-Science (e-Science). 2022 IEEE 18th International
Conference on e-Science (e-Science). IEEE.
https://doi.org/10.1109/escience55777.2022.00056
Cour, T., Sapp, B., & Taskar, B. (2011). Learning from Partial Labels. Journal of Machine
Learning Research, 12(42), 1501–1536. Retrieved from
http://jmlr.org/papers/v12/cour11a.html
Deng, L. (2012). The mnist database of handwritten digit images for machine learning
research. IEEE Signal Processing Magazine, 29(6), 141–142.
Dice, L. R. (1945). Measures of the Amount of Ecologic Association Between Species. In
Ecology (Vol. 26, Issue 3, pp. 297–302). Wiley. https://doi.org/10.2307/1932409
103
Ding, G. et al. 2019. A joint deep learning model to recover information and reduce
artifacts in missing-wedge sinograms for electron tomography and beyond. Scientific
Reports. Springer Science and Business Media LLC.
Donovan-Maiye, R.M. et al. 2022. A deep generative model of 3D single-cell
organization. PLOS Computational Biology. Public Library of Science (PLoS).
Foster, I., & Kesselman, C. (2022). CUF-Links: Continuous and Ubiquitous FAIRness
Linkages for Reproducible Research. In Computer (Vol. 55, Issue 8, pp. 20–30). Institute
of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/mc.2022.3160876
Foster, I. Globus Toolkit Version 4: Software for Service-Oriented Systems. J Comput Sci
Technol 21, 513–520 (2006). https://doi.org/10.1007/s11390-006-0513-y
J. P. Francis, H. Wang, K. White, T. Syeda-Mahmood and R. Stevens, "Neural Network
Segmentation of Cell Ultrastructure Using Incomplete Annotation," 2020 IEEE 17th
International Symposium on Biomedical Imaging (ISBI), 2020, pp. 1183-1187, doi:
10.1109/ISBI45749.2020.9098739.
Goldsborough, P. et al. 2017. CytoGAN: Generative Modeling of Cell Images. Cold
Spring Harbor Laboratory.
104
Gomez, C. et al. 2010. Use of high-resolution satellite imagery in an integrated model to
predict the distribution of shade coffee tree hybrid zones. Remote Sensing of
Environment. Elsevier BV.
Goodsell, David S. "The machinery of life." (2009): 371-402.
Hoffman, D. P., Shtengel, G., Xu, C. S., Campbell, K. R., Freeman, M., Wang, L., Milkie,
D. E., Pasolli, H. A., Iyer, N., Bogovic, J. A., Stabley, D. R., Shirinifard, A., Pang, S.,
Peale, D., Schaefer, K., Pomp, W., Chang, C.-L., Lippincott-Schwartz, J., Kirchhausen,
T., … Hess, H. F. (2020). Correlative three-dimensional super-resolution and block-face
electron microscopy of whole vitreously frozen cells. In Science (Vol. 367, Issue 6475).
American Association for the Advancement of Science (AAAS).
https://doi.org/10.1126/science.aaz5357
Huston, P., Edge, V., & Bernier, E. (2019). Reaping the benefits of Open Data in public
health. In Canada Communicable Disease Report (Vol. 45, Issue 10, pp. 252–256).
Infectious Disease and Control Branch (IDPCB) - Public Health Agency of Canada.
https://doi.org/10.14745/ccdr.v45i10a01
Isensee, F. et al. 2018. No New-Net. arXiv.
105
Jansen, C., Schilling, B., Strohmenger, K., Witt, M., Annuscheit, J., & Krefting, D.
(2019). Reproducibility and Performance of Deep Learning Applications for Cancer
Detection in Pathological Images. In 2019 19th IEEE/ACM International Symposium on
Cluster, Cloud and Grid Computing (CCGRID). 2019 19th IEEE/ACM International
Symposium on Cluster, Cloud and Grid Computing (CCGRID). IEEE.
https://doi.org/10.1109/ccgrid.2019.00080
Jean-Paul, S., Elseify, T., Obeid, I., & Picone, J. (2019). Issues in the Reproducibility of
Deep Learning Results. In 2019 IEEE Signal Processing in Medicine and Biology
Symposium (SPMB). 2019 IEEE Signal Processing in Medicine and Biology Symposium
(SPMB). IEEE. https://doi.org/10.1109/spmb47826.2019.9037840
Jenkinson, J., & McGill, G. (2012). Visualizing Protein Interactions and Dynamics:
Evolving a Visual Language for Molecular Animation. In D. W. C. Liu (Ed.), CBE—Life
Sciences Education (Vol. 11, Issue 1, pp. 103–110). American Society for Cell Biology
(ASCB). https://doi.org/10.1187/cbe.11-08-0071
Jonathanvanian. (2023, April). Chatgpt and generative AI are booming, but the costs can
be extraordinary. CNBC. Retrieved from
https://www.cnbc.com/2023/03/13/chatgpt-and-generative-ai-are-booming-but-at-a-veryexpensive-price.html
106
Kamran, S.A. et al. 2021. RV-GAN: Segmenting Retinal Vascular Structure in Fundus
Photographs Using a Novel Multi-scale Generative Adversarial Network. Medical Image
Computing and Computer Assisted Intervention – MICCAI 2021. Springer International
Publishing.
Kozlov, M. (2022). NIH issues a seismic mandate: share data publicly. Nature,
602(7898), 558-559.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep
Convolutional Neural Networks. In F. Pereira, C. J. Burges, L. Bottou, & K. Q.
Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25).
Retrieved from
https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924
a68c45b-Paper.pdf
Li, W. et al. 2019. Semantic Segmentation-Based Building Footprint Extraction Using
Very High-Resolution Satellite Images and Multi-Source GIS Data. Remote Sensing.
MDPI AG.
Liang, M. et al. 2020. Multi-Task Multi-Sensor Fusion for 3D Object Detection. arXiv.
107
Liao, L. et al. 2020. Guidance and Evaluation: Semantic-Aware Image Inpainting for
Mixed Scenes. Computer Vision – ECCV 2020. Springer International Publishing.
Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, and Hongsheng Li. 2019. Learning to
predict layout-to-image conditional convolutions for semantic image synthesis.
Proceedings of the 33rd International Conference on Neural Information Processing
Systems. Curran Associates Inc., Red Hook, NY, USA, Article 52, 570–580.
Luc, P. et al. 2016. Semantic Segmentation using Adversarial Networks. arXiv. (2016).
DOI:https://doi.org/10.48550/ARXIV.1611.08408.
Madduri, R., Chard, K., D’ Arcy, M., Jung, S. C., Rodriguez, A., Sulakhe, D., Deutsch, E.
W., Funk, C., Heavner, B., Richards, M., Shannon, P., Glusman, G., Price, N., Kesselman,
C., & Foster, I. (2018). Reproducible big data science: A case study in continuous
FAIRness. Cold Spring Harbor Laboratory. https://doi.org/10.1101/268755
Merkel, D. (2014). Docker: lightweight linux containers for consistent development and
deployment. Linux Journal, 2014(239), 2.
Murphy, R.F. 2012. CellOrganizer: Image-Derived Models of Subcellular Organization
and Protein Distribution. Methods in Cell Biology. Elsevier.
108
Ouyang, W., Beuttenmueller, F., Gómez-de-Mariscal, E., Pape, C., Burke, T.,
Garcia-López-de-Haro, C., Russell, C., Moya-Sans, L., de-la-Torre-Gutiérrez, C.,
Schmidt, D., Kutra, D., Novikov, M., Weigert, M., Schmidt, U., Bankhead, P., Jacquemet,
G., Sage, D., Henriques, R., Muñoz-Barrutia, A., … Kreshuk, A. (2022). BioImage
Model Zoo: A Community-Driven Resource for Accessible Deep Learning in BioImage
Analysis. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2022.06.07.495102
Mons, B., Neylon, C., Velterop, J., Dumontier, M., da Silva Santos, L. O. B., &
Wilkinson, M. D. (2017). Cloudy, increasingly FAIR; revisiting the FAIR Data guiding
principles for the European Open Science Cloud. In Information Services & Use
(Vol. 37, Issue 1, pp. 49–56). IOS Press. https://doi.org/10.3233/isu-170824
Mustra, M., Delac, K., & Grgic, M. (2008, September). Overview of the DICOM
standard. In 2008 50th International Symposium ELMAR (Vol. 1, pp. 39-44). IEEE.
Machine Learning Datasets | Papers With Code. Retrieved from
https://paperswithcode.com/datasets
Ramesh, A. et al. 2021. Zero-Shot Text-to-Image Generation. arXiv.
109
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2015). You Only Look Once:
Unified, Real-Time Object Detection (Version 5). arXiv.
https://doi.org/10.48550/ARXIV.1506.02640
Ronneberger, O. et al. 2015. U-Net: Convolutional Networks for Biomedical Image
Segmentation. Lecture Notes in Computer Science. Springer International Publishing.
Sansone, S.-A., Rocca-Serra, P., Field, D., Maguire, E., Taylor, C., Hofmann, O., Fang,
H., Neumann, S., Tong, W., Amaral-Zettler, L., Begley, K., Booth, T., Bougueleret, L.,
Burns, G., Chapman, B., Clark, T., Coleman, L.-A., Copeland, J., Das, S., … Hide, W.
(2012). Toward interoperable bioscience data. In Nature Genetics (Vol. 44, Issue 2, pp.
121–126). Springer Science and Business Media LLC. https://doi.org/10.1038/ng.1054
Shamsolmoali, P. et al. 2021. Image synthesis with adversarial networks: A
comprehensive survey and case studies. Information Fusion. Elsevier BV.
Silva, R. et al. 2019. Playing Games in the Dark: An approach for cross-modality transfer
in reinforcement learning. arXiv.
Sushko, V. et al. 2020. You Only Need Adversarial Supervision for Semantic Image
Synthesis. arXiv.
110
Tang, X. et al. 2022. Multi-task learning for single-cell multi-modality biology. Cold
Spring Harbor Laboratory.
TensorFlow. Retrieved from https://www.tensorflow.org/install/source#gpu
P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. W. Schuller and S. Zafeiriou, "End-to-End
Multimodal Emotion Recognition Using Deep Neural Networks," in IEEE Journal of
Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1301-1309, Dec. 2017, doi:
10.1109/JSTSP.2017.2764438.
C. Uhlerand and G. V. Shivashankar, "Machine Learning Approaches to Single-Cell Data
Integration and Translation," in Proceedings of the IEEE, vol. 110, no. 5, pp. 557-576,
May 2022, doi: 10.1109/JPROC.2022.3166132.
Vasan, R. et al. 2020. Applications and Challenges of Machine Learning to Enable
Realistic Cellular Simulations. Frontiers in Physics. Frontiers Media SA.
VICE. Retrieved from
https://www.vice.com/en/article/xgwqgw/facebooks-powerful-large-language-model-leak
s-online-4chan-llama
111
Wang, Z. et al. 2021. Global voxel transformer networks for augmented microscopy.
Nature Machine Intelligence. Springer Science and Business Media LLC.
Wang, X. et al. 2021. Contrastive Cycle Adversarial Autoencoders for Single-cell
Multi-omics Alignment and Integration. arXiv.
Y. Wang, Z. Sun and W. Zhao, "Encoder- and Decoder-Based Networks Using Multiscale
Feature Fusion and Nonlocal Block for Remote Sensing Image Semantic Segmentation,"
in IEEE Geoscience and Remote Sensing Letters, vol. 18, no. 7, pp. 1159-1163, July
2021, doi: 10.1109/LGRS.2020.2998680.
Wei, X. et al. 2022. scPreGAN, a deep generative model for predicting the response of
single-cell expression to perturbation. Bioinformatics. Oxford University Press (OUP).
White, K. L., Singla, J., Loconte, V., Chen, J.-H., Ekman, A., Sun, L., Zhang, X., Francis,
J. P., Li, A., Lin, W., Tseng, K., McDermott, G., Alber, F., Sali, A., Larabell, C., &
Stevens, R. C. (2020). Visualizing subcellular rearrangements in intact β cells using soft
x-ray tomography. In Science Advances (Vol. 6, Issue 50). American Association for the
Advancement of Science (AAAS). https://doi.org/10.1126/sciadv.abc8262
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A.,
Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes,
112
A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R.,
… Mons, B. (2016). The FAIR Guiding Principles for scientific data management and
stewardship. In Scientific Data (Vol. 3, Issue 1). Springer Science and Business Media
LLC. https://doi.org/10.1038/sdata.2016.18
Wilson, G., & Cook, D. J. (2020). A Survey of Unsupervised Deep Domain Adaptation.
In ACM Transactions on Intelligent Systems and Technology (Vol. 11, Issue 5, pp. 1–46).
Association for Computing Machinery (ACM). https://doi.org/10.1145/3400066
Yang, K.D. et al. 2021. Multi-domain translation between single-cell imaging and
sequencing data using autoencoders. Nature Communications. Springer Science and
Business Media LLC.
Yuan, H. et al. 2018. Computational modeling of cellular structures using conditional
deep generative networks. Bioinformatics. Oxford University Press (OUP).
Armand Zampieri, Guillaume Charpiat, Nicolas Girard, Yuliya Tarabalka; Proceedings of
the European Conference on Computer Vision (ECCV), 2018, pp. 657-673
Zhang, Y. et al. 2021. Deep multimodal fusion for semantic image segmentation: A
survey. Image and Vision Computing. Elsevier BV.
113
114
Abstract (if available)
Abstract
The structural modeling of cells can be accomplished by integrating images of cellular morphology from multiple scales and modalities using a parts based approach. In this thesis, we demonstrate a method for combining the statistical distribution of structures from x-ray tomography and fluorescence microscopy using neural networks to predict the localization of high resolution components in low resolution modalities by using the single cell as a shared unit of transfer.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Machine learning and image processing of fluorescence lifetime imaging microscopy enables tracking and analysis of subcellular metabolism
PDF
Understanding and generating multimodal feedback in human-machine story-telling
PDF
Investigating hematopoietic stem cells via multiscale modeling, single-cell genomics, and gene regulatory network inference
PDF
High-throughput methods for simulation and deep reinforcement learning
PDF
Mechanistic investigation of pro-angiogenic signaling and endothelial sprouting mediated by FGF and VEGF
PDF
Efficient deep learning for inverse problems in scientific and medical imaging
PDF
Multiplexing live 5d imaging with multispectral fluorescence: Advanced unmixing through simulation and machine learning
PDF
Genome-scale modeling of macrophage activity in the colorectal cancer microenvironment
PDF
Analog and mixed-signal parameter synthesis using machine learning and time-based circuit architectures
PDF
Multiscale spike-field network causality identification
PDF
Differential verification of deep neural networks
PDF
Extending multiplexing capabilities with lifetime and hyperspectral fluorescence imaging
PDF
On information captured by neural networks: connections with memorization and generalization
PDF
Fast and label-efficient graph representation learning
PDF
Switching dynamical systems with Poisson and multiscale observations with applications to neural population activity
PDF
Computational tools for large-scale analysis of brain function and structure
PDF
Hyperspectral phasor for multiplexed fluorescence microscopy and autofluorescence-based pathologic diagnosis
PDF
Controlling information in neural networks for fairness and privacy
PDF
High-resolution data acquisition with neural and dermal interfaces
PDF
Volume rendering of human hand anatomy
Asset Metadata
Creator
Francis, John
(author)
Core Title
Neural network integration of multiscale and multimodal cell imaging using semantic parts
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Degree Conferral Date
2023-12
Publication Date
12/12/2023
Defense Date
10/25/2023
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biomedical,cell imaging,Integration,Modeling,multimodal,multiscale,neural networks,OAI-PMH Harvest
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Zyda, Mike (
committee chair
), Barbic, Jernej (
committee member
), Fraser, Scott (
committee member
), Kesselman, Carl (
committee member
), White, Kate (
committee member
)
Creator Email
johnfrancisusc@gmail.com,jpfranci@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113788103
Unique identifier
UC113788103
Identifier
etd-FrancisJoh-12546.pdf (filename)
Legacy Identifier
etd-FrancisJoh-12546
Document Type
Thesis
Format
theses (aat)
Rights
Francis, John
Internet Media Type
application/pdf
Type
texts
Source
20231213-usctheses-batch-1114
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
biomedical
cell imaging
multimodal
multiscale
neural networks