Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Transparent and lightweight medical image analysis techniques: algorithms and applications
(USC Thesis Other)
Transparent and lightweight medical image analysis techniques: algorithms and applications
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Transparent and Lightweight Medical Image Analysis Techniques:
Algorithms and Applications
by
Vasileios Magoulianitis
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
December 2024
Copyright 2024 Vasileios Magoulianitis
Acknowledgments
At this point, I feel glad to express my sincere gratitude to my advisor, Professor C.-C. Jay
Kuo, who trusted and supported me throughout this work. His technical background and
in depth knowledge of the subject, provided valuable feedback to this thesis. Also, I would
like to thank him for the discussions we had and his advice that helped me to grow both as
a researcher and person. I would like also to thank my lab mates for helping me throughout
the PhD journey with their ideas and insights. I am also grateful to my closest friends for
helping and supporting me all those years, standing by me in the hardships of the PhD. Last
but not least, I would like to thank the members of my family, for everything they offered
me in all aspects of my life and their support during my academic experience.
ii
Table of Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Chapter 1:
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Significance of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Nuclei Segmentation in Histology Images . . . . . . . . . . . . . . . . 4
1.1.2 Predicting Prostate Cancer From MRI Images . . . . . . . . . . . . . 6
1.2 Prior And Current Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Nuclei Segmentation in Histology Images . . . . . . . . . . . . . . . . 8
1.2.2 Predicting Prostate Cancer From MRI Images . . . . . . . . . . . . . 10
1.3 Contributions of the Research . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3.1 Nuclei Segmentation in Histology Images . . . . . . . . . . . . . . . . 13
1.3.2 Predicting Prostate Cancer From MRI Images . . . . . . . . . . . . . 15
1.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 2:
Background Review in Nuclei Segmentation . . . . . . . . . . . . . . . . . . 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Staining in Digital Pathology . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Staining Procedures And Color Variations . . . . . . . . . . . . . . . 21
2.2.2 WSI Scanners & Image Digitization . . . . . . . . . . . . . . . . . . 22
2.2.3 WSI Variability Implications And Challenges . . . . . . . . . . . . . . 23
2.3 Methods for Nuclei Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Unsupervised . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Supervised . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4 Evaluation and Performance Benchmarking . . . . . . . . . . . . . . . . . . . 82
2.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.4.3 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.4.4 Discussion And Conclusions . . . . . . . . . . . . . . . . . . . . . . . 102
iii
2.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Chapter 3:
Unsupervised Nuclei Segmentation using Thresholding Operations . . . . 109
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.2 Proposed CBM Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.2.1 Data-Driven Color Transform . . . . . . . . . . . . . . . . . . . . . . 112
3.2.2 Data-Driven Binarization . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.2.3 Morphological Processing . . . . . . . . . . . . . . . . . . . . . . . . 115
3.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.4 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Chapter 4:
Unsupervised Nuclei Segmentation using Adaptive Thresholding and Self
Supervision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.2 Review of Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.3 Proposed HUNIS Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.3.1 First-Stage Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.2 Second-Stage Processing . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Chapter 5:
LG-NuSegHop: A Local-to-Global Self-Supervised Pipeline For Nuclei
Instance Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.2.1 Traditional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.2.2 Learning-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.2.3 Green Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 Materials And Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.3.2 Local Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.3.3 NuSegHop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.3.4 Global Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.4.3 Results & Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
iv
Chapter 6:
Clinically Significant Prostate Cancer Detection . . . . . . . . . . . . . . . . 165
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.2.1 PCa Detection & Lesion Segmentation . . . . . . . . . . . . . . . . . 170
6.2.2 Lesion Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.2.3 Successive subspace learning methodology . . . . . . . . . . . . . . . 173
6.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.3.1 Anomaly Detections - Stage-1 . . . . . . . . . . . . . . . . . . . . . . 175
6.3.2 Subspace Approximation in Green Learning . . . . . . . . . . . . . . 176
6.3.3 Anomaly Map Calculation . . . . . . . . . . . . . . . . . . . . . . . . 183
6.3.4 Anomaly Candidates Classification - Stage-2 . . . . . . . . . . . . . . 184
6.3.5 Interface Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.4.1 Database and pre-processing . . . . . . . . . . . . . . . . . . . . . . . 187
6.4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.5.1 Results on PI-CAI Dataset . . . . . . . . . . . . . . . . . . . . . . . . 189
6.5.2 Benchmarking & Ablation Study . . . . . . . . . . . . . . . . . . . . 191
6.5.3 Model Size and Complexity Comparison . . . . . . . . . . . . . . . . 192
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Chapter 7:
Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.1 Summary Of The Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.2.1 Nuclei Segmentation in Histological Images . . . . . . . . . . . . . . . 198
7.2.2 Prostate Cancer Detection from MRI . . . . . . . . . . . . . . . . . . 200
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
v
List of Tables
2.1 A summary of traditional methods for nuclei segmentation. . . . . . . . . . . 29
2.2 A summary of Self-Supervised nuclei segmentation methods. . . . . . . . . . 46
2.3 A summary of fully supervised nuclei segmentation methods. . . . . . . . . . 59
2.4 A summary of weakly supervised nuclei segmentation methods. . . . . . . . . 76
2.5 A summary of publicly available nuclei segmentation datasets. . . . . . . . . 85
2.6 A summary of the current evaluation metrics. . . . . . . . . . . . . . . . . . 87
2.7 Performance Comparison on the MoNuSeg dataset with the AJI metric. . . . 92
2.8 Weak supervision comparison between full supervision and point-wise at different training point ratios (results from [24]). . . . . . . . . . . . . . . . . . 97
2.9 Weak supervision comparison between full supervision and point-wise at different training point ratios (results from [24]). . . . . . . . . . . . . . . . . . 97
3.1 Energy distribution of three channels of three color spaces in a representative
histology image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.2 Comparative results among different unsupervised and supervised methods
using the AJI metric on the MoNuSeg [101] testing data, where the best performance is shown in boldface. CBM outperforms all the other unsupervised
approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.3 Comparative results of different methods using the AJI metric on the Test-2
set (unseen organ). CBM shows a competitive standing among other DLbased supervised approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.4 AJI performance comparison between P and L channels. . . . . . . . . . . . 119
vi
4.1 Quantitative results and performance comparison on Test-1 using AJI metric. 130
4.2 Quantitative results and performance comparison on Test-2 using the AJI
metric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
4.3 Segmentation improvement over different stages of our pipeline in Test-2 set 131
5.1 Architecture of the proposed NuSegHop. . . . . . . . . . . . . . . . . . . . . 148
5.2 Summary of hyperparameters configuration in LG-NuSegHop, finetuned on a
small subset of training images from MoNuSeg. . . . . . . . . . . . . . . . . 153
5.3 Performance benchmarking with self, weakly and fully supervised methods in
the MoNuSeg dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.4 Performance benchmarking with weakly and fully supervised methods in the
CryoNuSeg dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.5 Performance benchmarking with weakly and fully supervised methods in the
CoNSeP dataset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.6 Ablation study on the MoNuSeg dataset with combinations of preprocessing
and local processing modules. . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.7 Ablation study on the MoNuSeg dataset with different global processing modules. NuSegHop data-driven features are also compared with a set of handcrafted features for nuclei segmentation. All the pre-processing and local
processing operations are kept to their best configuration. . . . . . . . . . . . 160
6.1 Architecture of the Proposed RADHop Unit . . . . . . . . . . . . . . . . . . 183
6.2 Radiomics features categories . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.3 Summary of hyperparameters configuration in PCa-RadHop . . . . . . . . . 190
6.4 Performance benchmarking with selected DL-based models based on 1,000
testing patients from PICAI challenge. . . . . . . . . . . . . . . . . . . . . . 191
6.5 Results from PCa-RadHop on a local testing set (300 patients) from PICAI
and performance benchmarking with two traditional pipelines that use handcrafted radiomics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.6 Model size and complexity comparison. . . . . . . . . . . . . . . . . . . . . . 195
vii
List of Figures
1.1 Visual cues, such as nuclei topology arrangement, their size and count play
an important role in digital pathology for measuring cancer cellularity which
in turn leads to cancer detection, staging and assessment (figure from [9]). . 5
1.2 Same tissue digitized by the Aperio XT (a) and Hamamatsu (b) scanners.
One can realize the color variations because of the different devices (figure
from [11]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Typical pipeline of analyzing MRI prostate images from human’s lower abdomen. Prostate segmentation is performed initially to separate prostate from
other organs. Then lesion segmentation module tries to detect areas suspicious for harboring cancer and assign a probability to each of them for being
clinically significant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Nuclei topology in forming glands is important for tumor grading. Nuclei
segmentation task aims at detecting and highlighting nuclei over other areas,
thus giving rise to visual patterns that are important for pathologists, such as
assessing the cancer gleason grade in prostate tissue (GG) (a) or the cellularity
percent (b) (figures from [44, 45]). . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Top row: different slice thickness results in very different nuclei color and
appearance. Bottom row: Under-staining (left) and over staining (right) can
change nuclei and background color [10, 49] . . . . . . . . . . . . . . . . . . 22
2.3 Same tissue sample digitized under two different scanners. On the left, wholeslide image is acquired using Aperio XT, while the right one using Hamamatsu
scanner. The large color and texture variation across different devices is a
challenge for nuclei segmentation [11]. . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Outline of the existing nuclei segmentation methods. . . . . . . . . . . . . . 24
2.5 An example thresholding pipeline from [60]. . . . . . . . . . . . . . . . . . . 30
viii
2.6 High Performance Unsupervised Nuclei Instance Segmentation (HUNIS) pipeline
from [61]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7 An example marker controlled watershed method from [63] in conjunction
with thresholding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.8 An example active contour based pipeline from [69]. . . . . . . . . . . . . . . 38
2.9 An example graph cuts based pipeline from [73]. . . . . . . . . . . . . . . . . 41
2.10 An example K-Means Clustering based pipeline from [77]. . . . . . . . . . . 43
2.11 An example image from [85] illustrating the idea of domain adaptation. The
source data can vary from a common image set to labelled biomedical images. 47
2.12 An example image from [86] showing the use of pseudolabels generated from
the first stage to train a classifier in the second stage. . . . . . . . . . . . . 50
2.13 A contrastive learning example from [91] using three different patches generated from an image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.14 An example Mask RCNN based nuclei segmentation framework from [94]
called NucleiNet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.15 An example U-Net architecture from [96] with an atrous spatial pyramid pooling bottleneck block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.16 An example of the contour aware CIA-Net from [21] that uses a U-Net architecture with two decoding paths dedicated to nuclei and contour decoding
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.17 An example of architecture using gated attention for U-Net from [104] . . . 71
2.18 An example of a two stage point wise label propagation from [113]. . . . . . 77
2.19 An example framework using GAN to generate a nuclei centroid likelihood
map from [114]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.20 Cumulative distribution of the loss value over the ratio of foreground examples
seen from the model. Given the very skewed distribution, a small percent of
examples actually contribute in the optimization of the model (results from
[21]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
ix
2.21 (a) Perturbations in point-wise annotations. Yellow points represent nuclei
center, while red and blue are points offset by four and eight pixels, respectively. (b) Object-wise metrics for nuclei segmentation over different amounts
of perturbations measured in pixel distance from nuclei center. (results from
[111]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.22 In the middle and bottom rows, ground truth (annotated) boundaries are red,
detected are blue, and the overlap between the two is yellow. Segmentation
comparisons are shown with the CNN2 baseline model. In bottom row yellow
is more prevalent, thus indicating more precise boundaries detection with
respect to ground truth. (figure from [101]). . . . . . . . . . . . . . . . . . . 99
2.23 Comparison of CIA-Net without the Information Aggregation Module (IAM)
and BES-Net. CIA-Net can identify more accurately connected nuclei that
should be slit, even when ground truth is noisy or mislabelled (figure from [21]).100
2.24 Horizontal and vertical prediction maps are shown on areas prone to overlapping nuclei, along with their corresponding ground truth. Distance information alleviates over-segmentation and nuclei splitting phenomena (adjusted
figure from [95]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.25 A challenging nuclei segmentation comparison among several models, demonstrating the gated attention mechanism. (a) GT, (b) Baseline U-Net [18],
(c) CNN2 [107], (d) CNN3 [101], (e) Hover-Net [95], (f) NucleiSegNet [105]
(adjusted figure from [105]). . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.26 Demonstration of weak supervision from point labels, using different ratio of
points for training. Starting from top row, odd rows display results from LC
dataset, while even rows from the MO dataset. Upper examples show typical
segmentation cases, while bottom examples more challenging ones Gt points
refer to the full set of available annotations. (combined figures from [24]) . . 103
2.27 The envisioned workflow for CAD-assisted annotations in nuclei segmentation
from an unsupervised model to identify areas need annotation. Two modes
of annotations may be required from future models to achieve a high performance: (a) point-wise and (b) mask annotations. A hybrid model that can
best leverage this information will benefit from a minimum set of annotations
to achieve high performance. Certainly, the quality of training depends on
the unsupervised model that guides the annotation process. . . . . . . . . . . 107
3.1 Illustration of nuclei cell appearance in histology images. . . . . . . . . . . . 111
3.2 An overview of the proposed CBM method. . . . . . . . . . . . . . . . . . . 111
x
3.3 Comparison of two representations in a block image: (a) R/G/B color and
(b) P value in gray and (c) its corresponding bi-modal histogram. . . . . . . 114
3.4 Visualization of the morphological processing effect: input block (upper left),
noisy binarized output (upper right), an improved result by splitting two
distinct nuclei (bottom left), and segmentation ground-truth (bottom right). 116
4.1 An overview of the proposed HUNIS method, where the first stage provides
an initial nuclei segmentation result and yields pseudo-labels to guide the
segmentation in the second stage. . . . . . . . . . . . . . . . . . . . . . . . . 122
4.2 Two threshold adjustment scenarios: (top) a histogram of two imbalanced
modalities and its corresponding block image and (bottom) a histogram of
three modalities and its corresponding block image. . . . . . . . . . . . . . . 124
4.3 Illustration of the effect of the false positive removal module, where some
small-sized instances (in red) are compared with larger instances (in green)
that are more likely to be actual nuclei. The marked instances in grey (right
sub-figure) have a similarity score below Ts and are eliminated. . . . . . . . . 128
5.1 Nuclei segmentation to provide input and assist pathologists or AI tools to
diagnose and grade cancer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2 An overview of LG-NuSegHop pipeline. Pre-processing applies image enhancement to prepare the image for the local processing modules. NuSegHop receives as input the pseudolabel and predicts a heatmap. In the last step, global
processing modules increase the nuclei detection rate using information across
the entire input image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3 An illustration of the main preprocessing steps, involving stain separation and
the PQR method to convert color into grayscale. . . . . . . . . . . . . . . . . 142
5.4 Demonstration of the P-value distribution in a local patch where the bimodal
assumption holds. The auxiliary lines to calculate the adapted threshold Tˆ
are also depicted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.5 Graphical overview of the proposed NuSegHop for feature extraction. It consists of two layers that operate in two different scales. From both layers two
types of features are extracted: (1) spatial and (2) spectral. With red we
depict the extracted feature maps that have low energy and will be discarded.
All the spatial feature maps (green color) are concatenated to extract the
spectral ones (gray color). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
xi
5.6 An overview of the global processing pipeline. The Laplacian of Gaussian
filter detects local minima from the NuSegHop heatmap to decrease the false
negative rate. Watershed and probability thresholding binarize the image
and delineate nuclei boundary from the heatmap. Candidate instances are
classified in a self-supervised manner to detect any false positives. . . . . . . 150
5.7 Illustrative examples of the adaptive filtering from the local processing module. To is the intermediate point between the two peaks of the bi-modal
distribution and the Tˆ the adapted threshold about To. The input patch is
shown after the staining normalization. . . . . . . . . . . . . . . . . . . . . . 161
5.8 Visualization examples of the nuclei segmentation performance in three datasets.
It is also compared the performance of the instance ROI classification within
the global processing module. The true positive areas are marked with white,
the false positives with yellow and the false negatives with blue. Red boxes
highlight areas where the false positive removal is successful. . . . . . . . . . 164
6.1 A typical pipeline for cancer detection and classification is shown. Zonal
segmentation identifies the prostate gland and further divides it into Peripheral and Transitional Zone. The lesion segmentation algorithm identifies and
segment ROIs which harboring PCa with some probability. The lesion classification module operates on a per ROI level and aims at classifying each ROI
with respect to pathology (i.e. clinically significant or Gleason score). . . . . 166
6.2 The overall PCa-RadHop pipeline is illustrated. In stage-1 the per-voxel feature extraction and selection process are independent for each of the three
sequences. Then, selected features are concatenated before the classifier’s input. In doing so, a probability heatmap is obtained and further ROIs can be
identified based on their anomaly score. In stage-2, two modes are possible;
either extracting anomaly features (i.e. probability based) by expanding the
local neighborhood of an ROI, or visually based by extracting radiomics features for each ROI and further combine them with features from RadHop. In
stage-2 it is possible to reduce the probability of some false positive ROIs. . 170
6.3 Unsupervised feature extraction with RadHop pipeline. A patch centered at
certain location is fed in RadHop. Two concatenated Hop units bring multiscale properties in the final feature representation. Local features correspond
to certain locations, while global ones refer to the overall feature map for each
spectral dimension. The output feature is the concatenation from the local
and global features of the two layers. . . . . . . . . . . . . . . . . . . . . . . 174
xii
6.4 Illustration of the feature decomposition into spectral components using the
Saab transform in SSL. PCA is used to identify the orthogonal subspaces.
Different subspaces have different energy (shown with different color size).
This is the core module for feature extraction in Hop unit employed from
RadHop. Malignant areas may reflect differently on certain subspaces and
hence RadHop feature representation can help the classifier to detect them. . 178
6.5 Distribution of dense predictions of stage-1 on prostate gland after one training
of XGB. It is highly skewed towards easy negative samples with “soft” label
close to zero. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.6 Anomaly map (Λ) calculation from stage-1 predictions. Nine neighboring
voxel predictions are averaged using 24 × 24 patches with stride 8 on stage-1
output. Then, in stage-2 the anomaly map is used to find statistical features
from the neighboring anomaly score distribution. . . . . . . . . . . . . . . . 184
6.7 Demonstration of the interface tool for visualizing the csPCa predictions for
a given bpMRI input. The user can scroll through the slices using the scroll
bar on the right hand side of the window. . . . . . . . . . . . . . . . . . . . . 188
6.8 Visualizations from csPCa-RadHop intermediate predictions in stage-1 and
the false positive reduction in stage-2. In the three first rows (a)-(c) three
negative cases are shown with some false positive areas and how their probability is decreased after stage-2. In the last row (d) a positive case is displayed.
The true positive ROI (green) is retained after stage-1, while the probability
of the other false positive ROIs is reduced. . . . . . . . . . . . . . . . . . . . 192
6.9 Demonstration of the hard sample mining technique effectiveness on four negative examples from different patients with areas prone to false positives. The
predicted heatmaps from two differently trained classifiers is shown for each
case. Without hard sample mining implies that the classifier is trained on
randomly chosen patches across patients. The green arrows point to specific
FP regions where hard sample mining increases the robustness of the classifier
and reduces the probability of certain FPs. . . . . . . . . . . . . . . . . . . . 193
6.10 ROC curves and AUC comparison of three different pipelines, showing the
effectiveness of the hard sample mining technique and stage-2 in reducing the
false positive rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.1 Our envisioned future pipeline for annotation tools guided from an unsupervised model that can extract point-wise and full nuclei annotations. Then, a
hybrid model should leverage this hybrid annotation form in the best way to
learn efficiently the nuclei appearance distribution. . . . . . . . . . . . . . . 199
xiii
7.2 Hybrid CNN Reg-RadHop-NN for false positives correction. The first layers
that learn low level features close to voxel space are replaced with radiomics
features from RadHop. The later layers are used to expand the context about
the ROI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
xiv
Abstract
Artificial Intelligence (AI) for healthcare systems is a rapidly growing field that has received
an unprecedented number of contributions during the last decade. Modern AI and Deep
Learning (DL) have enabled the automation of many tasks within the clinical diagnosis
pipeline, paving the way for AI-powered computer-aided diagnosis (CAD) tools that can be
used as physicians’ assistants. This is expected to reduce the required time for diagnosis,
which in turn will decrease the clinical costs on patients’ end. Moreover, AI CAD tools provide a more objective decision making process, thus eliminating the inter-reader disagreement
rate. Cancer detection is of paramount importance within healthcare and clinical diagnosis.
AI is anticipated to play an instrumental role in the future of cancer diagnosis. It will assist
physicians in accurately diagnosing cancer in its early stages, a pivotal step for saving lives.
In medical image analysis, there are different types of images that need to be analyzed.
Magnetic Resonance Imaging (MRI) analysis shows the clinical image of several organs, in a
non-invasive way. Thereby, modern AI and computer vision can automate some of the tasks
in radiologists’ pipeline and expedite their everyday workflow. After the radiology report,
the final step that follows for cancer detection and staging is the histopathological image
analysis. Digital pathology analyzes images from biopsies to reveal nuclei patterns indicative to cancer. Towards this end, nuclei cell segmentation is an important, yet laborious
and time consuming task for pathologists. All in all, the thesis’s proposed solutions can
be divided into two parts, the histological and MRI image analysis. That aims to provide
automation at different types of images across the different stages within cancer diagnosis
pipeline.
For the histological image analysis part, this thesis includes three self-supervised pipelines
proposed for nuclei segmentation, which is a key task in cancer grading. At first, a novel
preprocessing technique is proposed to enhance the appearance of nuclei cells over backxv
ground. A set of local processing techniques is also proposed to predict a pseudolabel in
an unsupervised way (named CBM and HUNIS methods), based on a new adaptive thresholding technique and a novel anomalous instance removal module. In turn, a novel feature
extraction module is proposed, named NuSegHop, to learn the local texture of nuclei, based
on the generated pseudolabel. Furthermore, a set of post-processing techniques are applied
globally on the predicted heatmap of NuSegHop, to improve the detection rate of nuclei.
Extensive experiments on three publicly available datasets demonstrate the effectiveness of
the proposed methods, where they have a competitive standing among other self-supervised
methods, as well as a high generalization ability to unseen domains. On the other hand, the
MR part of this work is on the prostate cancer, which is the second most frequently occurring
cancer in men. PCa-RadHop model is proposed to automate the prostate cancer detection,
and includes two stages. Stage-1 predicts a heatmap using a novel feature extraction method,
named RadHop, and generates candidates Regions of Interest (ROI) for stage-2. RadHop
is a linear model meant to learn data-driven radiomics that help to classify cancerous regions. Stage-2 has been devised to re-classify the candidate ROIs from stage-1, by including
more context information surrounding them. The goal of stage-2 is to reduce the assigned
probability of the false positives, thus increasing the detection performance. PCa-RadHop
has achieved a competitive performance in the PI-CAI challenge, having a model size orders
of magnitude less than other state-of-the-art DL-based works and also maintaining a more
transparent pipeline than other DL-based solutions.
xvi
Chapter 1
Introduction
In this chapter, we provide the main points that motivate our research and basic background
information. This thesis touches upon two main areas within the field of artificial intelligence
(AI) and computer-aided diagnosis (CAD) in healthcare. Therefore, each section is divided
into subsections for the two different problems discussed.
In Section 1.1, the importance of our work is discussed in healthcare applications. Section 1.2 provides a brief overview of the field, where it is reviewed how the research has
evolved over the years and what the current challenges are. In Section 1.3, we outline the
contributions of this work as a teaser to the next chapters. The organization of the thesis is
provided in Section 1.4.
1.1 Significance of the Research
Cancer diagnosis is one of the cornerstone processes within medical diagnosis. Cancer disease
is the second leading cause of mortality in the U.S. according to the Centers for Disease
Control and Prevention (CDC) [1]. There exist different ways utilized from oncologists to
determine the likelihood that a patient suffers from cancer. On the one hand, a non-invasive
way is via a radiology examination, where radiologists look for specific visual cues, indicative
of cancer. Magnetic Resonance Imaging (MRI) and computerized tomography (CT-Scan)
depict on images certain parts of the human body using scanner devices. On the other
1
hand, an invasive way for cancer diagnosis is through the pathology report. This is used
toward the last diagnosis steps, to confirm and stage cancer using histopathological images.
This process relies on detecting molecular patterns of cancer under the biological prism. For
the pathology report, biopsy is the main procedure to start out the examination. This is
a small scale surgery to extract tissue specimens from the targeted organ. Therefore, the
pathology report requires invasive techniques and it is the final stage for cancer diagnosis.
In clinical practice, the radiology report usually precedes that of histology and in certain
cases determines whether a biopsy is needed.
In both types of clinical diagnosis –radiology and pathology– the detection accuracy is
critical. On the one hand, physicians need to achieve a high true positive rate (TPR) (i.e.
recall or sensitivity). The lower the TPR, the more likely a patient’s cancer will be undetected
and consequently untreated, possibly resulting in detrimental health consequences. On the
other hand, true negative rate (TNR) (i.e. precision or specificity) is also quite important for
patients. The lower the TNR, the more likely a healthy patient will be mistakenly diagnosed
with cancer. As a consequence, unnecessary tests and procedures may be ordered, increasing
patient discomfort and incurring higher costs for the healthcare system. Therefore, the higher
the TNR and TPR are, the more reliable and cost-effective the patient care is.
Currently, cancer diagnosis is carried out from expert physicians, trained to read images
either from the radiology or pathology report, looking for certain visual characteristics,
according to the acquired image. In general, medical image reading is a laborious and time
consuming task for physicians, being the main reason behind the high cost for diagnosis,
since it takes up several hours of expertise. Reading medical images is usually subject to high
subjectivity, because of the visual nuances involved in the cancer reading process. Reportedly,
the inter-observer discordance rate is high, as well as the intra-observer discrepancies [2]
(same physician may read differently the same images after a period). As such, there exists
a large variability within the clinical diagnosis pipeline – definitely not desired. Moreover,
there is always a possibility of human errors within diagnosis, sourcing either from lack of
2
experience or fatigue [3] due to physicians’ burnout effects [4].
Computer-aided diagnosis (CAD) tools have attracted a lot of interest in the past decade.
They mean to increase objectivity in clinical decision making and also to expedite the diagnosis process, since it takes a few seconds for image readout, while for the same task it
usually takes hours for physicians. Early research in CAD tools had relied on traditional
–non data-driven– image processing techniques, where hand-crafted features were used for
pattern recognition. With the advent of Deep Learning (DL), AI has been reshaped and
enables the automation of several tasks within the clinical workflow.
Data-driven models, such as Deep Neural Networks (DNNs), have dominated the research
field since they can achieve performance close to human [5]. Therefore, AI-powered CAD
tools can work as physician’s assistants, helping to speed up their routine tasks. The biggest
benefit, doctors can focus more on more ambiguous cases that require human intervention.
Another benefit is that CAD tools can also improve the performance of physicians with
less expertise, using them also for training purposes [6]. Hence, AI CAD systems in the
foreseeable future could be used to complement physicians’ reading, helping them to minimize
any reading errors and reduce the results turnaround.
Although modern AI and DL-based models achieve a high performance and enable the
automation of many tasks in healthcare, they are criticized from the research community
and physicians as “black-box” solutions [7]. Deep features are hard to interpret, having
no physical meaning as their derivation process is unfathomable, stemming from a nonlinear optimization process. Transparent and interpretable features are of high importance
in computer vision tasks, especially those involving high stakes decisions, such as medical
imaging. Furthermore, DNN models require large datasets for stable training, which is
usually not the case in medical related problems. Models that are apt to derive robust
features with high generalization ability, even if the training data are scarce, are preferred in
medical applications. Our research is particularly motivated by that, aiming to offer more
transparent CAD solutions and an explainable feature extraction process, that can be robust
3
with less or no training data and still maintain good generalization performance.
Beyond transparency, it is our firm belief that future CAD tools should also emphasize
on energy efficiency and ease deployment in any platform. DNN models bear a huge number
of parameters that requires large memory and special equipment (i.e. graphical processing
units). Training and validating those models also takes unreasonable amounts of energy that
constraints a sustainable and green-aware research [8].
Having provided a high level motivation behind our work with regards to the cancer
detection pipeline using AI, in the next subsections we discuss more specifically about the
research importance and motivational facts for the problems within healthcare that this
thesis addresses. That is, nuclei segmentation in histology images (in 1.1.1) and prostate
cancer detection from MRI images (in 1.1.2).
1.1.1 Nuclei Segmentation in Histology Images
Pathological images are acquired under the microscope at certain magnification level and
further digitized from scanner devices. Those images usually come at very high resolution
to depict a tiny tissue specimen at the microscopy level. Thus, it takes too much time for
pathologists to process and read them. In pathology images reading, nuclei segmentation
is an indispensable task, meant to detect and segment the nuclei cell, thus revealing the
underlying molecular pattern. This is viewed as an instance-level segmentation task, meaning
that one needs to identify first the nuclei cells and then to accurately segment each of them.
The output of this process is a binary image, with pixels belong to foreground marked with
white. Nuclei segmentation gives rise to the topology of nuclei and how glands are formed,
as well as the nuclei size and count. These are factors that pathologists look into for cancer
detection and staging.
Besides the large size of images, this task is even more difficult and time consuming
because of the large variations in image appearance due to a number of reasons. Hematoxylin
& Eosin (H& E) staining process has been used for many years to stain the tissue and
4
Figure 1.1: Visual cues, such as nuclei topology arrangement, their size and count play an
important role in digital pathology for measuring cancer cellularity which in turn leads to
cancer detection, staging and assessment (figure from [9]).
accentuate the nuclei over background. This process involves several steps where each of
those is prone to induce artifacts and noise [10]. Therefore, nuclei appearance may vary
across images because of the many factors that cannot be controlled. Other than the staining
process, the acquisition scanner and its setup can largely affect the output and the visual
appearance as we can see in Figure 2.3.
Considering all the previous factors, one can realize that the examination of each patient
may take several hours –several cores for examination is sampled from each patient–, with
a very high subjectivity in the segmentation results. CAD tools are expected to shorten
the turnaround time of the results from the laboratories and also increase objectivity when
defining nuclei.
It should be established at this point that nuclei segmentation is one of the most time
consuming parts in pathological reading pipeline. As such, the annotation process is fairly
5
expensive, as it takes a lot of time and expertise. This is the main reason that publicly
available datasets consists of a relatively small number of samples, thus challenging DNNs
training and a good generalization performance. Taking into account the lack of large annotated datasets, unsupervised or weakly supervised methods are much favored for nuclei
segmentation. On top of that, training with fewer data poses challenges to the generalization
performance of DNNs in unseen domains.
Figure 1.2: Same tissue digitized by the Aperio XT (a) and Hamamatsu (b) scanners. One
can realize the color variations because of the different devices (figure from [11]).
1.1.2 Predicting Prostate Cancer From MRI Images
Figure 1.3: Typical pipeline of analyzing MRI prostate images from human’s lower abdomen.
Prostate segmentation is performed initially to separate prostate from other organs. Then
lesion segmentation module tries to detect areas suspicious for harboring cancer and assign
a probability to each of them for being clinically significant.
Prostate cancer (PCa) is widely known as one of the most frequently occurring cancer
6
in diagnosis in men, accounting for more than 1.4 million diagnosed cases. Among other
cancers, it was the fifth leading cause of death in the same year [12]. One key fact about
prostate cancer is that if early diagnosed, the 5-year survival rate is close to 100%. Yet, in
case of metastasis the number dramatically drops to almost one third [13].
The diagnostic process usually has a patient with elevated Prostatic Specific Antigen
(PSA) or positive digital rectal examination (DRE) to undergo an MRI to shed light on more
aspects about the prostate gland. Radiologists read the MRI, trying to identify suspiciously
looking Regions of Interest (ROI) and assigning them with a PIRADS score [14]. The
relevant output from MRI scanners for PCa detection consists of three modalities such as,
T2-weighted (T2w), Apparent Diffusion Coefficient (ADC) and Diffusion Weighted Imaging
(DWI), where radiologists look for certain visual marks on each of them. PIRADS score
denotes the probability of an identified ROI to be clinically significant and is based on certain
visual cues that need to be detected by radiologists. This implies that not all suspicious ROIs
harbor PCa. Moreover, some ROIs can be lesion areas but harbor indolent cancer, which
has a different clinical treatment from the clinically significant cancer. If an ROI is assigned
with a PIRADS score 3 or more, then it is considered suspicious for clinically significant
(csPCa) and the patient undergoes biopsy to confirm or not the presence of cancerous cells
of PCa and report its extend and staging.
In this workflow, there is a reportedly high number of false positives [15]. That is,
many patients are advised for a biopsy that either do not have PCa or it is at the clinically
insignificant stage. Biopsy increases largely the overall diagnostic costs, coming also at the
expense of patient discomfort from the very procedure and any repercussions [16]. Therefore,
CAD tools aim at improving the false positive rate and identify more accurately patients
with no clinically significant cancer. A complete pipeline of a fully automated CAD tool for
PCa detection is shown in Fig. 1.3.
The motivation and goal of our work is to automate the MRI reading process for prostate
detection and develop a radiologist’s assistant tool powered with AI. Expediting this process,
7
saves time from radiologists and let them focus on the areas with higher probability of PCa.
In turn, that can cut down the diagnostic costs for PCa, since the time for generating a
radiology report can be significantly reduced. Finally, by detecting and further segmenting
an ROI with high probability of being csPCa can also help in focal therapy [17] when treating
csPCa. It is a very promising direction, having as benefit that patients can overcome cancer
without prostatectomy –which is not a desirable solution from patients end.
1.2 Prior And Current Status
1.2.1 Nuclei Segmentation in Histology Images
Early approaches in nuclei segmentation used traditional non-learning based methods for
nuclei segmentation. They relied more on clinical prior information about the nuclei appearance and shape. Proposed techniques included active contours, level sets, graph cuts,
thresholding, watershed algorithm and k-means clustering. Each of those methods have
their owns advantages and shortcomings. Thresholding performs pixel-level classification
and hence it is challenged when staining variations are larger and nuclei boundaries are
blurry. Watershed algorithms can perform well in separating connected nuclei, yet they are
prone to over/under segmenting regions that are weakly stained. Active contours are more
robust to noise and stain variations. However, they rely on an efficient initialization for
the contours and do not perform well on overlapping nuclei. In a similar manner, level sets
can capture topological changes and fit nuclei boundaries, but they come at a much higher
complexity than other techniques. Graph cuts can efficiently find the global optimum point
for binary pixel classification, nevertheless it is prone to over-segmenting nuclei. Among the
early proposed traditional methods, watershed algorithm is still relevant, being used as a
post-processing step in some methods.
The research in image segmentation problems has been drastically reshaped during the
past decade. Contemporary research in nuclei segmentation has adopted the DL paradigm
8
and DNN-based architectures for segmentation. The most popular and effective choice is
U-Net [18]. It is a Fully Convolutional Network (FCN), with two symmetrical branches. The
down- streaming branch (encoder) decreases the feature maps resolution using convolutional
operations, whereas the up-streaming branch (encoder) increases the feature scale at each
layer using de-convolution operations. In general, U-Net architecture is prominent in different medical image segmentation problems. For further increasing the segmentation accuracy
of U-Net, different methods have been proposed to address common problems as overlapping nuclei and noisy labels. Some methods try to incorporate gated attention on nuclei
boundaries to suppress noisy gradient propagation from areas that can mislead the learning
process [19, 20]. Other methods try to emphasize learning on the nuclei boundary via loss
functions, combined with post-processing techniques tailored to nuclei separation [21].
Recently, another line of research follows a weakly supervision approach, either using
point-level annotations (instead of pixel-level ones) [22] or less annotated data [23]. This
mitigates the annotation burden for pixel-wise nuclei annotations requirement and enables
the pathologists to just click on the nuclei –rather than segmenting each of them. This can
offer huge savings in annotation time and thus larger datasets can be collected and annotated
at a lower cost. This research shows a promising direction, since the performance gap is not
very large with methods that use full annotations [24].
On the one hand, fully supervised DL-based methods have a hard time to generalize well
in different types of tissue, because of the small amount of annotated data and the difficulty
of the problem (i.e. color variations, overlapping nuclei, several types of tissue etc.). Pointwise annotations have a bit inferior performance. Although they enable the acquisition of
larger datasets that will help in training DCNNs in the future. In either cases, current models
are perceived as “black-boxes” where the segmentation process is not fully understood. For
example, most of the DNN models require to be pre-trained on large datasets with natural
images (e.g. ImageNet). From human’s perspective, it is counter intuitive why a model
should train first on natural images in order to efficiently perform a segmentation task on
9
pathology images that entails knowledge in molecular biology. In other words, a question
arises, how DNNs adapt from a natural image domain to histology images and how learnable
patterns within neurons are being adapted across tasks during the learning process.
Another limitation from the current state-of-the-art, they try to solve the problem using
very large models with millions of parameters that increase the costs of deployment. From
an energy efficiency standpoint, another question arises is whether so many parameters are
really needed for this task, given that images in molecular biology have simpler and standard
patterns than natural images, where the basis for the pathologists is the prior knowledge
on nuclei shape and texture. We contend that incorporating more prior knowledge from
molecular biology and relying less on the training data and specific domain, we can devise
methods that can have a high generalization performance to unseen domains.
All in all, this thesis contributions in the nuclei segmentation problem are on proposing
transparent and explainable approaches –decoupling from DL techniques–, maintaining a
lightweight model, as well as solving the problem with an unsupervised pipeline that can
generalize well to unseen domains.
1.2.2 Predicting Prostate Cancer From MRI Images
PCa detection from MRI images is a relatively newer field that has attracted a lot of attention within the last ten years. Prior to that, MRI was not part of the diagnosing pipeline for
PCa. Early methods in this field were not fully automated, but relied more on pre-processing
from experts (semi-automated). That is, radiologists should read the MRI input and identify suspicious ROIs for further examination. In turn, they should manually delineate the
detected ROIs and feed the masks along with their corresponding MRI input to the machine
learning system. After this point the process was automated for classifying the ROIs on
whether they correspond to PCa or not. For feature extraction, radiomics [25] were used
to quantify different aspects of the ROIs, such as intensity value, first and second order
statistics, shape-based features, as well as other features tailored to capture local texture
10
patterns related to how cancer reflects on MRI. Handcrafted radiomics are not derived in a
data-driven way and they have a closed-form formula for their extraction. So, there are no
parameters or model to be trained.
In this semi-automated prostate cancer detection, the pipeline among all methods was
very common. At first, radiomic features are extracted from each ROI. Because not all the
features are relevant to this task, a feature selection (e.g. ANOVA) takes place to quantify
the relevance of each feature to the targeted task and their discriminant ability. The final
step is to train one or multiple classifiers –such as Support Vector Machine (SVM) or Random
Forest (RF)– and which model achieves a higher performance. The obvious limitation of this
approach is the need for human intervention involved for detecting the ROIs, as handcrafted
radiomics cannot be extracted without specifying an ROI upfront.
The advent of DNN models made a leap in the field by enabling the fully automated
prostate cancer detection and lesion segmentation. Some works do not perform lesion segmentation (pixel-wise predictions), but they use DCNNs on large patches of the prostate
to detect cancer [26]. FCN architectures were originally proposed to classify at pixel level
the MRI input and predict a probability map for PCa. With an FCN architecture, such as
U-Net, it is possible to achieve lesion segmentation and classification in an end-to-end manner [27], leveraging the inter-complementary diagnostic information among the bi-paramteric
MRI input to detect PCa.
A commonly identified issue in PCa detection is that the number of positive voxels is
much lower than the negative ones. Therefore, focal loss helps to learn from this imbalanced
data and to leverage the complementary information among the input modalities, received
from the MRI scanner. Huang et. al. [28] use a a U-Net to output a weight map and
propose a novel fusion mechanism for T2-w and ADC sequences based on Gaussian and
Laplacian pyramid decomposition. Moving a step forward, attention mechanisms are proven
to achieve a better performance [29, 30] because they help boosting the features that are
more discriminant for the task, by reducing the influence of other noisy areas. Moreover,
11
via attention mechanisms, it is possible to incorporate strong priors, for instance, focusing
the learning on certain areas of the prostate –peripheral zone– where it is more likely to
develop csPCa [31]. As previously mentioned, a common issue with most methods and
prostate cancer detection tools is the false positive rate. Several works treat the false positive
reduction as a post-processing task, trying to reduce them at a second stage. Duran et al. [32]
show that adding multi-scale contextual information surrounding the detected ROIs can help
to reduce the number of false positives.
Current limitations in state-of-the-art methods is the small size of publicly annotated
datasets that limit the learning performance of large DNNs and increases bias. It is hard
to find within medical field such a large number of MRIs, with sufficient samples of csPCa
required to train DNNs. Therefore, models that can be trained with less training data
are preferred for this task. Another challenge is the generalization ability to MRI images
acquired from external clinics (i.e. different scanner device or parameter setup). All that
leads to an increased number of false positives, mainly due to other prostate diseases (i.e.
prostatitis or indolent cancer) that give rise to areas visually similar to the csPCa ones. On
the flip side, actual csPCa lesion areas with very short extend can be missed or confused
with clinically insignificant cancer.
Interpretable and transparent feature extraction is also an important property in MRI
analysis field. It is of high importance for radiologists, to trust the decisions coming from
an AI-powered tool, so to use it effectively in their clinical workflow. Physicians need to
understand the “logic” or the underlying procedure behind feature extraction and classification before trust the decision making of an AI tool. Radiomics features are extracted
using certain formulae devised from scientists and thereby physicians can understand their
physical meaning. Deep features are hard to be interpreted and hence less likely for a physician to understand the underlying patterns. It is still unclear why for an MRI signal which
has certain patterns and textures, we need so many parameters using heavy DNNs to learn
cancer representations from MRI. Sustainable and energy efficient solutions preferred since
12
they can reduce the cost of research and maintenance of the application.
For the mentioned reasons, our method decouples from the DL paradigm and proposes
an unsupervised data-driven feature extraction approach that is linear and thus transparent.
Additionally, it does not require very large data for training, having a more stable feature
distribution with fewer data, as it is based on statistics. Also, the proposed model size is
orders of magnitude less than the DL-based state-of-the-art.
1.3 Contributions of the Research
Following up on the comments from the previous sections about the challenges and issues with
current methods in the medical image analysis, this thesis provides lightweight models and
energy efficient (green) solutions. Moreover, standing by physicians concerns, the proposed
methodologies offer a more transparent and explainable feature extraction process, without
neglecting feature interpretability. Overall, the proposed methodologies revolve about three
pillars: (1) robust performance, (2) small model size and (3) transparent and explainable
decision making.
1.3.1 Nuclei Segmentation in Histology Images
Our contributions in the problem of nuclei segmentation are summarized below:
• An unsupervised method, named CBM (Color Transform - Binarization - Morphology),
is proposed based on simple image processing techniques. Given the lack in annotated
data for this problem, unsupervised solutions are needed, either to provide an unbiased
segmentation output or to give input in self-supervision methods.
• A novel color transform is introduced for this problem, in lieu of other color spaces,
that projects the true color onto the axis with higher variance. This is proved to help
in highlighting the nuclei over background regions and helps in the local thresholding
operations.
13
• An efficient post-processing technique for splitting overlapping nuclei is proposed, using
morphological operations.
• CBM offers a parameter-free pipeline that achieves a competitive performance among
other supervised DL-based solutions, after benchmarking on the MoNuSeg dataset with
histology images from various organs.
• An adaptive local thresholding technique is proposed based on the bi-modal histogram
distribution. This aims at reducing over- and under-segmentation problems with local
thresholding. Our technique is based on observations and assumptions from priors
about the local nuclei intensity distribution.
• A novel false positive reduction module is proposed based on the prior distribution
about nuclei size and feature-based comparisons with other nuclei surrounding. Those
nuclei fall in the ambiguous area in terms of their size are compared with other nuclei
that are more likely to be true positives.
• Putting together the aforementioned techniques and modules, we propose HUNIS an
unsupervised pipeline with two stages. Stage-1 yields a segmentation output in an
unsupervised way. Stage-2 receives the original segmentation mask to train a classifier in a self-supervised way. In both stages, we adopt post-processing techniques to
mitigate the overlapping nuclei problem. HUNIS requires very few parameters to be
stored, while achieves a very high performance among other unsupervised and supervised methods using deep learning.
• A novel feature extraction module, named NuSegHop, is proposed to learn the texture
of histology images and detect nuclei in histology images.
• Local-to-Global NuSegHop (LG-NuSegHop) pipeline is proposed, a self-supervised
pipeline for nuclei segmentation. At first, modules from HUNIS are combined to create
14
a set of local processing operations to generate a pseudolabel in an unsupervised manner. NuSegHop is then trained using the generated pseudolabel to predict a heatmap
on nuclei existence. At the last stage, a set of global processing operations is proposed
to post-process the NuSegHop’s heatmap and further increase the detection accuracy
of the pipeline.
• LG-NuSegHop outperforms most of the unsupervised and weakly supervised pipelines
and has a competitive standing among other state-of-the-art fully supervised solutions.
• Extensive experiments conducted in three diverse datasets show that LG-NuSegHop
has an impressive generalization performance to other datasets, without requiring any
domain adaptation.
1.3.2 Predicting Prostate Cancer From MRI Images
Our contributions in the problem of Prostate Cancer detection from MRI are summarized
below:
• An unsupervised feature extraction model, named RadHop, is proposed to learn radiomicslike features for MRI analysis. It is a novel model and the first time it is proposed in
a cancer detection problem. The extracted features have certain dimensions that can
discriminate between cancer and non-cancer regions.
• To address the class imbalance problem and train our model in a way that is more
robust and not prone to false positives, we propose a hard negative mining technique
using exponential patch sampling from the negative class distribution that is proved
to increase detector’s performance.
• A novel two stages pipeline is proposed, named PCa-RadHop, integrating both RadHop
features and radiomics to reduce the false positive rate. We have implemented two ways
in stage-2 for identifying false positives. The first one is probability-based based on the
15
predicted heatmap from stage-1, while the second one is imaging-based, by utilizing
handcrafted radiomic features.
• PCa-RadHop has a very small model size comparing to other DL-based pipelines. We
train and validate our method based on the currently largest publicly available dataset,
PI-CAI, and compare our method with other DL-based baseline models. PCa-RadHop
has a competitive standing among other models, at orders of magnitude less complexity
and model size.
• PCa-RadHop offers a transparent pipeline, where the features are extracted in an intuitive way without the use of back-propagation and each module can be fully explained.
Therefore, our solution is more trustworthy from physicians and can be favored for
future deployment in a real clinical setting.
1.4 Organization of the Thesis
The rest of the proposal is organized as follows. In Chapter 2, we provide an extensive
overview of the nuclei segmentation problem, with comparisons among different models, discussing their pros and cons. In Chapter 3, we adduce our proposed methods for unsupervised
nuclei segmentation using the parameter-free, CBM pipeline. Chapter 4 further extends the
work of Chapter 3, where we propose the HUNIS method for adaptive thresholding and
self-supervision segmentation correction. In Chapter 5, LG-NuSegHop pipeline is presented
which is a more complete work, incorporating modules from CBM and HUNIS. In Chapter
6, we propose the PCa-RadHop pipeline for prostate cancer detection from MRI. Finally,
in Chapter 7 we draw conclusions and remarks, as well as future research directions for
extending our work.
16
Chapter 2
Background Review in Nuclei Segmentation
2.1 Introduction
According to the Centers for Disease Control and Prevention (CDC), cancer is the second
leading cause of mortality in the U.S. after cardiovascular diseases [1]. The toll accounts for
more than 140 deaths per 100,000 population in the U.S. An instrumental step for cancer
diagnosis, tumor grading, and staging evaluation is a biopsy. It is still the standard way
for confirming cancer in patients. Tissue specimens are extracted from the suspicious areas,
usually identified by radiologists, to reflect whether cancerous cells are present. Biopsy cores
are processed onto slides and further stained using the popular Hematoxylin & Eosin (H&E)
method to give rise to nuclei and cytoplasm. Before digital evolution, pathologists used
to read the slides manually under microscopes, which was a time-consuming and laborious
task, increasing the diagnostic expenditures while delaying the results turn in time[33]. Nuclei segmentation is a pivotal task towards cancer reading on histology images. The relative
topology, size, and shape of nuclei can characterize cancer’s development in a certain area,
depending on the tissue type (see Fig. 2.1). With the advent of whole slide image (WSI)
scanners, it is possible to digitize the slides in a high resolution under a certain magnifica17
tion level, thus enabling the pathologists to inspect the slides on the monitor using dedicated
software [34]. Yet, the very large size of each digitized image still requires much time from expert pathologists to be read. Furthermore, there is a reportedly high inter-reader variability
because of different expertise levels [35].
In the last decade, a lot of research has been conducted in developing accurate ComputerAided Diagnosis (CAD) tools that can perform at the same level as pathologists and provide
more objective decisions with no variability. These tools are meant to help pathologists in
routinely performed tasks by taking on trivial cases. Several million biopsies are performed
in the U.S. alone, across several types of tissues. Given the high false positive rate of people
sent to biopsy [36, 37], pathologists tend to spend most of their time reviewing benign tissues.
CAD tools can help to more efficiently screen trivial cases, which enables pathologists to focus
on more ambiguous cases that need human’s expertise. After recent AI advancements, it is
possible now to develop CAD tools able to fully automate tasks in the diagnostic pipeline,
hence easing the tissue reading process. Nuclei segmentation highlights the topology of
nuclei, as well as their shape, so its output can help pathologists to review the slides much
faster. Also, nuclei segmentation module can feed the input of another AI-based module
that predicts a slide to be cancerous based on the nuclei segmentation output.
Following the last decade’s research of AI and Deep Learning (DL), last year, the first ever
FDA approval was given to the PAIGE tool [38] meant for AI-assisted pathology reading.
This brings modern pathology into a new era and paves the way for more approvals in
the future for AI-powered tools in digital pathology. Yet, there are still commonly identified
challenges [39] for the nuclei segmentation task that need to be addressed. Noise and artifacts
during the staining process increase the intra-nuclei variance of appearance, which also varies
across tissue from different organs. Another factor is the small amount of annotated data.
Nuclei segmented images are hard and expensive to obtain since they require a considerable
amount of time and expertise. Hence, publicly available nuclei segmentation datasets for
research are scarce. Modern AI and DL-based solutions require large data for training, and
18
thereby they face the challenge of generalizing into new images, especially in unseen organs
[21].
A couple of surveys have been conducted about nuclei segmentation and histological
image reading. Nasir et al. [40] have conducted an extensive review of several methods,
comparing the nuclei- and gland-based segmentation methods. They provide various statistics and charts about methods and datasets accepted in journals, paper acceptance per year
in the area, and analyze publicly available datasets. Various quantitative and qualitative
analysis facts are derived, as well as comparisons between nuclei and gland segmentation
problems. Hayakawa et al. [41] provide a brief survey on existing nuclei segmentation solutions, beginning from earlier approaches that used traditional methods and reaching up to
recent state-of-the-art (SOTA) methods. That survey gives a quick overview of the whole
field, focusing only on the nuclei segmentation problem, and is a nice guide to quickly navigate through the different categories of existing methods in the field. Irshad et al. [42]
carried out a broader survey for nuclei segmentation and detection, as well as general classification in digital pathology images. They provide a thorough technical review on earlier
works before the advent of DL. In their work, different standard pre-processing methods
are referred. Moreover, there are extensive explanations and formulas for earlier techniques
used for nuclei segmentation, such as Gaussian Mixture Models (GMMs), clustering, active
contours models and level sets, graph cuts and morphological operations. Other surveys
[39] focus their review on DL-based methods for nuclei segmentation but target only breast
tissue cancer which is a popular area for nuclei segmentation. Yet, breast nuclei segmentation methods are intertwined to some extent with the generic nuclei segmentation ones
and thereby this paper provides a good overview of nuclei segmentation methods with Deep
Learning and relevant datasets that comprise breast tissue. As nuclei segmentation may
touch upon the general histology image classification problem, there are other surveys pertaining to the classification task that show how nuclei segmentation could couple with the
WSI classification task [43].
19
Figure 2.1: Nuclei topology in forming glands is important for tumor grading. Nuclei segmentation task aims at detecting and highlighting nuclei over other areas, thus giving rise
to visual patterns that are important for pathologists, such as assessing the cancer gleason
grade in prostate tissue (GG) (a) or the cellularity percent (b) (figures from [44, 45]).
Our survey focuses on the nuclei segmentation problem, regardless of the type of tissue. It
pertains to datasets with multiple organs, and we are looking into the generalization ability
of methods in unseen organs as well. Despite recent surveys, this overview paper aims at
providing a comprehensive technical review of existing literature in Section 2.3, starting from
the methods using traditional pipelines up to today’s DL-based models with supervision,
as well as self-supervised methods. After the extended review of proposed methods, our
discussion focuses on how supervision may help improve nuclei segmentation to the degree
that is needed. Motivated by the scarcity of annotated datasets, we try to delve deeper
into weak supervision or self-supervision in order to answer two questions: (1) how much
supervision do we need to solve the problem and (2) what are the areas of the problem that
supervision helps to reach a higher performance. After comparing and identifying current
issues and challenges in existing works in Section 2.4 –using both quantitative and qualitative
results– we provide our thoughts about future directions (Section 2.5), abstract ideas on how
supervision can help and how to identify the minimum amount of areas need supervision.
20
2.2 Staining in Digital Pathology
For over a century, the dominant technique for staining histopathological images is using
Hematoxylin & Eosin (H&E). Biopsy tissue specimens extracted from organ regions that are
suspected to have developed tumor are stained to reveal the nuclei over other structures. In
particular, Hematoxylin (H) gives rise to nuclei, reflecting on them by a dark-purple color,
while Eosin (E) stains other structures in a light pink color, –not diagnostically relevant to
cancer reading– such as stroma and cytoplasm [46, 47].
After staining the biopsy cores from glass slides, pathologists analyze the revealed nuclei
cell structure microscopically to study the cellular morphology for cancer diagnosis. However,
staining is a chemical procedure subject to high variations among different laboratories
and organ tissues [11]. Moreover, different microscope scanners are tuned under different
parameters that cause more variation in the process. Hence, with regard to the automated
nuclei segmentation process, noise can be induced at different stages before image acquisition.
This is a challenge from existing systems that need to cope with the noise and defects that
occur during the acquisition process.
2.2.1 Staining Procedures And Color Variations
The procedure for staining a histological specimen requires multiple steps, until the slide is
ready for examination under the microscope. The concrete steps where color artifacts can
occur are: (1) collection, (2) fixation, (3) dehydration and clearing, (4) paraffin embedding,
(5) microtomy, (6) staining, and (7) mounting [48].
Concretely, fixation time can vary the colorization results. Also, imperfect dehydration
can leave behind water drops that obscure certain slide regions under the microscope. The
thickness of slices after microtomy also affects a lot the nuclei appearance, as thinner slices
may provide more detail to nuclei [49]. Finally, staining process entails chemical solutions and
thus, different factors (i.e. staining time, solution pH etc.) can affect the tissue appearance
21
Figure 2.2: Top row: different slice thickness results in very different nuclei color and appearance. Bottom row: Under-staining (left) and over staining (right) can change nuclei and
background color [10, 49]
[10]. Image artifacts can be also caused during mounting the stained core onto the coverslip
(e.g. bubbles or dust). Fig. 2.2 visualizes some common color artifacts during staining.
2.2.2 WSI Scanners & Image Digitization
The next critical step after staining is the image acquisition at the microscopical level. The
type of lens and magnification level parameter, camera chip type, as well as the illumination
system within scanner, can largely affect the image output for the same stained tissue.
There are a couple of scanners in the market for digitizing WSI. Hamamatsu, Aperio XT,
Olympus, Philips, Huron, Leica, and others are some vendors selling digital scanners for
pathological slides. In Fig. 2.3, one can realize the color shift across different scanners.
Moreover, the device parameterization is not standardized, and thus the digitized image may
vary significantly across different laboratories using different scanner parameter adjustments.
22
Figure 2.3: Same tissue sample digitized under two different scanners. On the left, wholeslide image is acquired using Aperio XT, while the right one using Hamamatsu scanner. The
large color and texture variation across different devices is a challenge for nuclei segmentation
[11].
2.2.3 WSI Variability Implications And Challenges
Nuclei color and texture can be affected due to a number of imperfections. Artifacts and
noise that may occur during the staining process, as well as the image digitization process
(e.g., scanner’s device systematic and random noise) can alter the color and nuclei texture.
Unclear nuclei boundaries, overlapped nuclei, and variations in their color and texture are
quite challenging for a generic nuclei segmentation pipeline. Mitosis also can look quite
different on the digitized image due to staining and scanner variations [50, 51].
Different pre-processing methods have been proposed in the past to normalize the color
components [52, 53] and mitigate the large color and nuclei appearance variations. Yet, today
it still remains a challenge and the main reason modern AI nuclei segmentation models lack
good generalization ability. The previously mentioned challenges due to the large variations
during WSI acquisition are a subject for future research and the area of where different
proposed methods contribute in handling certain aspects of the nuclei segmentation problem.
23
Figure 2.4: Outline of the existing nuclei segmentation methods.
2.3 Methods for Nuclei Segmentation
This section outlines the spectrum of nuclei segmentation methods as shown in Fig. 2.4 and
presents a comprehensive review of significant approaches. They are broadly classified into
unsupervised and supervised learning methods.
2.3.1 Unsupervised
Unsupervised methods perform segmentation without the help of any annotations. These
methods are divided into two categories: traditional methods, which do not use any learning,
and self-supervised learning-based approaches.
24
2.3.1.1 Traditional Methods
Traditional nuclei segmentation methods were predominantly adopted before the deep learning era. They focused on applying different image processing techniques to obtain reasonable
segmentation maps. A summary of the unsupervised traditional methods is listed in Table
2.1.
Ref. Dataset Methods Pre-Processing Post-Processing
[54] 20 slides of neuroblasts
Hysteresis
Thresholding
Color space decomposition
Hole filling,
smoothing, removal of false
positives, watershed
[55] 30 cutaneous H&E
stained images
Local Region
Adaptive Thresholding
Hybrid Morphological Reconstructions
Opening
[56] Real and synthetic microscopic
images
Tricalss Thresholding
- -
[57] Gold Standard
Dataset
Hierarchical multilevel thresholding
Color deconvolution, opening
Dilation
[58] 20 leuokocyte images
Otsu’s threhsolding
Contrast stretching, histogram
equalization
Closing
25
[59] 30 cytology pleural fluid images
Otsu’s threhsolding
Median filtering,
conversion into
LAB color space
Opening
[60] MoNuSeg Adaptive Thresholding
Data Driven Color
Transform
Convex hull algorithm, nuclei
area priors based
thresholding, hole
filling
[61] MoNuSeg Local Modified Adaptive
Thresholding,
Self supervised
classification for
uncertain pixels
H component
extraction, Data
Driven Color
Transform
Convex hull algorithm, morphological operations
[62] 19 H&E stained
breast cancer images
Radial Symmetry Transform,
Marker Controlled
Watershed
Color deconvolution, morphological filtering
Size and solidity
based refinement,
ellipse approximation
[63] 119 H&E breast,
gastrointestinal,
and Feulgen
prostate images
Seeded watershed
based on image
driven markers
Gaussian Smoothing, Morphological Operations
Morphological
Operations
26
[64] 52 DAB stained
colorectal cancer
images
Region growing
based seeded
watershed
Global and local thresholding
for foreground
extraction
Intensity based
auto thresholding,
ellipse fitting
[65] Custom breast
cancer H&E
dataset
Skeleton model
based marker controlled watershed
Color deconvolution, Otsu’s
thresholding,
morphological
operations
Size based false
positive removal,
morphological operations
[66] 34 fluorescence
microscopy hepatocellular carcinoma images
Iterative marker
controlled watershed
Gradient map and
distance transform of binary
map
-
[67] 120 H&E breast
cancer images
Circular Hough
Transform based
modified marker
controlled watershed
Denoising,
CLAHE, Morphological operations
-
[68] H&E esophageal
images
Improved active
contour with
growing energy
Iterative dual
thresholding,
ultimate erosion
-
[69] 100 H&E breast
cancer images
Geodesic Active
Contour
ExpectationMinimization
algorithm
Overlap resolution
27
[70] 20 H&E breast
cancer images
DoG filtering
and thresholding
followed by level
set
Bilateral filtering,
Gamma correction, morphological operations
-
[71] MITOS dataset Localized, Region
Based Level Set
Stain normalization, color
deconvolution,
filtering
-
[72] KMC, BreakHis
datasets (breast
cancer images)
Modified ChanVese Model using
multi channel
color data
Color normalization, color channel
selection
Morphological
operations, area
based false positive removal
[73] 25 in-vivo and in
vitro breast images
Multiscale LoG
filter
Graph Cuts Graph cuts with
region adjacency
coloring and alpha
expansions
[74] 40 promyelocytic
leukemia and 10
lung epithelial cell
images
Distance Transform, Graph
cuts
Maximum Likelihood Estimation
-
28
[75] 51 H&E cervical
cell images
Adaptive and localized graph cuts
Conversion to
HSV, V channel
extraction, linear stretching,
median filtering
Morphological
and gradient
features, concave
points, and constrained ellipse
fitting to split
touching cells
[76] Blood smear microscopic images
K-means clustering
Median filtering,
LAB color space
conversion
Erosion based
region growing
mechanism for
nucleus splitting
[77] 35 pleural effusion
cytology images
K-means clustering
Median filtering, CLAHE,
LAB color space
conversion
Distance transform based watershed, Ellipse
fitting
[78] Custom H&E tumor samples
K-means clustering
Feature extraction
with Gabor filters
Elimination of
false positives
using cytological
profile
[79] 45 synthetic cervical cytology images
Fuzzy C means
clustering with
spatial shape
constraint
Complement-ing,
histogram based
binarization
False positive removal using area
and shape priors,
Closing
Table 2.1: A summary of traditional methods for nuclei segmentation.
29
2.3.1.1.1 Thresholding Thresholding is one of the fundamental traditional segmentation algorithms. Different thresholding methods vary in how the threshold value for segmentation is computed. One widely adopted and automatic thresholding algorithm for bimodal
images is Otsu’s thresholding [56, 58, 80, 57, 59]. Here, the image histogram is separated into
two clusters based on a threshold, decided either by minimizing the intra-class variance or
maximizing the inter-class variance. Other approaches use global [54] or local thresholding
[55, 60, 61] in addition to morphological operations to refine the segmentation maps as in
Fig. 2.5.
Figure 2.5: An example thresholding pipeline from [60].
Gurcan et al. [54] proposed hysteresis thresholding based on morphological operations.
They utilize the R component (from RGB) due to its high contrast and apply Top Hat
transform to detect the high-intensity regions from the morphologically reconstructed complementary R component. Hysteresis thresholding is then used to remove tiny high-intensity
regions near actual nuclei. It was observed that a few weakly stained nuclei were not detected through this method, thus reducing the true positive rate. Such global thresholding
methods fail to account for the variations in staining intensities across the image and within
the nuclei. To address this issue, local or regional thresholding was proposed. Lu et al. [55]
incorporated a two-module approach, with the first module performing Hybrid Morphological Reconstructions on the complementary image to reduce noise and intensity variations.
The second module applied a local, regional adaptive threshold, classifying all pixels with
intensity lower than the mean intensity of each block as nuclei. A refinement phase is fol30
lowed, leveraging nuclei’s elliptical shape and size distribution to correct under-segmentation
issues due to local intensity variations. The opening operation was performed as the final
step to remove ghost nuclei and smoothen the segmentation map. However, few nuclei with
significant intensity variations are missed in this process.
Cai et al. [56] propose an iterative thresholding method using Otsu’s threshold and classify
the pixels into three classes. The first iteration applies Otsu’s algorithm to determine the
threshold and the means of the two classes the threshold divides them into. These two
means help classify the pixels into three different categories, with the nuclei being pixels
with intensities greater than the largest mean and the background with pixel intensities
lower than the smallest mean. The pixels with intensities between these two means are
considered a separate class and subject to the same procedure, classifying the pixels into
three classes again. This procedure is repeated on the third class of pixels between the two
class means until a preset condition is reached when Otsu’s threshold divides them into the
nuclei and background. The nuclei pixels identified from each iteration are combined in a
logical union, and a similar operation is performed on the background pixels to obtain the
segmentation map. This approach is based on thresholding and helps recall some fine and
weakly stained nuclei, which the original Otsu’s algorithm may miss.
Using the Beer-Lambert Law, Phoulady et al. [57] first extract the Hematoxylin component from the H&E stained image. They propose an iterative multilevel thresholding scheme,
with each threshold determined using Otsu’s method, separating the pixel intensities into
several classes. The regions created in the initial stage are either shrunk or split into two or
more smaller areas in the further steps. Morphological operations were then performed to
remove any artifacts and improve the accuracy of the nuclei boundaries. Gautam et al. [58]
propose a similar scheme, where the H&E stained image is first converted into grayscale, and
a copy is made. One copy is histogram equalized, and the other copy is contrast stretched.
Addition and subtraction are performed on these two copies resulting in minimum distortion
in the nuclei. Otsu’s thresholding is then applied to the entire preprocessed image, followed
31
by the closing operation (combination of dilation and erosion) to fill in holes and remove
false nuclei. Win et al. [59] apply the median filter to each component R, G, and B to remove noise in the image. They then convert the images into the LAB color space due to the
dependency of the R, G, and B components on each other. Otsu’s thresholding is performed
on the grayscale adjusted and equalized image to binarize the image. The binarized image
is then subject to the morphological opening operation to remove false nuclei. However, this
method fails to segment overlapping or clustered nuclei.
Figure 2.6: High Performance Unsupervised Nuclei Instance Segmentation (HUNIS) pipeline
from [61].
In the past year, Magoulianitis et al. [60] proposed a pipeline consisting of three modules:
(1) Color transform, (2) Binarization, and (3) Morphological Processing called the CBM
pipeline. They first split the image into 50x50-sized blocks comparable to the size of nuclei
in the H&E strained images. The strong correlation between the R, G, and B channels of an
RGB image is exploited to reduce data dimensions. PCA is performed on the RGB image
to minimize the three attributes to a single high energy attribute for further processing.
A histogram of the normalized energy values from this attribute is plotted to visualize the
distribution of values within each block. This histogram displays a bimodal distribution, the
32
first peak representing the nuclei and the second peak representing the background. The
valley between the two modalities may be used to determine the threshold to classify the
pixels as nuclei or background. If this bimodal assumption does not hold, and there are more
than two peaks in the histogram, or one peak is more prominent, the block is reduced to four
smaller blocks or four blocks are merged to form one large block, respectively. Thresholding
is applied individually to these new reduced or combined blocks. In the final stage, large
connected nuclei are split using the convex hull algorithm, nuclei size priors are used to
eliminate small erroneous nuclei, and false negatives in the image are removed using the
hole filling filters. A follow-up work of CBM, namely HUNIS [61], proposes a two stage
approach, where stage-1 creates an initial segmentation output, and then stage-2 uses the
output to train a pixel-wise classifier in a self-supervised manner (See Fig. 2.6). In stage-1,
PCA is performed on the Hematoxylin component extracted from the H&E stained image
to obtain a monochrome attribute to work with. The adaptive thresholding algorithm is
slightly modified by considering two cases. When one peak in the histogram of the block is
more prominent than the other, or there are more than two peaks in the image, the threshold
for the block is adjusted adaptively based on the magnitude and direction of the slope of the
line connecting the two peaks in the histogram for more precise segmentation. Unusually
small nuclei are identified from a nuclei size distribution and eliminated. This is followed
by using shape priors and morphological processing to split large nuclei. A false positive
reduction module works on a larger tile to capture more nuclei instances. A global nuclei
size threshold is used to group the instances of reasonable size as ground truth and the
remaining as another set. Each element in the latter set is compared with the former set,
and elements with a very low similarity are considered false positives and eliminated. The
procedure is extended to a second stage, training a pixel-wise classifier on the pseudo labels
obtained in the first stage. Utilizing the confidence scores from the classifier, pixels with
high uncertainties which lie close to the boundary are reclassified into their correct classes.
Final morphological processing steps are included to refine the segmentation maps further.
33
Some slight modifications in applying the thresholding help alleviate the errors due to
staining variations. However, thresholding on its own needs help with segmenting overlapping
or clustered nuclei. Several approaches combined thresholding with other methods to account
for the clustered and overlapping nuclei, as in Fig. 2.7.
2.3.1.1.2 Watershed The watershed algorithm [81] employs topological information to
segment an image into different regions of what are called catchment basins. The original version of the algorithm starts with finding the local minima in the image as the centers(seeds) of
the catchment basins. Different colors are then flooded, beginning from the minima markers until they reach the boundaries of each catchment basin to form the watershed lines.
The boundaries of each region distinguish one part from the other, resulting in a segmented
image.
When applied to nuclei segmentation, this process sometimes resulted in over-segmentation,
meaning single objects were segmented into several regions, or under-segmentation, where
multiple regions were combined into a single region. Marker-controlled approaches with novel
marker selection methods were developed to overcome this difficulty. An example method
using a marker-controlled watershed is illustrated in Fig. 2.7.
Veta et al. proposed a marker-controlled watershed segmentation [62] using the Fast Radial Symmetry Transform (FRST). To remove irrelevant structures, color deconvolution (to
extract the Hematoxylin component) and morphological operations are performed. FRST is
used to detect the foreground and background markers based on the radial shape assumption
for nuclei. Watershed segmentation is then applied using these regional markers. Regions of
low solidity and size are removed to refine the segmentation further. Finally, they use the ellipse approximation to generate regular contours. This study observed that the FRST-based
markers reduce the oversegmentation observed with regional minima markers. However, one
drawback of this approach is the selection of background markers as everything around the
nucleus of a specific area. This may not be true for all nuclei resulting in few errors. To over34
Figure 2.7: An example marker controlled watershed method from [63] in conjunction with
thresholding.
come this drawback, Vahadane et al. [63] suggested a background marker generation method.
The image is first preprocessed using Gaussian smoothening and morphological processing
to remove noise and enhance the foreground and background while preserving the edges. To
obtain the background markers, the enhanced image is thresholded using Otsu’s method,
generating a tentative foreground, followed by inversion, dilation, and skeletonization. The
nuclei markers are generated using FRST and refined using the tentative foreground. The
nuclei and background markers thus obtained are employed in the watershed segmentation.
Post processing through morphological processes reduces some false positives and splits connected nuclei.
Shu et al. [64] proposed a method combining thresholding and marker controlled watershed segmentation with a two step approach to seed detection. Since thresholding often
misclassifies nuclei boundaries, they generate two masks, one with global thresholding and
35
another with global and local thresholding. For the first step of seed detection, the latter is
converted into a Euclidean Distance Map (EDM), and seeds for watershed are determined
through Ultimate Eroded Points (UEP). In the second step, to account for the weakly stained
nuclei, the remaining particles from this mask are considered seeds for a region growing process based on the global thresholded mask. This generates ”necks,” which helps watershed
merge oversegmented regions and separate clustered nuclei. Seeds for watershed are obtained
from the EDM, and segmentation is performed. Post processing consists of another round
of local auto thresholding and ellipse fitting to refine the maps. The requirement of some
empirically determined parameters restricts it to good performance on a single tissue type.
To eliminate the requirement of empirical parameter determination, Cui et al. [65] apply an
ellipse detection algorithm to estimate nuclei size. Otsu’s thresholding and morphological
operations are performed on the Hematoxylin component before ellipse detection. From this
binary image, connected components are identified. Based on the estimated nuclei sizes,
these components are classified as noise, single nuclei, or multi nuclei regions. Skeletonization is performed on the multi-nuclei region to identify seeds for watershed segmentation.
Once individual nuclei are identified, they are then post processed along with the components classified as single nuclei regions. However, this approach tends to mark some large
nuclei as a multi nuclei region, causing oversegmentation.
Seeds determined from regional minima approaches often include spurious markers in
the form of noise which may degrade the performance of the segmentation algorithm. The
H-minima transform is applied to suppress minima below a value ’h.’ Selecting this value
is critical, as low values may lead to oversegmentation, and higher values may cause under
segmentation. Koyuncu et al. [66] proposed an iterative H-minima based method for efficient
selection of h. They first generate a gradient map and distance map from the image. The h
value is varied with each iteration to generate markers from the gradient map, and regions
with an area less than a specified threshold are eliminated. The markers obtained from
each iteration are then combined, and the marker controlled watershed is used to grow
36
regions identified from these markers on the distance map. To prevent oversegmentation,
the region growing process is constrained to pixels that have not been previously classified
as background or another nucleus. This process improves the segmentation of non-circular
nuclei.
Rajyalakshmi et al. [67] propose a modified marker-controlled watershed. The H&E
stained image is preprocessed by applying Contrast Limited Adaptive Histogram Equalization (CLAHE) and morphological processes to remove noise. Local maximum points are
identified with the help of Corner Detection Techniques, which are then used to detect nuclei
circles using the Circular Hough Transform. Otsu’s thresholding is applied to remove false
positives. To eliminate certain dark and bright regions, a map of variable size structured
elements is constructed and applied to the thresholded image. Marker-controlled watershed
is applied to the resulting image to extract the boundaries of overlapped nuclei.
2.3.1.1.3 Active Contours & Level Sets Active contours, also known as snakes, start
with an initial contour of points and evolve to fit the points on the object by energy minimization. The initial contour is often generated from a representation of parameters or a
formulation. The contours evolve iteratively until a potential minimum energy boundary is
obtained.
However, this method is highly sensitive to initial contour placement. One of the early
implementations of the snakes algorithm by Hu et al. [68] detects the nuclei centers using
a dual threshold algorithm followed by ultimate erosion. To overcome the issue of initial
contour placement, they propose an improved snake energy minimization function by adding
a region similarity based growing energy function. This algorithm also restricts the movement
of the contour along radial directions, which reduces the computational time and broadens
the boundary attraction range.
Active contours are also limited in their ability to segment overlapping objects. Fatakdawala et al. [69] employ an expectation-minimization (EM) algorithm with four classes to
37
Figure 2.8: An example active contour based pipeline from [69].
initialize a geodesic active contour as depicted in Fig. 2.8. The EM step generates an initial
segmentation map, which helps reduce the impact of dataset variability. A swatch color template selects the target initial segmentation map representing the nuclei. A Magnetostatic
Active Contour model is applied to the chosen map using a bidirectional force to evolve the
nuclei boundaries. A final step to resolve overlap between nuclei contours is implemented
by identifying points of high concavity in multi nuclei regions, followed by an edge path
algorithm to split the contours using the edge information and a size heuristic. Nonetheless,
good performance from this approach is subject to well-stained and low noise samples, as
the EM algorithm depends on the R, G, and B values.
Level sets are an alternative mathematical implementation of Active contours, in which
the boundary is viewed as the level set function ϕ = 0 or the zero-level-set function. Starting
38
from an initial contour represented by a level set function, this contour is evolved to fit the
object’s boundary by constructing a zero level set function.
Faridi et al. [70] proposed a level set based technique to detect and segment cancerous
nuclei. A bilateral filter is applied to the image to smoothen the image while preserving the
edges. The green channel indicates a higher probability of cancerous nuclei, hence chosen for
further processing. Gamma correction is applied to this channel and thresholded to obtain
a binary image. The difference of Gaussian (DoG) filter is applied to the morphologically
processed image from the previous step and thresholded to obtain nuclei regions. Initial
contours for the level set algorithm are generated by dilating the detected nuclei centers
(regions). Nuclei with smooth contours are obtained as an output of the level set algorithm.
False positives may still occur due to staining variations, and not all critical cancerous nuclei
may be detected.
Beevi et al. [71] implemented an approach combining the Krill Herd Algorithm (KHA)
based multilevel thresholding and localized Level Set. Upon stain normalization, the R
component of the image is chosen for further processing due to the high contrast between
nuclei and background. A Weiner filter is applied to the image to enhance the weakly stained
nuclei and edges. Initial contours are obtained by employing the KHA optimized multilevel
thresholding, where out of the three thresholds obtained, the lower threshold values are
detected as nuclei. Localized level set algorithm works on the principle of maximizing the
mean intensity difference between the foreground and background along the contour. In
addition, energy minimization is done by region based techniques and local information
accounting for the intensity variations. KHA exhibited fast convergence, eliminated the risk
of oversegmentation, and improved the segmentation of overlapping and touching nuclei.
The Chan-Vese model [82] for image segmentation based on the Level Set implementation
has been a widely adopted method. Instead of the regular image gradient based stopping
criterion in regular active contours, this model applies a stopping criterion based on the
Mumford-Shah segmentation algorithm [83], thus detecting even irregular boundaries.
39
Rashmi et al. [72], apply a multichannel Chan-Vase model to perform nuclei segmentation. The Green Channel (from RGB color space) and inverted S Channel (from the HSI
color space) are obtained from the color normalized H&E stained images due to their ability
to distinguish weakly stained nuclei. The energy function in this Chan-Vese implementation
utilizes both channels to get the zero level set contour. The output of this step is postprocessed by filling the holes using the closing operation and thresholded to remove false
positives. The complementary information supplied by the Green and Saturation Channels
contributes to improving the segmentation performance.
2.3.1.1.4 Graph Cuts In graph-based approaches, each pixel is considered a node in a
graph, and each edge is weighted based on the degree of similarity between its connecting
nodes. A cut in the graph partitions the graph into two disjoint sets of nodes. The best cut
will have minimum cost or energy.
Al-Kofahi et al. [73] proposed graph based methods for initial binarization and further
refinement. The normalized image histogram is computed and a minimum error thresholding based on Poisson distribution is performed before applying the fast maxflow Graph cut
algorithm to obtain initial segmentation. To detect seeds, the scale normalized Laplacian of
Gaussian (LoG) filter response at multiple scales is computed. To overcome undersegmentation, maximum scale values are constrained using the Euclidean Distance map. They use
these seeds as nuclei centers in a local maximum clustering algorithm, and foreground pixels
are assigned to these centers forming clusters. A region adjacency graph coloring method
to divide large clusters in the initial segmentation into smaller groups of nuclei precedes the
α-expansion (algorithm to obtain multi-way cuts in a graph) to delineate nuclei boundaries
in clusters. This pipeline is illustrated in Fig. 2.9.
Danˇek et al. [74] proposed a two-stage graph cut model where the first stage distinguishes
the foreground and background, and the second stage segments touching nuclei. Bilevel
histogram analysis is employed to generate initial weights to the edges of the graph. The
40
Figure 2.9: An example graph cuts based pipeline from [73].
centers of the two peaks depicting the nuclei and the background are considered as thresholds.
Weights of links of background voxels less than the threshold and foreground greater than
the threshold are given the value ∞. The rest of the voxels are not given any weight. The
background segmentation is obtained by performing the two terminal graph cut algorithm.
In the second stage, centers are identified from the nuclei clusters by calculating a distance
transform inside the cluster and using the maxima transform to find the peaks. The graph
weights in this stage are determined by the Euclidean distance from the nuclei centers to
the current voxel. Since the standard maxflow algorithms may not be used with multiple
nuclei, an iterative algorithm to find the best cut for label pairs is implemented. Strong
gradients in the nuclei centers are ignored by including nuclei shape a priori information
41
while performing graph cuts.
Zhang et al. [75] use an adaptive and local graph cuts approach. Preprocessing involves
converting the image from RGB to HSV space and extracting the V component, enhancing
it via linear stretching and removing noise using a median filter. They employ an adaptive
thresholding algorithm suggested by Sauvola et al. [84] using textural and intensity information to detect approximate nucleus regions. Each of these regions is refined using a Poisson
distribution based localized Graph Cuts with the help of boundary and regional information.
This approach improves the performance in case of non uniform chromatin distribution and
low contrast difference between nuclei and background. The segments with maximum overlap with the region obtained in the adaptive thresholding are retained, and an empirically
determined condition on roundness reduces computational time by determining the need for
further refinement.
2.3.1.1.5 K-Means Clustering For K-Means Clustering Based approaches, a value K
is selected as the number of clusters, and cluster centers are chosen randomly or heuristically.
Each pixel is assigned a cluster label based on the minimum distance between the pixel and
the cluster centers. New cluster centers are computed using the new labels. This process is
repeated until it converges or no changes occur.
Sarrafzadeh et al. [76] proposed a K-means clustering algorithm integrated with a region
growing mechanism to segment nuclei. They apply a median filter to each of the image’s R,
G, and B components to remove noise and preserve the edges. The filtered image is converted
to the LAB color space to decouple the intensity and color bands. K-means clustering is
applied to the a and b color spaces, creating three clusters based on Euclidean distance. The
cluster with the minimum mean of RGB bands is detected as the nuclei. The opening and
hole filling operations follow to smoothen boundaries, eliminate false segments, and complete
segmented regions. To separate touching nuclei, connected components are identified as those
regions with an area greater than the area of cells. With the green component being the most
42
suitable for edge detection, the Sobel filter is applied to this component, and the detected
edges are superimposed on the binary map obtained from the previous step. Seed points for
region growing are determined by computing the center of mass after applying erosion on
the edge included map. From these seeds, the regions are expanded or grown until there are
two or more edge points in an 8-connected neighborhood of the central pixel. This results
in splitting connected nuclei, giving distinct nuclei in the final map.
Figure 2.10: An example K-Means Clustering based pipeline from [77].
Win et al. [77] proposed an approach on the same lines as [76]. To preprocess the images,
in addition to median filtering, CLAHE is applied to enhance the contrast. They use the
same K-means procedure with k as three, followed by morphological processing to improve
the nuclei boundaries. Fig. 2.10 illustrates the flow chart of the adopted method. These two
methods differ in the procedure adopted to split overlapping nuclei. Distance transform is
applied, where the value of each pixel is replaced by its distance from the closest background
pixel. The seeds are assumed to be the darkest parts of each object, and the watershed is
performed using these seeds. Finally, ellipse fitting is done to generate smooth contours.
Chang et al. [78] proposed K-means clustering using morphological features. They extracted features from H&E stained images using the Gabor filters of different orientations
and frequencies. The impulse responses from Gabor filters and other features, like intensities,
are stacked to form an n-dimensional feature space. Each pixel in the image is mapped to
43
a point in this feature space. The pixels are then enhanced using chosen features, and pixels with similar features are clustered using the K-means clustering algorithm. Cytological
profile, including features like nuclei shape, size, intensities, etc., may be used to eliminate
false positives and improve segmentation.
Fuzzy C-means(FCM) clustering is an extended, robust clustering-based segmentation
algorithm among all the fuzzy clustering methods. It offers the flexibility of allowing partial
memberships in clusters. The standard algorithm utilizes only intensity information and
hence is susceptible to artifacts. Including spatial information, however, tends to result in
poor segmentation. Saha et al. [79] proposed a spatial shape constrained FCM to segment
nucleiThe input image is first complemented and binarized by subtracting the background ˙
using the second peak from the image histogram as the threshold. A fuzzy partition matrix is
initialized before applying the spatial shape constraints. The circular shape function (CSF)
is defined from seeds identified using an adjacency graph. In each connected component,
nodes with maximum intensity are considered seed points. CSF is calculated for each pixel
in the image as a function of its spatial coordinates with respect to other seed points. This
value is then used to modify the pixel’s intensity, thus moderating which target cluster the
pixel belongs to. This CSF based FCM procedure is repeated until convergence. CSF thus
helps in differentiating pixels in spatially different locations with similar intensities. Features
like area, eccentricity, and circularity were used to remove irrelevant areas from the output,
followed by morphological processing to smoothen boundaries.
2.3.1.2 Self-Supervised
Although results from deep learning based methods were favorable, the requirement of large
amounts of data and their annotation efforts are a matter of concern. This led to the development of self-supervised learning based techniques. This class of methods is categorized
under unsupervised learning, considering these methods require no labels. All approaches in
this section have a two-stage pipeline consisting of a pretext task(pre-training) and a down44
stream task(fine-tuning). They are further classified into Domain Adaptation, Predictive
Learning, and Contrastive Learning. Table 2.2 summarizes the self-supervised methods.
Ref. Dataset Methods Pre-Processing Post-Processing
[85] COCO, BBBC,
Kumar, TNBC
Mask RCNN
based domain
adaptation using
pseudo labeling
DARCNN pretrained with
source dataset
-
[86] 400 single WBC
images split into
two datasets of
300 and 100 images
Unsupervised initial segmentation
with K-means
followed by supervised refinement
using SVM classifier
Conversion to HSI
color space
-
[87] MoNuSeg Attention network
for scale identification with
segmentation
maps as auxiliary
outputs
Tile extraction,
stain normalization
Opening, closing, distance
transform
[88] MoNuSeg, TNBC Multi scale representation based
Self supervised
Learning using
U-Net
Cropping, resizing, ResNet-18
pretrained with
zoomed in and
zoomed out tiles
-
45
[89] Kaggle DSB18,
BUSIS, ISIC18,
BraTS18
Redundancy reduction based
Barlow Twins
U-Net
U-Net encoder
pretrained with
Barlow Twins approach (Siamese
Net)
-
[90] BBBC039V1, Kumar, TNBC
Cycle Consistency
Panoptic Domain
Adaptive Mask
RCNN
Normalization,
random sample cropping,
removal of samples with less
than 3 objects,
complementing
-
[91] MoNuSeg Contrastive
Learning using Scalewise
Triplet Loss and
Count Ranking
to pretrain U-Net
encoder
Anchor, positive
and negative tile
sampling
-
[92] MoNuSeg, CoNSeP
Positive and
Negative Patch
based Contrastive
Learning using
FCN
- -
Table 2.2: A summary of Self-Supervised nuclei segmentation methods.
46
2.3.1.2.1 Domain Adaptation Publicly available fully annotated biomedical datasets
are only a few, but there a plenty of completely labeled general datasets. As illustrated in
Fig. 2.11, domain adaptation takes advantage of these large volumes of labeled data, called
the source, in the pre-training stage to learn features. The relevant biomedical datasets,
called the target, are then used in the second stage to fine tune the performance of the
model.
Figure 2.11: An example image from [85] illustrating the idea of domain adaptation. The
source data can vary from a common image set to labelled biomedical images.
Hsu et al. [85] proposed a Domain Adaptive Region-based CNN (DARCNN) that learns
object definition from a large annotated vision dataset COCO and adapts it to biomedical
datasets. The pipeline consists of two stages of feature level adaptation and pseudo labelling
at the image level. DARCNN utilizes source dataset weights for pretraining, and trains on
combined batches of the source and target datasets. The two step framework of the Mask
RCNN along with a domain separation module form the main structure of the DARCNN. The
large domain shift between the general vision dataset to the biomedical dataset is handled by
the domain separation module, which learns domain specific and domain invariant features
that are input to the mask segmentation and regional proposal networks respectively. The
47
domain specific features contain unconstrained embedding space in addition to information
about the discriminability of the source and target domains. On the other hand, the domain
invariant features contain information on objectness in the inputs from both domains. The
loss function used in DARCNN consists of four losses: Lsim representing domain invariant
features, Ldif f representing domain specific features, Lsource representing the Mask RCNN
losses to train the source dataset, and Ltarget representing the self supervised consistency
loss. This approach gives space for background variation within the dataset by assuming an
independent background consistency in each image. This is done with the help of the region
proposal network, which minimizes the variations in the representation of the background.
Ltarget, also known as the self supervised representation consistency loss, is responsible for
this minimization. The output of the first stage gives a coarse segmentation map, which
needs to be further refined to obtain image level supervised results. Image level supervision
is achieved in the second stage of the DARCNN, which trains only the target branch on the
pseudo labels with high confidence generated from the first stage. These pseudolabels are
strengthened with the help of augmentations, accounting for variations in illumination and
image quality. They also show the generalizability of DARCNN by adapting the model to
three diverse biomedical image datasets.
In [90], Liu et al. proposed a Cycle Consistent Panoptic Domain Adaptive Mask RCNN
(CyC-PDAM). They choose fluorescence microscopy images as their source domain and
synthesize H&E stained images using CycleGAN. The fluorescence microscopy images are
preprocessed to generate square patches of size 256x256. With CycleGAN on its own, the
synthesized images appear to have some undesirable nuclei, that, in further tasks, tend to
be marked as background. An auxiliary task of nuclei inpainting is presented to remove
these unlabeled nuclei. From the synthesized image and its mask, an auxiliary mask with
all the unlabelled nuclei is generated. Using this, a fast marching based nuclei inpainting is
applied to replace these nuclei with unlabelled background pixels, thus eliminating all the
undesirable nuclei. Mask RCNN, used as the baseline model, is built using ResNet-101 and a
48
feature pyramid network(FPN). This Mask RCNN has domain bias in the semantic features,
as it focuses mainly on local features and lacks a global view. To introduce this panoptic
view, a semantic branch including a domain discriminator, is appended to the FPN. This
branch in addition to the instance level segmentation branch help in reducing cross domain
discrepancies and produces domain invariant features. Finally, to decrease the bias towards
the source domain, a reweighted task specific loss is introduced. This network performs
better than its fully supervised equivalent on unseen datasets, proving the efficacy of the
domain invariant features that prevent the network from being influenced by dataset bias.
2.3.1.2.2 Predictive Learning This method uses a framework to generate pseudo labels for the training set that are further refined in the next stage. Such an example is
shown in Fig. 2.12. If a deep learning based model is used, the weights obtained from the
pretraining stage are transferred to the second stage to finetune the segmentation model.
Zheng et al. [86] proposed a self supervised method with an unsupervised initial segmentation to generate pseudo labels, which are later used in a supervised refinement phase. The
first module in the pipeline performs background separation in the HSI color space with
the help of K-means clustering. Several oversegmented regions in the border of the image
are removed. Regions with colors similar to the removed regions are also eliminated. The
remaining image is merged into the foreground leading to touching or overlapping clumps.
In order to split these connected nuclei components, concave points are identified on the
contour of the clumps. The clumps are then iteratively split by connecting pairs of concave
points on the contour. The second module is responsible for fine tuning the results from the
first module, by a supervised classification approach. Features like RGB colors, topological
structure, and HSV based weak edge enhancement operator values are extracted for each
pixel. To speed up the classifier training, a cluster sampling technique, selecting representative pixels from the oversegmented regions in the background separation step, is applied.
The final step is to train an SVM classifier using the selected representative points. This
49
Figure 2.12: An example image from [86] showing the use of pseudolabels generated from
the first stage to train a classifier in the second stage.
trained classifier can then be used to classify the pixels of an image as the required region
of interest or the background.
Sahasrabudhe et al. [87] proposed a method based on the assumption that the magnification level of an image can be determined by the texture and size of nuclei. This approach
shows that identifying the magnification or scale of the image acts as a self supervised signal
for nuclei location. The required segmentation maps are obtained as an auxiliary output in
this scale classification network. They use the concept that if a tile of nuclei can determine
the magnification level, its element wise multiplication with an attention map representing
the corresponding nuclei will also be able to determine the magnification level. A confidence
map is generated using a fully convolutional feature extractor, which is then activated by a
sigmoid function to generate the attention map. A sparsity regularizer is applied to this map
to focus attention on the input patch. Now, the elementwise multiplication of this attention
50
map and the original image tile is input to a scale classification network built out of ResNet34, that gives the scale classification probability. The entire network is trained end to end,
with the auxiliary output generating the nuclei segmentation map. A smoothness regularizer is applied to the attention maps to remove high frequency noise. In addition, to impose
semantic consistency on the feature extractor, transformation equivariance is described by
applying transformations like rotation, transpose, and flips. To obtain the final segmentation
output, the attention maps are subject to opening and closing operations. Distance transform is computed from this image, and local maxima are identified to locate seeds for the
marker controlled watershed that followed. From their observations, this model generalizes
well on unseen organs.
Based on a similar scale based approach, Ali et al. [88] proposed a multi-scale selfsupervised model. Small patches from the whole slide images are extracted, and each patch
is again cropped and resized. These patches have images that are zoomed in or zoomed
out. The initial self supervised stage uses a ResNet-18 model, trained to classify these pairs
of patches as zoomed in or zoomed out. The second stage employs a U-Net architecture
with a ResNet-18 encoder and a Feature Pyramid Network decoder. The U-Net model is
trained with the weights transferred from the first stage to perform the actual segmentation
task. This model was fine tuned using the Adam optimizer with the Cross Entropy Loss.
The results of this approach support the effectiveness of transferred weights from the same
domain, as opposed to domain adaptation from general datasets like ImageNet.
Punn et al. [89] proposed a self-supervised framework, known as the BT-UNet, employing
the redundancy reduction based Barlow Twins approach to pretrain the encoder in the U-Net.
Two distorted images are generated from an image by introducing distortions like cropping or
rotation. The first stage comprises pretraining the U-Net encoder with the help of the Barlow
Twins strategy, followed by a projection network to obtain encoded feature representations.
The Barlow Twins approach uses a twin encoder and projector based Siamese net sharing
similar weights and parameters. The feature maps generated by the encoder network lead to
51
Figure 2.13: A contrastive learning example from [91] using three different patches generated
from an image.
feature representations by passing through blocks of global average pooling, fully connected
(FC) layers, ReLU activation, batch normalization, and a final FC layer. A cross correlation
matrix is computed from these representations. The model is then refined to make the cross
correlation matrix similar to an identity matrix with a Barlow Twins Loss function. In the
second stage, the weights learned by the twin encoders are transferred to the U-Net model
initializing the encoder, while the decoder is initialized with default weights. A limited
number of annotated samples are utilized to train this U-Net segmentation model, which
uses the average of the binary cross entropy loss and dice coefficient loss as the loss function.
2.3.1.2.3 Contrastive Learning Contrastive learning creates positive patches and negative patches from an image, and a model learns attributes by contrasting the patches against
each other. This helps the model find the similarities and dissimilarities among the image
patches.
Xie et al. [91] proposed an instance-aware self supervised method involving scalewise
52
triplet learning and count ranking. This implicitly helps the network learn the nuclei size
and quantity information from the raw data. In triplet learning, three samples are generated
from the original input image. A random sample of a specific dimension is cropped from
the input image, called the anchor. Another sample of the same size is cropped from the
same image, called the positive image. These two samples will have identical nuclei sizes as
they are same-sized samples. To include nuclei size information, a negative patch, a subpatch from the positive patch, is sampled and resized to the size of the anchor and positive
images. This sub patch is randomly sampled from a set of three sizes to introduce diversity.
The anchor, positive and negative samples form a triplet, used in this self supervised proxy
task shown in Fig. 2.13. While these patches implicitly account for the nuclei size, they
also account for nuclei quantity, as negative patches will always have a smaller number of
nuclei than the positive and anchor patches. This introduces another metric, the pairwise
count ranking for self supervised learning. The proxy task comprises three encoders with
shared weights trained on the count ranking loss and scalewise triplet loss. These encoders
aim to embed the features into a 128 dimensional feature space. Triplet learning focuses
on narrowing the distance between samples with similarly sized nuclei and enhances the
dissimilarity between the samples with differently sized nuclei. Count ranking enables the
network to identify large crowds of nuclei. Fine tuning for the segmentation task is done
using a U-Net with a three way classification, including the nuclei, nuclei boundary, and
background. ResNet-101 is the backbone of the encoder and is pretrained with the proxy
task. These weights are transferred to the U-Net encoder, while the decoder weights are
randomly initialized. Joint training on the two proxy tasks appears to substantially improve
the segmentation performance, compared to employing just one of the two.
Boserup et al. [92] proposed a patch based contrastive learning based network. A confidence network is used to predict a set of confidence maps for each image, representing the
confidence level that each pixel belongs to a particular class ’k’. The high confidence level
of an image representing class ’k’ implies that the image contains objects of a particular
53
class. Such a confidence map, implemented using a fully convolutional neural network, is
trained to distinguish between objects of different classes by contrastive learning. The selection of positive and negative patches is highly critical as they determine the performance
of the confidence network. Positive patches are those which are believed to have objects of
a specific class, and negative patches are those which do not have the object of that class.
To obtain positive and negative patches, an entropy based patch sampler is put to use. The
average patch entropy is defined as a function of the Bernoulli Random Variable of the confidence value of a pixel belonging to a particular class. Ideal choices of these patches would
correspond to higher certainty from the confidence network. From a set of patches sampled
from an unnormalized Bernoulli distribution for each class, positive and negative samples are
partitioned based on their confidence scores. The similarity between patches is calculated
from the pixelwise product of an image and its confidence map. Mean squared error and
mean cross entropy are the pixel based similarity measures employed for this purpose. This
scaling with the confidence map connects the gradients between the confidence network and
the sampling process, which aids in backpropagation for end-to-end training of the model.
This approach uses a combined loss, including the inter-class and intra-class contrastive
losses, to ensure distinct features are identified for each class, in addition to maximizing and
minimizing the similarity among positive and negative patches, respectively. The required
segmentation maps are obtained from the confidence maps of the network after convergence.
2.3.2 Supervised
Supervised learning methods require labels to train the model. Depending on the level of
supervision required, the approaches are classified into Full Supervision and Weak Supervision.
54
2.3.2.1 Full Supervision
Full Supervision refers to deep learning models that require 100% of the training set to
achieve a good performance. A summary of the fully supervised methods is shown in Table
2.3.
Ref. Dataset Methods Pre-Processing Post-Processing
[93] DSB2018,
MoNuSeg
Region based
Mask RCNN with
Guided Anchor
RPN
Resizing Soft Non Maximum Suppression
[94] DSB2018 Mask RCNN with
ResNet-101 backbone
Pretrained with
weights of COCO
dataset
Clump identification followed by
marker controlled
watershed
[95] CoNSeP, Kumar,
TNBC, CPM15, CPM-17,
CRCHisto
HoVerNet : Three
branch U-Net
with horizontal
and vertical distance maps to
separate nuclei
clusters
Patch extraction Gradient based
marker controlled
watershed from
distance map
[96] Kidney dataset,
TNBC, MoNuSeg
High resolution wide and
deep transferred
ASPPU-Net
Patch extraction,
Data augmentation
-
55
[97] MoNuSeg, TNBC,
CryoNuSeg,
BBBC039V1
Dense ResU-Net
with residual connections of atrous
blocks
Color Normalization, patch extraction, Data Augmentation
-
[98] Post-NAT-BRCA,
MoNuSeg
Cascaded U-Net
framework (U-Net
with weighted
pixel loss followed
by Vanilla U-Net
with a soft Dice
loss
Zero padding,
patch extraction,
weighted mask
generation
Erosion, Dilation,
Reconstruction
[99] MoNuSeg Enhanced
lightweight U-Net
with generalized
Dice loss
Stain normalization, resizing,
patch extraction,
data augmentation
Opening
[100] DSB2018,
BBBC006v1,
BBBC039, PanNuke
CPP-Net with
Context Enhancement, Confidence
Based Weighting
and Shape Aware
Perceptual Loss
- Semantic segmentation decoder,
NMS, Reassignment of pixels to
correct categories
56
[101] Kumar dataset Ternary CNN
with boundary
class
Color normalization, (boundary
annotation, pixel
mapping in training stage), patch
extraction
Seed detection
by thresholding
followed by region
growing using
boundary class
[19] 224 H&E stained
images of ganglion
cells from pediatric intestine
Boundary Enhanced U-Net
with two decoders
Downsampling,
random cropping,
data augmentation
-
[21] MoNuSeg Contour Aware
Information Aggregation Network
with information
aggregation modules between two
decoders
Stain normalization, data
augmentation
Nuclei and contour outputs subtracted, connected
component identification
[102] Kumar, CPM-17 Boundary assisted
Region Proposal
Network
Stain normalization, normalization, data
augmentation
-
57
[103] HUSTS,
MoNuSeg, CoNSeP, CPM-17
Region Enhanced
multitask U-Net
with auxiliary
tasks of rough
segmentation and
contour extraction
Patch extraction,
data augmentation
Marker controlled
watershed
[104] 150 3D abdominal CT scans, 82
3D pancreatic CT
scans
Attention Gated
U-Net
Dowsampling,
data augmentation
-
[105] KMC Liver
dataset, Kumar
dataset
NeucliSegNet (UNet based) with
robust residual
blocks in encoder,
bottleneck block,
attention decoder
block
Resizing, patch
extraction. No
pretraining
-
[106] ClusteredCell,
MoNuSeg, CoNSeP, CPM-17
Gating Context
Aware Pooling integrated modified
U-Net
Resizing, patch
extraction, data
augmentation,
ImageNet pretrained ResNet-34
-
58
Figure 2.14: An example Mask RCNN based nuclei segmentation framework from [94] called
NucleiNet.
[20] DSB2018,
MoNuSeg
U-Net based
Convolutional
Blur Attention
Network
Patch extraction,
training data generation, Biorthogonal wavelet denoising
-
Table 2.3: A summary of fully supervised nuclei segmentation methods.
2.3.2.1.1 CNN Based Methods This section discusses some landmark CNN based
works. In addition to the conventional CNN, the advent of Region based CNNs lead to the
Mask RCNN, designed to predict object masks in addition to bounding boxes. Such a Mask
RCNN based framework is shown in Fig. 2.14.
One of the initial CNN based nuclei segmentation methods was proposed by Xing et
al. [107]. They propose a supervised deep CNN to generate a probability map that assigns
each pixel a probability of how close it is to the nucleus center. The CNN is trained with
59
images in the YUV space. Each image is manually annotated for nuclei centers, and patches
with the centers at a radius of 4 pixels from these centers are considered to be positive, and
others as negative. Rotation invariance is achieved by rotating the positive patches before
training, thereby augmenting the training data. The CNN uses softmax with two neurons as
its final layer to generate the probability of each patch being positive or negative. Patches
with a positive probability of less than 0.5 are eliminated from further processing. Additionally, a region size threshold is employed to eliminate patches with small areas that could
indicate noise. From the probability maps, the distance transform and H-minima transform
are applied to generate minima as markers for an iterative region growing algorithm. A
smoothing operation is performed to preserve the shapes of nuclei for the next step. These
initial shapes are used in a selection based dictionary learning to generate a shape repository
representing a subset of the nuclei. This method works by minimizing the ISE (integrated
square error) based on a sparse constraint. To account for the wide variability in shapes, the
different shapes are clustered into groups using K-means, and a shape-prior model is learned
for each of these groups. They then perform an alternative shape deformation and shape
inferencing algorithms to perform the segmentation. The shape deformation is implemented
using a Chan-Vese model [82] incorporated with an edge detector and a repulsive term to introduce robustness and split touching nuclei. The shape prior model is used to perform shape
inferencing, thus allowing the contours to iteratively evolve towards the nuclei boundaries.
This approach ensures that the contours don’t split or merge due to any heterogeneities in
intensities as opposed to the level sets in Sec. 2.3.1.1.3.
Liang et al. [93] proposed a region based CNN, employing a guided anchored region
proposal network (RPN). This network uses a Mask RCNN and FPN as its baseline. Rather
than applying the conventional, dense, predefined anchors, guided anchoring is used in the
dynamic prediction of anchors with different shapes and sizes. The GA-RPN module consists
of two branches responsible for location and shape prediction, respectively. This module
generates multi level anchors, collecting anchors from multiple feature maps generated at
60
different levels by the FPN. An Intersection of Union (IoU) branch is designed to regress
the IoU between the ground truth and the predicted bounding box. Generally, Mask RCNN
uses non maximum supression (NMS) to order the boxes by classification score. However,
this approach may eliminate certain boxes of low classification scores with high quality while
preserving some false positives. The IoU module overcomes this by introducing the IoU
regression score. A new metric called the Fusioned Box Score (FBS), the geometric mean
of the classification score and the IoU score, is used to classify the boxes into their correct
classes. NMS with a low threshold may cause an increase in the miss-detection rate as it
can classify clustered nuclei as one object. To overcome this limitation, they propose a soft
NMS that decays the FBS of boxes that have a significant overlap with the box with the
highest FBS. This would penalize boxes close to one with the largest FBS more than the
boxes farther away. The results from this method show fewer undetected nuclei compared
to other SOTA nuclei segmentation methods.
Roy et al. [94] proposed the Nuclei-Net, a Mask RCNN based multistage network. The
first stage employs a Mask RCNN with a ResNet-101 backbone, transfer learned from the
COCO dataset. An RPN follows the backbone network, using the sliding window method
to scan the feature maps from the previous step to generate anchor boxes of reasonable
size and aspect ratios. On extracting the proposals after the application of ROI Align,
an FCN is applied to calculate the offsets in each box. These offsets are used to refine
the originally obtained proposals, which are then classified as nuclei or background using a
classification head. Four consecutive convolution layers are applied to each feature map from
a bounding box to generate the binary feature maps. The model is trained using two losses:
a classification loss based on the cross entropy loss and a mask loss based on binary cross
entropy loss. Most of the regions in this coarsely refined segmentation map have well defined
boundaries except few complex clumps. Such regions are identified using an area based
threshold to perform further refinement. In the second stage, marker controlled watershed is
performed to retrieve individual nuclei from the clumps. They propose to generate markers
61
Figure 2.15: An example U-Net architecture from [96] with an atrous spatial pyramid pooling
bottleneck block.
from these clumps by first identifying boundary points in each clump and initializing them
as potential markers. An iterative algorithm is used to reach the markers from the boundary
points by removing points with a distance less than a heuristically determined threshold from
a specific boundary point. This stage splits any connected nuclei clumps from the previous
stage to obtain individual nuclei.
2.3.2.1.2 U-Net like Methods U-Nets were initially designed for biomedical image
segmentation [18]. They are similar to encoder-decoder architectures, except that there are
skip connections between the encoder and decoder, allowing coupling between the two. This
allows the transfer of some low level features from the encoder to the high level stages of the
decoder, enriching the segmentation result.
Graham et al. [95] proposed a novel encoder-decoder framework called HoVerNet, with
two decoder branches for nuclei segmentation leveraging features encoded by the horizontal
and vertical distances from the centers of mass. This network employs the preactivated
62
ResNet-50 to extract a powerful feature set. To ensure minimum loss of information in
the initial stages, the downsampling factor is reduced to 8 from 32. Following the feature
extraction are the two nearest neighbor upsampling branches, the nuclear pixel and HoVer
branches. The nuclear pixel branch determines whether a pixel represents the nuclei or the
background. The HoVer branch is responsible for determining the horizontal and vertical
distances from the centers of mass of nuclei, and hence splits touching nuclei. These branches
consist of upsampling units followed by multiple stacked dense units. Convolution layers in
between the upsampling stages help in improving predictions at the boundaries. The loss
function concerned with each branch is a combination of two individual losses. For the
nuclear pixel branch, the cross entropy loss and dice loss are added. The HoVer branch
loss comprises of the mean squared error between ground truth and the horizontal and
vertical distances. In addition, it includes the mean squared error of the gradients from the
horizontal and vertical maps and their respective ground truths. From the horizontal and
vertical distance maps, a significant difference was observed between the pixels of different
instances. Computing gradients can shed light on where the nuclei boundaries exist. Finally,
a marker controlled watershed, with the help of the calculated gradient information, can help
split the touching or overlapping nuclei. Though trained on images from a single tissue, this
network exhibits generalizability when tested on diverse tissue samples.
Chanchal et al. [96] proposed a high resolution deep and wide transferred ASPPUNet, consisting of an atrous spatial pyramid pooling (ASPP) bottleneck module amidst
an encoder-decoder architecture (see Fig. 2.15). The high resolution encoder has four levels
of convolution layers followed by max pooling layers. A residual connection from each layer
to its corresponding layer in the main network minimizes losses occurring during pooling.
The powerful decoder concatenates features at a similar level to extract residual information.
The performance of the network is improved by introducing the ASPP bottleneck with a
multiple dilation rate CNN. Dilation rate helps visualize larger areas, and applying multiple
dilation rates in one layer extracts multilevel features. The addition of the ASPP bottle63
neck aids in extracting more relevant features. This model achieves excellent performance
by not producing any false positives and extracting maximum information. Nevertheless,
overlapping nuclei and blurry boundaries still pose a challenge.
In [97], Kiran et al. proposed the DenseResU-Net that employs dense units in the higher
layers of the encoder focusing on relevant features from the previous layers. The input
H&E stained images are preprocessed using color deconvolution. This helps reduce the
detection of false positives and gives a better picture of nuclei and boundaries for further
segmentation. Cropping, flipping, rotation, and other basic augmentation operations are
performed to enhance the generalizability of the model on unseen organs. Distance mapping
is applied to detect nuclei and binary thresholding at the value of 0.4 is used to obtain contour
information. DenseResU-Net comprises a five stage architecture, with five dense blocks in
the final layer of the contracting path, which helps in preventing the model from learning
redundant features. These dense blocks give rise to high computational efficiency. Since
skip connections between the encoder and decoder cause semantic gaps, residual connections
using the atrous block and non-linear operations are implemented similar to [96], extracting
spatial features. The decoder retrieves the segmented output by reconstructing the feature
maps through upsampling. This model shows excellent performance on images from different
organs, proving its robust nature and generalizing ability.
Saednia et al. [98] proposed a cascaded U-Net framework, with a weighted U-Net followed
by a vanilla U-Net with the VGG-16 baseline trained using the soft Dice loss. The model
was pretrained using the public Post-NAT-BRCA dataset, prior to training on the MoNuSeg
dataset (elaborated in Sec. 5.4.1). Weighted masks are generated from each image using
the binary mask such that pixels between adjacent nuclei are given larger weights. These
weights ensure the model learns the separations between nuclei in such regions on application of the weighted loss function. The weight maps are employed while calculating the loss
function to penalize the loss function at boundary regions between touching or overlapping
nuclei. A weighted cross entropy loss is used to train the weighted U-Net model. This model
64
was trained with the input images, their binary and weighted masks to generate an output
probability map. These probability masks and binary masks are input to the vanilla U-Net.
The second stage is implemented with a VGG-16 backbone to reduce the number of parameters, thus promoting generalizability. The soft Dice loss function is selected to penalize the
network for predicting nuclei with a low confidence level. The final segmentation maps from
the cascaded network were post processed using morphological operations to remove small
noisy structures. The second stage accounts for some parts of nuclei that were missed in the
first stage, especially small nuclei and centers of large nuclei. This cascaded model performs
on par with most deep learning based segmentation models. Accurate boundary detection
still remains a challenge.
Hancer et al. [99] proposed an imbalance aware method for nuclei segmentation using a
lightweight enhanced U-Net model. They apply Macenko’s stain normalization technique by
obtaining optical density vectors from the RGB image and performing single value decomposition to get accurate stain vectors. To resize the high resolution images without losing
any pixels, nearest neighbor interpolation technique is implemented. The next step is data
augmentation, where techniques like rotation, reflection, and translation are used to increase
the number of training samples. Class imbalance may impact the model training, leading
to a biased model. Loss functions have been the common solution to such challenges and
in this work, they incorporate the generalized Dice loss to account for the class imbalance.
This loss involves per-class weight as the inverse square of the class volume. The generalized
Dice loss is used to train a lightweight U-Net model, with a depth of three layers as opposed
to the original four layer U-Net. In addition, the final layer of the network uses a Dice
pixel classification layer that assigns a categorical label to each pixel. Finally, morphological
operations are performed to refine the segmentation maps.
Chen et al. [100] proposed the Context-Aware Polygon Proposal Network (CPP-Net) with
a U-Net backbone. They make use of polygons to represent nuclei, which can help the task
of differentiating between nuclei that touch or overlap. Following the U-Net are three unit
65
sized convolutional layers to predict the distance maps, confidence maps, and the centroid
probability map. The next step is the Context Enhancement Module that samples a set of
points from an initial point toward its predicted boundary. The multiple predicted distances
are then merged to update the distance between the pixel and its boundary. To perform
this merging adaptively, they use a Confidence Based Weighting Scheme with the help of
confidence maps. In addition to the BCE loss for centroid probability and weighted L1 loss
for the distance regression, CPP-Net includes a Shape Aware Perceptual loss that penalizes
the difference in shape between predicted and ground truth instances. As a part of the fine
grained post processing, a semantic segmentation decoder attached to CPP-Net’s encoder
identifies the foreground pixels. NMS converts each polygon to a mask and certain pixels are
reassigned to their correct categories. This step helps refine boundaries, thus improving the
quality of segmentation. One limitation of this approach is that irregularly shaped nuclei
may not be efficiently represented by a polygon, and CPP-Net may fail in such cases.
2.3.2.1.3 Contour Aware From error analysis, it was observed that a majority of the
misclassifications were from regions around the boundary of nuclei. This was due to the
presence of overlapping, touching nuclei in dense clusters. With the idea of giving importance
to contours, several approaches introduced a way to focus especially on the nuclei boundaries
as shown in Fig. 2.16.
Kumar et al. [101] proposed a CNN that produced a ternary map, as opposed to the
binary map distinguishing between nuclei and background. This method classifies each
pixel into three categories, namely, nuclei, nuclei boundary, and background. The threeway classification is visualized to identify nuclear boundaries even among dense clusters and
chromatin sparse nuclei. The optical density vector is first obtained from the H&E stained
image by applying Beer Lambert Transform. Sparse non-negative matrix factorization or
SNMF is applied to this vector to generate two sets of matrices, one contains stain density
maps, and the other contains optical density components of each prototype i.e., H & E
66
Figure 2.16: An example of the contour aware CIA-Net from [21] that uses a U-Net architecture with two decoding paths dedicated to nuclei and contour decoding respectively.
prototypes. To obtain the color normalized image, the stain density map is multiplied by
the basis matrix for its corresponding color prototype and inverse Beer Lambert Transform
is performed. The normalization procedure helps in reducing the contrast among nuclei in
different images while preserving the nuclei to background contrast. The constructed CNN
architecture consisted of three convolution layers, with max pooling layers between them and
activated by ReLU activation. These layers were followed by two FC layers and an output
layer with three nodes and softmax activation. To obtain the final segmentation map, the
nuclei body probability map was thresholded at 0.5. This operation provided seeds for a
region growing mechanism. As the seeds are grown, the average boundary class probability
increases, and the average nuclei probability decreases. The regions are grown to obtain the
segmentation map until the boundary class probability reaches a local maxima as long as
it doesn’t interfere with other nuclei or boundaries. This gives rise to an anisotropic region
growing method, with the regions growing at different rates and directions, leading to non67
circular shapes. Including boundary supervision is shown to improve accurate detections
and has a slight edge with segmenting chromatin sparse nuclei.
Oda et al. [19] proposed a U-net based network with two decoding paths, and special
emphasis on the boundaries called Boundary Enhanced Segmentation Net or BESNet. The
first decoding path focuses on boundary prediction and is trained on boundary labels. The
second path (main decoding path - MDP) utilizes the responses from the boundary decoding
path to weigh the training loss for segmentation adaptively. Specifically, information on the
difficulty of determining boundaries is combined with the MDP. The input image is fed into
the encoder path to generate feature maps. Feature maps from the boundary and main
decoding paths are concatenated. The boundary decoding path is trained using the cross
entropy loss. The output in this path will be high at boundary regions but deteriorate at
unclear regions, meaning that such regions have a higher training difficulty. These regions
are given more importance in the MDP by the adaptively weighted Boundary Enhanced
Cross Entropy (BECE) Loss. The additional decoding path, however, adds computational
burden on the system compared to other U-Net based networks. Though this network
uses boundary information to gain insight into the entire nuclei body, it doesn’t leverage
the nuclei information to learn about the boundary. Since contours have a greater intravariability, networks may benefit from mutual information from the nuclei and contours,
thus improving the prediction performance.
To leverage the advantages of mutual dependencies between nuclei and their boundaries,
Zhou et al. [21] proposed the fully convolution Contour Aware Information Aggregation Net
(CIA Net). The U-Net based design has a densely connected encoder using the FPN for feature extraction. The encoder is built using four dense modules stacked hierarchically, with
transition modules following each dense module. To take advantage of multi scale features
as in an FPN, CIANet proposes lateral connections at each level between the encoder and
decoders. Local and textural information from the initial layers is summed with the more
robust semantic features from the upsampled layers in the decoder. This network uses two
68
decoders, one for nuclei and the other for contours, with multilevel information aggregation
modules(IAMs) between them. The IAM helps in bidirectional task specific feature aggregation, taking important features from both decoders as cues to refine segmentation details
in the nuclei and contours. In the decoders, bilinear interpolation is used to upsample the
feature maps and add to the feature maps from the lateral connections of the encoder. The
IAM smoothens these maps and eliminates the grid effect. These features are then passed
on to the classifier to determine score maps. The complementary task specific features are
concatenated for refinement in the subsequent iteration. Noisy and inaccurate labels can
lead to an overfitted model and prevent the learning of essential features. Generally, such
outliers tend to have a low prediction probability and result in large errors. A Smooth Truncated Loss is proposed, which reduces the effect of outliers with greater impact for a lower
prediction probability. This helps alleviate oversegmentation, helping the network focus on
areas with high confidence scores, thus learning more informative features. A Soft Dice Loss
is also included in the total loss function along with the Smooth Truncated Loss and a weight
decay term to incorporate shape similarity among the nuclei regions. Exploiting the high
relevance between the nuclei and contours improves the generalizing ability of the model to
unseen data. However, CIA-Net suffers from false negatives in cases of low contrast between
the nuclei and background.
Chen et al. [102] proposed a two-stage boundary-assisted region proposal network (BRPNet), with the first stage proposing possible instances based on boundary detection and
the second stage performing proposal-wise segmentation. The first stage consists of the
Task Aware Feature Encoder (TAFE), which extracts high-quality features for semantic
segmentation and instance contour detection. In this stage, a backbone encoder extracts
feature maps of four different sizes from the original input image. These maps are split
into segmentation features and boundary features and input to two task-specific encoders
that are deeply supervised, similar to CIA-Net [21]. Feature Fusion Models (FFM) based
on the IAMs from CIA-Net are devised to aggregate the features from both the task specific
69
encoders. The outputs of the FFM are fed into decoders to perform semantic segmentation
and instance boundary detection, respectively. Since TAFE requires postprocessing based on
handcrafted hyperparameters, a second stage was introduced to make BRP-Net more robust.
A square patch containing the region proposals of different sizes is cropped and classified
into two groups using a length threshold. The images in these two groups are resized to
a specific size and trained separately using similar networks with dense blocks, which take
the image patch and the two probability maps from the previous stage as input. IoU scores
are used to train the networks by comparing the image patch with its corresponding ground
truth. Patches with an IoU score lower than a threshold are considered to be false positives
and inferred as background.
In a similar approach leveraging boundary and nuclei features, Qin et al. [103] proposed
a region enhanced segmentation network by combining three U-Nets in serial and parallel
to form a multi-task architecture (REU-Net). The model uses the attention U-Net as the
baseline, with three U-Net like branches to perform the auxiliary tasks of contour extraction
and rough segmentation and the main task of fine nuclei segmentation. The predicted results
of the auxiliary branches are integrated and multiplied by elements enhancing the saliency
of nuclei along with their contours. This region enhances the image, and the original image
is fed into the encoder of the fine segmentation branch. The encoded features from all three
branches are concatenated and input to the fine segmentation decoder through attention
gates to aggregate the spatial and textural features of nuclei and contours. The attention
gates diminish the semantic gap between the three branches, removing background elements
and providing essential target features to the network. An atrous spatial pooling pyramid
(ASPP) structure is used to retrieve a rich set of spatial features by capturing multiscale
information to prevent the loss of vital spatial information while encoding. The loss for each
branch is computed as a combination of the dice and cross entropy losses. The loss function
for the entire network is calculated to be the weighted sum of the losses from each branch,
with the fine segmentation branch having double the weight of the auxiliary branches. The
70
Figure 2.17: An example of architecture using gated attention for U-Net from [104]
results of this approach serve as evidence for the mutual contribution of contour and nuclei
information toward segmentation performance.
2.3.2.1.4 Attention Gated Attention gates are used in segmentation architectures to
focus on the important features and suppress irrelevant features like the background being
included in the training.
Schlemper et al. [104] proposed an attention gated U-Net model (see Fig. 2.17). False
positive removal has always required a post processing or multi-stage segmentation approach
in many methods. Attention gating attempts to avoid the need for separate removal methods
by eliminating the sources of such errors while training. Since additive attention has been
found to perform well on high dimensional inputs, the gating coefficient is obtained using
this approach. Each global feature vector and activation map are considered at each level
to identify the most relevant features for the specific task. Though the attention mechanism
doesn’t require a separate loss function for optimization, deep supervision appears to allow
the feature maps to be discriminative at different scales. In a U-Net model, gating is implemented before concatenating the features between the encoder and decoder paths as a
71
part of the skip connections to combine only the important activations. The attention gates
filter activations in forward and backward passes and suppress the irrelevant background
information in the backward pass. They use the sigmoid activation function to normalize
the attention gated features, resulting in improved training convergence. To account for
class imbalance, this method uses the Sorensen Dice loss. They also use a loss term for each
scale to ensure the model attends at each scale.
A three block NucleiSegNet was proposed by Lal et al. [105], containing a robust residual
encoder, an attention based decoder, and a bottleneck block. High level semantic feature
extraction is performed by the robust residual block with depth wise and point wise convolution in separable blocks. Four such blocks constitute the encoder. The bottleneck block,
with three convolution layers, followed the encoder and contributed toward achieving the
best training loss. This block ensured the compressed encoding of the global information
from all relevant regions, simplifying the job of the decoder. Features from the encoder and
the compressed features from the bottleneck block were merged and input into attention
gates in the decoder. In the attention gate, the gating signal was upsampled, as opposed
to downsampling the skip connections. They applied a multiplicative attention gate, owing
to its memory efficiency and faster computations. The decoder has four upsampling stages
with transpose convolutions within each attention block to decrease the model parameters
without affecting accuracy. Such an attention mechanism helps in reorganizing fine features,
and removing background information. A combined loss function of dice loss and Jaccard
loss was implemented to train the model. Overall, this method performs on par with SOTA
methods, with a fewer network parameters.
Yang et al. [106] proposed a U-Net based gating context aware pooling network (GCPNet). GCP-Net uses an ImageNet pretrained ResNet-34 encoder blocker. This is followed
by the GCP module that functions as a context extractor, generating high-level semantic
features. The GCP module consists of Multi scale Context Gating Residual (MCGR) block,
Global Context Attention (GCA) block, and the Multikernel Maxpooling Residual (MMR)
72
block. Context gating (CG) transforms the input feature representation into a new representation with a powerful discriminant capability. To improve on the limited receptive field of
CG, they propose the MCGR block with parallelly connected three branches of depth wise
convolution, producing weighted feature maps of different resolutions. The input feature
map and the weighted feature maps are merged to retrieve multiscale information. Contextual information is proven to improve segmentation results by increasing the size of the
receptive field and using attention blocks. The GCA block reweights features to enhance
the network’s sensitivity to essential information, thus improving performance. In contrast
to a single pooling kernel in maxpooling, the MMR block incorporates pooling kernels with
four different sizes to capture features with a range of receptive fields, as it can influence
the amount of context information used. The decoder module includes four decoder blocks
to retrieve the features extracted in the previous stages. Each decoder block consists of two
GCA residual blocks, after which the features are concatenated with the information from
the skip connection. To overcome the vanishing gradient problem in deep networks, the
GCA residual block uses a shortcut connection between layers, forcing the network to learn
essential features in each feature map and suppress the irrelevant ones. The performance of
this network is comparable to SOTA deep learning frameworks.
Thi Le et al. [20] proposed the Convolutional Blur Attention (CBA) network, a SOTA
approach, and the best performing so far. CBA net is pretrained first, followed by finetuning
on the training set. The network consists of the blur attention module and blur pooling
operation to retain important features and prevent the addition of noise in the encoding
or downsampling process. Initially, the RGB images are converted to grayscale and scaled
down. Traditional downsampling algorithms use max pooling, which can cause a loss of
features in the early stages. Max pooling is replaced by the blur attention module and blur
pooling to improve the segmentation output. This network uses blur convolutional layers
with stride 1. The blur pooling operation was an attempt to enforce shift invariance, where
any shifts in the input have minimum impact on the output. The blur attention module
73
consists of channel and spatial blur attention. Channel blur attention accumulates spatial
information by average and blur pooling to learn spatial statistics and object features. In
contrast, spatial blur attention gets channel information from average and blur pooling on
the channel axis. Spatially vital regions are identified with the help of a convolution layer
that filters the pooling results. Due to the loss of information in the upsampling stage,
they propose auxiliary connections between the downsampling and upsampling stages to
obtain input features by performing convolutions of different strides while upsampling. The
original size features are concatenated with these convolved features to provide generous
features to the decoder. In addition, they employ a pyramid blur pooling module to extract
multiscale information and identify highly correlated neighboring features. This model is
computationally efficient, with parameters multiple times fewer than several SOTA models.
2.3.2.2 Weak Supervision
Deep learning methods require the availability of large annotated datasets for rich performance. However, with biomedical datasets, it is highly painstaking to manually label such
large amounts of data, in addition to inter-observer variability, leading to inaccuracies in
certain areas. To alleviate such a tedious task, certain weakly supervised methods have been
proposed that require only a subset of the training data to train the model. This subset
can refer to only point annotations, i.e., one label per nuclei or a minimum percent of the
training set. A brief summary of the weakly supervised methods is presented in Table 2.4.
Ref. Dataset Methods Pre-Processing Post-Processing
[108] MoNuSeg, TNBC ResNet-50 backbone segmentation net supervised by auxiliary
PseudoEdgeNet
Label assignment
- Voronoi and distance transform
Threshold-ing
74
[24] Lung Cancer
Dataset, Kumar
Dataset
Semi-supervised
nuclei detection followed by
weakly supervised
segmentation
(using ResNet
backbone U-Net)
Color normalization, patch
extraction, data
augmentation,
ResNet-34 encoder pretrained,
Voronoi and Kmeans cluster
labeling
-
[109] Lung Cancer
Dataset, Kumar
Dataset
Uncertainty prediction from
Bayesian CNN
followed by normal CNN trained
with partial points
and mask labels
Color normalization, patch extraction, data augmentation
-
[110] MoNuSeg, TNBC Coarse segmentation using self supervision followed
by fine segmentation with contour sensitive constraint
Point distance
map and Voronoi
edge distance map
generation
-
75
[111] MoNuSeg, CPM17
Co-trained UNet based
SegmentationColorization
Network
Patch extraction, data
augmentation,
H-component extraction, Voronoi
and K-means
cluster labeling
-
[68] MoNuSeg GAN based nuclei
centroid detection
followed by peak
region backpropagation
Stain normalization, patch
extraction
Graph cuts
[112] Kumar, TNBC,
MoNuSeg
Conditional
SinGAN based
training data augmentation from
selected patches
followed by
Mask RCNN for
semi-supervised
segementation
Patch extraction,
data augmentation
-
Table 2.4: A summary of weakly supervised nuclei segmentation methods.
2.3.2.2.1 Point-wise Label Propagation This category of weakly supervised methods
begins with point annotations. These annotations are extended to generate coarse pixel wise
labels that are used to train a nuclei segmentation network.
76
Figure 2.18: An example of a two stage point wise label propagation from [113].
Yoo et al. [108] proposed one of the initial weakly supervised segmentation networks
for fine nuclei segmentation, with an auxiliary network for edge detection. To train the
segmentation network with point labels, a label assignment scheme assigns positive values to
the point annotations and negative values to pixels on the Voronoi boundaries. The binary
cross entropy loss is utilized to train the network. Though this network generates nuclei blobs,
it lacks information about the boundaries. This led to the design of the auxiliary network
for edge detection called the PseudoEdgeNet, a shallow CNN extracting edge information.
PseudoEdgeNet is trained with the original image and the point annotations generated by
the segmentation network and acts as its supervisory signal. In addition, a large attention
module in the PseudoEdgeNet guides it on where to extract edges, thus improving the
quality of edge maps generated. The Sobel-filtered result of the segmentation net is used as
a reference in calculating the edge loss. Both the networks are trained jointly with a cross
entropy based segmentation loss and the edge loss. Though bounded by the performance of
fully supervised networks, this approach was a good start to the weakly supervised approach
with a promising future.
Qu et al. [24] proposed a two-stage weakly supervised technique from partial point annotations as shown in Fig. 2.18. The first stage performs semi supervised nuclei detection. An
extended Gaussian mask is generated from the available labeled points based on the distance
77
of a pixel from its nearest labeled point. The background is defined at regions greater than
a certain distance, nuclei are identified with an exponential function of the distance, and
the remaining pixels are unlabeled and ignored while training. A regression based detection
model is trained using the obtained extended Gaussian masks, using the mean square error. The U-Net like model uses ResNet-34 as its encoder. A background map is generated
from the first step by thresholding. The second part of the first stage uses an iterative self
training method to improve the detection performance by combining information from the
initial points and the generated background maps. The background map is updated in each
iteration based on an intensity and area threshold. At the end of this stage, the background
regions grow, and potential nuclei locations are identified. The second stage requires coarse
pixel-level labels to train a CNN. Using the results from the previous stage, Voronoi labeling
is used to obtain regions called Voronoi cells that give essential information about the central parts of the nuclei. To understand more about the shapes and boundaries of the nuclei,
K-means clustering based on color and spatial information is implemented, obtaining coarse
pixel wise labels. A network similar to the one used in the first stage is trained using the
generated labels and a weighted loss function based on the cluster labels and Voronoi labels.
A dense CRF loss is used to further improve the nuclei boundaries. However, this approach
faces difficulties with non-uniform staining.
In their follow-up paper [109], Qu et al. proposed using a combination of points and masks
to enhance the performance of the weakly supervised approach. The first stage comprises
an uncertainty prediction task that finds representative complex nuclei to be supervised by
annotation masks. Uncertainty maps are generated by a Bayesian CNN built by adding
a Gaussian distribution prior to the softmax layer. A probability map is generated on the
application of the softmax function to the output of the network. To supervise the training of
this model, in addition to the point annotations, proxy labels are generated pixel-wise similar
to [24] from cluster and Voronoi labels. This network is trained using the cross entropy loss
derived from the two proxy labels. From the generated uncertainty maps, area wise average
78
uncertainty is computed to identify the top 5% of highly uncertain nuclei predictions that
will require mask annotations. In the second stage, the masks for the selected representative
nuclei are integrated with the cluster and Voronoi labels, and add some background pixels
to aid training with the combined masks. These updated labels train a normal CNN model
with the same loss function as in the first stage. This modified approach achieves a slightly
improved performance compared to their previous method.
Tian et al. [110] proposed a coarse to fine weakly supervised learning strategy using point
annotations. The first stage generates coarse segmentation maps through distance mapping
and self supervised learning. To train an FCN, the point annotation map must be transformed into a supervision map. They propose two maps for supervision, the point distance
map focusing on highly reliable positive points obtained by dilating the point annotations,
and the Voronoi edge distance map focusing on highly reliable negative points indicating
non-nuclei pixels. With these generated supervision maps and the original image, an FCN is
trained end to end using polarization loss. In addition, sparsely calculated loss concerning
the point labels and Voronoi labels are also included as to focus only on the partial labels
with high confidence. This coarse segmentation network is trained for about three iterations
to obtain reliable masks. For each iteration, while the Voronoi edge map remains the same,
the point distance map is updated to result from the previous segmentation round. The
second stage focuses on contour refinement. Edge maps are generated from the original
image and the coarse masks by applying Sobel filtering. A sparse contour map is obtained
by pixel-wise and operation between the edge maps. This auxiliary boundary supervision
is implemented using the sparse contour map for supervision and a contour sensitive loss
to refine the contours. The second stage, fine tuning, significantly improves the model’s
performance.
Lin et al. [111] proposed an alternate approach to learning contour information in weakly
supervised methods by means of a sequential Segmentation and Colorization Net called the
SC-Net. The initial step generates coarse pixel labels from the point annotations using
79
Voronoi labeling and K-means clustering. Voronoi labeling generates convex polygons with
point annotations as the center. To perform K-means clustering, the original image and
a distance transformed image are used to cluster the pixels into three categories, nuclei,
background, and ignored pixels. They extract the H component from the H&E stained
images, enhancing the contrast between nuclei and non-nuclei regions. A ResU-Net based
Segmentation Network is trained with the generated coarse labels and cross entropy losses
from the cluster and Voronoi labels. The Voronoi labels help split overlapping nuclei, while
the cluster labels provide contour and shape information. To minimize the effect of incorrect
cluster labels, they propose a co-training framework with a pair of segmentation networks
trained by two non-overlapping sets of data. The training of one network will be supervised
by pseudolabels generated by the other network along with the coarse labels. Accurate cross
supervision is achieved by using EMA to average the pseudolabels periodically. They also
proposed an auxiliary colorization task to obtain precise nuclei contours. The combined SC
Net consists of a sequence of two U-Nets, with the first generating probability maps from
the H-component and the second reconstructing the H&E image from the probability map.
With the help of the colorization network, the segmentation network gathers more low level
features and captures the nuclei-cytoplasm relationship as well. As the training progresses,
the segmentation task is given more importance than the colorization task. However, this
framework fails if not all nuclei are completely labeled, as it may generate erroneous coarse
labels.
2.3.2.2.2 GAN With a limited number of annotated training samples, researchers use
these masks to generate more samples to be used for training, or to predict the certainty
of the annotations. This section describes the use of GANs in a weakly supervised nuclei
segmentation framework.
Hu et al. [114] proposed a GAN based weakly supervised nuclei segmentation approach
depicted in Fig. 2.19. They first perform stain normalization on the images by decomposing
80
Figure 2.19: An example framework using GAN to generate a nuclei centroid likelihood map
from [114].
them into stain density maps and obtaining the component distribution. Since not all the
point annotations are the exact centroid of the nuclei, they propose the using the nuclei
centroid likelihood map as the training set. For this purpose, they use the conditional GAN
based pix2pix network for centroid detection. Here, a detection network detects nuclei centroids and generates the likelihood map as the output. A discriminator network distinguished
between true and generated centroids from the paired input training data consisting of the
original image and the likelihood map. The GAN is trained with a combined loss function
of the confrontation loss and an L1 loss. An area threshold is used to detect the central
areas in the nuclei from the generated likelihood maps. In addition to these areas, regions
surrounding these central areas also contribute to nuclei detection. They use a region guided
backpropagation from the central regions to visualize pixels contributing to the centroid detection to obtain a contribution graph for each nucleus. Finally, the graph cuts algorithm
is implemented to get the segmented nucleus by considering the contribution graph of each
nucleus as the foreground and the remaining as the background. All such fragments are
merged to obtain a complete segmentation map. This approach helps in identifying rough
nuclei boundaries.
Lou et al. [112] proposed a selection based approach to weakly supervised nuclei seg81
mentation. The annotated image patches to be used for training are determined by two features called representativeness(inter-patch attribute) and consistency (intra-patch attribute).
When a patch has the smallest distance from other patches in a cluster, it is considered representative of that cluster. Patches with high similarity within themselves are considered
consistent. Such patches with high representativeness and consistency are chosen as the
training set. Several similar-sized patches are sampled from each training image. The inter
and intra-patch attributes are computed by performing a dual level clustering, which first
groups the image patches into several clusters, and each image is split into four equal sub
regions. Based on the coarse and fine level representativeness and intra-patch consistency,
one image patch from each cluster is chosen to be annotated. Augmentation is performed
on each of these masks by random cropping, flipping, and rotation. For each of these imagemask pairs, a mask synthesis algorithm is employed to generate masks. With the original
image-mask pairs and these synthetic masks, a conditional SinGAN is trained to generate the
nuclei images corresponding to the masks. This network comprises a multi scale conditional
generator and a component wise discriminator. For each scale, the training loss combines an
adversarial loss and reconstruction loss. To perform segmentation, a Mask RCNN is trained
on the real and synthetic image pairs. The trained model then predicts masks for the original
training set and these masks act as pseudo labels. The model is finetuned for another 2-3
iterations, including the original training set and the pseudo labels. The model after the
final iteration is considered to be the final model.
2.4 Evaluation and Performance Benchmarking
In this section, we start by briefly reviewing the available datasets for training the nuclei
segmentation models in 5.4.1. The currently proposed metrics are presented in 5.4.2, since
the nuclei segmentation task has different ways that can be evaluated. In 2.4.3, we conduct
an extensive performance analysis and comparisons of recent methods, both quantitatively
82
and qualitatively. Finally, in 5.4.5 we summarize findings drawn from previous comparisons,
stressing more the weakly supervised results.
2.4.1 Datasets
Several fully annotated nuclei segmentation datasets have been made publicly available. A
summary of a few of the widely used datasets is presented in Table 2.5. These datasets
contain exhaustive annotations of all nuclei in the images.
Dataset Number
of
Training
Images
Number
of Testing
Images
Nuclei
Count
Organs
Included
Magnification
Level
MoNuSeg [115] 30 14 21,623 7
(Breast,
Liver,
Kidney,
Prostate,
Bladder,
Colon,
Stomach)
40x
83
Kumar [101] 16 14 21,623 7
(Breast,
Liver,
Kidney,
Prostate,
Bladder,
Colon,
Stomach)
40x
CPM-15 15 2,905 2 40x, 20x
CPM-17 [116] 32 32 7,570 4 40x, 20x
CoNSeP [95] 27 14 24,319 1 (Colorectal
Adenocarcinoma)
40x
TNBC [117] 50 4,056 1
(Breast)
40x
CRCHisto [118] 50 50 29,756 1
(Colon)
20x
Data Science Bowl
2018 [119]
536 134 37,333 Tissue
from
humans,
mice and
flies
Mixed
84
Lung Cancer
Dataset [24]
24 16 24,401 Lung
Adenocarcinoma
20x
Table 2.5: A summary of publicly available nuclei segmentation datasets.
The Multi-Organ Nuclei Segmentation (MoNuSeg) Dataset was originally released for
MICCAI 2018 challenge. This dataset comprises 30 training images of size 1000x1000 and
magnification 40x and 14 testing images with similar specifications. These images cover
samples from Breast, Liver, Kidney, Prostate, Bladder, Colon, and Stomach cells collected
from The Cancer Genome Atlas(TCGA). The Kumar dataset is a subset of the MoNuSeg
dataset, with its 30 training images split into 16 training and 14 testing samples. With
diverse H&E stained histology images from 7 different organs, good performance on this
dataset indicates a high generalization ability.
The Triple Negative Breast Cancer(TNBC) dataset has 50 H&E stained images from 11
TNBC patients, annotated by an expert pathologist and research fellows. It has 512x512-
sized images of breast cancer tissues at a magnification of 40x. These images include a
variety of nuclei annotations, including normal epithelial cells, inflammatory cells, fibroblasts,
macrophages, adipocytes, invasive carcinomic cells, and myoepithelial cells.
Graham et al. introduced the Colorectal Nuclei Segmentation and Phenotype (CoNSeP)
dataset with 41 H&E stained images obtained from 16 colorectal adenocarcinoma patients.
These images are of size 1000x1000 at a 40x magnification and contain a diverse set of tissue
components and nuclei types. Two expert pathologists exhaustively annotated each nucleus
within every tile with consensus. This dataset displays wide variations within the colorectal
adenocarcinoma images, improving performance on unseen images.
Another colon cancer dataset, called CRCHisto, contains 100 H&E stained images of
size 500x500 at a 20x magnification. The nuclei, however, are not exhaustively annotated
85
with class labels. These images were cropped from 10 whole slide images of 9 patients and
annotated by an expert pathologist and a graduate student.
The Data Science Bowl 2018 (DSB 2018) dataset consists of 670 images with different
tissue types, staining modalities, magnification, etc. The nuclei masks have been annotated
by a team of experts. This diversity helps in nuclei detection from a wide variety of images
and helps in generalization.
CPM-17 dataset was made publicly available during the MICCAI 2017 Digital Pathology
Challenge. The 64 images were extracted from TCGA and contained 16 tiles from four
different cancer types each, at magnifications of 20x and 40x. The nuclei were annotated
by students and reviewed by pathologists. CPM-15, on the other hand, contains 15 images
from two different cancer types. Both these datasets contain images of different sizes.
The Lung Cancer dataset was generated by Qu et al. in [24]. It contains 40 H&E stained
lung adenocarcinoma and lung squamous cell cancer images. These images are extracted at a
magnification of 20x at a size 900 x 900. Each image was annotated by an expert pathologist
with bounding boxes, points, and full masks for the experiment.
2.4.2 Evaluation Metrics
For the task of nuclei segmentation, various evaluation metrics have been used over time.
Earlier methods calculated the accuracy of the algorithm based on the number of pixels
detected as the nucleus within a region of interest (ROI). However, in medical images, there
exists a large class imbalance, often with more background information compared to the
relevant object of interest in a single image tile. This imbalance can create a bias in metrics
like accuracy, leading to an illusion of excellent performance.
Segmentation models require measures that can assess localization correctness in addition to classification accuracy. Modified metrics measuring the similarity between the
ground truth and the predicted values appear to be a better approach to evaluating performance. The most commonly used evaluation metrics are the F1-score, Dice Similarity
86
Coefficient(DSC), the Jaccard Index(JI) or Intersection over Union score (IoU), Aggregated
Jaccard Index (AJI) and Panoptic Quality (PQ). The first two metrics are pixel-level metrics,
while the AJI is an object level metric. These metrics are often defined by four values: True
positives -(TP - predicted true, actual true), True negatives (TN - predicted false, actual
false), False positives (FP-predicted true, actual false), and False negatives (FN- predicted
false, actual true). The equations in this section denote X as the set of ground truths and
Y as the set of predicted instances corresponding to the ground truth. Table 2.6 presents a
comparison of the available evaluation metrics.
Metric Advantage Disadvantage
F1-score Focuses on evaluating the presence of a predicted object corresponding to the ground truth
object
Does not account for pixel-level
errors
IoU (Jaccard Index)
Measures the conformance of
shape between ground truth and
prediction
Does not account for object level
errors
Dice Similarity
Coefficient (DSC)
Measure of pixel wise agreement
between ground truth and prediction
Does not penalize detection errors
Aggregated Jaccard Index (AJI)
Penalizes both object level and
pixel level errors
Over penalization owing to
failed detections
Panoptic Quality
(PQ)
Unified scoring of detection and
segmentation
Dependent on IoU with a strict
threshold and hence may result
in a lower score
Table 2.6: A summary of the current evaluation metrics.
87
2.4.2.1 F1-score
The F1-score is defined as the harmonic mean of the precision and recall, calculated from
the values of TP, TN, FP, and FN. Precision is defined as the percentage of correct positive
predictions of all the positive predictions, while recall is defined as the percentage of actual
positives that were correctly predicted. This metric is very commonly used in the field of
medical image segmentation and considers each instance as an object, giving a per-object
evaluation metric.
P recison =
T P
T P + F P
Recall =
T P
T P + F N
F1 − score =
2 × P recision × Recall
P recision + Recall =
2T P
2T P + F P + F N
2.4.2.2 Intersection over Union score or Jaccard Index
The IoU score or Jaccard Index (JI) is also defined in terms of TP, FP, FN, and TN. One
important difference between this index and the DSC is that, JI penalizes undersegmentation
and oversegmentation more than DSC. Higher rates of oversegmentation and undersegmentation lead to a lower JI. This measure also accounts for the level of shape concordance
between the ground truth and the predicted map. In general, the IoU score or JI is the
ratio of the common elements between the ground truth and predicted map to the union of
elements in the ground truth and the predicted map.
IoUscore =
T P
T P + F P + F N
IoUscore =
|X ∩ Y |
|X ∪ Y |
Both the Jaccard Index and the Dice Similarity Coefficient account only for the pixel level
88
errors, and don’t account for any object level errors. A suitable metric for a segmentation
algorithm must penalize the model for any missed objects and false detections in addition
to oversegmentation and undersegmentation errors. The Aggregated Jaccard Index (see
Sec. 2.4.2.4) was proposed in [101] to account for pixel level and object level errors.
2.4.2.3 Dice Similarity Coefficient
The Dice Similarity Coefficient (DSC) is computed as twice the set of common elements
between the ground truth and predictions divided by the total number of elements in each
set. This measure provides an overall score for the quality of instance level segmentation. It
gives the level of similarity between the ground truth and the predictions.
DSC =
2(X ∩ Y )
|X| + |Y |
2.4.2.4 Aggregated Jaccard Index
The Aggregated Jaccard Index (AJI) is an extension of the Jaccard Index that computes
an aggregated intersection cardinality in its numerator and an aggregated union cardinality
in the denominator for the predicted nuclei and its ground truth. For each nucleus in a
ground truth, the AJI is calculated by adding the pixel count of the intersection between the
ground truth and predicted segments to the numerator and adding the pixel count of their
union to the denominator. This process aggregates the false positives and false negatives
in the denominator. Hence, the AJI ensures that all missed detections, false detections,
oversegmentation and undersegmentation are accounted for. In the equation below, N refers
to the set of false positives from the prediction set.
AJI =
Σ
n
i=1X ∩ Y
Σn
i=1X ∪ Y + Σk∈N Yk
89
2.4.2.5 Panoptic Quality
The Panoptic Quality proposed by [120], is a metric that assesses the combined detection
and segmentation quality. The F1 score measures the detection quality (DQ), and the
segmentation quality (SQ) is a measure of similarity between the predicted instance and its
ground truth. This metric provides a good evaluation of the detection of individual nuclei
instances and their segmentation, and overcomes some limitations of the AJI and DSC. In
the equation below, x is the ground truth segment, and y is the predicted segment. Each
pair (x,y) is unique if its IoU is greater than 0.5, and this matching splits the segments into
TP, FP, and FN.
P Q = DQ × SQ
P Q =
|T P|
|T P| +
1
2
|F P| +
1
2
|F N|
×
Σ(x,y)∈T P IoU(x, y)
|T P|
2.4.3 Performance Comparison
2.4.3.1 Quantitative Comparison
In this subsection we carry out a performance comparison among different models to deduce
how different methods and techniques can enhance the segmentation quality. Table 2.7 shows
a quantitative performance comparison of a few state-of-the-art segmentation approaches on
the MoNuSeg dataset with AJI, which is the commonly used metric for comparisons and
more suitable for instance segmentation problem. In the table, Test 1 and Test 2 refer to
the data splitting proposed in [101], with Test 1 consisting of 16 images only from breast,
liver, kidney and prostate, and Test 2 consisting of 14 images from all the seven organs (see
Table 2.5). Combined Test Sets refer to the 30 images, including Test Set 1 and 2, while the
MoNuSeg Test set refers to the 14 images used for the challenge.
90
Method Test 1 Test 2 Test 1 & 2
(comb.)
MoNuSeg
Test Set
Unsupervised
Cell Profiler [121, 21] 0.1549 0.0809 - -
Fiji [122, 21] 0.2508 0.3030 - -
DDMRL [123] - - - 0.4860
Scale-Supervised
Attention Net [87]
- - - 0.5354
CyC-PDAM [90] 0.5432 0.5848 0.5610 -
CBM [60] - 0.5808 - 0.6142
HUNIS [61] - 0.6548 - 0.6387
Supervised
CNN2
[107, 101] 0.3558 0.3354 - -
U-Net (ResNet-50)
[124, 39] - - - 0.4882
U-Net (VGG-16) [125,
39]
- - - 0.4925
U-Net (DenseNet-201)
[126, 39] - - - 0.5083
CNN3 [101] 0.5154 0.4989 0.5083 [102] -
91
Mask
R-CNN [127] 0.5978 0.5531 0.5786 [102] 0.5282 [39]
DCAN [128] 0.6082 0.5449[21] - 0.557
PA-Net
[129, 21] 0.6011 0.5608 - -
BES-Net
[19, 21] 0.5906 0.5823 - -
HoVerNet [95] - - 0.618 -
CIA-Net [21] 0.6129 0.6306 0.6205 [102] -
REU-Net [103] - - 0.636 -
BRP-Net [102] 0.6196 0.6384 0.6422 -
GCP-Net [106] - - 0.651 -
Enhanced lightweight
U-Net [99]
- - - 0.6895
SSL [91] - - - 0.7063
Region Based CNN
[93]
- - - 0.73
DenseResU-Net [97] 0.7998 0.7684 0.7861 -
CBA-Net [20] - - - 0.7985
Table 2.7: Performance Comparison on the MoNuSeg dataset with the AJI metric.
Cell Profiler and Fiji are conventional approaches developed for biomedical image analysis. Cell Profiler applies an intensity threshold, while Fiji performs a watershed based
nuclear segmentation. We see their results on the MoNuSeg dataset to be very poor. Do92
main Diversification and Multi-Domain Invariant Representation Learning (DDMRL) is one
of the initial deep learning based unsupervised approaches that set the performance standard for domain adaptive methods. It is seen to perform much better on the MoNuSeg
dataset, with an increase of almost 0.18 AJI compared to the conventional methods. The
CyC-PDAM, with its nuclei inpainting mechanism and panoptic level adaptation, achieves
an AJI of 0.5610 overall on the MoNuSeg dataset. On the same lines as self supervision,
the attention-based scale prediction network with segmentation as an auxiliary task [87]
performs even better than the supervised CNN based algorithms, with an AJI of 0.5354.
This class of self supervision works on the relevant histology dataset, and doesn’t require
any labeled data, unlike domain adaptive methods. With the trend of deep learning based
unsupervised and self supervised methods producing comparable results to supervised methods, the CBM [60] achieves a high performance with a well-designed unsupervised method
and a very small computational complexity using simple image processing techniques. The
large performance gap between the deep learning based methods and this method can be
owed to the significant domain gap in biomedical images and inherent intensity variations,
challenged more by the relatively small amount of training data. The HUNIS paper [61] further improves upon the performance of CBM by introducing a second stage self supervised
refinement on the adaptively thresholded result and obtains a substantial improvement in
performance, especially in Test 1. The selective self supervision in the second stage based
on the confidence scores of the predicted pixels proposes a condition based supervision that
can be further explored as an alternative approach to fully supervised segmentation which
suffers from a large memory requirement. Among the unsupervised methods, the HUNIS
approach achieves the highest AJI of 0.6548 on Test 2 and 0.6387 on the MoNuSeg Test Set.
It outperforms all unsupervised and self supervised deep learning methods and requires only
a negligible number of parameters compared to the millions of parameters of deep learning
networks. In addition, this method performs on par with U-Net based REU-Net, attention
gated GCP-Net, and the enhanced lightweight U-Net.
93
CNNs and FCNs (Fully Connected Networks) formed the beginning of deep learning
for nuclei segmentation. CNN2, [107] classifying pixels as nuclei or background, performed
better than the conventional methods, but is particularly challenged in segmenting dense
nuclei clusters. As seen in Table 2.7, CNN 2 obtains an AJI of 0.3558 on Test Set 1 and
0.3354 on Test 2. CNN3 [101] included a boundary class that showed an improvement in
the segmentation of nuclei with diffused chromatin and forming dense clusters. This helped
increase the AJI by about 0.16, reaching an AJI of 0.5083. The multi-tasking FCN used in
DCAN provided encouraging results for deep learning based methods, with an AJI close to
0.60 in some test sets.
The concept of upsampling to obtain a pixel-wise segmentation prediction gave rise to
the U-Net, an encoder-decoder architecture that performs downsampling to obtain features
and builds on these features through upsampling to obtain a segmentation map with similar
dimensions as the input. Different configurations in the encoder architecture were developed
to extract efficient and representative features. Among the U-Net implementations, the deep
DenseNet-201 outperforms networks with other backbones like VGG-16 or the ResNet, by
achieving an AJI of 0.5083. This promising approach led to advanced U-Net based architectures like BES-Net [19], CIA-Net [21], REU-Net [103], and BRP-Net [102] that incorporated
boundary information to refine the segmentation masks in an end-to-end manner. From the
table, we find that boundary supervision contributes to a notable improvement of around
0.09 - 0.14 in AJI, and most of these methods yield an AJI of about 0.60-0.64. HoverNet [95]
displays similar performance with an AJI of 0.618 with the help of horizontal-vertical distance map to separate clustered nuclei. GCP-Net is an attention gated network that achieves
an AJI of about 0.65. The inclusion of the attention gates suppresses irrelevant pixels from
further processing, thereby focusing more on the nuclei regions. Also, adding more context
from coarser layers is proven to help the segmentation quality. This approach contributes to
improved performance even without contour awareness.
Histology images often have a class imbalance issue between the nuclei and background
94
pixels. Such an imbalance may introduce a bias in the network. The instance aware self
supervised network [91] based on contrastive learning achieves an AJI of 0.70. Though SSL
uses nuclei size and quantity priors as the self-supervised pretraining, the best performance
is achieved by finetuning the network with 100% labeled data. An imbalance aware network
proposed by Hancer et al. [99] uses an enhanced lightweight U-Net supervised by the generalized Dice Loss, with an AJI of 0.6895. The DenseResU-Net shows a leap in performance
among the U-net methods with AJIs greater than 0.76 on different test sets. Its wise use of
atrous blocks in a dense network with residual connections between the encoder and decoder
helps reduce the semantic gap. The CBA network [20] obtains the highest AJI of 0.7985
among all the supervised deep learning methods. The integration of the attention mechanism and the blur pooling operations overcomes the challenges of variations in staining,
while the low pass filtering allows the extraction of enhanced features, thus contributing to
SOTA performance.
In addition to the U-Net, region based CNNs like the Mask RCNN have also shown
favorable results. PA-Net and the region based CNN [93] employ improved Mask RCNN
architectures. While PA-Net achieves an AJI of 0.60, the region based CNN with a guided
anchor RPN and Fusioned Box Score hikes the performance by another 0.10 giving a 0.73
AJI. However, it should be noted that Mask RCNN suffers from slow speed, especially with
large images, and requires an enormous number of parameters for training.
After comparing results from different SOTA methods, we compile some findings with
more emphasis on the weakly supervised aspect of the problem. Zhou et. al. [21] observed
that noise in the staining and digitization process gives rise to ambiguous instances, thus
leading to more noise in labels from pathologists’ subjective annotations. In Fig. 2.20, we
can see that only the top 10% of samples have an influence on the 80 % of the overall
cross-entropy loss value, and the really informative regions are more scarce.
From the weak supervision and point-wise annotations standpoint, it is interesting to
analyze how performance is affected by using different fractions of annotated points. Looking
95
Figure 2.20: Cumulative distribution of the loss value over the ratio of foreground examples
seen from the model. Given the very skewed distribution, a small percent of examples
actually contribute in the optimization of the model (results from [21]).
into the study of Qu et al. [24], we can realize the more supervision is added the better the
results are. As an observation, pixel level metrics, such as the reported accuracy and F1
score (at pixel level) improve marginally by adding more supervision (see Table 2.8). Yet,
the performance improvement is more accentuated when using the Dice and AJI metrics
object-wise. Another observation from weak supervision, on the Lung Cancer dataset, the
performance drops significantly if we reduce the number of training points at half. On the
other hand, for the MoNuSeg, the performance is very close in terms of AJI and Dice. Even
using 10% of the points, the AJI performance gap is still small.
Another question that may arise is how supervision helps the generalization in other
datasets. In Table 2.9, one can see that the AJI score is not affected when using much
less training data (even 5%) while training using MoNuSeg and testing on the Lung Cancer
dataset. Conversely, there is a small performance improvement when adding more training
points when we use the Lung Cancer for training and MoNuSeg for testing. This may be
attributed to the smaller size of the LC dataset that challenges the model when a very small
96
Table 2.8: Weak supervision comparison between full supervision and point-wise at different
training point ratios (results from [24]).
Dataset Method Accpixel F1pixel Diceobj AJIobj
Lung Cancer (LC)
Fully-sup 0.9615 0.8771 0.8521 0.6979
GT points 0.9427 0.8143 0.8021 0.6497
5 % 0.9262 0.7612 0.7470 0.5742
10 % 0.9312 0.7700 0.7574 0.5754
25 % 0.9331 0.7768 0.7653 0.6003
50 % 0.9332 0.7819 0.7704 0.6120
MoNuSeg (MO)
Fully-sup 0.9194 0.8100 0.6763 0.3919
GT points 0.9097 0.7716 0.7242 0.5174
5 % 0.8951 0.7540 0.7015 0.4941
10 % 0.8997 0.7490 0.7033 0.5031
25 % 0.8966 0.7511 0.7087 0.5120
50 % 0.8999 0.7566 0.7157 0.5160
fraction of points is included for training. In general, performance drops in both datasets,
where LC testing is affected more by the domain shift, while the performance difference
because of the training domain shift is less in MoNuSeg.
Table 2.9: Weak supervision comparison between full supervision and point-wise at different
training point ratios (results from [24]).
T rain → T est Ratio Accpixel F1pixel Diceobj AJIobj
MO → LC
5 % 0.9271 0.7589 0.7418 0.5609
10 % 0.9213 0.7518 0.7297 0.5555
25 % 0.9222 0.7551 0.7320 0.5588
50 % 0.9226 0.7579 0.7336 0.5608
LC → MO
5 % 0.9004 0.7419 0.7028 0.4884
10 % 0.8964 0.7338 0.6913 0.4971
25 % 0.8974 0.7234 0.6886 0.4870
50 % 0.8970 0.7232 0.6986 0.5030
Moving along the point-wise annotations, annotation errors from pathologists can be
simulated as perturbation noise and GT points can be some pixels further from the actual
center. In Fig. 2.21, we can see how the noise during annotation can affect the segmentation
performance, especially when the points have a distance larger than 8 pixels from nuclei
center, reaching closer to nuclei boundaries or even falls out from nuclei.
97
Figure 2.21: (a) Perturbations in point-wise annotations. Yellow points represent nuclei
center, while red and blue are points offset by four and eight pixels, respectively. (b) Objectwise metrics for nuclei segmentation over different amounts of perturbations measured in
pixel distance from nuclei center. (results from [111]).
2.4.3.2 Qualitative Comparison
As has been stressed from many papers, the main source of errors lies in the contours of nuclei
that compromises the segmentation performance. Overlapping or touching nuclei in dense
clusters and blurred boundaries can cause over or under segmentation problems. Contour
aware attention mechanisms in a Deep Neural Network (DNN) turn out to help the network
to delineate the nuclei boundaries more accurately.
CNN3 [101] introduces a third class for detecting the boundaries of cells. The main
motivation of that is that in a post-processing step, touching nuclei can be accurately segmented. This is possible by trying to grow the nuclei area in an iterative way, maximizing
the boundary probability, without decreasing the nuclei one of other neighboring instances
(constrain invasions). They let the nuclei anisotropically grow until boundary-class reaches
98
Figure 2.22: In the middle and bottom rows, ground truth (annotated) boundaries are red,
detected are blue, and the overlap between the two is yellow. Segmentation comparisons
are shown with the CNN2 baseline model. In bottom row yellow is more prevalent, thus
indicating more precise boundaries detection with respect to ground truth. (figure from
[101]).
a local minima, while constraining the growth with the inside (nuclei) and outside (background) classes of surrounding areas. In comparison with CNN2 [107], it helps to identify
boundaries and touching nuclei more precisely (see Fig. 2.22).
Subjective annotations results in mislabelled instances and inaccurate boundary delineations. The inter-annotator variance is even more evident due to blurred edges and staining
artifacts. Based on the observation that noise in labeling has the tendency to statistically
dominate the gradients and hence loss calculations. CIA-Net [21] shows an improved performance over earlier methods by adding the “truncated loss” to diminish the influence in
99
Figure 2.23: Comparison of CIA-Net without the Information Aggregation Module (IAM)
and BES-Net. CIA-Net can identify more accurately connected nuclei that should be slit,
even when ground truth is noisy or mislabelled (figure from [21]).
the learning of the outlier regions with high confidence. Focusing on the more informative
regions in the training process helps mitigate the over-segmentation problem. In Fig. 2.23,
one can see the effectiveness of adding the IAM unit that focuses on the texture and spatial
dependence between nuclei and their boundaries, as well as that of truncated loss function
for more accurate boundary detection, despite the noisy labels.
As mentioned, another way to mitigate over-segmentation and resolve overlapping instances is proposed in Hover-Net [95]. Instance-wise horizontal and vertical distances from
their respective center of mass provide rich information to the encoder branch, on top of
textural features. In Fig. 2.24, we can see how the distance information helps in splitting
the nuclei apart.
Cropped image regions show horizontal and vertical map predictions, with corresponding
ground truth. Arrows highlight the strong instance information encoded within these maps,
100
Figure 2.24: Horizontal and vertical prediction maps are shown on areas prone to overlapping
nuclei, along with their corresponding ground truth. Distance information alleviates oversegmentation and nuclei splitting phenomena (adjusted figure from [95]).
where there is a significant difference in the pixel values.
Besides overlapping nuclei, another major challenge in this area is nuclei size variability
among different organs, datasets, and scanning protocols. Attention mechanisms among
the encoder and decoder have been proven efficient because they bring more contextual
information from the coarse feature maps with larger receptive fields. In [105], the coarser
features are used as a gating signal within the gated attention on the skip connections between
the encoder and decoder. In Fig. 2.25, we can see the effect of such a gated mechanism over
other SOTA models. It can identify and segment more accurately both small and large size
nuclei with different textures, thus decreasing the false negative rate.
Qu et al. [24] carry out a visualization comparison using certain fractions of point-wise
annotations to show how reducing the training points can affect the nuclei segmentation
performance. In Fig. 2.26, we can observe that for easier images where there are not many
nuclei texture/color variations, even when less than 50% of point annotations are used, the
segmentation quality is good. On the other hand, for other more challenging images where
nuclei texture varies significantly, one can see that nuclei are more under-segmented and only
using 50% or the full set of point annotations, the segmentation result is more accurate.
101
Figure 2.25: A challenging nuclei segmentation comparison among several models, demonstrating the gated attention mechanism. (a) GT, (b) Baseline U-Net [18], (c) CNN2 [107],
(d) CNN3 [101], (e) Hover-Net [95], (f) NucleiSegNet [105] (adjusted figure from [105]).
2.4.4 Discussion And Conclusions
Having reviewed a large body of papers pertinent to nuclei segmentation from different
categories and analyzed their quantitative and qualitative performance on public datasets,
it is time to discuss our observations and draw some conclusions.
2.4.4.1 General Remarks
As a first point, early generic molecular segmentation tools used for nuclei segmentation
perform very poorly compared to dedicated models that target the very task. Unsupervised methods using domain adaptation or predictive learning seem to achieve a better
performance than methods that use contrastive learning, and they are dominant. Yet, all
DL-based methods with no supervision have a much inferior performance compared to the
fully or weakly supervised ones. Turns out that transferring meaningful features between
102
Figure 2.26: Demonstration of weak supervision from point labels, using different ratio of
points for training. Starting from top row, odd rows display results from LC dataset, while
even rows from the MO dataset. Upper examples show typical segmentation cases, while
bottom examples more challenging ones Gt points refer to the full set of available annotations.
(combined figures from [24])
two different domains, either from natural or medical images, is very challenging, especially
when presented with very few training images, which is the case for nuclei segmentation. On
the other hand, the two recent unsupervised methods of CBM and HUNIS, based on more
traditional segmentation ways and prior knowledge of the problem, achieve a competitive
performance even among supervised DL-based solutions. This is to say that still, traditional
techniques, when effectively applied, can provide a high performance solution with a small
number of parameters and in a more transparent way.
Early CNN architectures for binary pixel classification, before U-Net becomes the mainstream baseline for nuclei segmentation, definitely improved the performance over earlier
traditional methods. Although, they are challenged by class imbalance problems and seg103
menting dense nuclei clusters. FCNs further improved the segmentation performance, especially when coupled with multi-tasking branches that focus on the contours of nuclei.
In more recent approaches, the dominant baseline DNN architecture for nuclei segmentation is the U-Net. The best performance is yielded when DenseNet is used as a backbone
model. Also, additional extensions on top of the main model, such as attention mechanisms
that bring more contextual information at different scales and exploit the relevance between
nuclei and contour, help the classifier to detect and segment nuclei of different sizes. Drawing
a piece of evidence from CIA-Net and BRP-Net, this not only increases the segmentation
performance but also improves the generalization ability on an unseen organ since those two
models perform better on Test-2 (unseen organ) than Test-1 set in MoNuSeg. Moreover,
boundary awareness during training definitely helps a model boost its segmentation performance. That can be achieved either using a separate class for boundaries, combined with
a post-processing method, or via gated mechanisms that learn more from the informative
regions (i.e. boundaries and nuclei), hence suppressing other background information that
is more noisy. Mask RCNN has been proven to be effective in other computer vision tasks,
albeit in nuclei segmentation, its performance is not convincing compared to other baseline
models. It requires many annotated data for training and, thereby is not very practical
for this task. Besides, its large model size and slow inference time make it less efficient for
deployment.
2.4.4.2 Weak Supervision Standpoint
One commonly faced challenge in this problem is the noisy labels because that is a laborious
and sensitive task, prone to errors and subjectivity. One question that arises is how models
can identify the erroneous labels and prevent their influence on the learning model (in terms
of gradient propagation). In Fig. 2.20, we can see that only the top 10% of samples have
an influence on the 80 % of the distribution of overall cross-entropy loss. That is, the really
informative regions are scarce, and most foreground examples have a minuscule contribu104
tion to the learning process and wrong labels largely affect training. As a remedy, some
efficient techniques applied are gated attention for boosting contextual multi-scale information, contour-specific task learning to complement nuclei appearance, and loss functions that
can account for class imbalance and suppress noisy regions. The utter goal is to steer the
learning process toward more informative regions and rely less on labels that convey noise
and may be misleading to gradient descent process. Models are better off trained by putting
more emphasis on the nuclei and their corresponding contours, digging out the labels that
conform with the model’s predictions (more informative about nuclei shape, texture, and
boundaries).
Trying to ease the annotation process (can save almost 88% [113] of the full pixel annotation time) and enable access to larger annotated datasets, partial (or point-wise) annotations
is a recent line of research in nuclei segmentation that has attracted a lot of interest. Observing results that include certain fractions of the overall points, the more data points we
include, the higher the performance is. Yet, a remarkable note is that even using a very
small fraction of points –5 or 10%– the performance does not change much compared to
the full set of annotated points. Notably, the initial choice of a set of points seems not to
play a significant role in the model’s performance. Certainly, models trained on the full
pixel-wise annotations seem to have a better performance, nevertheless, the gap is relatively
small (about 7 %, when using point-wise annotations.
From these findings, it is evident that exhaustively annotated datasets with pixel-wise
labels are not much favored. The tremendous savings in annotation time from the point-wise
labels –given the small performance gap with the full mask based methods– provides a better
trade off for future research, as it can enable the acquisition of larger datasets at much less
cost. Moreover, weak supervision using a reduced training set of points does not affect much
the generalization performance across different datasets, although this needs to be validated
with more experiments in the future using larger testing datasets. Finally, perturbations
in point-wise annotations resulting from human error are tolerated by a certain amount
105
of pixels. After some point, the performance drops significantly since nuclei positions are
very misleading, and Voronoi labels become too noisy. It is important then to identify the
amount of labels, their type, and the areas that needed annotations for minimizing the labels
requirement but still achieving a high performance, satisfying pathologists so that they can
use future CAD tools in their everyday cancer diagnosis pipeline.
2.5 Future Work
All in all, supervision seems advantageous to the nuclei segmentation task, but the amount
and type of supervision is the key for foreseeing the future line of research. The impractical
and costly nature of pixel-wise annotations inhibits the rapid growth of the field, taking into
account that DL-models require by their nature a large amount of data samples. Trying to
extrapolate from current methods, fully unsupervised methods are hard to achieve a high
performance. The nuclei color and texture variations from the staining and digitization process, make domain adaptation very challenging for unsupervised methods, and hence their
generalization ability is poor. However, still strong prior knowledge and simple image processing techniques (e.g. HUNIS method) can provide a competitive performance. Although,
there is still a gap compared to state-of-the-art DL-based fully supervised methods.
Future models in nuclei segmentation will be mainly trained using the weak supervision
paradigm. Point-wise annotations reduce dramatically the annotation time and costs. Taking into account that the performance gap with fully supervised methods is relatively small,
as well as this research trend is more recent and has not been fully explored, one can argue
that point-wise based methods pave the way for future trends.
Taking it one step further, from our earlier observations, achieving a high performance
may not require all the nuclei points to be annotated. This shows a direction, where future
algorithms will be trained on points from a few nuclei that are representative of the overall
image distribution. This is expected to minimize pathologists’ labor and yield datasets with
106
Figure 2.27: The envisioned workflow for CAD-assisted annotations in nuclei segmentation
from an unsupervised model to identify areas need annotation. Two modes of annotations
may be required from future models to achieve a high performance: (a) point-wise and (b)
mask annotations. A hybrid model that can best leverage this information will benefit from
a minimum set of annotations to achieve high performance. Certainly, the quality of training
depends on the unsupervised model that guides the annotation process.
more annotations. Therefore, AI algorithms can be part also of the annotation tool, thus
making the nuclei annotation process more to the point. That is, we envision a system that
suggests what nuclei need supervision and guides the segmentation process by indicating to
pathologists which nuclei they need to pinpoint on their centers. This will reduce human
fatigue and hence shifting errors in point-wise annotations that as shown earlier can impact
the performance.
The question arises is for which nuclei we need human supervision. Furthermore, what
is the smallest and most representative subset of nuclei points needs to be annotated, so the
model under training has a representative distribution about the nuclei of a histopathology
image and can potentially maximize the generalization ability to others nuclei as well. For,
easy to segment nuclei may not need supervision and unsupervised methods can perform
well already. Also, outlier nuclei in appearance should not be included, trying to filter
out noisy annotations. For pointing to the annotator, which pixels need supervision, an
unsupervised model would be preferably deployed to identify the group of nuclei whose
107
appearance is more challenging. A supervised model pre-trained on another dataset could
also be an option, nevertheless, the bias that would be carried over may give a distorted
idea of the areas that need annotation. Hence, we argue that an unsupervised model is more
intuitive for identifying areas and nuclei that need supervision. The key to that is to find a
representation about nuclei where different appearance aspects can be encoded. Therefore,
it would be easy to identify inlier and outlier nuclei from a distribution based on Gaussian
distance criteria.
As another path, in some challenging images, for certain groups of nuclei (e.g. high inner
nuclei variation appearance or overlapping nuclei) may also need their full segmentation
masks to enhance the annotated data and help in the model training. Thus, it would be
interesting in the future to develop hybrid models that could be trained from an annotated
dataset that comprises point-wise and fully annotated masks together in areas that models
need pixel-level supervision to achieve a very high performance. Fig. 2.27 illustrates the
workflow pipeline we imagine for future data annotation in nuclei segmentation.
With regards to transparent solutions for medical applications, interpretable models will
be favored in the future, so pathologists can understand the segmentation output that comes
out of an AI-assisted CAD tool and what factors (i.e., visual features) led to the result.
Therefore, apart from a good segmentation performance, explainability and transparency in
the nuclei segmentation pipeline are also of high importance, as it will make the future CAD
tools more trustworthy to pathologists and hence more easily integrated into their everyday
clinical diagnosis.
108
Chapter 3
Unsupervised Nuclei Segmentation using Thresholding Operations
3.1 Introduction
Histology images provide strong cues to pathologists in cancer diagnosis and treatment.
Automated nuclei instance-level segmentation for histology images provides not only the
number, density and sizes of nuclei but also morphological features, such as the magnitude
and the cytoplasmic ratio. This information facilitates cancer diagnosis and assessment of
the tumor aggressiveness rate. Nuclei segmentation tasks can be conducted in a supervised
or an unsupervised manner. For supervised methods, the annotation of high resolution histology images in pixel-level accuracy is a time-consuming job, being carried out by expertized
physicians. Other identified challenges include the variability of cell appearance from different organs, unclear boundaries, as well as color and intensity variations in stained images
from different laboratories. Furthermore, it is a subjective task and annotated labels tend
to vary from one person to the other. All the above factors challenge the practical generalizability of supervised segmentation methods, as histology images become larger in their
number and more diversified in content.
Earlier methods on nuclei segmentation were mainly unsupervised. They were based
109
on thresholding [130, 59], mathematical morphology for robust feature extraction [131], or
statistical modeling for segmentation and likelihood maximization for boundary refinement
[132]. Another popular tool was the watershed algorithm, which was combined with various
ways for extracting potential nuclei locations [133, 62]. Moreover, a work by Ali et al. [134]
proposed an adaptive active contour mechanism that takes boundary- and region-based
energy terms of the cell into account.
Recently, deep-learning-based (DL) methods [128, 101, 129, 21] have been applied to this
problem. In general, supervised methods provide significantly better performance than the
unsupervised ones. The motivation of our research stems from the fact that nuclei instance
labeling is a fairly laborious task, with highly subjective annotations and miss-labeling rate
[135] that challenges the supervised solutions. To give a rough idea about the workload,
reportedly, for annotating 50 image patches (about 12M pixels) requires from an expert
pathologist to work for 120-230 hours [136]. Besides, there is a domain shift problem [137]
arising from stain and nuclei variations in histology images of different organs, patients and
acquisition protocols. As such, the large domain gap given the small number of annotated
samples impede most of the DL models from achieving a high performance.
Several studies have been conducted to mitigate the labeling cost. For example, Qu et
al. [24] proposed a two-stage learning framework using coarse labels. Other researchers
[87, 91] have investigated the self-supervised DL approach to reduce the number of required
labeled data by exploiting the observation that nuclei size and texture can determine the
magnification scale.
On the same line, lately there have been proposed a few unsupervised DL methods [85,
123] offering various domain adaptation techniques by learning representations from different
source domains. Also, other unsupervised approaches [136, 90] use Generative Adversarial
Networks (GANs) to synthesize histology images for the source domain, in lieu of labeled
data for the nuclei segmentation model.
In spite of their high performance, DL-based solutions have their own shortcomings. First,
110
Figure 3.1: Illustration of nuclei cell appearance in histology images.
Figure 3.2: An overview of the proposed CBM method.
they are perceived as a “black-box” approach. Interpretable solutions are highly desired in
medical applications, as they enable explainable decisions and the tools are more trustworthy
in clinical use. Second, they bear a high computational cost in training and testing [138]
because of the large number of parameters required to achieve a certain performance.
In this work, we propose an unsupervised data-driven method to solve the nuclei segmentation problem. The solution consists of three modules applied to each image block of size
50 × 50: 1) data-driven color transform for energy compaction and dimension reduction, 2)
data-driven binarization, and 3) incorporation of geometric priors with morphological image
processing. It is named “CBM” because of the first letter of the three modules – “Color
transform”, “Binarization” and “Morphological processing”.
We conduct experiments on the MoNuSeg dataset to demonstrate the effectiveness of the
proposed CBM method. It outperforms all other unsupervised approaches and stands in a
111
competitive position with supervised ones, based on the Aggregated Jaccard Index (AJI)
performance metric. Thus, the proposed CBM method is an attractive solution in practice
nuclei segmentation, since it yields comparable performance with state-of-the-art supervised
methods while requiring no training labels and being transparent on its segmentation result.
3.2 Proposed CBM Method
Each histology test image of the experimenting dataset has a spatial resolution of 1000×1000
pixels and each pixel has R, G and B three color channels. We partition each image into
non overlapping blocks of size 50 × 50 pixels and apply contrast enhancement to accentuate
the boundaries around nuclei as a pre-processing step. Then, CBM processes each block
independently with three modules as elaborated below.
3.2.1 Data-Driven Color Transform
Nuclei segmentation is a binary decision problem for each pixel; namely, it belongs to either
the nucleus or the background region. We need pixel attributes to make decision. A pixel
has R/G/B color values in raw histology images. There exist strong correlations between
RGB channels in histology images as shown in Fig. 3.3(a). We can exploit this property for
energy compaction.
There are many well-known color spaces such as YUV, LAB, HVS, etc. However, they are
not data-dependent color transforms. To achieve optimal energy compaction, we apply the
principal component analysis (PCA) to the RGB 3D color vector of pixels inside one block.
That is, we can determine the covariance matrix between the R/G/B color components
based on pixels in the region. Then, the three eigenvectors define the optimal transform and
their associated eigenvalues indicate the energy distributions among the three channels. The
transform output channels are named P, Q and R channels. They are channels of the first,
the second and the last components, respectively.
112
The energy distributions of three color components of the RGB, LAB and PQR color
spaces for a representative histology image is shown in Table 3.1, where the P/Q/R channel
energy distribution is averaged over all blocks in one image. As shown in the table, the
first principal component, P, has an extremely high energy percentage while the rest two
components have very limited energy. As a result, we can treat the latter two as background
noise and discard them. Instead of considering segmentation using three attributes, we
simplify it greatly using a single attribute. As the first principal subspace, P points at the
direction where the variance is maximized. It better captures the transition from background
to nuclei cell areas, thus leading to a more distinct local histogram.
We normalize raw P-channel values to the range of [0, 1] with linear scaling. Note that, if
x is an eigenvector of the covariance matrix of RGB color channels, −x is also an eigenvector.
We choose the one that maps a higher P value to background (i.e., brighter) and a lower
P value to nuclei (i.e., darker). This can be easily achieved by imposing the P value to be
consistent with the luminance value in the LAB color space, since the background luminance
is higher than those of nuclei. The original color and the normalized P-channel representations for a block are compared in Fig. 3.3. Performance using the P channel against the L
channel (in the LAB color space) will be compared in Sec. 4.4.
Table 3.1: Energy distribution of three channels of three color spaces in a representative
histology image.
RGB LAB PQR
38.1% (R) 29.3% (L) 97.7% (P)
27.2% (G) 40.1% (A) 1.9% (Q)
34.7% (B) 30.6% (B) 0.4% (R)
One can see that the energy compaction capability of P/Q/R depends on color homogenuity in a region. In most images, as the block size grows larger, the P-channel energy
compaction property becomes lower. This is attributed to the fact that the color distribution of a larger block is less homogeneous and the correlation structure is more complicated.
113
(a) (b)
(c)
Figure 3.3: Comparison of two representations in a block image: (a) R/G/B color and (b)
P value in gray and (c) its corresponding bi-modal histogram.
3.2.2 Data-Driven Binarization
To conduct the binary classification of pixels, we study the histogram of the P value of each
pixel in a block. A representative histogram is shown in Fig. 3.3(c), which has a bi-modal
distribution. The modality in the left is contributed by the P values of pixels in the nuclei
region while that in the right comes from the P values of pixels in the background region.
There exists a valley between the two peaks, which are from pixels lying on transitional
boundaries between nuclei and background. Thus, binarization can be achieved by identifying the intermediate point in between the two modalities and use the associated P value
as an adaptive threshold, which is denoted by T. A pixel of P value is classified to the
background region if P > T and the nuclei region if P ≤ T. Since threshold T is determined
by the histogram of the P value of pixels in a block, the binarization process is fundamentally an adaptive thresholding method. That is, threshold T is automatically derived in each
block without any intervention. Whether it will yield a successful outcome depends on the
114
bi-modal histogram assumption.
The bi-modal histogram assumption holds under the following two conditions:
1. if the block size is not too large,
2. if the ratio of the nuclei pixel number and the background pixel number does not
deviate much from unity.
In case the first condition is not met, we may see K-modalities with K > 2. Then, there
are multiple valley points, which makes the threshold selection challenging. If the second
condition is not met, it means that the majority of pixels belong to one of two classes and we
may see one dominant modality while the second modality is weak. Because it is difficult to
choose a robust threshold under the first condition, we partition one block of size 50×50 into
four sub-blocks of size 25 ×25 and conduct the data-driven color transform and binarization
in each of the four sub-blocks. On the other hand, if the second condition happens, we
merge 4 blocks into a super-block of size 100 × 100 and conduct the same processing in the
super-block.
3.2.3 Morphological Processing
We have so far concentrated on pixel classification based on its color attributes in the first
two modules of the CBM method. Now, we would like to take the neighborhood of a pixel
into account. Typically, nuclei appear in form of rounded blobs (i.e., convex objects). Yet,
we observe the following three common errors after the second module:
(a) nuclei instances may be falsely connected,
(b) false positives may appear because of the staining process,
(c) holes exist inside the nuclei cell because of the inner intensity variations.
For (a), we can split the larger ones using the convex hull algorithm to find high convexity
areas that imply an underlying connectivity between two cells. For (b), we can filter out
115
abnormally small nuclei in the first place based on the prior knowledge on the average area of
a nucleus. For (c), the hole filling filter is used to correct the false negatives inside the nuclei
cell. For some images with dense cell areas and blurred boundaries, the connectivity may be
more severe, having bundled together more than two cells. We found that, after applying the
previous procedure in an iterative manner, the segmentation performance further increases,
since some connected nuclei may require more than one iterations to be split. The effect of
this processing is illustrated in an example as shown in Fig. 3.4.
Figure 3.4: Visualization of the morphological processing effect: input block (upper left),
noisy binarized output (upper right), an improved result by splitting two distinct nuclei
(bottom left), and segmentation ground-truth (bottom right).
116
3.3 Experimental Results
The data from the 2018 MICCAI MoNuSeg Challenge (Kumar dataset) [101] is used to
evaluate the performance of our proposed CBM method. This is a popular dataset for
comparisons among other methods, offering also different testing protocols.
For performance benchmarking, we follow the data splitting of [101]. That is, we report
the results on the MoNuSeg Challenge test set (14 images from various organs), as well as the
6 histology images set from bladder, colon and stomach, referred as Test-2 (unseen organ)
in Tables 4.1 and 4.2, respectively. For evaluation purposes, the AJI metric [101] is adopted
from most of the papers over the F-1 score or DICE coefficient. AJI is more accurate for an
instance-level segmentation task as the nuclei segmentation, by taking both the nuclei-level
detection and pixel-level error performance into account.
Since our pipeline is parameter-free, there are no hyper-parameters to be fine-tuned on
each testing set we evaluate our method. Threshold T and P QR color transform weights
–described in Section 3.2– are automatically determined from the pixel distribution in local
blocks. Furthermore, the priors about nuclei shape and size for the morphological processing
module are common between the two testing sets.
As shown in Tables 4.1 and 4.2, our method outperforms all the rest unsupervised ones
by a large margin in either testing scenarios. Notably, apart from the Cell Profiler and Fiji
methods, the performance gap is large even for the DL-based unsupervised [136, 85, 123,
90, 87] approaches. This is mainly because it is quite challenging for DL models to bridge
the source and target domains with so few data, given the large domain discrepancies in
histology images from various organs.
Comparing with supervised methods, CBM stands at a competitive position. It achieves
similar performance with sophisticated model architectures, such as BES-Net [19]. It is
also worthwhile to emphasize that most of the supervised methods use expensive models
as backbone architecture (in terms of the number of trainable parameters) such as Res-Net
117
Table 3.2: Comparative results among different unsupervised and supervised methods using
the AJI metric on the MoNuSeg [101] testing data, where the best performance is shown in
boldface. CBM outperforms all the other unsupervised approaches.
Method AJI
Unsupervised
Cell Profiler [121] 0.1232
Fiji [122] 0.2733
DARCNN [85] 0.4461
DDMRL [123] 0.4860
Hou et al. [136] 0.4980
Self-Supervised [87] 0.5354
Liu et al. [90] 0.5610
CBM (Ours) 0.6142
Supervised
CNN2 [101] 0.3482
CNN3 [101] 0.5083
CIA-Net [21] 0.6907
SSL [91] 0.7063
or Dense-Net. The number of parameters in these models ranges between 10M to 44M.
Instead, CBM is a parameter-free pipeline with simple image processing components and
hence the computational burden is significantly lower.
It is also evident from Table 4.1 that supervision may help to reach higher performance.
Nevertheless, it is challenging for most DL supervised methods to generalize well in this
problem, especially when the annotated training data are in paucity. For instance, the
current state-of-the-art SSL method achieves the reported performance using the full training
set. According to their weakly supervision analysis [91], SSL achieves roughly the same
performance with CBM when trained with roughly 50% of the data.
Finally, to illustrate the energy compaction capability of the PQR channel decomposition,
we compare the AJI performance for the P channel and the L channel (in the LAB color
space) in Table 6.5. The advantage of using the P channel is clearly demonstrated.
118
Table 3.3: Comparative results of different methods using the AJI metric on the Test-2
set (unseen organ). CBM shows a competitive standing among other DL-based supervised
approaches.
Method AJI
Unsupervised
Cell Profiler [121] 0.0809
Fiji [122] 0.3030
CBM (Ours) 0.5808
Supervised
CNN3 [101] 0.4989
DCAN [128] 0.5449
PA-Net [129] 0.5608
BES-Net [19] 0.5823
CIA-Net [21] 0.6306
Table 3.4: AJI performance comparison between P and L channels.
Colorspace Test-2 Challenge Data
L 0.5414 0.5856
P 0.5808 0.6142
3.4 Conclusion and Future Work
Nuclei segmentation in histology images is a demanding and prone to errors task for physicians, and its automation is of high importance for cancer assessment. The proposed CBM
method offers a promising lightweight and parameter-free unsupervised direction for nuclei
segmentation, requiring no labeled data. It addresses the problem based on a data-driven
color conversion and binarization, as well as a morphological module that takes into account
priors about the nuclei shape and size. In the future, we would like to boost the segmentation
performance furthermore by exploring weakly or self-supervision approaches.
119
Chapter 4
Unsupervised Nuclei Segmentation using Adaptive Thresholding and Self Supervision
4.1 Introduction
Medical imaging is one of the fields that benefit a lot from the advancement of modern AI
algorithms. They enable the computer aided diagnosis (CAD) tools to serve as physician’s
assistants. In particular, CAD in digital pathology becomes a fast growing area since it is
conducive to cancer diagnosis and assessment. As part of this process, nuclei segmentation
provides important visual cues, such as molecular morphological information [33] to expert
pathologists.
Generally speaking, nuclei instance segmentation is an indispensable task in histology
images reading for cancer assessment. Its automation is of high significance for pathologists’
reading process. Hematoxylin and Eosin (H&E) staining has been used for years in histology
to reveal the underlying nuclei structure. Variations along this process, especially for images
coming from different laboratories that use different protocols and scanners, may affect
nuclei color and texture. Manual segmentation of histology images carried out by expert
120
pathologists is a labor-intensive and time-consuming task, also subject to high inter-observer
variability [119]. Thus, a sufficiently large amount of annotated data is in paucity.
A high-performance unsupervised nuclei instance segmentation (HUNIS) method is proposed in this work. HUNIS consists of two-stage block-wise operations, where the first stage
provides an initial segmentation result and yields pixel-wise pseudo-labels for the second
stage. This self-supervision mechanism is novel and effective. It is shown by experimental
results that HUNIS outperforms other unsupervised methods by a large margin. Besides,
HUNIS is highly competitive with state-of-the-art supervised methods. The rest of the paper
is organized as follows. Related work is reviewed in Sec. 4.2. HUNIS is presented in Sec.
4.3. Experimental results from the MoNuSeg dataset are shown in Sec. 4.4. Concluding
remarks are given in Sec. 4.5.
4.2 Review of Related Work
Before the advent of the deep learning (DL) paradigm, earlier methods addressed this segmentation problem with no supervision. Examples include: Adaptive thresholding [55, 139,
140], clustering [141], active contours [142, 143], and graph cuts [144]. Another popular
method is the watershed algorithm [145, 62], which is often used as a post-processing step,
where research was mainly focused on finding proper markers to initialize the segmentation
process.
In recent years, DL solutions are prevalent [95, 101, 105, 19, 104]. They attempt to
handle multi-scale appearances of nuclei through separate branches of the networks and
negative effects of hard samples through customized losses [95, 91, 21]. A self-supervised
learning method was proposed by Sahasrabudhe et al. [87], that regularizes the encoder
model implicitly with scale. Despite their effectiveness, it is challenging for DL models to
generalize and transfer learned models from training to testing domains. Given the smallsized publicly available datasets and nuclei variations across different organs, DL solutions
have their limitations.
121
1 - Adaptive
Thresholding
2 - Nuclei Shape
Priors
3 - FP Instance
Removal
Self-Supervised
Classi�ication
Morphological
Re�inement
P
I
Stage-2
Stage-1
Input Hematoxylin
Image
Predicted Mask
Figure 4.1: An overview of the proposed HUNIS method, where the first stage provides an
initial nuclei segmentation result and yields pseudo-labels to guide the segmentation in the
second stage.
Lately, unsupervised methods have shown promising performance on the nuclei instance
segmentation task [139]. Among them, [136, 90] are DL-based methods. [90] and [85]
adopt domain adaptation and model regularization, while [136] uses generative adversarial
networks (GANs) to synthesize histology images for the nuclei segmentation model. Yet,
their performance is far inferior to that of supervised DL methods. On the other hand,
[139] is a non-DL solution, offering a transparent pipeline for addressing the problem and
requiring no training data.
In this work, we devise a two-stage unsupervised processing pipeline, namely HUNIS.
The first stage consists of a novel adaptive thresholding operation and a false positive (FP)
nuclei removal module, to obtain an initial segmentation output. Then, the first stage’s
output is used to provide pixel-wise pseudo-label to guide the second stage processing for
a more accurate segmentation. This self-supervision mechanism is novel and effective as
demonstrated by experimental results in Sec. 4.4.
4.3 Proposed HUNIS Method
An overview of the proposed HUNIS method is shown in Fig. 4.1. It consists of a two-stage
block-wise operations pipeline. The first stage includes: 1) adaptive thresholding of pixel
intensity values, 2) incorporation of nuclei size/shape priors and 3) removal of false positive
122
instances. The first stage provides an initial segmentation result and yields the pseudo-labels
for self-supervising the second stage. The second stage exploits color and shape information
under the self-supervised setting for a more accurate segmentation.
4.3.1 First-Stage Processing
4.3.1.1 Adaptive Thresholding
A histology image of size 1000 ×1000 is first decomposed into non-overlapping blocks of size
50 × 50 to ensure homogeneity at the local level. Block boundaries do not pose an issue,
since any small inconsistencies between neighboring blocks from the adaptive thresholding
are alleviated from the subsequent morphological processing in module-2 of stage-1. As
a pre-processing step, we accentuate the foreground nuclei over background tissue, thus
enhancing the subsequent adaptive thresholding operation. In general, the nuclei chromatic
palette is mostly captured from Hematoxylin (H), in contrast with Eosin (E) that carries
more information about the background. To this end, the original color image is projected
on the H color basis using the approach described in [146]. A contrast enhancement is
applied to the H-image to further highlight the nuclei. Next, color pixels are converted
into monochrome ones within each block to facilitate the following thresholding operation.
The color transformation can be achieved by applying principal component analysis (PCA)
and retaining the first principal component, called the intensity value below. PCA removes
the correlation among RGB channels and achieves high energy compaction. The L value
in the LAB color space is used as reference to select the sign of the eigenvector at each
block uniquely. The color-to-intensity transformation simplifies a color-based segmentation
mechanism from 3D to 1D. While it works well for the majority of blocks, we consider color
attributes in stage-2 to further increase the segmentation performance.
We conduct local thresholding on pixel intensities in each block adaptively based on a
bi-modal assumption. That is, if the intensity histogram in a block has two main peaks, one
123
To
TcT
T
T' = To + λ(To-Tc)
To
T'
T'
Tc
Intensity Level
(%) (%)
Intensity Level
Figure 4.2: Two threshold adjustment scenarios: (top) a histogram of two imbalanced modalities and its corresponding block image and (bottom) a histogram of three modalities and
its corresponding block image.
corresponding to foreground and one to background, and the notch between the two peaks
is low enough, one can choose the intermediate point between the two peaks point as a
binarization threshold. There are however challenging cases where the bi-modal assumption
is violated. Then, a mechanism is needed to adjust the threshold. Two such examples are
shown in Fig. 4.2. They occur because of ambiguous instances with mid-level intensities
(see the top case) or poor contrast between nuclei and background (see the bottom case).
The threshold is adjusted using the following algorithm. The peak points of the first
and second modalities are shown by solid dots in the left two sub-figures of Fig. 4.2. The
mid-point between the two is denoted by To. We draw a line that is perpendicular to the
line segment formed by the two dots and passing through T0. Its intercept, Tc, with the
124
horizontal line of zero occurrence is used as a reference point for correcting the initial To.
Then, we have
T
′ = To + λ(To − Tc), 0 < λ < 1, (4.1)
where T
′
is the adjusted threshold value and the second term on the left-hand-side is called
a correction term. As shown in the top case of Fig. 4.2, if the second peak (i.e., background)
is higher than the first one (i.e., nuclei), it is likely that the block has low contrast and
ambiguous regions and it is desired to decrease To to reduce the false positive rate. On the
other hand, as shown in the bottom case of Fig. 4.2, if the mid-peak is high but not higher
than the first peak (i.e., nuclei), it is likely that the mid-modality region corresponds to
nuclei boundaries (given the first peak being the nuclei) or due to its texture. Thus, we can
increase To to segment nuclei more precisely.
The direction and magnitude of threshold adjustment is automatically determined from
the slope of the line segment connecting the two dots. A positive slope would give an
intercept, Tc, higher from To and the correction term in Eq. (4.1) is negative. Conversely, a
negative slope would yield a positive correction. A weight, λ, is used to control the correction
amount, whose value is obtained empirically from the data (small variations do not affect
much the overall performance). Note that point T in Fig. 4.2 is the reflection of T
′ about
the vertical line with intensity equal to To. It does not appear in Eq. (4.1) and it is drawn
only for illustration purposes.
4.3.1.2 Incorporation of Size/Shape Priors
The output from the adaptive thresholding module is often noisy. Prior knowledge about
nuclei size and shape can be incorporated to remove that noisy predictions at the instance
level. To this end, we calculate the histogram of nuclei sizes over the dataset that should
have minor variations with other datasets, since nuclei size has a certain distribution. Unusually small nuclei instances from unsupervised thresholding are perceived as noise and can
be filtered out. Moreover, two or more close nuclei can end up being connected after segmen125
tation because of unclear boundaries. Shape priors (e.g., nuclei shape convexity) can help
split those falsely connected nuclei. Algorithmically, the convex hull algorithm can detect
abnormally steep curves along nucleus boundary, indicating that the instance came after two
or more bundled nuclei and so they can be split. Furthermore, hole filling is used to correct
open areas in the interior of a nucleus and thus compensate for inner texture variations that
challenge the thresholding operation.
4.3.1.3 Removal of False Positive Instances
Some false positive nuclei instances cannot be filtered out in the size/shape priors module.
They usually come from darker background areas (resulting from defects in staining process)
or small nuclei with ambiguous texture. We propose a simple and efficient way to reduce
the false positive rate. In this module, we consider a larger local neighborhood, called a tile
(say, of size 200 × 200) to include more detected instances for consideration. Each tile is
processed independently. The idea is to compare instances that are more likely to be true
positives with other ambiguous instances that could potentially be false positives. The size
prior contributes here. Larger instances are less likely to be false positives while the chances
for smaller instances to be falsely marked are higher. This is mainly due to the combined
operation of adaptive threshoding and size/priors modules. Hence, we have a global nuclei
size threshold to deduce what instances will be used as “ground truth” in a tile. To be
more specific, we have two sets of instances: the reference instances set, R, and the query
instances set, Q. All instances with sizes larger than a threshold are assigned to set R while
the remaining ones to set Q. Each element in Q is compared against the mean ensemble
of elements in R. If their similarity is poor, it is likely to be a FP instance and can be
eliminated.
To evaluate the similarity and compare instances in R and Q, some attributes are extracted per instance. Since most instances from Q have lower contrast and poorer color saturation, we use HSV colorspace channels and their corresponding contrast values (x ∈ R6
)
126
per instance as features to discern FP nuclei. For R class, we aggregate all instance features
to yield one reference feature for comparison. That is, we extract the feature vector xR by
averaging their values of all instances in R via
xR =
1
|R|
X
i∈R
xi
, (4.2)
where xi
is the feature vector of the i-th instance in R. Then, we compare the similarity of
the same feature vector of a query sample against xR. The similarity metric is defined as:
S(xR, xj ) = e
−γ||xj−xR||2
2 , ∀j ∈ Q, (4.3)
where γ is a hyper-parameter. Clearly, 0 ≤ S ≤ 1. The higher the S value, the higher the
similarity. A query instance is removed, if its S < TS, where TS is another hyperparameter.
The process of false-positive nuclei instance removal is illustrated in Fig. 4.3.
4.3.2 Second-Stage Processing
4.3.2.1 Self-Supervised Binary Classification for Uncertain Pixels
The nuclei color distribution in a tile is more stable than that in the entire image and gives
richer information about nuclei appearance over background. As such, unlike stage-1 that
carries out operations on monochrome patches, this module operates in tiles of 200 × 200
from the Hematoxylin image. We first train a classifier based on the pseudo-labels obtained
from step-1 in a pixel-wise manner. Then, we conduct prediction for uncertain pixels, based
on the classifier’s confidence. Most of those pixels usually lie close to the nuclei boundaries.
As long as the majority of pixel labels in a tile are correct, some of the noise in pseudo-labels
can be removed and thus pixels are more likely to be assigned to their correct class. This
implies that large and solid nuclei are more probable to stay intact, while some of the smaller
instances or nuclei boundary areas can be corrected towards their correct class.
127
r1
r2
r9
r6
r8
q7
q5
q3
r4
Feature
Extraction
Figure 4.3: Illustration of the effect of the false positive removal module, where some smallsized instances (in red) are compared with larger instances (in green) that are more likely
to be actual nuclei. The marked instances in grey (right sub-figure) have a similarity score
below Ts and are eliminated.
4.3.2.2 Shape Refinement
In this module, we perform a final round of nuclei shape refinement using the same priors
and procedures as in its counterpart from Stage-1. Since pixel-wise classification is prone
to false predictions to some extent, this module refines the output so that it has a better
segmented nuclei instance. This refinement is necessary since the shapes of some nuclei could
be distorted after splitting or due to unclear boundaries. Thus, we preserve convexity using
the convex hull algorithm for nuclei that have abnormally steep contours regions. Other
morphological operations are also used to refine the nuclei shape.
128
4.4 Experimental Results
The proposed HUNIS method is evaluated on the 2018 MICCAI MoNuSeg dataset [101] to
demonstrate its effectiveness. The dataset offers different testing protocols. For performance
benchmarking, we follow the data splitting scheme as specified in [101]. That is, we report
results on two MoNuSeg Challenge test datasets:
• 14 images from various organs whose histology images are available in the training
dataset, here referred to as MoNuSeg Test-1.
• 6 histology images from three unseen organs (bladder, colon and stomach), referred to
as MoNuSeg Test-2.
The parameters in Eqs. (4.1) and (4.3) are set to λ = 0.3 and γ = 0.1, respectively. The
similarity threshold in Fig. 4.3 is set to Ts = 0.6. All other parameters are determined
automatically from the data. For evaluation purposes, it is common among other works to
use the Aggregated Jaccard Index (AJI) [101], rather than the F-1 score or DICE coefficient.
AJI is more suitable for instance-level segmentation tasks, since it considers both nuclei-level
detection and pixel-level error performance.
The results for MoNuSeg Test-1 and Test-2 are shown in Tables 4.1 and 4.2, respectively.
All benchmarking methods except CBM are DL-based. As shown in Table 4.1, one can see
that HUNIS outperforms all DL unsupervised approaches by large margins in Test-1. It
also outperforms CBM by 0.0245 in terms of the AJI score. Furthermore, HUNIS achieves
a competitive standing among supervised DL methods in Test-1. Its performance is close to
that of the 2nd best in the table, namely, UNet-Atten. [104].
The domain adaptation task in Test-2 is quite challenging for supervised methods since
they need to make a decision on data from unseen organ. As shown in Table 4.2, HUNIS
outperforms all benchmarking unsupervised and supervised methods, including sophisticated
DL models such as CIA-Net. Evidently, it is difficult for DL models to generalize well from
129
Table 4.1: Quantitative results and performance comparison on Test-1 using AJI metric.
Method AJI
Unsupervised
DARCNN [85] 0.4461
Hou et al. [136] 0.4980
Self-Supervised [87] 0.5354
Liu et al. [90] 0.5610
CBM [139] 0.6142
HUNIS (ours) 0.6387
Supervised
CNN3 [101] 0.5083
Hover-Net [95] 0.618
UNet-Atten. [104] 0.6498
NucleiSegNet [105] 0.688
training to testing when the amount of annotated data is scarce. In contrast, Test-1 and
Test-2 make little difference for unsupervised methods. We see that HUNIS can achieve an
even higher AJI score in Test-2 as compared with Test-1.
An ablation study that demonstrates the progressive improvement on segmentation results in various stages of HUNIS is given in Table 6.5. It shows the effectiveness of false
positive instances removal module (Module 3 in Stage 1) and two modules in Stage 2. They
contribute to significant AJI score improvement from the output of the previous stage.
It is worthwhile to stress that HUNIS carries very few overall parameters. In contrast,
modern DL models typically contain millions of parameters. For example, the NucleiSegNet
model and the UNet-Atten model take up 15.7M and 32M parameters, respectively. They
need GPU to conduct the training and testing tasks. HUNIS can be implemented by software
in mobile/edge devices. Furthermore, it is a fully unsupervised solution requiring no training
data at all.
130
Table 4.2: Quantitative results and performance comparison on Test-2 using the AJI metric.
Method AJI
Unsupervised
Cell Profiler [121] 0.0809
Fiji [122] 0.3030
CBM [139] 0.5808
HUNIS (ours) 0.6548
Supervised
CNN3 [101] 0.4989
BES-Net [19] 0.5823
CIA-Net [21] 0.6306
Table 4.3: Segmentation improvement over different stages of our pipeline in Test-2 set
Stages Stage-1 (Modules 1&2) Stage-1 (Modules 1&2&3) Stages 1&2
AJI 0.6045 0.6377 0.6548
4.5 Conclusion and Future Work
An unsupervised nuclei instance segmentation method, namely HUNIS, was proposed in
this work. It contains several novel ideas, such as an advanced adaptive thresholding scheme
that can adjust the binarization threshold based on the local distribution automatically, an
efficient false positive nuclei removal technique that can eliminate ambiguous instances, and
a self-supervised learning mechanism that can finetune the segmentation results. HUNIS
outperforms other unsupervised methods and maintains competitive performance against
state-of-the-art supervised methods. It also comes with a very low overall computational
complexity, thus offering a green solution to the nuclei instance segmentation problem. It
is interesting to extend the developed methodology to other relevant medical segmentation
problems as well.
131
Chapter 5
LG-NuSegHop: A Local-to-Global SelfSupervised Pipeline For Nuclei Instance
Segmentation
5.1 Introduction
Cancer diagnosis from biopsy tissue specimens has been the standard way to tumor detection
and grading. Cancerous and healthy cells have distinct molecular profiles which can provide
important visual cues to pathologists. Nuclei segmentation is a fundamental task within
this diagnosis pipeline, since the nuclei cell topology, size and shape play a crucial role
to cancer grade reading. Hematoxylin and Eosin (H&E)-stained images give rise to this
molecular profile by highlighting the nuclei cells and it has been the cornerstone process for
histolopathological slides preparation [147].
Undoubtedly, histopathological image reading is a painstaking task. It relies on very
subtle visual cues, requiring also highly expertise. On top of this, digitized slides are captured
usually under a high magnification level, typically ranging from 20x-40x. That results in
very high resolution images which pathologists need to examine thoroughly to recognize
potentially cancerous regions. Given that multiple cores usually sampled out of each patient,
132
one can realize that analyzing histology slides is a fairly time consuming and laborious
task [33]. Computer-aided diagnosis (CAD) tools are meant to automate certain physicians’
tasks, offering also a more objective decision making process. Automated nuclei segmentation
can expedite the slide reading process by highlighting the molecular patterns and enhance
pathologist’s reading. Moreover, it can be used as the intermediate step toward whole-slide
classification (WSI) for models aiming to learn the pattern of clusters that nuclei form and
map that to a grade group of cancer [148, 149] (see Fig. 5.1).
Nuclei segmentation poses several challenges to models and algorithms. At first, the H&E
staining process [150] involves many steps carried out manually from humans and thus it is
far from stable and noise-free. Staining artifacts can also increase the intra-class distance,
while during the image acquisition process, the type of scanner and its parameterization can
also affect the nuclei appearance [11]. Another challenge that modern Deep Learning (DL)
models are faced with is the lack of large annotated datasets, since it is a labor intensive
task that only expert pathologists can perform. Therefore, data annotation is expensive and
also subject to high inter-observer variability [35], which regards this problem as learning
from noisy labels.
There is a plethora of works in existing literature which approach the problem from
different angles. Prior to the DL-based methods, most of the works focused on unsupervised
methodologies. For instance, different variants of thresholding operations [59, 57], active
contours [71] and level sets [70], watershed algorithm [66, 151], Graph cuts [73] and K-means
clustering [77]. Those approaches were mostly relied on biological priors of the problem,
particularly about the nuclei appearance, shape and size.
It has been almost a decade since the advent of DL in the medical imaging field. For segmentation tasks, fully convolutional pipelines, such as U-Net [18] are popular choices among
the researchers for semantic segmentation. Fully supervised methods use U-Net as backbone architecture [95, 97], also coupled with attention mechanisms tailored to focusing the
learning on the error-prone regions (i.e. nuclei boundaries) [21, 152]. Since fully supervised
133
Figure 5.1: Nuclei segmentation to provide input and assist pathologists or AI tools to
diagnose and grade cancer.
methods are heavily challenged from the lack of large annotated datasets, weakly supervised
methods[23, 22] attempt to learn either using less labels or point-wise annotations [153, 24].
Furthermore, unsupervised learning methods use self-supervision and specifically employ
domain adaptation [85] and predictive learning [87] to transfer the nuclei appearance from
other domains. Nevertheless, they fail to achieve a competitive performance.
Despite their success in other computer vision problems, DL models are challenged in
the medical imaging tasks, mainly due to the lack of large datasets. More importantly, DL
models are often criticized as “black-boxes” from physicians [7], since inherently their feature
learning process is intricate. Moreover, to achieve a good performance, backbone models
require pretraining on the ImageNet. As such, it is unclear how the representations can be
adapted from a natural imaging domain to the biological one. Furthermore, those models
fail to explicitly incorporate human’s prior knowledge which is important for a transparent
decision making from the tools.
134
All the mentioned reasons motivate this work to attempt a fully unsupervised pipeline
and also decouple from the DL paradigm. Instead, a novel data-driven feature extraction
model for histology images is introduced, namely NuSegHop. It is a linear, feedfoward and
multi-scale model to learn the local texture from the histology images. Our approach is based
on the Green Learning (GL) [154] paradigm which offers a framework for feature learning at
a significantly lower complexity, where the features can be seamlessly interpreted [155]. The
proposed pipeline consists of three major modules, starting with a set of local processing
operations using priors of the task to generate a pseudolabel. Then, NuSegHop is used
in a self-supervised manner to predict a heatmap for nuclei presence. Finally, a set of
global processing operations takes place as a post-processing to decrease the false negative
and positive rates, also in a self-supervised manner. Overall, the full pipeline incorporates
self-supervised learning and priors insights, ranging from local areas (i.e. patches), up to
global image decisions. Therefore, the overall proposed pipeline is named Local-to-Global
NuSegHop (LG-NuSegHop).
The main contributions of this work are:
• NuSegHop as a data-driven feature extraction for learning the texture in histology
images
• Local image processing techniques to predict a pseudolabel in an unsupervised manner
• Global image processing techniques to post-process a heatmap with predictions and
increase the detection rate of nuclei
• Competitive performance in three diverse datasets among other DL-based supervised
and weakly supervised models.
• High generalization performance without any prior or training from the source domain.
135
5.2 Related Work
In this section we provide an overview of methods about nuclei segmentation across different
categories, beginning with the traditional pipelines that our work has elements from, and
further including the DL state-of-the-art methods.
5.2.1 Traditional Methods
Earlier works relied mostly on priors from the nuclei appearance and certain assumptions
to solve the problem. Threshodling was fundamental in early segmentation works, where
different methods propose mechanisms for calculating the appropriate threshold to binarize
the input image. A popular algorithm in many works is Otsu’s thresholding [57, 56], trying
to minimize the intra-class variance or maximizing the inter-class one and automatically
discover the best threshold. Win et.al. [59] apply a median filter on each color component
and then perform Otsu’s thresholding on the grayscaled image, followed by morphological
operations to refine the output. A locally adaptive thresholding mechanism on linear color
projections has been also proposed in [61].
Watershed algorithm [81] is another popular approach that uses topological information
to segment an image into regions called catchment basins. This algorithm requires initial
markers which are the seeds of the catchment basis. It is a popular choice in many works a
post-processing step to find the nuclei boundaries [151, 156].
5.2.2 Learning-based Methods
5.2.2.1 Full Supervision
The initial DL-based works [107] relied fully on pathologists annotations to learn the nuclei
color and texture variations. Region-proposal works employ Mask-RCNN to detect the nuclei
[93, 94]. One of the most popular architectures for medical image segmentation used as a
136
backbone in several works is the U-Net [18]. Kumar et al. [101] proposed a 3-way CNN
model trained to supervise the nuclei boundaries.
CIA-Net [21] leverages the mutual dependencies between nuclei and their boundaries
across different scales, proposing also the Truncated Loss for diminishing the influence of
outlier regions and mitigate the noisy labels effect. Graham et al. [95] introduce Hover-Net,
a multi-branch network that is trained on segmentation, classification and pixel distance
from the nuclear mass targets.
As emphasized, full label collection in this task is expensive and not in abundance. To
this end, point-wise labels [24, 108] can be used to learn the appearance of nuclei from partial
point annotations. Furthermore, it is possible to combine point annotations and a limited
number of full nuclei masks to enhance the learning process and improve the results [109].
5.2.2.2 Self-Supervision
Several methods have used self-supervision, to learn from a different task domain and transfer the knowledge into nuclei segmentation. Domain adaptation is a popular self-supervised
choice since it exploits the large volumes of labeled data from other domains, and then apply
it to the target domain. Domain Adaptive Region-based CNN (DARCNN) is proposed in
[85] which learns definition of objects from a generic natural object detection dataset [157]
and adapts it on the biomedical datasets. This is possible through a domain separation
module that learns domain specific and domain invariant features. Liu et al. [90] propose
the Cycle Consistent Panoptic Domain Adaptive Mask RCNN (CyC-PDAM) that learns
from fluorescence microscopy images and synthesizes H&E stained images using the CycleGAN [158]. Contrastive learning is another way for applying self-supervision. Xie et al. [91]
propose an instance aware self-supervised method which involves scale-wise triplet learning
and count ranking to implicitly learn nuclei from the different magnification levels.
Predictive learning is another alternative to learn representations implicitly from the
data. Sahasrabudhe et al. [87] have proposed a method on the assumption that the image
137
magnification level can be determined by the texture and size of nuclei. In turn, this can be
used as a self-supervised signal to detect nuclei locations and seed the watershed algorithm.
Zheng et al. [86] have proposed a method that generates pseudo labels obtained from an
unsupervised module using k-means clustering on the HSI colorspace. Then, an SVM classifier is trained on a feature vector with color and texture, as well as topological attributes.
Our work is conceptually similar to that work, since it creates a pseudolabel from local
thresholding operations and then uses self-supervision at a global level.
5.2.3 Green Learning
Green Learning (GL), has been recently introduced in[154], aiming to provide a more transparent feed-forward feature extraction process, at a small complexity and model size [155,
159]. The proposed feature extraction model create a multi-scale feature extraction and creates a rich spatial-spectral representation of the input image [160]. Instead of convolutional
filters trained with backpropagation, principal component analysis (PCA) is used to learn
the local subspace across different layers, where each feature has larger receptive field along
deeper stages. Following GL’s terminology, each layer is called “Hop”, in which features are
learned in an unsupervised and data-driven way.
Within the medical imaging field, Liu et. al. [161, 162] have proposed the first works on
segmentation and classification tasks. Also, GL has recently achieved competitive results
in prostate cancer detection from Magnetic Resonance Images [163]. Our core proposed
module, NuSegHop, uses the channel-wise Saab transform from GL for pixel-wise feature
extraction and classification in a self-supervised manner. To the best of our knowledge, this
is the first GL-based work in digital histology.
138
Hematoxylin
Extraction
Orthogonal
Color Projection
Adaptive
Patch Splitting
Thresholding
Heatmap
Filtering
Watershed +
Thresholding
Self-Supervised
Local Maxima
Classification
Instance
Detection
Refinement
Anomalous
NuSegHop
Instance
Removal
Contrast
Enhancement
Preprocessing
Local Processing
Intermediate Outputs
Input
Pseudolabel Heatmap Candidate ROIs FP Instances
Predicted Mask
Global Processing
Figure 5.2: An overview of LG-NuSegHop pipeline. Pre-processing applies image enhancement to prepare the image for the local processing modules. NuSegHop receives as input the
pseudolabel and predicts a heatmap. In the last step, global processing modules increase the
nuclei detection rate using information across the entire input image.
5.3 Materials And Methods
The proposed pipeline comprises three distinct modules that operate successively. In Section
5.3.1, we describe the image preprocessing steps, meant to enhance the input image towards
the subsequent operations. In Section5.3.2, the local pixel-wise operations are described
to predict a pseudolabel for NuSegHop, where its architecture and process is detailed in
Section 5.3.3. The global processing modules are presented in 5.3.4. An overview of the
entire proposed LG-NuSegHop pipeline is illustrated in Fig. 5.2.
139
5.3.1 Preprocessing
The preprocessing modules aim at preparing the input image tile for the subsequent local
processing module which is mainly based on thresholding. Thus, the key goals are: (1)
highlight the nuclei over the background and (2) convert the color image into grayscale.
Prior to thresholding, the goal is to make nuclei more distinct over the background tissue.
From the theory of the H&E staining process, hematoxylin principally colors the nuclei cells
to a darker color (e.g. blue or dark-purple), while eosin mainly stains the cytoplasm and
other structures in the background area. There are several methods in literature for carrying
out this color conversion. We choose the work of Salvi et al. [164]. It is an Singular Value
Decomposition (SVD)-geodesic based method for stain separation, after converting the input
image from RGB to the optical density space, where SVD can be more effective. Another
benefit from stain separation, it helps mitigating the large stain variability across different
images which is one of the challenges in this task.
After separating the stain colors using the orthogonal spaces from SVD, we project all
pixels on the Hematoxylin’s subspace to create image H. To further enhance this separation
and make also nuclei boundary more distinct –especially for images that suffer from blurry
artifacts due to the staining or acquisition process– we apply histogram equalization.
The last preprocessing step to prepare the input tile for the thresholding operation is
convert it to grayscale. Although other colorspaces (e.g. LAB) could be an option, in our
earlier work [60] we have shown a more optimal way to convert the image into grayscale.
Transformation across different colorspaces use certain formula to map pixel values from one
domain to the other. PQR is a data-driven way for color conversion, adapted to the input
content. It is based on SVD to calculate the color subspace direction that maximizes the
data variance. One advantage is the better energy compaction in one channel, comparing
to a fixed colorspace conversion. Moreover, finding the color conversion that maximizes the
variance is particularly important for the subsequent thresholding operation, since we assume
140
that along the direction of that subspace, the separation between nuclei and background is
maximized. Therefore, after SVD, we linearly project patches Hp from the H image on its
first principal component P to convert from color to grayscale. Since, different areas of the
tile may have different statistics, we perform PQR independently after the image is split in
local patches which are subject to thresholding. An illustrative example of PQR is shown in
Fig. 5.3. The color conversion formulae are as follows:
Hp = U · S · V
⊺
(5.1)
P ≜ V1,1:, Q ≜ V2,1:, R ≜ V3,1: (5.2)
5.3.2 Local Processing
The main purpose of this module is to create a pseudolabel for training NuSegHop. This
module uses simple, yet effective and intuitive image processing techniques at a local level.
It employs prior knowledge of the problem and self-supervision locally, to filter out error
predictions from the unsupervised local processing. To this end, certain assumptions are
made, to overcome the lack of supervision:
1. The bi-modal distribution according to which at a local area histogram there are two
peaks, where the lower intensity corresponds to nucleus and the brighter to background.
Preprocessing is meant to accentuate this assumption
2. Local similarity where adjacent nuclei tend to have less color or texture variations
3. Larger low intensity components are less likely to be false positives than the smaller
instances
141
PQR
H&E
Separa�on
Monochrome image
Original Image
Hematoxyline Eosin
Figure 5.3: An illustration of the main preprocessing steps, involving stain separation and
the PQR method to convert color into grayscale.
5.3.2.1 Adaptive Thresholding
The thresholding method we propose is adaptive in two ways: (1) scale-wise and (2) intensitywise. The input image is split into patches of size P 50 × 50 and the process starts out with
estimating the local distribution. If the bi-modal criteria are not met, the process tries
also to patches of 25 × 25 and 100 × 100. This is to adapt on different nuclei sizes or
magnification levels. On the other hand, the threshold at each local patch is automatically
adjusted based on the local area statistics. One choice for threshold calculation is to simply
pick the intermediate value between the two peaks. Yet, we opt for a more adaptive way
142
Figure 5.4: Demonstration of the P-value distribution in a local patch where the bimodal
assumption holds. The auxiliary lines to calculate the adapted threshold Tˆ are also depicted.
to calculate the optimal threshold [61], thus reducing the under or over segmentation effects
and eventually create a less noisy pseudolabel for self-supervision. Given the histogram and
the two main peaks T1 and T2 under the bi-modal assumption, we define as L12 the line
crosses through T1, T2, the intermediate point To where the threshold correction is about
To =
T1+T2
2
. Also, Tc is defined as the intercept point of the intensity value axis of the
histogram and the perpendicular line of L12, passing from To. A λ hyperparameter is used to
control the amount of correction about To. The adjusted threshold Tˆ formula is calculated
using Eq. 5.3.
Tˆ = To + λ(To − Tc) (5.3)
5.3.2.2 Morphological Instance Refinement
Although adaptive thresholding works well in areas with relatively low variation, there are
patches where the color variance is higher, thereby causing over or under segmentation
effects. Morphological processing has been widely used in literature for processing binarized
images. To refine the thresholding operation, we apply a set of morphological operations,
such as hole filling, small instance removal and nuclei splitting. Priors about the nuclei size
143
and shape are incorporated to apply simple morphological processing and filter out noisy
instances. For nuclei splitting, the convex hull algorithm is employed to detect highly deep
curvatures that are not indicative of nuclei shape. This step is significant since subsequent
operations operate on a per-instance base.
5.3.2.3 Locally Anomalous Instance Removal (LAIR)
To further filter out larger instances that are possible to be false positives, we carry out a
simple local comparison among the detected instances. For this operation we need a larger
patch, in order to include more nuclei instances and make the comparison effective. Hence.
for this submodule a 200 × 200 patch is used. In each large patch, the first criterion for
query instances q is the size. That is, if an instance has a small to medium size, it will be
compared against the rest larger instances r that provide reference, according to assumption
3 (see Section 5.3.2). We create a reference representation by forming the ensemble from
the non-query instances. This can be viewed as a first step of introducing self-supervision
locally in our pipeline, coupled with priors from the task. Intuitively, abnormally looking
instances at a certain feature space can be regarded as anomalies and in turn eliminated. We
define, Q = {q1, q2, . . . , qN } the instances being tested and R = {r1, r2, . . . , rM} the reference
instances.
Regarding the feature representation, we use the HSI colorspace from the H image, along
with the channel-wise contrast value. For similarity comparison, a Gaussian kernel is used
to measure the distance between each query instance xj and the ensemble reference xR
(see Eq. 5.4), and determine the anomaly instances that are subject to removal obtaining
a similarity score S (see Eq. 5.5). Instances q that have a lower similarity with their local
reference class R from a predefined threshold Ts are removed from the foreground.
xR =
1
|R|
X
i∈R
xi
, (5.4)
144
Figure 5.5: Graphical overview of the proposed NuSegHop for feature extraction. It consists
of two layers that operate in two different scales. From both layers two types of features
are extracted: (1) spatial and (2) spectral. With red we depict the extracted feature maps
that have low energy and will be discarded. All the spatial feature maps (green color) are
concatenated to extract the spectral ones (gray color).
S(xR, xj ) = e
−γ||xj−xR||2
2 , ∀j ∈ Q, (5.5)
5.3.3 NuSegHop
After the local processing module operations, we have obtained an initial segmentation
output using no labels or training data. This output can be used as a pseudolabel to a
classifier. Aiming at obtaining a probability heatmap for nuclei segmentation, we propose
a novel and unsupervised feature extraction method, named NuSegHop, to learn the local
texture of nuclei for pixel-wise classification. Other methods in the past [86] have used
hand-crafted features or pure color-based methods which lack robustness. A data-driven and
multi-scale feature extraction is proposed in this paper for pixel-wise nuclei segmentation
using the Green Learning paradigm [155]. This method has as a key advantage the low
145
complexity and small model size, which is essential for fast inference. Also, the GL-based
feature extraction module is linear and hence more transparent and interpretable [159]. For
NuSegHop the H image is converted into the HSI colorspace before feature extraction. A
detailed architecture of NuSegHop is illustrated in Fig. 5.5.
Originally, a window area A of size S
(1) × S
(1) is considered to characterize the area
about a pixel. For this problem, we choose S
(1) = 9, since too small windows may not be
able to learn the local texture, while larger ones may induce more noise into feature learning.
NuSegHop learns the texture within the window area in two scales S1 and S2, to give multiscale properties in the feature space. The core operation in NuSegHop for texture learning
is the Saab transform [155], which is based on the Principal Component Analysis (PCA),
applied in two ways: (1) spatial and (2) spectral. The full feature extraction diagram of
NuSegHop is shown in Figure 5.5.
5.3.3.1 Feature learning - Spatial Saab
To learn the texture across different local areas within a window, a neighborhood construction
with filters of spatial size of f
(1) × f
(1) is applied with stride 1 and equal padding. Since the
input image has three color channels, each local neighborhood defines a cuboid C
(1) with
size K(1) = f
(1) ×f
(1) ×3 which contains the local HSI color information. As a consequence,
K(1) is the maximum number of extracted subspaces from the Saab transform in layer 1.
L
(1) = S
(1) × S
(1) such cuboids can be extracted from a window at layer-1 using padding
on A. By sampling across windows centered on the pixels of the original image, one can
create a tensor T
(1) for training with size N × L
(1) × K(1), where N the number of sampled
pixels from the input image to training NuSegHop. In turn, T
(1) can be used for training
layer-1 and calculate the subspaces (Eq. 5.6) used for feature extraction. After SVD matrix
decomposition, the rows of V are the eigenvectors correspond to the orthogonal subspaces
of the signal (see Eq. 5.7).
146
T
(1) = U · S · V
⊺
(5.6)
W(1) = V1:M,1: (5.7)
In training, Saab transform is applied on C
(1) to extract K(1) orthogonal subspaces (i.e.
principal components). Moreover, because many principal components may carry no significant energy –as it is dictated from their corresponding eigenvalues–, they can be discarded
to remove unnecessary complexity and noise. That is, the first M principal components are
retained, with M < K(1)
.
We define the weight matrix W(1) ∈ R
M×K(1) which contains all the information to decomposing the cuboids into their spectral representations by projecting onto the extracted
principal components. After training, W incudes the weights for feature extraction. For
instance, to perform spatial feature extraction in layer 1, one needs to multiply L
(1) cuboids
extracted within A window, and project them along the M principal components. To formulate this operation, we construct matrix C with size L
(1) × K(1), where its rows contain
the cuboids. By multiplying them, we calculate matrix F of size L
(1) × M which includes
all the spatial features of layer-1 (see Eq. 5.8). Spectral maps F
(1) in layer 1 are obtained
from the matrix F, by reshaping it back to size S
(1) × S
(1) × M. We view each principal
component as a different spectral local representation of A.
F = C · W⊺
(5.8)
We also concatenate as additional feature (one more column in F) the mean of each local
cuboid in both layers (Eq. 5.9), since apart from the texture, the local color is also important
to differentiate nuclei from background. If we are to draw a parallel with circuit theory, the
DC component is the mean color and AC are the textures derived from the Saab transform.
Thus, F is now of size L
(1) × (M + 1).
147
Table 5.1: Architecture of the proposed NuSegHop.
F Resolution f Size Stride
Layer 1 (9 × 9) × 3 (3 × 3) × 1 (1 × 1) × 1
Max-pool 1 (5 × 5) × 1 (2 × 2) × 1 (2 × 2) × 1
Layer 2 (5 × 5) × 1 (3 × 3) × 1 (1 × 1) × 1
F = FDC ⊕ FAC (5.9)
In layer-2, feature map F
(1) is fed as input after a max-pooling layer. The spatial feature
extraction process in layer-2 is similar to layer-1, with one difference, the Saab transform is
applied independently on each of the M + 1 feature maps of layer-1 [160]. Therefore, after
neighborhood construction each cuboid C
(2) has a shape of K(2) = f
(2) × f
(2) × 1. Also, the
spatial size of L
(2) = S
(2) × S
(2) = ⌈(S
(1)/2)⌉ × ⌈(S
(1)/2)⌉ after max-pooling.
Independent Saab transforms as many as M + 1 are applied on tensors with size of
N×L
(2)×K(2). After channel-wise Saab transform in layer-2, each spectral map originally has
a size of S
(2)×S
(2)×(M+1)×K(2), after concatenating all the feature maps from the channelwise Saab transforms. Energy-based spectral truncation is also applied in layer-2. Supposing
that Q principal components are kept from each channel-wise Saab transform (Q < C(2)),
then the final layer-2 spectral maps will have a shape of S
(2) × S
(2) × (M + 1) × (Q + 1),
adding also the DC channels in the same way as in layer 1. The energy threshold for both
layers is set at Te = 1e − 03.
5.3.3.2 Feature learning - Spectral Saab
Spatial-wise Saab provides a spectral analysis of A across all its spatial regions at the scales
of S
(1) and S
(2) A. Therefore, each feature has a spatial correspondence. Yet, it is required
to extract features that have a global reference to A as well. Those features are unassociated
from the spatial domain and meant to capture different patterns within A area, such as
boundary transitions from nuclei to background. To this end, on each spectral map F
(1) and
148
F
(2) from layers 1 and 2, we apply a PCA using the spectral maps’ spatial components as
features. By doing so, the transformed signal will have no spatial correspondence anymore.
This is performed independently for every M + 1 and (M + 1) × (Q + 1) spectral maps for
layers 1 and 2, respectively. The spectral features G
(l)
s from each layer is simply the union
of all PCA transformed spatial features F
(1)
s . (see Eq. 5.10 and 5.11.) The same Te is used
to filter out the principal components and reduce the dimensionality.
G
(1)
s = ∪
M+1
s=1 {P CA(F
(1)
s
)Te } (5.10)
G
(2)
s = ∪
M+Q+2
s=1 {P CA(F
(2)
s
)Te } (5.11)
The last step in NuSegHop is to concatenate all the spatial and spectral features from
both layers to form the final feature X that characterizes A. After concatenation the top
100 discriminant features are selected [165]. This provides a rich spatial-spectral feature
representation about the color and texture of the local neighborhood under A. Besides,
in the Saab feature space, the spectral dimensions are uncorrelated because the principal
components are orthogonal by definition.
NuSegHop enables for fast pixel-wise predictions, requiring no supervision for its feature
extraction part. Having extracted the features on each W, one can train a classifier using
the pseudolabels from the local processing module. We train an Xtreme Gradient Boosting
(XGB) classifier and use its probability predictions to generate a heatmap Pˆ. Each pixel
contains the probability of belonging to the foreground.
5.3.4 Global Processing
This module aims at integrating the locally made decisions, based on color and texture,
and perform a global post-processing. Most operations from the local processing group are
pixel-wise and carried out in local patches of the original image to reduce variability, whereas
149
Figure 5.6: An overview of the global processing pipeline. The Laplacian of Gaussian filter
detects local minima from the NuSegHop heatmap to decrease the false negative rate. Watershed and probability thresholding binarize the image and delineate nuclei boundary from
the heatmap. Candidate instances are classified in a self-supervised manner to detect any
false positives.
global processing has the entire information about the image.
As mentioned, locally-based decisions may miss faintly stained nuclei or misclassify background areas as nuclei. The goal of moving from local to global decisions is to decrease the
false negative (FN) rate by including more instances as candidates, based on the probability
areas from the local decisions. Since, it is inevitable for this process to give rise to false positives (FP), self-supervision is also employed at the end of the global processing modules to
help discern potentially FP instances. A diagram of the global processing pipeline is shown
in Fig. 5.6.
150
5.3.4.1 Heatmap Filtering
Out of our observations on the obtained probability heatmap P, most of the nuclei are
predicted with high confidence from NuSegHop module. This refers to solid stained instances
that can be easily recognized from their color and texture. As we want to decrease the
complexity of the global processing unit, large instances are retained and we consider only
the less confident ones for the subsequent module (i.e. local maxima detection). This helps
both the complexity and efficiency. To do so, the heatmap and the predicted mask from
NuSegHop are considered to calculate the per instance confidence, by taking the average from
all pixels belong to the instance. Highly confident instances with average probability more
than 0.95 are removed from the local maxima detection submodule. After this submodule
we obtain the filtered heatmap P
′
.
5.3.4.2 Local Maxima Detection
This submodule aims mainly at increasing the recall ratio of nuclei, on the remaining areas after instance filtering, where the NuSegHop unit is not confident. Texture and color
variations or faintly stained nuclei from the local processing module result in scattered high
probability areas that during binarization become isolated small instances. The goal is to
detect those areas and create candidate ROIs as foreground. Given the filtered heatmap P
′
this task boils down to a local maxima detection. We apply a Laplacian of Gaussian filter
(LoG) to detect “blob-like” regions which corresponds to potential nuclei instances. The
Gaussian filter is meant to smooth and unify the pixel-wise heatmap estimation, thus to
mitigating the color and texture variance.
5.3.4.3 Watershed Post-processing
For the highly confident instances, their boundary is typically distinct and can be determined
from the local processing modules and in turn from the heatmap. However, for the low
confidence instances that are hard to be accurately classified based on the heatmap. We
151
can detect the position of the nuclei but it is hard to estimate accurately their boundaries,
since those areas are outliers when training the classifier. As long as the rough locations of
candidates are obtained from the previous module, we their centroids to seed the watershed
algorithm, and find the adjacent nuclei boundary lines. This helps areas with multiple less
confident nuclei located therein, where their boundaries estimation is more challenging. As
a last step, we binarize the filtered heatmap, appending back also the confident instances
using a probability threshold to create instances for the subsequent classification and false
positive reduction. As it is desired to include many candidates, so to increase the recall rate,
we choose a lower probability threshold Tp = 0.35. This typically increases the instance-wise
false positive rate, but the subsequent module will remove any instance that is not similar
to nuclei appearance.
5.3.4.4 Self-Supervised Instance Classification
The final step of the proposed pipeline is a self-supervised instance-based classification in
order to detect nuclei that their representation falls out of the normal nuclei and their
appearance is closer to background.
This ROI instance based classification is similar to FLAIR module but with two main
differences: (1) it is performed in a global level and (2) there are no size-based criteria to
select instances.
The hypothesis here is the following: so long as the majority of instances is correctly
classified, the minority of instances that are false positives do not affect the ensemble learning,
as they are statistically less significant. Moreover, if their representation is closer to the
background, rather than the foreground then they are simply classified as false positives and
are removed from the final segmentation output.
For feature extraction and classification, we use the H image and convert it to the HSI
colorspace (as in NuSegHop). We apply feature extraction on each channel separately and
concatenate them before the classifier’s input. For features we opt for the first order statistics
152
Table 5.2: Summary of hyperparameters configuration in LG-NuSegHop, finetuned on a
small subset of training images from MoNuSeg.
Hyperparameter Value
Local Processing
λ (Adapt. Thresh.) 0.2
S (LAIR) 0.7
γ (LAIR) 0.1
NuSegHop
Energy Te 10e − 4
# Spectral Dimensions 10
XGB – # trees 100
XGB – Tree depth 4
XGB – Learning rate 0.075
Global Processing
Blob Threshold (LoG) 0.05
Tp 0.35
to learn the color characteristics, as well as the gray-level zone matrix which includes several
features that capture the rough texture of the nuclei [166]. Here, we choose an SVM classifier
with radial basis function kernel, to predict for each instance the probability of being a true
positive.
5.4 Experimental Results
This section includes details of the experiments conducted in this work to validate the efficiency of our work, as well as potential areas of improvement. The datasets used for
experiments and comparisons are briefly introduced in subsection 5.4.1, while the metrics
for the quantitative analysis in 5.4.2. Additionally, our method is compared against other
state-of-the-art methods, from unsupervised to weakly and fully supervised methods in 5.4.3.
Furthermore, an ablation study is carried out in 5.4.4 to evaluate how different modules affect
the performance, accompanied with visualization examples (5.4.4.1). In subsection 5.4.5 we
discuss on the findings and the draw inferences from the comparisons with state-of-the-art
153
works.
5.4.1 Datasets
Three publicly available datasets are chosen to compare our proposed methodology.
MoNuSeg [167] The Multi-Organ Nuclei Segmentation (MoNuSeg) dataset includes 30
training image tiles of size 1000x1000 and magnification level 40x, comprising 21, 623 nuclei.
Also, 14 testing imag tiles are available for benhmarking. The extracted tiles come from histological slides of breast, liver, kidney, prostate, bladder, colon, and stomach collected from
The Cancer Genome Atlas(TCGA) [168]. Also, the samples collected come from different
hospitals and patients. Therefore, it is a fairly diverse dataset across different aspects that
challenges the generalization ability of the segmentation model. The Aperio ImageScope was
used for digitization of the slides.
CryoNuSeg [169] This is the first H&E multi-organ dataset from frozen samples. Slides
digitization from frozen samples involves a different process and therefore the nuclei appearance is different. This technique pertains intra-operative surgical sessions and its major
benefit is that it can be performed rapidly. Yet, the requirement for a quick slide preparation, staining and digitization comes at the expense of the image quality. The dataset
provides 30 digitized images of size 512 × 512, acquired at a 40× magnification level. The
slides come from 10 different organs (larynx gland, adrenal, lymph nodes, pancreas, skin,
pleura, mediastinum, thyroid gland, thymus, testes) and there exist 7, 596 annotated nuclei.
CoNSeP [170] This dataset includes 41 image tiles from 16 slides of patients with colorectal adenocarcinoma. 27 tiles are used for training and 14 for testing. The extracted tiles
are of size 1000 × 1000. The Omnyx VL120 scanner was used for the slides digitization at a
40× magnification level. Overall, 24, 319 nuclei are annotated.
154
5.4.2 Evaluation Metrics
For performance evaluation, we use three different metrics, that have been commonly used in
the literature. It is worth noting that nuclei segmentation is an instance-level segmentation
problem. That is, a nucleus instance needs to be detected and then segmented. F1 score is
the harmonic mean of the precision and recall. The F1 score regards nuclei segmentation
as an instance detection problem, without taking into account the segmentation aspect. To
complete our metrics, we also include the Aggregated Jaccard Index (AJI) metric [101] and
the Panoptic Quality (PQ) [120]. These two metrics are more suitable for instance-level
segmentation problems as they take into account both aspects. In particular, PQ calculates
the detection quality (DQ), as well as the segmentation quality as a similarity measure with
the ground truth. Dice similarity coefficient is also included in our comparisons to measure
the segmentation performance.
5.4.3 Results & Comparisons
5.4.3.1 Experimental Setup
To have a thorough understanding on the advantages and weaknesses of our work, we compare it against several state-of-the-art works with different levels of supervision. At first,
we compare our method with other self-supervised methods on the MoNuSeg dataset, which
use no labels from the target datasets. Another category is the weakly supervised methods
that either use less training samples from the annotation masks or point annotations. Moreover, we include in our analysis a few popular fully supervised methods, so as to provide a
thorough comparison of our work.
Before we delve into the comparisons, one aspect we would like to stress is that our
method does not use any training data for parameters learning. Yet, since there are several
hyperparameters (see Table 6.3) need adjustment, we use 6 out of the 30 training images
randomly from the MoNuSeg dataset to finetune LG-NuSegHop. This can be viewed as
155
Table 5.3: Performance benchmarking with self, weakly and fully supervised methods in the
MoNuSeg dataset.
AJI F1 Dice
Self Supervised
DARCNN [85] 0.446 0.5410 -
Self-Attention [87] 0.535 - 0.747
CyC-PDAM [90] 0.561 0.748 -
Nucleus-Aware [171] 0.593 0.759 -
Weakly Supervised
Partial Points [113] 0.543 0.776 0.732
Point Annotations [153] 0.562 0.776 0.744
BoNuS [22] 0.607 0.780 0.767
Cyclic Learning [172] 0.636 0.774 0.774
Fully Supervised
U-Net [18] 0.543 0.779 -
RCSAU-Net [173] 0.619 0.82 -
HoVer-Net [95] 0.618 0.826 -
CDNet [174] 0.633 0.831 -
TopoSeg [175] 0.643 - -
NucleiSegNet [105] 0.688 0.813 -
CIA-Net [21] 0.691 0.901 -
(Ours) 0.651 0.887 0.778
(Finetuned) 0.658 0.892 0.791
the validation set of our experiments. After the model is fixed, we test it out on the three
testing datasets. This aims at testing how well our model generalizes to data with inherent
discrepancies. However, we individually finetune LG-NuSegHop hyperparameters on each
datasets using their a subset of their training data, in order to also test the performance
when LG-NuSegHop is adapted to a certain domain.
5.4.3.2 Performance benchmarking
At first glance, LG-NuSegHop has a competitive standing in comparisons with other works,
including also the fully supervised ones (Tables 5.3, 5.4, 5.5). In the MoNuSeg dataset,
it outperforms by large margins the self and weakly supervised works in terms of all the
156
Table 5.4: Performance benchmarking with weakly and fully supervised methods in the
CryoNuSeg dataset.
AJI Dice PQ
Weakly Supervised
BoNuS [22] 0.431 0.693 0.399
Partial Points [113] 0.410 0.682 0.357
DAWN [23] 0.508 0.804 0.476
Pseudoedgenet [108] 0.321 0.620 0.306
DoNuSeg [176] 0.441 0.672 0.306
Fully Supervised
U-Net [18] 0.469 0.697 0.403
HoVer-Net [95] 0.526 0.804 0.495
Swin-unet [177] 0.524 0.849 0.498
CDNet [174] 0.539 0.776 0.499
(Ours) 0.545 0.703 0.419
(Finetuned) 0.567 0.723 0.479
reported metrics, with a 0.651 AJI. It also has an impressive performance standing among
the fully supervised works, including sophisticated models for nuclei segmentation, such as
HoVer-Net [95] and CDNet [174]. However, other models such as NucleiSegNet [105] and
CIA-Net [21] achieve higher performance in AJI, yet LG-NuSegHop achieves a competitive
standing in detecting nuclei based on the F1 score (Table 5.3). Also, when finetuned using
all the MoNuSeg training images the performance increases slightly to 0.658 AJI and 0.892
F1 score.
In the CryoNuSeg, our method surpasses all the weakly supervised methods by large
margins, achieving an AJI of 0.545 (Table 5.4). Comparing with the Dice coefficient and PQ,
only the recently proposed DAWN [23] has a better performance. Besides, from the fully
supervised category LG-NuSegHop achieves a higher AJI score from the state-of-the-art.
Remarkably, even when it is not finetuned on the CryoNuSeg data, LG-NuSegHop achieves
a competitive performance in this dataset, where the acquisition process is considerably
different than the standard H&E staining process. When finetuned on the same domain, the
157
Table 5.5: Performance benchmarking with weakly and fully supervised methods in the
CoNSeP dataset.
AJI Dice PQ
Weakly Supervised
Pseudoedgenet [108] 0.221 0.331 0.153
BoNuS [22] 0.354 0.651 0.380
Partial Points [113] 0.366 0.646 0.391
Point Annotations [110] 0.464 0.749 0.398
DAWN [23] 0.509 0.805 0.477
Fully Supervised
U-Net [18] 0.499 0.761 0.434
HoVer-Net [95] 0.513 0.837 0.492
CDNet [174] 0.541 0.835 0.514
Mulvernet [178] 0.515 0.833 0.482
(Ours) 0.422 0.654 0.407
(Ours) Finetuned 0.461 0.691 0.427
AJI performance increases further to 0.567.
In the third dataset under comparison, CoNSeP, all methods achieve a lower performance,
since it has a large intra-class variance, where nuclei have quite different textures. Hence, this
dataset is challenging for most methods. LG-NuSeg achieves an AJI of 0.422 without any
finetuning (Table 5.5). Comparing to the weakly supervised methods it has a competitive
performance across all metrics. Only the point annotations [110] and DAWN [23] methods
perform better. Compared to the fully supervised works, the performance gap is larger.
Yet, if we finetune the hyperparameters on the training images, the AJI score increases
significantly to 0.461, surpassing most of the weakly supervised works and narrowing the
gap with the fully supervised ones, such as U-Net.
5.4.4 Ablation Study
As LG-NuSegHop is a pipeline consisting multiple processing steps and modules, we conduct an ablation study with different modules to demonstrate their efficacy and importance
158
Table 5.6: Ablation study on the MoNuSeg dataset with combinations of preprocessing and
local processing modules.
LAB PQR (To) Adaptive (Tˆ) Morph. Refin. LAIR AJI F1 DICE
✓ ✓ ✓ ✓ 0.605 0.763 0.732
✓ ✓ ✓ ✓ 0.611 0.779 0.738
✓ ✓ 0.583 0.745 0.705
✓ ✓ 0.595 0.749 0.721
✓ ✓ ✓ 0.608 0.774 0.728
within the overall pipeline. Table 5.6 shows a few comparisons between the PQR and L,AB
color conversion as a pre-processing step, as well as the contribution of the local processing modules. Table 5.7 at first compares a handcrafted feature extraction approach against
NuSegHop. In turn, we progressively add the global processing modules to test the performance improvement.
First of all, one can observe that our PQR pre-processing conversion helps both the
detection and segmentation metrics. Also, the adaptive thresholding improves significantly
the AJI over the non-adaptive one. Morphological post-processing is important to remove
noisy instances and split nuclei and it is reflected from the large improvement of the F1 score
(see Table 5.6). LAIR module also provides a small improvement in F1 score, by filtering
some false positives in images, wherever it is more likely to have a high false positive rate.
On the other hand, by replacing a handcrafted feature extraction in lieu of NuSegHop,
all metrics drop significantly. Moreover, local maxima detection on NuSegHop’s heatmap
improves mainly the detection performance by recalling areas that indicate nuclei existence.
It is also evident that both watershed and self-supervised instance classification help to
improve mainly the detection aspect of the task, by reducing the false positive rate and
delineate the nuclei more precisely (see Table 5.7).
159
Table 5.7: Ablation study on the MoNuSeg dataset with different global processing modules.
NuSegHop data-driven features are also compared with a set of hand-crafted features for
nuclei segmentation. All the pre-processing and local processing operations are kept to their
best configuration.
[86] NuSegHop Local Maxima Watershed ROI Self-Classif. AJI F1 DICE
✓ 0.622 0.813 0.740
✓ 0.641 0.836 0.763
✓ ✓ 0.647 0.875 0.769
✓ ✓ ✓ 0.651 0.883 0.772
✓ ✓ ✓ ✓ 0.658 0.892 0.778
5.4.4.1 Qualitative Analysis
In the last part of our analysis, two visualization comparisons are provided. In Fig. 5.7, one
can observe how the adaptive thresholding can help in segmenting local patches, where the
bi-modal assumption is not very pronounced. The binarization performance is compared over
a more conservative thresholding by choosing the intermediate point. It is demonstrated that
the adjustment of the threshold about the intermediate point between the histogram peaks,
helps to reduce the false positive areas upfront in LG-NuSegHop and thereby provide a less
noisy pseudolabel to NuSegHop. Moreover, in Fig. 5.8, we illustrate a few examples across
the three datasets and compare the instance self-supervised classification module operation
in removing false positive instances as the last post-processing step of LG-NuSegHop. In
MoNuSeg, our method achieves better results and in turn we can see that the false positive
and negative rate is lower. CryoNuSeg yields a higher rate of false positives, which is mitigated from the global processing module. Finally, the ConSeP image example gives a higher
false negative rate and hence the instance classification module does not have any effect.
5.4.5 Discussion
From the quantitative and qualitative analysis, it is evident the importance and role of each
individual module. In LG-NuSegHop pipeline other modules (e.g. instance classification)
are meant to increase the detection accuracy and other the segmentation (i.e. NuSegHop).
160
Figure 5.7: Illustrative examples of the adaptive filtering from the local processing module.
To is the intermediate point between the two peaks of the bi-modal distribution and the Tˆ
the adapted threshold about To. The input patch is shown after the staining normalization.
Overall, our proposed methodology achieves a very competitive performance among other
self-supervised or weakly and fully supervised methods. It is worth noting that even without any finetuning on the target datasets, LG-NuSegHop outperforms most of the weakly
supervised methods and can be compared to the state-of-the-art fully supervised ones. This
seamless generalization ability provides a huge advantage of our method, since there is no
need to form a training dataset or have a pathologists to annotate data. It can be deployed
in a plug-n-play manner for nuclei segmentation and achieve a competitive performance
in certain datasets (e.g. CryoNuSeg) or to provide pseudolabel to another self-supervised
model.
As have already mentioned, LG-NuSegHop is a pipeline that relies on human’s prior
knowledge in solving the problem and requires certain assumptions to hold. From experimenting with the three diverse datasets, we can observe that our method shows a relatively
161
lower performance on the CoNSeP dataset. This can be attributed to the fact that the
local similarity assumption is very weak in this dataset. The nuclei instances have a large
distance in shape and color, thus challenging for the local processing module to predict a less
noisy pseudolabel for NuSegHop. In turn, it is also hard for the self-supervision to improve
significantly over the pseudolabel prediction, resulting in a higher false negative rate.
From the experiments, fully supervised methods on the CoNSep dataset have a higher
gap in performance both from the weakly supervised methods and the LG-NuSegHop one.
Our method fits better into MoNuSeg and CryoNuSeg assumptions where the intra-class
distance is relatively smaller and can yield a competitive performance even without any
finetuning. One higher level conclusion that can be drawn from this comparison is that the
nuclei segmentation problem on certain histology images can be solved solely by relying on
clinical and biology prior knowledge using no or little supervision (Tables 5.3 and 5.4). Yet,
when certain assumptions are not well met, full supervision is needed to learn the nuclei
variability and achieve a higher performance (see Table 5.5). Another observation from the
results and the comparison with different metrics is that our method is more effective on the
detection aspect of the problem over the segmentation one.
LG-NuSegHop can offer a high nuclei segmentation performance and generalize well with
no specific domain adaptation. For example, although our method does not require any
training from MoNuSeg, we use 6 images for hyperparameter finetuning. So, LG-NuSegHop
is inherently adapted to this domain to a certain extend. Two recent works that have carried
out domain adaptation from MoNuSeg (train) to CryoNuSeg (test), have achieved an AJI
of 0.452 [179] and 0.484 [23]. For the same domain shift, LG-NuSegHop achieves an AJI of
0.545. Therefore, one can infer that supervision may not always be in favor of cross domain
generalization. We contend that for this problem, biology priors and human insights play a
pivotal role to mitigate the domain adaptation requirement.
From a complexity standpoint, LG-NuSegHop uses simple image processing operations
pre and post the NuSegHop module with very low complexity. NuSegHop has a total number
162
of parameters equal to 40K for feature extraction. Notably, the local processing pipeline and
NuSegHop can be implemented in parallel to achieve a very short inference time. On the
other hand, DL state-of-the-art solutions require several million of parameters and special
equipment for model deployment. As a final remark, it is also important to emphasize that
within our pipeline every module is intuitive and transparent. We underline this advantage
as it is crucial for medical image solutions to be explainable to physicians, so that they can
be effectively utilized in real clinical settings.
5.5 Conclusion
This work proposes the LG-NuSegHop pipeline for unsupervised nuclei segmentation from
histology images. A novel feature extraction model named NuSegHop is introduced to learn
the local texture. Regarding NuSegHop, several custom-made image processing modules are
proposed to preprocess the input image, provide a pseudo label, and post-process the predicted heatmap to increase the nuclei detection rate. Key advantages of our method are the
generalization ability to unseen domains with inherent discrepancies and the small number of
parameters. Every proposed module is intuitive and transparent, based on specific biological
priors of the problem. In future work, we will investigate ways to focus NuSegHop feature
extraction on the nuclei boundaries, aiming to improve its segmentation performance.
163
Figure 5.8: Visualization examples of the nuclei segmentation performance in three datasets.
It is also compared the performance of the instance ROI classification within the global
processing module. The true positive areas are marked with white, the false positives with
yellow and the false negatives with blue. Red boxes highlight areas where the false positive
removal is successful.
164
Chapter 6
Clinically Significant Prostate Cancer
Detection
6.1 Introduction
Prostate Cancer (PCa) is widely known as one of the most frequently occurring cancer in
diagnosis in men. Reportedly, in 2020, it accounts for more than 1.4 million diagnosed cases
(7.3% of new cancer cases) and 375,304 people passed from PCa [12]. Among other cancers,
it was the fifth leading cause of death in the same year. An important fact according to the
American Cancer Society [13] is that the five year survival rate when early diagnosed reaches
almost 100%, whereas in metastasis stage that number plummets to 31%.
As a consequence, early diagnosis of clinically significant cancer (csPCa) plays a vital
role for patients survival odds. This implies that not all PCa cases can affect the five year
survival rate. For example, indolent cancer (i.e. Gleason 3+3) [180] is not treated clinically,
but rather the patient remains in an active surveillance status until cancer aggressiveness
grows. Hence, it is of high importance to also discern those cases.
Currently, abnormally elevated serum prostate-specific antigen (PSA) and positive digital rectal examination (DRE) are two indicative factors for a patient to undergo biopsy for
detecting csPCa. Yet, both methods tend to overdiagnosis, incurring higher health costs and
165
Figure 6.1: A typical pipeline for cancer detection and classification is shown. Zonal segmentation identifies the prostate gland and further divides it into Peripheral and Transitional
Zone. The lesion segmentation algorithm identifies and segment ROIs which harboring PCa
with some probability. The lesion classification module operates on a per ROI level and
aims at classifying each ROI with respect to pathology (i.e. clinically significant or Gleason
score).
patient discomfort (such as erectile dysfunction and urinary incontinence). As of 2019, to
increase the sensitivity and specificity of detecting csPCa, European Association of Urology
(EAU) guidelines state that multiparametric magnetic resonance imaging (mpMRI) is recommended as the initial diagnostic test prior to biopsy. This procedure has radiologist to
examine MRI sequence of patients with elevated PSA. According to specific visual features,
each identified lesion is assigned with a PIRADS score [181], indicating the probability of
being PCa.
Radiological diagnosis is carried out by using bi-parametric (bp-MRI) or multi-parametric
(mp-MRI) by including also the Dynamic Contrast-Enhanced (DCE). However, reading the
MRI from radiologists and assign PIRADS score turns out not a trivial task. A high discordance rate has been reported among different studies [15, 182] and a high false positive
rate to detect csPCa. An explanation to the low agreement rate among radiologists is the
high level of expertise required to read mp-MRIs. As such, different years of experience
and training can affect the PIRADS reading. Moreover, reading mp-MRIs is a laborious
task that entails careful examination of various MRI sequences, slice-by-slice, segmenting
166
manually any suspicious region of interest (ROI) and finally assigning them with a PIRADs
score. Accurate segmentation of lesions is also significant, as it affects the accuracy of targeted biopsies, thus reducing the number of specimens sampling from the prostate. Lesion
segmentation also touches upon focal therapy [17, 183], in order to increase therapy’s effect
on tumor region, trying also to minimize the radiation dose on noncancerous tissue.
In recent years and since the advent of Deep Neural Networks (DNNs), the research
in Computer-aided Diagnosis (CAD) tools has attracted an unprecedented interest. DNNs
have enabled tasks and applications, such as fully automated prostate gland and lesion segmentation, as well as PIRADS classification. In turn, the automation of those tasks gives
rise to a fully automated prostate cancer detection pipeline, without any human intervention. Therefore, such a tool can take over most of the tasks off of radiologists routine (e.g.
prostate volume calculation, lesion segmentation, PIRADS reading). Also, such a tool can
potentially provide a more objective and robust output, for certain cases, and overall expedite radiologists reading using an AI-powered radiology assistant. Most common works for
csPCa detection use a certain backbone architecture [184, 185] (e.g. U-Net, V-Net) trained
on a number of lesions. Attention-mechanisms [29, 32, 186] have been also employed to
boost the regional-features from the two main zones of prostate, since they have different
visual characteristic according to studies [187]. Some works target suspicious ROIs detection
(PIRADS-based) [188], while other methods try to detect csPCa [189, 190] or stratify even
further the associated Gleason score of each lesion [191]. Per-lesion diagnosis and risk level
classification is then combined to deduce the patient’s csPCa probability.
Despite their high performance at different computer vision tasks, DNNs have been criticized as a non-explainable method, often termed in literature as a “black-box”. This is
due to the non-linear system architecture of those models and end-to-end optimization process. After training, each neuron represents a learnt feature, looking for particular visual
pattens. Yet, those features have no physical meaning to experts and is hard explain how
they are derived. Future CAD tools should be trustworthy to physicians, so they can be
167
efficiently employed in a real clinical setting. A key factor to that is feature interpretability.
Green Learning (GL) paradigm [154, 160] provides an alternative pattern recognition and
feature learning framework, offering a mathematically transparent multi-scale feature extraction process. GL employs simple signal processing techniques to create a rich feature space,
by decomposing the input image into a different spatial-spectral representation [160, 192].
Besides, GL offers a very lightweight approach, comparing to DNNs that usually require
several million of parameters to solve the problem, thereby restraining the platforms where
CAD tools can be deployed. Also, DL models are notorious for the high carbon footprint
they carry in solving tasks, including the training resources, as well as inference when implemented. GL provides a more sustainable framework for future implementations, requiring
only a small fraction of DL model’s computational resources. Representative applications of
SSL-based methods for medical image segmentation and classification can be found in [162,
161].
This work adopts GL for csPCa detection from bp-MRI, aiming to offer a transparent
and explainable pipeline with orders of magnitude smaller computational complexity and
maintaining also a high detection performance. The overall pipeline consists of two stages:
Stage-1 yields an initial heatmap with a probability of each voxel harboring csPCa. Stage-2
grows a larger area about candidate detections based on Stage-1 predictions, since csPCa
lesions must be viewed into a larger context according Yu Xin et al. [193, 194]. Stage-2
key aim is to reduce the csPCa probability of false positive (FP) detections and increase
that of true positives (TP). For stage-1, a 24 × 24 block unit is used for feature extraction
about a voxel. Then the RAD-Hop unit, based upon Saab transform [195] extracts global
and local features at different spatial-spectral levels. A Discriminant Feature Test (DFT)
is used to filter out noisy dimensions and retain the most discriminant subset of features.
Xtreme Boosting Tree (XGBoost) classifier is chosen to perform the final per-voxel prediction.
Finally, stage-2 builds upon stage-1 probability predictions (heatmap) and utilizes a larger
area surrounding an candidate lesion ROI. Another classifier is trained on probability-based
168
or radiomics features to predict TPs and FPs.
In this work, we propose a quite novel prostate cancer detection method with a transparent feature extraction process and tiny model size, comparing to other state-of-the-art
works. The main contributions can be summarized in four folds:
1. To the best of our knowledge, PCa-RadHop is the first work applies the Green learning
paradigm and SSL methodology for cancer detection.
2. A novel two stages pipeline method is proposed, where each module is fully explainable
to physicians.
3. A new unsupervised, data-driven radiomics-like feature extraction method is proposed,
named RAD-Hop.
4. An efficient hard negative sample mining technique is proposed to make the classifier
more robust to false positives and mitigate the class imbalance issue.
5. Benchmark experiments and ablation studies are conducted on the currently largest
publicly available PI-CAI dataset.
The rest of the paper is organized as follows. Related work is reviewed in Section 6.2. The
proposed method is presented in Section 6.3. The experimental analysis are then discussed
in Section 6.5. Conclusions are finally drawn in Section 6.6.
6.2 Related Work
The task of csPCa detection entails a couple of different tasks needs to be performed combined or independently within a pipeline. That is, segmentation pipeline try to jointly
segment and classify lesion areas. Also, other works aim at classifying an ROI having been
identified automatically from another module or delineated from radiologists. In this section,
a brief literature review is carried out on various methods that touch upon main parts this
work.
169
Figure 6.2: The overall PCa-RadHop pipeline is illustrated. In stage-1 the per-voxel feature
extraction and selection process are independent for each of the three sequences. Then,
selected features are concatenated before the classifier’s input. In doing so, a probability
heatmap is obtained and further ROIs can be identified based on their anomaly score. In
stage-2, two modes are possible; either extracting anomaly features (i.e. probability based)
by expanding the local neighborhood of an ROI, or visually based by extracting radiomics
features for each ROI and further combine them with features from RadHop. In stage-2 it
is possible to reduce the probability of some false positive ROIs.
6.2.1 PCa Detection & Lesion Segmentation
Since lesion detection can be viewed also as a semantic segmentation task, most of the works
use popular segmentation networks to identify and segment suspicious ROIs. U-Net [184,
196] is a popular choice in literature for various semantic segmentation task and thereby
can be used for identifying suspicious areas on the prostate gland [197, 28, 198, 199, 200,
201, 202]. Huang et al. [28] employ a U-Net to extract a weight map and propose a novel
fusion mechanism for T2-w and ADC sequences based on Gaussian and Laplacian pyramid
decomposition. Wong et al. [202] propose an ensemble of several U-Net based models with
different architectures. Different loss functions are also tried, showing that the ensemble
of different models and combining different loss functions result in a higher sensitivity and
specificity. Another work [197] uses T2-w input only and a radiomics-based supervised U-Net
for prostate gland and lesion segmentation. It demonstrates that radiomics-based pipeline
improves segmentation accuracy over U-Net. Recently, nnU-Net model was proposed [198]
meant to offer a more generic solution for medical image segmentation tasks. This method is
170
self-configured with respect to preprocessing, network architecture and training parameters.
It achieves a high performance in various segmentation tasks, even not specialized to any task.
A comparative study for detecting csPCa [203] compares U-Net predictions using bpMRI
input with PI-RADS scoring, reaching to a conclusion that automated AI-based predictions
achieve similar performance with PI-RADS reading from radiologists. Similar to the UNet architecture, Song et al.[30] recently proposed a 3D V-Net network, employed with a
multi-scale attention mechanism to emphasize learning on the lesion ROI, while suppressing
redundant background areas.
Attention mechanisms are adopted in computer vision tasks and are proved to increase
the performance. ProstAttention-Net is proposed by Duran et al. [29] utilizing U-Net as
backbone architecture and has two branches, one for gland and the other for lesion segmentation. The latter one uses the predicted prostate mask as a prior through an attention
mechanism for detecting and further grading lesions. A similar approach is also proposed
in [32] for steering network’s attention on the Peripheral Zone of the prostate. On the same
track [193] uses ResNet-50 as backbone and a feature pyramid network (FPN). The full network consists of two branches, the instance and semantic branch, coupled with an attention
mechanism to efficiently combine global and local features.
Apart from U-Net like architectures, other works [204, 205] use Mask R-CNN pipeline to
carry out jointly prostate gland and lesion segmentation. Dai et al.[204] use a non-local Mask
R-CNN and a self-training approach to train the model from a limited number of annotated
data. Similarly, in [205], a two stage approach is also used, with the second stage for lesion
detection and classification adopting weak supervision. A mask R-CNN with Squeeze and
Excitation (SE) blocks, named SEMRCNN [206] shows to improve the detection performance
over several other backbone architectures, such as mask R-CNN, U-Net, V-Net or ResNet50.
Cao at al.[191] propose the FocalNet that is trained using the focal loss (FL) in order to
mitigate the class imbalance between negative and positive areas, and mutual finding loss
(MFL) on T2-w and ADC to extract the most discriminant cross-modality features for lesion
171
segmentation and gleason classification. Moreover, an ordinal class encoding is adopted,
instead of the usual one-hot representation, since gleason scores (i.e. prediction classes) have
certain order. FL is also adopted in [207] for lesion segmentation, using as post-processing
a selective dense conditional random field to refine initial lesion segmentation.
6.2.2 Lesion Classification
A few other works decouple from jointly segmenting lesions and aim at classifying the identified ROIs in terms of the histopathological report (e.g. Gleason group or binary csPCa).
Abraham et al. [189] extract a standard crop around suspicious ROIs and use a sparse autoencoder to learn deep features and train a random forest classifier. Prostate gland crops
are used in [208], to extract a cancer response map per slice using a fully convolutional network. Then global average pooling is used to find the cancer probability per case. Besides,
inconsistency loss controls the prediction discrepancies between T2-w and ADC sequences.
Another interesting work [194] proposes a multi-scale false positive reduction module. An
earlier work [26] splits the input MRIs into large patches combining T2w, ADC and DWI
sequences to train a DCNN model to make predictions for PCa presence.
A popular and robust approach for classifying an ROI in terms of clinical significance,
trying to quantify visual features from cancer molecular patterns that reflect on MRI is
radiomics [209, 210]. A number of works [211, 212, 213, 214] have proved that radiomics
can help discerning csPCa from suspiciously looking ROIs. For most of those works, after
defining an ROI, different features are extracted from the radiomics library, such as firstorder statistics, shape and texture based features, as well as gray co-occurrence matrices.
Since this is an unsupervised feature extraction, a feature selection method is usually used
to retain the most discriminant feature subset for a given dataset, followed by a classifier.
An important trait of radiomics is that they provide a mathematically transparent, hence
explainable feature space to the classifier. Therefore, classifier decisions can be interpreted
from physicians.
172
The proposed work, PCa-RadHop, creates initial per voxel predictions, by operating on
local patches for csPCa classification. Also, the second stage refined predictions in this work
can be regarded as an ROI classification after initial proposals, either building upon first
stage probability predictions or using radiomics.
6.2.3 Successive subspace learning methodology
As already seen, most of the problem approaches base their pipelines on a DCNN model
trained using back-propagation. A new framework, Green Learning (GL), has been recently
introduced from Kuo et al., aiming to provide a more transparent feed-forward feature extraction process, carrying a small number of parameters and complexity [154]. In particular,
the successive subspace learning (SSL) methodology was proposed in a sequence of papers
[215, 155]. Drawing some parallel lines with deep learning, it creates a multi-scale feature
extraction scheme by adding layers interleaved with pooling operations, to trade spatial for
spectral features [160]. Instead of convolutional filters trained with backpropagation, principal component analysis (PCA) is used to learn the local subspace across different layers,
where each feature has also a larger receptive field along deeper stages. Each layer is called
Hop, in which features are learned in an unsupervised data-driven way. In practice, the input signal is decomposed via the subspace approximation via adjusted bias (Saab) transform
into a rich spatial-spectral representation [155].
Outside the medical imaging field, GL has been applied successfully in point cloud and
face gender classification, as well as texture synthesis and image generation [216, 217, 218,
219, 159, 220, 221]. Furthermore, within the medical imaging field, the first two works using
GL have been proposed from Liu et al. [162, 161] on cardiac segmentation and ALS disease
classification.
The proposed pipeline uses the channel-wise Saab transform for voxel-wise feature extraction and classification. To the best of our knowledge, this is the first work that applies
the GL paradigm and SSL methodology for a cancer detection task.
173
Figure 6.3: Unsupervised feature extraction with RadHop pipeline. A patch centered at
certain location is fed in RadHop. Two concatenated Hop units bring multi-scale properties
in the final feature representation. Local features correspond to certain locations, while
global ones refer to the overall feature map for each spectral dimension. The output feature
is the concatenation from the local and global features of the two layers.
6.3 Methods
The proposed method receives as input a bp-MRI sequence (T2-w, ADC and DWI high
b-value) and as a pre-preprocessing step all sequences are sampled on T2-w dimensions.
It consists of two stages: (1) In the first stage (Section 6.3.1), 2D voxel-wise predictions
are made to deduce the csPCa probability of each voxel. The classification unit operates
on a fixed patch size –surrounding a voxel– used for feature extraction following the SSL
methodology and classification. Then, the anomaly score map is derived in Section 6.3.3 that
provides initial indication of suspiciously looking areas with high probability of harboring
csPCa. One question arises, what is the optimal patch size needed to discern csPCa. Prior
research [199, 200] suggests a patch size between 21 to 31, depending on whether is intended
to capture the whole extension of the lesion or parts of it.
Our proposed PCa-RadHop detection framework has been devised to classify initially
174
large parts of the lesion, adopting a patch size of 24 × 24 in stage-1. Then, we locally
aggregate initial predictions to acquire 8×8 blocks and construct the anomaly map. In stage2, operating on 8 × 8 units we grow a larger area about each candidate region, considering
an area of 40×40 for classification, since recent research suggest for false positive reduction,
a larger area is required about each potential lesion [194]. An overview of the proposed
pipeline is illustrated in Figure 6.2.
6.3.1 Anomaly Detections - Stage-1
This stage yields initial predictions per voxel, targeting a relatively smaller patch size than
the average lesion area. Certainly, stage-1 can work as a standalone module and directly
generate a voxel-wise heatmap of csPCa. Two patch sizes are experimented in stage-1 (see
Section 6.5): (1) 24 × 24 for calculating the anomaly map and feed stage-2 input, and (2) as
an independent module for predicting csPCa operating on 32 × 32 patch size.
6.3.1.1 Feature Extraction
This is the core feature extraction module of our detection framework, based on SSL. Offthe-shelf GL-based choices for feature extraction vary depending on the task. We opt for EPixelHop [159] method and we tailor it to MRI input specifications, instead of natural images
(cifar-10) that was originally designed for. The feature extraction module itself within PCaRadHop pipeline is named RAD-Hop, as it extracts radiomic-like features from MRI input
in a data driven way. It consists of two consecutive steps: 1) neighborhood construction
(per-slice), and 2) representation learning through channel-wise Saab transform [160].
There are two kind of features within RAD-Hop, the spatial and spectral ones. The
former represent certain local parts of the ROI, whereas the spectral ones have a global view
of the input feature map of each layer. This spatial-spectral decomposition is motivated by
the assumption that csPCa reflects more on certain spectral components and hence through
feature selection, one can extract the more discriminant subspace where csPCa can be clas175
sified. Figure 6.3 demonstrates the architecture and components of the proposed RadHop
feature extraction module that has two layers of Hop units. For MRI sequence input adding
more layers do not trade off the additional complexity that is incurred.
The input tensor in RadHop is of dimension Hi × Wi × Ci × Ki
, where Hi
, Wi and Ci
represent the resolution in the 3D space, and Ki represents the dimension of the feature
vector for each voxel extracted from the (i − 1)-th RadHop unit (Ki projected subspaces).
Since our csPCa detection pipeline is based on 2D predictions, square patches are extracted
from single slices of size Si
, therefore Ci = 1 and Hi = Wi = Si
. Therefore, at the first
layer of RadHop (Hop-1), K0 = 1 and thus the input tensor is of dimension S1 × S1 × 1 × 1
for each MRI sequence. We first gather the local neighborhood in the 2D space centered
at each voxel on the prostate gland area. The neighborhood size within Hop-i is defined
as Fi × Fi × 1 (independent for each Ki dimension) and denotes the local filter size. That
is, each voxel in the neighborhood has corresponding feature vector of dimension Ki
, which
results in a tensor of size Fi × Fi × Ki
in Hop-i output. The neighborhood tensor is then
flattened along the spatial domain. Channel-wise Saab transform is performed on each of
the Ki subspaces separately, to learn the spectral kernels through subspace approximation
at the current scale.
6.3.2 Subspace Approximation in Green Learning
Within SSL framework, PCA extracts the orthogonal subspaces of local neighborhoods and
in turn, each subspace image is further decomposed from the next Hop unit. For instance,
suppose the input vector (local neighborhood voxels) is xϵRN , where N = SHi × SW i ×
SKi. Features can be extracted by projecting the input vector onto several anchor vectors
(subspaces), which can be expressed as an affine transform expressed as
ym = a
T
m · x + bm, m = 0, 1, · · · , M − 1, (6.1)
176
where am is the m-th anchor vector of dimension N, and M is the total number of anchor
vectors. Here, the channel-wise Saab transform is a data-driven approach to learn the anchor
vectors from all the neighborhood tensors collected from the input data. First, it decomposes
the input subspaces into the direct sum of two subspaces, i.e. DC and AC, expressed in
Eq. 6.2, where the terms are borrowed from the “direct circuit” and “alternating circuit” in
the circuit theory.
S = SDC ⊕ SAC. (6.2)
SDC and SAC are spanned by DC and AC anchor vectors, defined as:
• DC anchor vector a0 = √
1
N
(1, 1, · · · , 1)T
• AC anchor vectors am, m = 1, · · · , M − 1.
The two subspaces are orthogonal to each other, where the input signal x is projected onto
a0 to get the DC component xDC. Then the AC component is extracted by subtracting the
DC component from the input signal, i.e. xAC = x − xDC.
After that, AC anchor vectors are learnt by conducting principal component analysis
(PCA) on the AC component. The first K principal components with sufficient energy are
kept as the AC anchor vectors. Thus, one can extract features by projecting x onto the above
learnt anchor vectors (i.e. eigenspace) based on Eq. 6.1. The bias term is selected to ensure
all features are positive according to the following work [155]. The subspace decomposition
out of local filters from the training data is illustrated in Fig. 6.4.
In RadHop we use two Hop units (Fig. 6.3). Both Hop-1 and 2 use a filter size of Fi = 3
to learn PCA kernels. Without using padding on the input tensors in each layer, the output
feature map in Hop-1 is (S0 − 1) × (S0 − 1) × K1. Prior to Hop-2, a max pooling by 2
is applied to reduce the spatial feature map dimension. Hop-2 spatial dimension output is
(S1/2−1)×(S1/2−1)×K2. Ki corresponds to the number of spectral components (projected
eigenspaces) and equals to the input dimensions in PCA. That is, K1 = F1×F1×C0×K0 = 9.
177
Figure 6.4: Illustration of the feature decomposition into spectral components using the Saab
transform in SSL. PCA is used to identify the orthogonal subspaces. Different subspaces
have different energy (shown with different color size). This is the core module for feature
extraction in Hop unit employed from RadHop. Malignant areas may reflect differently on
certain subspaces and hence RadHop feature representation can help the classifier to detect
them.
In Hop-2, according to c/w Saab transform [160, 155] each feature map is further projected onto several anchor spaces independently from the rest feature maps (i.e. spectral
dimensions). K2 equals to the aggregated number of all subspaces in Hop-2 which equals
F2 × F2 × K1. Therefore, the maximum number of feature maps in Hop-1 and 2 is 9 and 81
respectively (given F1 = F2 = 3). All feature maps are orthogonal to each other because of
PCA projections, hence RADHop extracts uncorrelated features.
Certainly, not all spectral components are equally informative for the input signal distribution. Those with very few energy as determined from the eigenvalue (always positive)
magnitude are discarded. Especially for signals like MRI input, this can lead to a much
smaller model size, without much loss of information.
178
C/w Saab transform in each Hop within RADHop decomposes the input signal into a
different multi-scale representation where each feature in Hop-i f
(i)
x,y has dimension 1×1×Ki
corresponding to a specific spatial position. The aggregation of all features, that is f, gives
us the local feature map.
Yet, desiring to extract also features that have a global view of the ROI patch, we
further use PCA on each feature map Ki
for both Hop units, based on the Si ×Si ×1 spatial
responses. In doing so, for each feature map f
(i) of size Si × Si × Ki
, we extract one global
feature using PCA of dimension Si × Si
for each Ki
independently. After PCA, the feature
map is transformed from spatial to spectral domain and the features are decoupled from the
spatial domain. So, global features can be defines as:
G
(i)
s = P CA(f
(i)
s
) ∀s, ∀i. (6.3)
The final feature output from RadHop results from the concatenation of flattened feature
maps from both Hop units (local features), as well as their corresponding global features. In
RADHop, DC is also included along with the AC filters, as it is also important for csPCa
classification.
6.3.2.1 Feature Selection
RadHop provides a rich spatial-spectral representation of the input patch. However, even
after filtering out feature maps with very low energy, the feature dimensions is still very high
for training a classifier. Furthermore, some dimensions may carry very little discriminant
power for classifying csPCa. RadHop operates in a completely unsupervised way, where
the feature space is very rich and having a physical meaning –by encoding different texture
patterns of the MRI signal– but a high portion of the the feature space is not very useful for
classification. The goal of the feature selection module is to extract the most discriminant
feature subspace from RadHop, thus providing a less noisy input to the classifier.
Within GL framework, a novel Discriminant Feature Test (DFT) has been proposed
179
by Yijing et al. [165] to quantify the discriminant power of features. A brief high level
description is given below: For a given feature a under test, the minimum and maximum
of projected values f(a) = a
T x is calculated, denoted by fmin and fmax, respectively. The
partition interval [fmin, fmax] is split into B bins in a uniform way. For measuring the
discriminant power and class separation power of each features the bin boundaries are used
as candidate thresholds. Each threshold, tb, b = 1,...,B − 1 partitions interval [fmin, fmax]
into two sub-intervals. Using as candidate thresholds all the bin boundaries, the splitting
quality can be evaluated using the weighted entropy:
La,tb =
N+
N+ + N−
La,tb+ +
N−
N+ + N−
La,tb−
(6.4)
where N+ = |Fa,tb+| and N− = |Fa,tb−| is the samples number in the left and right subintervals, respectively. As a cost function the entropy value is chosen to measure the purity
of the partition:
L = −
X
C
c=1
pclog(pc) (6.5)
Across all La,tb
values the minimum cost is chosen to represent the discriminant power of
the given feature. Running the DFT across all features from RadHop, we then sort them in
ascending order according to their minimum entropy values. A natural number of features
to keep with the lowest entropy value is at the elbow point that usually occur from the
distribution of RadHop features. That is, from each MRI sequence 1, 000 features are kept.
The three features from the overall MRI input are concatenated to finalize the fused feature
F input to the classifier (see Fig. 6.2).
F = [FT2, FADC, FDW I ] . (6.6)
180
Figure 6.5: Distribution of dense predictions of stage-1 on prostate gland after one training
of XGB. It is highly skewed towards easy negative samples with “soft” label close to zero.
6.3.2.2 Feature Classification
The last part of stage-1 is to train a classifier to perform voxel-wise predictions, given the
extracted feature from RADHop for each voxel. We opt for the Xtreme Gradient Boosting
(XGB) classifier, because it offers an ensemble of decision trees, thus giving a transparent
decision making process. Moreover, XGBoost is more powerful than a Random Forest classifier, because the decision trees are not independent, but each new tree is built upon prior
tree errors through the loss function. Also, XGB offers a small model size solution that does
not inhibit our objective for an energy efficient pipeline for csPCa.
6.3.2.3 Hard Negative Mining
A commonly known issue in PCa localization and detection, in general, is the class imbalance
problem, since there are numerous negative areas on the prostate gland, but just a handful of
ROIs with csPCa. Given that in most studies usually the patient cohort is also imbalanced,
as most of the patients have no malignant findings or if any they are indolent (non csPCa).
As such, one can realize the problem occurring with the class imbalance problem. Another
issue, most of the negative samples are very easy to be classified because most of the prostate
gland areas are not suspicious at all. Therefore, by randomly sampling patches for training,
gives the impression to the classifier that negative samples are far away from the positive
ones in feature space. Nevertheless, there are still many prostate regions closer to the positive
181
class that can cause false positives, thereby affecting detector’s specificity.
Towards this end and trying to mitigate this issue, we perform a two-step training of
the classifier. In the first step, having no other choice, we train the XGB using random
patch sampling. We densely sample within positive ROIs patches with a 25% overlap from
each annotated slice (especially when patch is of size 24 × 24 where ground truth lesion is
partially captured). Negative regions are randomly picked across prostate and their number
is selected accordingly to the desired class ratio. The optimal ratio, negative to positive
patches, for training the XGB was empirically found to give the best performance at roughly
4:1.
After step-1 training, XGB is applied almost densely (with some small stride) all over the
prostate glands in the training cohort and a “soft” label is obtained, which is the classifiers
out-of-bag probability for a given sample. The “soft” label distribution is shown in Fig. 6.5.
One can realize the issue with randomly sampling patches. Most of the negative samples
will be picked from the easiest bin close to zero and that would make the classifier prone
to false positives. In a hard negative mining manner, we train XGB again in a second
step, having now the prior for each patch where it stands within feature space (“easy” or
“hard” based on the soft label). To balance out the training in step-2, we split the soft label
distribution into 10 bins. The number of training samples is determined from the overall
number of positive areas. Given that it is desired to maintain a certain class ratio, the
negative samples population can be decided. Finally, we apply an exponential rule to decide
how many samples Ni will be sampled out of each “hardness” bin shown in Eq. 6.7.
Ni = Ne−λ(i−1), i = 0, ..., 9 (6.7)
where N the total number of negative samples and i the bin index. Following the exponential distribution we include more easier samples and as the difficulty increase in term
of soft labels, the number of samples decays exponentially. Hyper-parameter λ was set at
0.4. The XGB after the second training round is used from the detector to provide per voxel
182
Table 6.1: Architecture of the Proposed RADHop Unit
Input Resolution Filter Size Stride
Hop 1 (24 × 24) × 1 (3 × 3) × 1 (1 × 1) × 1
Max-pooling 1 (22 × 22) × 1 (2 × 2) × 1 (2 × 2) × 1
Hop 2 (11 × 11) × 1 (3 × 3) × 1 (1 × 1) × 1
Table 6.2: Radiomics features categories
Feature Type Num. of Features
First Order Statistics 19
Shape-based (3D) 16
Shape-based (2D) 10
Gray Level Co-occurrence Matrix 24
Gray Level Run Length Matrix 16
Gray Level Size Zone Matrix 16
Neighbouring Gray Tone Difference Matrix 5
Gray Level Dependence Matrix 14
predictions and calculate the csPCa heatmap.
6.3.3 Anomaly Map Calculation
Having obtained initial predictions for each voxel in stage-1, we aim to grow a larger area
around suspiciously looking regions. Voxel-level probability predictions are more sensitive to
errors. Therefore, it is desired to make units that are more robust, by aggregating neighboring
predictions. To this end, average pooling is applied about each voxel using a 3 × 3 area. For
patch size of 24 × 24, if we consider averaging 9 surrounding patches with stride equals 8,
then the receptive field for the center voxel area of 8 × 8 × 1 is 40 × 40 × 1 (see Fig. 6.6).
The center 8 × 8 unit is an overlap area for all 9 neighboring patches. Hence, moving the
process now into stage-2 our processing units are no longer the voxels, but 8 × 8 unit blocks
conveying an anomaly score that results from averaging neighboring voxel-level predictions
and the span on the voxel space is large enough to help distinguish false positives from true
183
Figure 6.6: Anomaly map (Λ) calculation from stage-1 predictions. Nine neighboring voxel
predictions are averaged using 24 × 24 patches with stride 8 on stage-1 output. Then, in
stage-2 the anomaly map is used to find statistical features from the neighboring anomaly
score distribution.
positives. We name the array of anomaly scores (Λ), which is a subsampled mapping of the
original input resolution.
6.3.4 Anomaly Candidates Classification - Stage-2
As mentioned before, stage-1 can predict a heatmap on the prostate gland with probabilities
for areas harboring csPCa. Hence, one can use it as a separate module for cancer detection
and lesion segmentation, choosing appropriately the processing patch size for feature extraction. In the experimental section we show results for a larger and smaller patch size. For
stage-2 inputs, we opt for a patch size of 24 × 24 in order to capture smaller parts of lesion
areas and re-classify them in stage-2.
The main object of stage-2 is to grow a larger area around high anomaly scores from the
previously calculated anomaly map. Larger patch size may include more parts of the lesion
but given lesion’s irregular shape can also include more noise. Using smaller patches we can
184
predict on parts from the lesion and in stage-2 to combine predictions from the neighborhood
to classify true positive areas over false positive ones with high anomaly score.
6.3.4.1 Candidate Areas For Stage-2
Having calculated the anomaly map A, where each anomaly score corresponds to an 8 × 8
area, one needs to decide what candidate regions are sent to stage-2. That is, the only
available information for deciding that so is the anomaly scores. It is also important to
ensure that not many true positive (TP) areas are missed from stage-2 processing and also
to quickly remove some anomaly scores that are very unlikely to be true csPCa lesions.
To this end, we binarize A at the anomaly threshold (T) where the true positive rate
(TPR) is very high (i.e. close to 1) and the FPR is minimized. Having binarized A, one
can use some simple rules for deciding which candidates areas continue to stage-2. Single
anomaly scores, even more robust than a voxel, they can still correspond to some small false
positive regions that is always likely to exist somewhere. Therefore, instead of one anomaly
score, after binarization we use a 2 × 2 element to detect candidates on the anomaly map.
This corresponds to a 16×16 area in the original MRI space. In this way, many small regions,
unlikely of being TPs can be filtered out, thus reducing noisy information in stage-2. The
optimal anomaly score found to be 0.38 when minimizing the FPR, while constraining the
TPR to be over a certain point (it is set to r = 0.95) (see Eq.6.8).
Topt = argminF
T
P R s.t. T P R ≥ r (6.8)
6.3.4.2 Feature Extraction - Anomaly-based
The last step and main object of stage-2 is to classify candidate regions with high anomaly
score over the binarization threshold and retain a 2 × 2 structure. Usually the number of
false positive (FP) candidates is more than the TPs. Hard sample mining technique aims at
reducing the FP predictions and is important for stage-2 to provide a cleaner input.
185
One way to build features for training another classifier in stage-2 and grow a larger area
surrounding the candidate anomaly units –which is the main motivation to adding a second
stage– is to include the 9 neighboring anomaly units (non-overlapping) in a feature vector,
plus their mean and standard deviation. Given that each anomaly units is of size 8 × 8 the
whole area we consider is of size 56 × 56 in the original MRI input (see Fig. 6.6). Finally, a
classifier can be trained on TPs and FPs as a second classification step.
6.3.4.3 Feature Extraction using Radiomics
An issue occurs with anomaly-based feature extraction in stage-2, the vector’s dimension is
very small. Also, for some large size FPs, their feature vector distribution may still be close
to those of TPs, so it is hard for the classifier to tell apart. To this end, we have a second
mode in stage-2, employing radiomics features.
Having identified the suspicious candidates from the anomaly map, we can still go back
on the original input MRI sequence and extract additional visual-based features. Radiomics
library [210] provides a good solution for quantifying MRI characteristics of an ROI. They
are usually employed after having determined an ROI and used to classify it with respect to
the biopsy report (see Section 6.2.2). As such, they have not been used for detection tasks,
where there is no determined ROI, since those features are sensitive to noise, if outside areas
is included.
As long as stage-1 has provided the initial predictions, ROIs can be drawn either using
directly the generated heatmap from stage-1 or following the anomaly map candidates 6.3.4.1.
For classification using radiomics we consider features from all categories within radiomics
library. DFT module is also employed here to select the most discriminant set of radiomic
features, in case some extracted features are more noisy. Table 6.2 shows the different
categories of the visual-related features we consider for classification in stage-2. A different
classifier is used in stage-2 based on radiomics. To further increase the feature dimension, we
can combine with features from RadHop from stage-1 by extracting a 24×24 patch centered
186
about the ROI center of mass. DFT can choose those features that are discriminant from
both radiomics and RadHop to XGB and combine them to boost further the final feature
discriminant power.
6.3.5 Interface Application
Our CAD csPCa tool is fully automated and can be tried on bpMRI clinical data. To offer
the CAD tool’s functionalities to radiologists in an easy way, we have developed a simple
interface application that facilitates their workflow. Our tool let the user load a bpMRI
input and predict a csPCa probability heatmap as output. In the background, our proposed
methodology has been implemented as a service in the backend of the application. Moreover,
within the interface tool the scrolling mouse cursor can be used to easily navigate side-by-side
the bpMRI input, along with the predicted heatmap for each slice of the sequence. Finally,
a 3D view of the segmented clinically significant lesion is also available in our tool, having
a 3D reconstruction functionality. A snapshot of our developed interface is illustrated in
Fig. ??.
6.4 Experimental Setup
6.4.1 Database and pre-processing
To demonstrate the effectiveness of the proposed csPCa method using Green Learning, we
conduct experiments based on the currently largest publicly available dataset from the PICAI challenge [222], which consists bpMRI data of a cohort of 1, 500 patients. A subset of 328
cases are common with the previously largest dataset provided at the ProstateX challenge
[armato2018prostatex]. Besides, an online hidden validation set of 100 patients is offered.
For the testing phase, all methods were evaluated once on a cohort of 1000 patients.
Training data comprise MRI scans across multiple clinics, such as the Radboud Univer187
Figure 6.7: Demonstration of the interface tool for visualizing the csPCa predictions for a
given bpMRI input. The user can scroll through the slices using the scroll bar on the right
hand side of the window.
sity Medical Center (RUMC), University Medical Centet Groningen (UMCG) and Ziekenhuis
Groep Twente (ZGT). Scans from those medical centers are included both in training and
testing. MRI scans from the Norwegian University of Science and Technology (NTNU) is
also included in the testing cohort, as an unseen institute data to training. The vendors
of MRI scanners are Siemens Healthineers and Philips Medical Systems. The training data
are divided into 1075 samples of benign or indolent PCa and 425 clinically significant cancer cases. Prior RadHop, all MRI sequences are resampled onto T2-w dimensions using
resampling.
6.4.2 Evaluation Metrics
For measuring the performance of the proposed pipeline, as well as benchmarking with
other existing works, same metrics as in the PI-CAI challenege are adopted [222]. That
is, there are two metrics to evaluate the methods performance in detecting csPCa. For
lesion-level detection the average precision (AP), where each segmented lesion has a floating
188
point probability for being clinically significant. For patient-level performance, Area Under
Receiver Operating Characteristic (AUROC) is used, predicting for each patient a probability
of having csPCa. For the PI-CAI challenge the average between the two metrics was used
to rank different methods. For the per-lesion performance, to consider a hit with the TP
lesion, the prediction must have an Intersection over Union (IoU) at least 0.1.
P =
Tp
Tp + Fp
(6.9)
R =
Tp
Tp + Fn
(6.10)
AP =
X
n
(Rn − Rn−1)Pn (6.11)
Score =
(AP + AUROC)
2
. (6.12)
6.5 Experimental Results
6.5.1 Results on PI-CAI Dataset
For benchmarking with other methods we use the results from the PI-CAI challenge on
the testing cohort of patients. For ablation study and intermediate results between stage 1
and 2, a local validation cohort is randomly chosen, consisting of 300 patients. For a fair
comparison, we report results with baseline methods from PI-CAI challenge trained solely
based on the PI-CAI publicly available cohort, without using external private datasets. Our
experimental procedure complies with the latest guidelines for clinical studies using machine
learning [collins2024tripod+]. The performance in both metrics and benchmarking with
other models are available in Table 6.4.
189
Table 6.3: Summary of hyperparameters configuration in PCa-RadHop
Hyperparameter Value
RadHop – Energy T1 10e − 4
RadHop – Spectral PCA Dimensions 10
Features Selected (Per Sequence) 1000
XGB – Max. num of trees 1000
XGB – Max. tree depth 4
XGB – Learning rate 0.075
XGB – Loss Function Binary Logistic
As we can see RadHop including only stage-1 predictions using patch size of 32 × 32
(i.e. no refinement by stage-2), it has a competitive standing among other methods using
DCNN architectures, such as U-Net or Swin-Transformer. The evaluation in such a large
patient cohort minimizes the bias and approaches better the actual performance, should our
algorithm used in an actual clinical setting. Both the patient-level prediction metric (i.e.
AUROC) and lesion-level precision (i.e. AP), surpass the performance of nnDetection and
Swin-Transformer models. In patient-level, RadHop has a better performance than every
other method under comparison, expect U-Net that has a slightly better performance. In
lesion-level metric, RadHop achieves a higher precision than other methods, but there is still
some gap with the U-Net related ones. This observation after error analysis motivated us
to introduce a stage-2 step, to refine the heatmap predictions from stage-1. In Fig. 6.8, we
provide heatmap visualizations about stage-2 effectiveness in reducing the FP probability
from stage-1 predictions. By reducing the FP probabilities –wherever possible– using the
classifier in stage-2, both the case-level (AUROC) and lesion-level (AP) scores increase.
Turns out that adding more contextual probability information information through the
anomaly map and combining with radiomics features (both data-driven and hand-crafted)
helps to decrease the probability (specificity) of certain FPs, without compromising the TP
rate (sensitivity). In Figure 6.9, it is also shown the classifier robustness in FP, when trained
with the hard sample mining technique rather than random sampling.
190
Table 6.4: Performance benchmarking with selected DL-based models based on 1,000 testing
patients from PICAI challenge.
Method Score AUROC AP
SwinTransformer 0.561 0.729 0.393
nnDetection 0.586 0.785 0.386
nnU-Net [198] 0.626 0.803 0.450
U-Net [184] 0.635 0.814 0.456
PCa-RadHop (Stage-1 only) 0.607 0.807 0.407
6.5.2 Benchmarking & Ablation Study
Given the limited access to the hidden testing cohort of PI-CAI challenge, we use the publicly
available data for a performance comparison in order to demonstrate the effectiveness of
stage-2, as well as compare stage-1 in two different patch sizes. Also, our data-driven RadHop
is compared against two early pipelines that use hand-crafted radiomics features prior to the
advent of DL models. In experiments using the publicly available cohort of 1500 patients,
a 5-fold cross validation method was used, after reserving 300 patients for testing. At first,
in Table 6.5 one can see the effectiveness by using smaller patch size, since it captures
more efficiently parts of the lesion, as opposed to trying to identify the lesion as a whole
(more noise is induced in PCA). To do so, one needs a larger patch size that can induce
a more noisy RadHop feature representation, thereby affecting the precision performance,
especially for smaller lesions. The Stage-2 effectiveness is also evident in Table 6.5. In
particular, AP (lesion-level metric) is improved from 0.356 to 0.374, while the AUROC
(patient-level metric) increases from 0.801 to 0.822 (see Figure 6.10). This is because it is
possible to reduce the probability of some false positives in stage-2, by re-classifying the ROIs
detected in stage-1, using the anomaly features, as well as the hand-crafted and data-driven
(RadHop) radiomics features. Furthermore, PCa-RadHop showed a statistically significant
performance improvement (p < 0.001) by large margins, comparing to the two traditional
pipelines in both metrics (see Table 6.5). [223] achieves a better performance when compared
to [224], since it includes additional positional features from the prostate gland that help the
191
Figure 6.8: Visualizations from csPCa-RadHop intermediate predictions in stage-1 and the
false positive reduction in stage-2. In the three first rows (a)-(c) three negative cases are
shown with some false positive areas and how their probability is decreased after stage-2. In
the last row (d) a positive case is displayed. The true positive ROI (green) is retained after
stage-1, while the probability of the other false positive ROIs is reduced.
classifier. The main reason that RadHop features turn out to be very effective in this task
is that they provide a more rich and complementary feature space, compared to the handcrafted radiomics. This spatial-spectral representation provides more degrees of freedom to
the classifier to discover powerful feature combinations that help to discern between true
positives and false positives. Hand-crafted radiomics are limited in their performance, as
they can only capture aspects of the signal that are dictated by human invention.
6.5.3 Model Size and Complexity Comparison
An important aspect of models comparison, besides performance, is the model size and the
Floating Point Operations (FLOPS). These two factors are consequential when algorithms
are implemented on certain platforms, especially with limited computation resources. Table
6.6 shows the comparisons with nnU-Met and U-Net architectures (popular choices among
several methods in literature for PCa detection). One can realize the tiny model size of
192
Figure 6.9: Demonstration of the hard sample mining technique effectiveness on four negative
examples from different patients with areas prone to false positives. The predicted heatmaps
from two differently trained classifiers is shown for each case. Without hard sample mining
implies that the classifier is trained on randomly chosen patches across patients. The green
arrows point to specific FP regions where hard sample mining increases the robustness of
the classifier and reduces the probability of certain FPs.
RadHop when compared to DL-based architecture (×1445 and ×1681) more parameters for
U-Net and nnU-Net respectively). In addition, the computational complexity of RadHop
is fairly low. Specifically, nnU-Net requires ×128 operations than RadHop for inference,
while U-Net requires ×56 FLOPS. These findings highlight the tremendous deployment
advantages of our method when compared with DL models. Also, this stresses one of the
Green Learning framework key benefits which is to offer lighweight and environment-friendly
solutions. Additionally, RadHop has a transparent feed-forward pipeline, where each module
functionality can be explained to physicians.
193
Figure 6.10: ROC curves and AUC comparison of three different pipelines, showing the
effectiveness of the hard sample mining technique and stage-2 in reducing the false positive
rate.
6.6 Conclusion
A fully automated two stages pipeline for csPCa detection, RADHop, is proposed in this
work, aiming to offer a CAD tool for radiologists. Decoupling from other approaches and the
trending DL-based solutions, RADHop follows Green Learning to solve the problem and has
a competitive detection performance among other works. It offers a very lightweight model
size and small inference time compared to other methods, with key advantage also that its
pipeline is fully transparent, where each individual module is explainable and without using
back propagation. The feature extraction process in stage-1 that uses basic signal process194
Table 6.5: Results from PCa-RadHop on a local testing set (300 patients) from PICAI and
performance benchmarking with two traditional pipelines that use hand-crafted radiomics.
Configuration Score AUROC AP
Stage-1 (32 × 32) 0.558 0.768 0.348
Stage-1 (24 × 24) 0.579 0.801 0.356
Stage-1 (24 × 24) + Stage-2 0.599 0.822 0.374
[224] 0.498 0.700 0.295
[223] 0.523 0.739 0.307
Table 6.6: Model size and complexity comparison.
Model Model Size (M) FLOPS (B)
nnU-Net 185 (×1681) 346 (×128)
U-Net 159 (×1445) 152 (×56)
PCa-RadHop (Ours) 0.11 (×1) 2.7 (×1)
ing techniques and the adoption of radiomics in stage-2 for refining the prediction output
casts RADHop very transparent to physicians. Also, to offer RADHop’s functionalities to
radiologists, an interface tool has been developed to load and visualize the bpMRI input, as
well as illustrate the predicted heatmap.
195
Chapter 7
Conclusion and Future Work
7.1 Summary Of The Research
Regarding the nuclei segmentation problem, compiling our insights from the overview study,
as well as our two proposed unsupervised methods (i.e. CBM and HUNIS) and the selfsupervised one (i.e. LG-NuSegHop), one can realize that this problem is far from solved.
Variations in the staining and digitization pose challenges to effectively learning representations about nuclei and their boundaries. Additionally, images from different tissues and
cancer types increase the content variations and thus the difficulty. DL fully supervised
solutions are particularly challenged from the small fully annotated datasets and therefore
it is very hard to generalize well to unseen organ. Point-wise annotations have been recently proposed, to largely reduce the annotation time and enable the acquisition of datasets
with many more data samples that can potentially help DNNs to learn more generalized
patterns. On the one hand, fully supervised methods do not seem very efficient solutions,
firstly because of the high demand in annotations that are expensive to obtain and also
as the current research shows, the benefit from the full annotations is not very large. On
the other hand, fully unsupervised solutions can achieve a high performance, even with a
parameter-free model. Although, it is hard to achieve a top performance –to the point they
can make this technology to be adopted in clinical tools. With LG-NuSegHop pipeline, it is
196
demonstrated that one can achieve a high performance across datasets with inherent diversities, without any domain adaptation or priors needed. Hence, this work shows that relying
only on the priors of the problem and self-supervision, a competitive performance can be
achieved, maintaining also a high generalization ability to unseen domains. We contend that
supervision in this problem can be proved helpful where the priors start to not hold very
strongly. Finally, all the proposed pipelines are transparent, using intuitive image processing
techniques, feature extraction and machine learning, and maintain also a very small model
size, comparing to the state-of-the-art.
Regarding the prostate cancer detection from MRI, current state-of-the-art is challenged
from the high false positive rate. Other diseases that are not cancer, such as prostatitis may
have similar visual appearance, thus being areas prone to yield false positives. Also, small
lesion areas where cancer has not been progressed much, can be missed from the detector.
We showed that the proposed method, PCa-RadHop, can achieve a competitive performance
among other DL-based methods. As a key advantage, its complexity and model size is tiny
comparing to state-of-the-art, paving the way for more sustainable research and systems in
future healthcare, at a low carbon footprint. Another aspect that makes our solution very
appealing to radiologists and urologists is that every part of our system can be explained
and each module functionality is independent from the rest parts. Remarkably, the datadriven feature extraction process is unsupervised, where a linear model has been developed to
learn representations using statistics and thus being easier to analyze and interpret. Another
benefit is that it does not require massively annotated data, as the RadHop feature extraction
process is more stable even with smaller datasets. All in all, RadHop learns radiomic-like
features in a data-driven way at a much smaller complexity.
197
7.2 Future Research Directions
7.2.1 Nuclei Segmentation in Histological Images
An extension to our work, trying to extrapolate our current line of research, is the introduction of supervision on areas need it. As commented before, purely unsupervised works
have a limitation in performance. The main goal is to maximize the nuclei segmentation
accuracy, while minimizing the annotation requirements. Point-wise annotations seems a
very efficient solution for reducing the annotation time, since all it takes from physicians is
to place a marker point on each nuclei. According to Qu et. al. [24], even if we reduce the
number of annotated nuclei –starting with a subset of annotation points– the performance
change is very small. Therefore, one problem needs answer is to find a minimum subset of
nuclei that need point-wise annotations. The word “minimum set” here implies that the
weakly supervised model would achieve comparable performance, as if all the annotation
points were used for training. Taking it a step further, for some nuclei maybe their overall
appearance is also important to be annotated. For example, if the inner-nucleus distribution
has more of a texture distribution, then some full mask annotations may be needed on certain
nuclei. Moreover, for clusters of nuclei close to each other (i.e. overlapping) or with blurry
boundaries, full annotations can provide useful cues about their connectivity. This is hard
to be learned from point annotations, since the pre-processing algorithm –which identifies
the nuclei boundaries (e.g. Voronoi labels)–, may under segment some nuclei, thereby giving
a noisy input to the segmentation step.
We argue that future models should be able to accommodate both point annotations and
full segmentation masks of nuclei. As mentioned, there should be a model that identifies
which nuclei need point wise annotations and which full annotations. This model needs to be
unsupervised so that there is no bias from any dataset used for pre-training. It can potentially
detect nuclei that are not easy to segment (e.g. from their segmentation confidence) and
198
thus supervision at this point is imperative. Drawing some parallels with human learning,
for many tasks people use very specific supervision. Yet, this supervision helps in learning
the task much faster and more reliably.
Summarizing our points about potential future extensions, there are two identified main
research directions for nuclei segmentation. The one is to build an unsupervised model that
can pinpoint nuclei that need point-wise annotations and others which their full masks is
needed. This model is expected to be deployed from future annotation tools that will guide
the annotation process carried out from pathologists, with key aim at minimizing their
efforts. The second envisioned line of research is to devise hybrid models that can be trained
on both point-wise and full nuclei annotations. This is a weakly supervision scenario where
those models can potentially perform as good as a model with full supervision. Figure 7.1
illustrates the concept we envision for future nuclei segmentation workflow.
Figure 7.1: Our envisioned future pipeline for annotation tools guided from an unsupervised
model that can extract point-wise and full nuclei annotations. Then, a hybrid model should
leverage this hybrid annotation form in the best way to learn efficiently the nuclei appearance
distribution.
Since the majority of the research has been focused on learning from the labels and optimize based on the producing an accurate prediction with respect to the label, an interesting
line of research is to incorporate biological priors in the problem. For instance, the optimization process could be constraint in predicting nuclei with certain roundness or elliptical
199
shape and boundaries that fit into the biological description of a nuclei in biology. That
could potentially help future DNNs to learn more efficiently from less training data.
7.2.2 Prostate Cancer Detection from MRI
PCa-RadHop pipeline shows a very novel and promising direction for PCa detection with
lighweight and transparent models. There are several extensions that can potentially improve
the detection accuracy in the future. After error analysis, our detector can spot most of the
ground truth lesion areas in patient cohort with high confidence. Therefore, the current
challenge is to minimize the false positive regions. Current literature suggests that adding
more contextual information about the candidate ROIs can help to identify between true
and false positives. One current limitation of our proposed method is that it introduces more
context but in probability level only. Even radiomics features are confined to their ROI, as
it is detected in stage-1. As such, in stage-2, one can include more –multi-scale– context to
help identifying false positives with higher accuracy.
Trying to increase the context about the proposed ROIs from stage-1, one interesting way
would be to combine RadHop features with a CNN to help expand the context in a learnable
way. That is, RadHop features can learn the local texture up to a certain scale, where
it is more homogeneous and can be captures from PCA. Then this feature representation
can be used from a CNN to learn a larger part of the image. In other words, instead of
having a traditional CNN learning from voxel-to-label, one can have a hybrid CNN learning
from RadHop radiomics-to-label. The envisioned concept for the hybrid CNN is illustrated
in Fig.7.2.
The two main zones of prostate Peripheral (PZ) and Transitional (TZ) have different
probability of developing cancer. Zonal information is available from the prostate zonal
segmentation module that takes place before the prostate cancer detection one. Therefore,
this clinical prior could be incorporated in PCa-RadHop, so to emphasize the features from
PZ during learning, employing an attention-like mechanism [32], properly adapted to our
200
Figure 7.2: Hybrid CNN Reg-RadHop-NN for false positives correction. The first layers
that learn low level features close to voxel space are replaced with radiomics features from
RadHop. The later layers are used to expand the context about the ROI.
pipeline.
Following up from the last paragraph, another improvement that can increase our system
performance is the independent feature extraction on different zones. Radiologists argue
that PZ and TZ have different visual patterns. This also evident from PIRADS guidelines,
separating the way lesions on PZ and TZ are graded [181]. Our current approach does not
separate the feature extraction process between the two zones. Therefore, RadHop features
maybe sub-optimal if they are mixed between the two regions that have distinct features.
Towards this end, one can devise two models that operate independently on the two zones,
since PCa-RadHop is a very lightweight method and thus we can seemingly double the
models that can be used, at the expense of a few more parameters.
Other than visual features extracted from MRI, clinical information can also help increasing the performance. Patient age, race and PSA density level changes the likelihood for
a patient to suffer from clinically prostate cancer. Therefore, we plan on integrating those
clinical factors in our PCa-RadHop, so that it has available the same clinical information as
radiologists.
Another interesting path of research is to conduct research towards the interpretability
of PCa-RadHop. As emphasized in Chapter 1, transparent pipeline and explainable features
are conducive for adopting this technology into clinical setting. Towards this end, one could
conduct research, to analyze the RadHop features’ physical meaning using as reference point
201
the radiomics, which radiologists are familiar with. This analysis can be carried out using
signal processing methods and will corroborate our current research, by adding more intuition
into the RadHop feature extraction.
Finally, a problem when working with MRI input is that different clinics and hospitals use
different scanners and parameters. Developing CAD tools that can be handy for radiologists
need to be robust and able to analyze any MRI input, regardless the source. Also, the
MRI quality is conducive for reliable AI predictions. In the clinical world, there is always a
possibility that some poor quality MRIs (i.e. motion artifacts, old scanner version) can be
used as input to the CAD tool. Also, even if the image quality is good, for some scanners
and their parameterization, the input MRIs may lie far from the training distribution of the
CAD tool. In other words, the domain shift may be large enough that the AI-predictions
are not trustworthy.
To address these practical issues, one line of research could be the normalization of MRIs
from different scanners and institutions, so they have a more standardized distribution for
the detector’s input. Moreover, in such a CAD tool, a quality assessment module should
take place to predict a score for each MRI input about whether it can be reliably used for
predictions. “Difficult” samples can bypass the AI model and be analyzed from physicians
directly. RadHop is expected to be very helpful in this direction, because the rich spectralspatial feature representation it provides, may have dimensions that can determine the MRI
quality and predict a quality score. This module would be very useful as part of a commercial
CAD tool meant to be used from radiologists in their everyday routine.
202
Bibliography
[1] Sherry L Murphy et al. “Mortality in the United States, 2020”. In: (2021).
[2] Chisako Muramatsu. “Overview on subjective similarity of images for content-based
medical image retrieval”. In: Radiological physics and technology 11 (2018), pp. 109–
124.
[3] Stephen Waite et al. “Tired in the reading room: the influence of fatigue in radiology”.
In: Journal of the American College of Radiology 14.2 (2017), pp. 191–197.
[4] Colin P West, Liselotte N Dyrbye, and Tait D Shanafelt. “Physician burnout: contributors, consequences and solutions”. In: Journal of internal medicine 283.6 (2018),
pp. 516–529.
[5] Seo Yeon Youn et al. “Detection and PI-RADS classification of focal lesions in prostate
MRI: performance comparison between a deep learning-based algorithm (DLA) and
radiologists with various levels of experience”. In: European Journal of Radiology 142
(2021), p. 109894.
[6] Patricia Raciti et al. “Novel artificial intelligence system increases the detection of
prostate cancer in whole slide images of core needle biopsies”. In: Modern Pathology
33.10 (2020), pp. 2058–2066.
[7] Markus Plass et al. “Explainability and causability in digital pathology”. In: The
Journal of Pathology: Clinical Research 9.4 (2023), pp. 251–260.
[8] Emma Strubell, Ananya Ganesh, and Andrew McCallum. “Energy and policy considerations for modern deep learning research”. In: Proceedings of the AAAI conference
on artificial intelligence. Vol. 34. 09. 2020, pp. 13693–13696.
[9] Ziang Pei et al. “Direct cellularity estimation on breast cancer histopathology images
using transfer learning”. In: Computational and mathematical methods in medicine
2019 (2019).
[10] Andrew Janowczyk, Ajay Basavanhally, and Anant Madabhushi. “Stain normalization
using sparse autoencoders (StaNoSA): application to digital pathology”. In: Computerized Medical Imaging and Graphics 57 (2017), pp. 50–61.
[11] Tha´ına A Azevedo Tosta et al. “Computational normalization of H&E-stained histological images: Progress, challenges and future potential”. In: Artificial intelligence
in medicine 95 (2019), pp. 118–132.
203
[12] Kimberly D Miller et al. “Cancer treatment and survivorship statistics, 2019”. In:
CA: a cancer journal for clinicians 69.5 (2019), pp. 363–385.
[13] Rebecca L Siegel et al. “Cancer statistics, 2023”. In: CA: a cancer journal for clinicians 73.1 (2023), pp. 17–48.
[14] Jelle O Barentsz et al. “ESUR prostate MR guidelines 2012”. In: European radiology
22 (2012), pp. 746–757.
[15] Moritz Kasel-Seibert et al. “Assessment of PI-RADS v2 for the detection of prostate
cancer”. In: European journal of radiology 85.4 (2016), pp. 726–731.
[16] R Clements et al. “Side effects and patient acceptability of transrectal biopsy of the
prostate.” In: Clinical radiology 47.2 (1993), pp. 125–126.
[17] Luke P O’Connor et al. “Future perspective of focal therapy for localized prostate
cancer”. In: Asian Journal of Urology 8.4 (2021), pp. 354–361.
[18] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation”. In: Medical Image Computing and ComputerAssisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer. 2015, pp. 234–241.
[19] Hirohisa Oda et al. “BESNet: boundary-enhanced segmentation of cells in histopathological images”. In: International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer. 2018, pp. 228–236.
[20] Phuong Thi Le et al. “Convolutional blur attention network for cell nuclei segmentation”. In: Sensors 22.4 (2022), p. 1586.
[21] Yanning Zhou et al. “Cia-net: Robust nuclei instance segmentation with contouraware information aggregation”. In: International Conference on Information Processing in Medical Imaging. Springer. 2019, pp. 682–693.
[22] Yi Lin et al. “BoNuS: Boundary Mining for Nuclei Segmentation with Partial Point
Labels”. In: IEEE Transactions on Medical Imaging (2024).
[23] Ye Zhang et al. “DAWN: Domain-Adaptive Weakly Supervised Nuclei Segmentation
via Cross-Task Interactions”. In: arXiv preprint arXiv:2404.14956 (2024).
[24] Hui Qu et al. “Weakly supervised deep nuclei segmentation using partial points annotation in histopathology images”. In: IEEE Transactions on Medical Imaging 39.11
(2020), pp. 3655–3666.
204
[25] Ahmad Algohary et al. “Radiomic features on MRI enable risk categorization of
prostate cancer patients on active surveillance: Preliminary findings: Radiomics Categorizes PCa Patients on AS”. In: Journal of Magnetic Resonance Imaging 48 (Feb.
2018). doi: 10.1002/jmri.25983.
[26] Yang Song et al. “Computer-aided diagnosis of prostate cancer using a deep convolutional neural network from multiparametric MRI”. In: Journal of Magnetic Resonance
Imaging 48.6 (2018), pp. 1570–1577.
[27] Ruiming Cao et al. “Joint Prostate Cancer Detection and Gleason Score Prediction in
mp-MRI via FocalNet”. In: IEEE Transactions on Medical Imaging PP (Feb. 2019),
pp. 1–1. doi: 10.1109/TMI.2019.2901928.
[28] Xunan Huang et al. “Application of U-Net based multiparameter magnetic resonance
image fusion in the diagnosis of prostate cancer”. In: IEEE Access 9 (2021), pp. 33756–
33768.
[29] Audrey Duran et al. “ProstAttention-Net: A deep attention model for prostate cancer
segmentation by aggressiveness in MRI scans”. In: Medical Image Analysis 77 (2022),
p. 102347.
[30] Enmin Song et al. “Prostate lesion segmentation based on a 3D end-to-end convolution neural network with deep multi-scale attention”. In: Magnetic Resonance Imaging
(2023).
[31] Audrey Duran, Pierre-Marc Jodoin, and Carole Lartizien. “Prostate Cancer Semantic
Segmentation by Gleason Score Group in mp-{MRI} with Self Attention Model on
the Peripheral Zone”. In: Medical Imaging with Deep Learning. 2020. url: https:
//openreview.net/forum?id=VN7ww5Nvd.
[32] Audrey Duran, Pierre-Marc Jodoin, and Carole Lartizien. “Prostate cancer semantic segmentation by gleason score group in bi-parametric MRI with self attention
model on the peripheral zone”. In: Medical Imaging with Deep Learning. PMLR.
2020, pp. 193–204.
[33] Metin N Gurcan et al. “Histopathological image analysis: A review”. In: IEEE reviews
in biomedical engineering 2 (2009), pp. 147–171.
[34] Rodrigo Escobar D´ıaz Guerrero et al. “Software tools and platforms in Digital Pathology: a review for clinicians and computer scientists”. In: Journal of Pathology Informatics 13 (2022), p. 100103.
[35] Joann G Elmore et al. “Variability in pathologists’ interpretations of individual breast
biopsy slides: a population perspective”. In: Annals of internal
medicine 164.10 (2016), pp. 649–655.
205
[36] Thao-Quyen H Ho et al. “Cumulative probability of false-positive results after 10
years of screening with digital breast tomosynthesis vs digital mammography”. In:
JAMA Network Open 5.3 (2022), e222440–e222440.
[37] Tineke T Stolk et al. “False positives in PIRADS (V2) 3, 4, and 5 lesions: relationship with reader experience and zonal location”. In: Abdominal Radiology 44 (2019),
pp. 1044–1051.
[38] C. Kanan et al. Independent validation of paige prostate: Assessing clinical benefit of
an artificial intelligence tool within a digital diagnostic pathology laboratory workflow.
2020.
[39] Andrew Lagree et al. “A review and comparison of breast tumor cell nuclei segmentation performances using deep convolutional neural networks”. In: Scientific Reports
11.1 (2021), p. 8025.
[40] Esha Sadia Nasir, Arshi Parvaiz, and Muhammad Moazam Fraz. “Nuclei and glands
instance segmentation in histology images: a narrative review”. In: Artificial Intelligence Review (2022), pp. 1–56.
[41] Tomohiro Hayakawa et al. “Computational nuclei segmentation methods in digital pathology: a survey”. In: Archives of Computational Methods in Engineering 28
(2021), pp. 1–13.
[42] Humayun Irshad et al. “Methods for nuclei detection, segmentation, and classification
in digital histopathology: a review—current status and future potential”. In: IEEE
reviews in biomedical engineering 7 (2013), pp. 97–114.
[43] Xiaomin Zhou et al. “A comprehensive review for breast histopathology image analysis
using classical and deep neural networks”. In: IEEE Access 8 (2020), pp. 90931–90956.
[44] Z Pei et al. Direct Cellularity Estimation on Breast Cancer Histopathology Images
Using Transfer Learning (2019).
[45] M. Salvi et al. “Impact of Stain Normalization on Pathologist Assessment of Prostate
Cancer: A Comparative Study”. In: Cancers 15.5 (2023), p. 1503.
[46] John KC Chan. “The wonderful colors of the hematoxylin–eosin stain in diagnostic
surgical pathology”. In: International journal of surgical pathology 22.1 (2014), pp. 12–
32.
[47] Andrew H Fischer et al. “Hematoxylin and eosin staining of tissue and cell sections”.
In: Cold spring harbor protocols 2008.5 (2008), pdb–prot4986.
[48] James L Hiatt and P Leslie. Tratado de histologia em cores. 2007.
206
[49] Emmanouil Michail et al. “Detection of centroblasts in h&e stained images of follicular lymphoma”. In: 2014 22nd Signal Processing and Communications Applications
Conference (SIU). IEEE. 2014, pp. 2319–2322.
[50] M. Aubreville et al. “Quantifying the scanner-induced domain gap in mitosis detection”. In: arXiv:2103.16515 (2021).
[51] Christof A Bertram et al. “A large-scale dataset for mitotic figure assessment on whole
slide images of canine cutaneous mast cell tumor”. In: Scientific data 6.1 (2019),
p. 274.
[52] Santanu Roy et al. “A study about color normalization methods for histopathology
images”. In: Micron 114 (2018), pp. 42–61.
[53] Surbhi Vijh, Mukesh Saraswat, and Sumit Kumar. “A new complete color normalization method for H&E stained histopatholgical images”. In: Applied Intelligence
(2021), pp. 1–14.
[54] Metin N Gurcan et al. “Image analysis for neuroblastoma classification: segmentation
of cell nuclei”. In: 2006 International Conference of the IEEE Engineering in Medicine
and Biology Society. IEEE. 2006, pp. 4844–4847.
[55] Cheng Lu et al. “A robust automatic nuclei segmentation technique for quantitative histopathological image analysis”. In: Analytical and Quantitative Cytology and
Histology 34 (2012), pp. 296–308.
[56] Hongmin Cai et al. “A new iterative triclass thresholding technique in image segmentation”. In: IEEE transactions on image processing 23.3 (2014), pp. 1038–1046.
[57] Hady Ahmady Phoulady et al. “Nucleus segmentation in histology images with hierarchical multilevel thresholding”. In: Medical Imaging 2016: Digital Pathology. Vol. 9791.
SPIE. 2016, pp. 280–285.
[58] A. Gautam et al. “Automatic classification of leukocytes using morphological features
and na¨ıve Bayes classifier”. In: 2016 IEEE region 10 conference (TENCON). IEEE.
2016, pp. 1023–1027.
[59] Khin Yadanar Win and Somsak Choomchuay. “Automated segmentation of cell nuclei
in cytology pleural fluid images using OTSU thresholding”. In: 2017 International
Conference on Digital Arts, Media and Technology (ICDAMT). IEEE. 2017, pp. 14–
18.
[60] Vasileios Magoulianitis et al. “An Unsupervised Parameter-Free Nuclei Segmentation
Method for Histology Images”. In: 2022 IEEE International Conference on Image
Processing (ICIP). IEEE. 2022, pp. 226–230.
207
[61] Vasileios Magoulianitis, Yijing Yang, and C-C Jay Kuo. “HUNIS: High-Performance
Unsupervised Nuclei Instance Segmentation”. In: 2022 IEEE 14th Image, Video, and
Multidimensional Signal Processing Workshop (IVMSP). IEEE. 2022, pp. 1–5.
[62] Mitko Veta et al. “Marker-controlled watershed segmentation of nuclei in H&E stained
breast cancer biopsy images”. In: 2011 IEEE international symposium on biomedical
imaging: from nano to macro. IEEE. 2011, pp. 618–621.
[63] Abhishek Vahadane and Amit Sethi. “Towards generalized nuclear segmentation in
histological images”. In: 13th IEEE International Conference on BioInformatics and
BioEngineering. IEEE. 2013, pp. 1–4.
[64] Jie Shu et al. “Segmenting overlapping cell nuclei in digital histopathology images”.
In: 2013 35th Annual International Conference of the IEEE Engineering in Medicine
and Biology Society (EMBC). IEEE. 2013, pp. 5445–5448.
[65] Yuxin Cui and Jianjun Hu. “Self-adjusting nuclei segmentation (SANS) of HematoxylinEosin stained histopathological breast cancer images”. In: 2016 IEEE International
Conference on Bioinformatics and Biomedicine (BIBM). IEEE. 2016, pp. 956–963.
[66] Can Fahrettin Koyuncu et al. “Iterative h-minima-based marker-controlled watershed
for cell nucleus segmentation”. In: Cytometry Part A 89.4 (2016), pp. 338–349.
[67] Uppada Rajyalakshmi, S Koteswara Rao, and K Satya Prasad. “Supervised classification of breast cancer malignancy using integrated modified marker controlled watershed approach”. In: 2017 IEEE 7th International Advance Computing Conference
(IACC). IEEE. 2017, pp. 584–589.
[68] Min Hu, Xijian Ping, and Yihong Ding. “Automated cell nucleus segmentation using improved snake”. In: 2004 International Conference on Image Processing, 2004.
ICIP’04. Vol. 4. IEEE. 2004, pp. 2737–2740.
[69] Hussain Fatakdawala et al. “Expectation–maximization-driven geodesic active contour with overlap resolution (emagacor): Application to lymphocyte segmentation
on breast cancer histopathology”. In: IEEE Transactions on Biomedical Engineering
57.7 (2010), pp. 1676–1689.
[70] Pegah Faridi et al. “An automatic system for cell nuclei pleomorphism segmentation
in histopathological images of breast cancer”. In: 2016 IEEE Signal Processing in
Medicine and Biology Symposium (SPMB). IEEE. 2016, pp. 1–5.
[71] Sabeena Beevi, Madhu S Nair, and GR Bindu. “Automatic segmentation of cell nuclei
using Krill Herd optimization based multi-thresholding and localized active contour
model”. In: Biocybernetics and Biomedical Engineering 36.4 (2016), pp. 584–596.
208
[72] R Rashmi, Keerthana Prasad, and Chethana Babu K Udupa. “Multi-channel ChanVese model for unsupervised segmentation of nuclei from breast histopathological
images”. In: Computers in Biology and Medicine 136 (2021), p. 104651.
[73] Yousef Al-Kofahi et al. “Improved automatic detection and segmentation of cell nuclei
in histopathology images”. In: IEEE Transactions on Biomedical Engineering 57.4
(2009), pp. 841–852.
[74] Ondˇrej Danˇek et al. “Segmentation of touching cell nuclei using a two-stage graph
cut model”. In: Image Analysis: 16th Scandinavian Conference, SCIA 2009, Oslo,
Norway, June 15-18, 2009. Proceedings 16. Springer. 2009, pp. 410–419.
[75] Ling Zhang et al. “Segmentation of cytoplasm and nuclei of abnormal cells in cervical
cytology using global and local graph cuts”. In: Computerized Medical Imaging and
Graphics 38.5 (2014), pp. 369–380.
[76] Omid Sarrafzadeh and Alireza Mehri Dehnavi. “Nucleus and cytoplasm segmentation
in microscopic images using K-means clustering and region growing”. In: Advanced
biomedical research 4 (2015).
[77] K. Y. Win, S. Choomchuay, and K. Hamamoto. “K mean clustering based automated segmentation of overlapping cell nuclei in pleural effusion cytology images”.
In: 2017 International Conference on Advanced Technologies for Communications
(ATC). IEEE. 2017, pp. 265–269.
[78] Young Hwan Chang et al. “Quantitative analysis of histological tissue image based
on cytological profiles and spatial statistics”. In: 2016 38th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE.
2016, pp. 1175–1178.
[79] Ratna Saha, Mariusz Bajger, and Gobert Lee. “Spatial Shape Constrained Fuzzy CMeans (FCM) Clustering for Nucleus Segmentation in Pap Smear Images”. In: 2016
International Conference on Digital Image Computing: Techniques and Applications
(DICTA). 2016, pp. 1–8. doi: 10.1109/DICTA.2016.7797086.
[80] Nobuyuki Otsu. “A threshold selection method from gray-level histograms”. In: IEEE
transactions on systems, man, and cybernetics 9.1 (1979), pp. 62–66.
[81] Jos BTM Roerdink and Arnold Meijster. “The watershed transform: Definitions, algorithms and parallelization strategies”. In: Fundamenta informaticae 41.1-2 (2000),
pp. 187–228.
[82] Tony F Chan and Luminita A Vese. “Active contours without edges”. In: IEEE Transactions on image processing 10.2 (2001), pp. 266–277.
209
[83] David Bryant Mumford and Jayant Shah. “Optimal approximations by piecewise
smooth functions and associated variational problems”. In: Communications on pure
and applied mathematics (1989).
[84] Jaakko Sauvola and Matti Pietik¨ainen. “Adaptive document image binarization”. In:
Pattern recognition 33.2 (2000), pp. 225–236.
[85] Joy Hsu, Wah Chiu, and Serena Yeung. “DARCNN: Domain Adaptive Region-based
Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition. 2021, pp. 1003–1012.
[86] Xin Zheng et al. “Fast and robust segmentation of white blood cell images by selfsupervised learning”. In: Micron 107 (2018), pp. 55–71.
[87] Mihir Sahasrabudhe et al. “Self-supervised nuclei segmentation in histopathological
images using attention”. In: International Conference on Medical Image Computing
and Computer-Assisted Intervention. Springer. 2020, pp. 393–402.
[88] Hesham Ali, Mustafa Elattar, and Sahar Selim. “A Multi-scale Self-supervision Method
for Improving Cell Nuclei Segmentation in Pathological Tissues”. In: Annual Conference on Medical Image Understanding and Analysis. Springer. 2022, pp. 751–763.
[89] Narinder Singh Punn and Sonali Agarwal. “BT-Unet : A self-supervised learning
framework for biomedical image segmentation using barlow twins with U-net models”.
In: Machine Learning 111.12 (2022), pp. 4585–4600.
[90] Dongnan Liu et al. “Unsupervised instance segmentation in microscopy images via
panoptic domain adaptation and task re-weighting”. In: Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition. 2020, pp. 4243–4252.
[91] Xinpeng Xie et al. “Instance-aware self-supervised learning for nuclei segmentation”.
In: International Conference on Medical Image Computing and Computer-Assisted
Intervention. Springer. 2020, pp. 341–350.
[92] Nicklas Boserup and Raghavendra Selvan. “Efficient selfsupervision using patch-based contrastive learning for
histopathology image segmentation”. In: arXiv:2208.10779 (2022).
[93] H. Liang et al. “A region-based convolutional network for nuclei detection and segmentation in microscopy images”. In: Biomedical Signal Processing and Control 71
(2022), p. 103276.
[94] Kaushiki Roy et al. “Nuclei-Net: A multi-stage fusion model for nuclei segmentation
in microscopy images”. In: (2023).
210
[95] Simon Graham et al. “Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images”. In: Medical Image Analysis 58 (2019), p. 101563.
[96] A. Kumar C., S. Lal, and J. Kini. “High-resolution deep transferred ASPPU-Net for
nuclei segmentation of histopathology images”. In: International journal of computer
assisted radiology and surgery 16 (2021), pp. 2159–2175.
[97] Iqra Kiran et al. “DenseRes-Unet: Segmentation of overlapped/clustered nuclei from
multi organ histopathology images”. In: Computers in Biology and Medicine 143
(2022), p. 105267.
[98] Khadijeh Saednia, William T Tran, and Ali Sadeghi-Naini. “A Cascaded Deep Learning Framework for Segmentation of Nuclei in Digital Histology Images”. In: 2022 44th
Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE. 2022, pp. 4764–4767.
[99] Emrah Hancer et al. “An imbalance-aware nuclei segmentation methodology for H&E
stained histopathology images”. In: Biomedical Signal Processing and Control 83
(2023), p. 104720.
[100] Shengcong Chen et al. “CPP-net: Context-aware polygon proposal network for nucleus segmentation”. In: IEEE Transactions on Image Processing 32 (2023), pp. 980–
994.
[101] Neeraj Kumar et al. “A dataset and a technique for generalized nuclear segmentation
for computational pathology”. In: IEEE transactions on medical imaging 36.7 (2017),
pp. 1550–1560.
[102] S. Chen, C. Ding, and D. Tao. “Boundary-assisted region proposal networks for nucleus segmentation”. In: Medical Image Computing and Computer Assisted Intervention–
MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part V 23. Springer. 2020, pp. 279–288.
[103] Jian Qin et al. “REU-Net: Region-enhanced nuclei segmentation network”. In: Computers in Biology and Medicine 146 (2022), p. 105546.
[104] Jo Schlemper et al. “Attention gated networks: Learning to leverage salient regions
in medical images”. In: Medical image analysis 53 (2019), pp. 197–207.
[105] Shyam Lal et al. “NucleiSegNet: robust deep learning architecture for the nuclei
segmentation of liver cancer histopathology images”. In: Computers in Biology and
Medicine 128 (2021), p. 104075.
[106] Guihua Yang et al. “GCP-Net: A Gating Context-Aware Pooling Network for Cervical
Cell Nuclei Segmentation”. In: Mobile Information Systems 2022 (2022).
211
[107] Fuyong Xing, Yuanpu Xie, and Lin Yang. “An automatic learning-based framework
for robust nucleus segmentation”. In: IEEE transactions on medical imaging 35.2
(2015), pp. 550–566.
[108] Inwan Yoo, Donggeun Yoo, and Kyunghyun Paeng. “Pseudoedgenet: Nuclei segmentation only with point annotations”. In: Medical Image Computing and Computer
Assisted Intervention–
MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17,
2019, Proceedings, Part I 22. Springer. 2019, pp. 731–739.
[109] H. Qu et al. “Nuclei segmentation using mixed points and masks selected from uncertainty”. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI).
IEEE. 2020, pp. 973–976.
[110] Kuan Tian et al. “Weakly-supervised nucleus segmentation based on point annotations: A coarse-to-fine self-stimulated learning strategy”. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part V 23. Springer. 2020, pp. 299–
308.
[111] Yi Lin et al. “Label propagation for annotation-efficient nuclei segmentation from
pathology images”. In: arXiv preprint arXiv:2202.08195 (2022).
[112] Wei Lou et al. “Which pixel to annotate: a label-efficient nuclei segmentation framework”. In: IEEE Transactions on Medical Imaging 42.4 (2022), pp. 947–958.
[113] H. Qu et al. “Weakly supervised deep nuclei segmentation using points annotation in
histopathology images”. In: International Conference on Medical Imaging with Deep
Learning. PMLR. 2019, pp. 390–400.
[114] Wei Hu et al. “Generative adversarial training for weakly supervised nuclei instance
segmentation”. In: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE. 2020, pp. 3649–3654.
[115] Neeraj Kumar et al. “A Multi-Organ Nucleus Segmentation Challenge”. In: IEEE
Transactions on Medical Imaging 39.5 (2020), pp. 1380–1391. doi: 10.1109/TMI.
2019.2947628.
[116] Q. D. Vu et al. “Methods for segmentation and classification of digital microscopy
tissue images”. In: Frontiers in bioengineering and biotechnology (2019), p. 53.
[117] Peter Naylor et al. “Segmentation of nuclei in histopathology images by deep regression of the distance map”. In: IEEE transactions on medical imaging 38.2 (2018),
pp. 448–459.
212
[118] Korsuk Sirinukunwattana et al. “Locality Sensitive Deep Learning for Detection and
Classification of Nuclei in Routine Colon Cancer Histology Images”. In: IEEE Transactions on Medical Imaging 35.5 (2016), pp. 1196–1206. doi: 10.1109/TMI.2016.
2525803.
[119] Juan C Caicedo et al. “Nucleus segmentation across imaging experiments: the 2018
Data Science Bowl”. In: Nature methods 16.12 (2019), pp. 1247–1253.
[120] Alexander Kirillov et al. “Panoptic segmentation”. In: Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition. 2019, pp. 9404–9413.
[121] Anne E Carpenter et al. “CellProfiler: image analysis software for identifying and
quantifying cell phenotypes”. In: Genome biology 7.10 (2006), pp. 1–11.
[122] Johannes Schindelin et al. “Fiji: an open-source platform for biological-image analysis”. In: Nature methods 9.7 (2012), pp. 676–682.
[123] Taekyung Kim et al. “Diversify and match: A domain adaptive representation learning
paradigm for object detection”. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 2019, pp. 12456–12465.
[124] Kaiming He et al. “Deep residual learning for image recognition”. In: Proceedings of
the IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778.
[125] Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for largescale image recognition”. In: arXiv preprint arXiv:1409.1556 (2014).
[126] Gao Huang et al. “Densely connected convolutional networks”. In: Proceedings of the
IEEE conference on computer vision and pattern recognition. 2017, pp. 4700–4708.
[127] Kaiming He et al. “Mask r-cnn”. In: Proceedings of the IEEE international conference
on computer vision. 2017, pp. 2961–2969.
[128] Hao Chen et al. “DCAN: Deep contour-aware networks for object instance segmentation from histology images”. In: Medical image analysis 36 (2017), pp. 135–146.
[129] Shu Liu et al. “Path aggregation network for instance segmentation”. In: Proceedings
of the IEEE conference on computer vision and pattern recognition. 2018, pp. 8759–
8768.
[130] Shivang Naik et al. “Automated gland and nuclei segmentation for grading of prostate
and breast cancer histopathology”. In: 2008 5th IEEE International Symposium on
Biomedical Imaging: From Nano to Macro. IEEE. 2008, pp. 284–287.
213
[131] J-P Thiran and Benoit Macq. “Morphological feature extraction for the classification
of digital images of cancerous tissues”. In: IEEE Transactions on biomedical engineering 43.10 (1996), pp. 1011–1020.
[132] Theodoros Mouroutis, Stephen J Roberts, and Anil A Bharath. “Robust cell nuclei
segmentation using statistical modelling”. In: Bioimaging 6.2 (1998), pp. 79–91.
[133] Norberto Malpica et al. “Applying watershed algorithms to the segmentation of clustered nuclei”. In: Cytometry: The Journal of the International Society for Analytical
Cytology 28.4 (1997), pp. 289–297.
[134] Sahirzeeshan Ali et al. “Adaptive energy selective active contour with shape priors for
nuclear segmentation and gleason grading of prostate cancer”. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer.
2011, pp. 661–669.
[135] Humayun Irshad et al. “Crowdsourcing image annotation for nucleus detection and
segmentation in computational pathology: evaluating experts, automated methods,
and the crowd”. In: Pacific symposium on biocomputing Co-chairs. World Scientific.
2014, pp. 294–305.
[136] Le Hou et al. “Robust histopathology image analysis: To label or to synthesize?” In:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, pp. 8533–8542.
[137] Wenjun Yan et al. “The domain shift problem of medical image segmentation and
vendor-adaptation by Unet-GAN”. In: International Conference on Medical Image
Computing and Computer-Assisted Intervention. Springer. 2019, pp. 623–631.
[138] Shyam Lal et al. “NucleiSegNet: Robust deep learning architecture for the nuclei
segmentation of liver cancer histopathology images”. In: Computers in Biology and
Medicine 128 (2021), p. 104075. issn: 0010-4825.
[139] Vasileios Magoulianitis et al. “Unsupervised Data-Driven Nuclei Segmentation For
Histology Images”. In: arXiv preprint arXiv:2110.07147 (2021).
[140] Jing-Hao Xue and D Michael Titterington. “t-Tests, F-Tests and Otsu’s Methods
for Image Thresholding”. In: IEEE Transactions on Image Processing 20.8 (2011),
pp. 2392–2396.
[141] Adel Hafiane, Filiz Bunyak, and Kannappan Palaniappan. “Fuzzy clustering and active contours for histopathology image segmentation and nuclei detection”. In: International Conference on Advanced Concepts for Intelligent Vision Systems. Springer.
2008, pp. 903–914.
214
[142] Khamael Al-Dulaimi et al. “White blood cell nuclei segmentation using level set
methods and geometric active contours”. In: 2016 International Conference on Digital
Image Computing: Techniques and Applications (DICTA). IEEE. 2016, pp. 1–7.
[143] Mohammed Ali Roula, Ahmed Bouridane, and Fatih Kurugollu. “An evolutionary
snake algorithm for the segmentation of nuclei in histopathological images”. In: 2004
International Conference on Image Processing, 2004. ICIP’04. Vol. 1. IEEE. 2004,
pp. 127–130.
[144] Hongming Xu et al. “An unsupervised method for histological image segmentation
based on tissue cluster level graph cut”. In: Computerized Medical Imaging and Graphics 93 (2021), p. 101974.
[145] Pengfei Shen et al. “Segmenting multiple overlapping nuclei in h&e stained breast
cancer histopathology images based on an improved watershed”. In: 2015 IET International Conference on Biomedical Image and Signal Processing (ICBISP 2015).
IET. 2015, pp. 1–4.
[146] Paula Andrea Dorado, Raul Celis, and Eduardo Romero. “Color separation of H&E
stained samples by linearly projecting the RGB representation onto a custom discriminant surface”. In: 12th International Symposium on Medical Information Processing and Analysis. Vol. 10160. International Society for Optics and Photonics.
2017, 101600P.
[147] Esha Sadia Nasir, Arshi Parvaiz, and Muhammad Moazam Fraz. “Nuclei and glands
instance segmentation in histology images: a narrative review”. In: Artificial Intelligence Review 56.8 (2023), pp. 7909–7964.
[148] Wei Lou et al. “Structure embedded nucleus classification for histopathology images”.
In: IEEE Transactions on Medical Imaging (2024).
[149] Tsai Hor Chan et al. “Histopathology whole slide image analysis with heterogeneous graph representation learning”. In: Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. 2023, pp. 15661–15670.
[150] Vasileios Magoulianitis, Catherine A Alexander, and C-C Jay Kuo. “A Comprehensive
Overview of Computational Nuclei Segmentation Methods in Digital Pathology”. In:
arXiv preprint arXiv:2308.08112 (2023).
[151] Xiaoming Liu et al. “MDC-net: A new convolutional neural network for nucleus segmentation in histopathology images with distance maps and contour information”.
In: Computers in Biology and Medicine 135 (2021), p. 104543.
[152] Wenxi Liu et al. “Contrastive and uncertainty-aware nuclei segmentation and classification”. In: Computers in Biology and Medicine (2024), p. 108667.
215
[153] Yi Lin et al. “Nuclei segmentation with point annotations from pathology images
via self-supervised learning and co-training”. In: Medical Image Analysis 89 (2023),
p. 102933.
[154] C-C Jay Kuo and Azad M Madni. “Green learning: Introduction, examples and
outlook”. In: Journal of Visual Communication and Image Representation (2022),
p. 103685.
[155] C-C Jay Kuo et al. “Interpretable convolutional neural networks via feedforward
design”. In: Journal of Visual Communication and Image Representation (2019).
[156] Lipeng Xie et al. “Integrating deep convolutional neural networks with marker-controlled
watershed for overlapping nuclei segmentation in histopathology images”. In: Neurocomputing 376 (2020), pp. 166–179.
[157] Tsung-Yi Lin et al. “Microsoft coco: Common objects in context”. In: Computer
Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-
12, 2014, Proceedings, Part V 13. Springer. 2014, pp. 740–755.
[158] Jun-Yan Zhu et al. “Unpaired image-to-image translation using cycle-consistent adversarial networks”. In: Proceedings of the IEEE international conference on computer
vision. 2017, pp. 2223–2232.
[159] Yijing Yang, Vasileios Magoulianitis, and C-C Jay Kuo. “E-pixelhop: An enhanced
pixelhop method for object classification”. In: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE.
2021, pp. 1475–1482.
[160] Yueru Chen et al. “Pixelhop++: A small successive-subspace-learning-based (sslbased) model for image classification”. In: 2020 IEEE International Conference on
Image Processing (ICIP). IEEE. 2020, pp. 3294–3298.
[161] Xiaofeng Liu et al. “Voxelhop: Successive subspace learning for als disease classification using structural mri”. In: IEEE journal of biomedical and health informatics
26.3 (2021), pp. 1128–1139.
[162] Xiaofeng Liu et al. “Segmentation of cardiac structures via successive subspace learning with saab transform from cine mri”. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE. 2021,
pp. 3535–3538.
[163] Vasileios Magoulianitis et al. “PCa-RadHop: A transparent and lightweight feedforward method for clinically significant prostate cancer segmentation”. In: Computerized Medical Imaging and Graphics (2024), p. 102408.
216
[164] Massimo Salvi, Nicola Michielli, and Filippo Molinari. “Stain Color Adaptive Normalization (SCAN) algorithm: Separation and standardization of histological stains
in digital pathology”. In: Computer methods and programs in biomedicine 193 (2020),
p. 105506.
[165] Yijing Yang et al. “On supervised feature selection from high dimensional feature
spaces”. In: APSIPA Transactions on Signal and Information Processing 11.1 (2022).
[166] Tommy L¨ofstedt et al. “Gray-level invariant Haralick texture features”. In: PloS one
14.2 (2019), e0212110.
[167] Neeraj Kumar et al. “A multi-organ nucleus segmentation challenge”. In: IEEE transactions on medical imaging 39.5 (2019), pp. 1380–1391.
[168] Katarzyna Tomczak, Patrycja Czerwi´nska, and Maciej Wiznerowicz. “Review The
Cancer Genome Atlas (TCGA): an immeasurable source of knowledge”. In: Contemporary Oncology/Wsp´o lczesna Onkologia 2015.1 (2015), pp. 68–77.
[169] Amirreza Mahbod et al. “CryoNuSeg: A dataset for nuclei instance segmentation
of cryosectioned H&E-stained histological images”. In: Computers in biology and
medicine 132 (2021), p. 104349.
[170] Simon Graham et al. “Lizard: A large-scale dataset for colonic nuclear instance segmentation and classification”. In: Proceedings of the IEEE/CVF international conference on computer vision. 2021, pp. 684–693.
[171] Zhiyun Song et al. “Nucleus-aware self-supervised pretraining using unpaired imageto-image translation for histopathology images”. In: IEEE Transactions on Medical
Imaging (2023).
[172] Yang Zhou et al. “Cyclic learning: Bridging image-level labels and nuclei instance
segmentation”. In: IEEE Transactions on Medical Imaging 42.10 (2023), pp. 3104–
3116.
[173] Huadeng Wang et al. “Multi-task generative adversarial learning for nuclei segmentation with dual attention and recurrent convolution”. In: Biomedical Signal Processing
and Control 75 (2022), p. 103558.
[174] Hongliang He et al. “Cdnet: Centripetal direction network for nuclear instance segmentation”. In: Proceedings of the IEEE/CVF International Conference on Computer
Vision. 2021, pp. 4026–4035.
[175] Hongliang He et al. “Toposeg: Topology-aware nuclear instance segmentation”. In:
Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023,
pp. 21307–21316.
217
[176] Ziyue Wang et al. “Dynamic Pseudo Label Optimization in Point-Supervised Nuclei
Segmentation”. In: arXiv preprint arXiv:2406.16427 (2024).
[177] Hu Cao et al. “Swin-unet: Unet-like pure transformer for medical image segmentation”. In: European conference on computer vision. Springer. 2022, pp. 205–218.
[178] Vi Thi-Tuong Vo and Soo-Hyung Kim. “Mulvernet: nucleus segmentation and classification of pathology images using the HoVer-Net and multiple filter units”. In:
Electronics 12.2 (2023), p. 355.
[179] Zhongyu Li et al. “Toward source-free cross tissues histopathological cell segmentation
via target-specific finetuning”. In: IEEE Transactions on Medical Imaging 42.9 (2023),
pp. 2666–2677.
[180] Jaquelyn L Jahn, Edward L Giovannucci, and Meir J Stampfer. “The high prevalence of undiagnosed prostate cancer at autopsy: implications for epidemiology and
treatment of prostate cancer in the Prostate-specific Antigen-era”. In: International
journal of cancer 137.12 (2015), pp. 2795–2802.
[181] Jeffrey C Weinreb et al. “PI-RADS prostate imaging–reporting and data system:
2015, version 2”. In: European urology 69.1 (2016), pp. 16–40.
[182] Ji Won Seo et al. “PI-RADS version 2: detection of clinically significant cancer
in patients with biopsy gleason score 6 prostate cancer”. In: American Journal of
Roentgenology 209.1 (2017), W1–W9.
[183] Morand Piert et al. “Accuracy of tumor segmentation from multi-parametric prostate
MRI and 18F-choline PET/CT for focal prostate cancer therapy applications”. In:
EJNMMI research 8.1 (2018), pp. 1–14.
[184] Paul F Jaeger et al. “Retina U-Net: Embarrassingly simple exploitation of segmentation supervision for medical object detection”. In: Machine Learning for Health
Workshop. PMLR. 2020, pp. 171–183.
[185] F. Milletari, N. Navab, and S. Ahmadi. “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation”. In: 2016 Fourth International
Conference on 3D Vision (3DV). 2016, pp. 565–571. doi: 10.1109/3DV.2016.79.
[186] Yuan Yuan et al. “Z-SSMNet: A Zonal-aware Self-Supervised Mesh Network for
Prostate Cancer Detection and Diagnosis in bpMRI”. In: arXiv preprint arXiv:2212.05808
(2022).
[187] Shoshana B Ginsburg et al. “Radiomic features for prostate cancer detection on MRI
differ between the transition and peripheral zones: preliminary findings from a multi218
institutional study”. In: Journal of Magnetic Resonance Imaging 46.1 (2017), pp. 184–
193.
[188] Thomas Sanford et al. “Deep-learning-based artificial intelligence for PI-RADS classification to assist multiparametric prostate MRI interpretation: A development study”.
In: Journal of Magnetic Resonance Imaging 52.5 (2020), pp. 1499–1507.
[189] Bejoy Abraham and Madhu S Nair. “Computer-aided diagnosis of clinically significant prostate cancer from MRI images using sparse autoencoder and random forest
classifier”. In: Biocybernetics and Biomedical Engineering 38.3 (2018), pp. 733–744.
[190] Sunghwan Yoo et al. “Prostate cancer detection using deep convolutional neural networks”. In: Scientific reports 9.1 (2019), p. 19518.
[191] Ruiming Cao et al. “Joint prostate cancer detection and Gleason score prediction
in mp-MRI via FocalNet”. In: IEEE transactions on medical imaging 38.11 (2019),
pp. 2496–2506.
[192] Mozhdeh Rouhsedaghat et al. “Successive Subspace Learning: An Overview”. In:
arXiv preprint arXiv:2103.00121 (2021).
[193] Xin Yu et al. “Deep attentive panoptic model for prostate cancer detection using biparametric MRI scans”. In: Medical Image Computing and Computer Assisted
Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–
8, 2020, Proceedings, Part IV 23. Springer. 2020, pp. 594–604.
[194] Xin Yu et al. “False positive reduction using multiscale contextual features for prostate
cancer detection in multi-parametric MRI scans”. In: 2020 IEEE 17th international
symposium on biomedical imaging (ISBI). IEEE. 2020, pp. 1355–1359.
[195] Na Li et al. “On energy compaction of 2D Saab image transforms”. In: 2019 AsiaPacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE. 2019, pp. 466–475.
[196] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional Networks
for Biomedical Image Segmentation. 2015. arXiv: 1505.04597 [cs.CV].
[197] Praful Hambarde et al. “Prostate lesion segmentation in MR images using radiomics
based deeply supervised U-Net”. In: Biocybernetics and Biomedical Engineering 40.4
(2020), pp. 1421–1435.
[198] Fabian Isensee et al. “nnU-Net: a self-configuring method for deep learning-based
biomedical image segmentation”. In: Nature methods 18.2 (2021), pp. 203–211.
219
[199] Pegah Khosravi et al. “A deep learning approach to diagnostic classification of prostate
cancer using pathology–radiology fusion”. In: Journal of Magnetic Resonance Imaging
54.2 (2021), pp. 462–471.
[200] Josh Sanyal et al. “An automated two-step pipeline for aggressive prostate lesion
detection from multi-parametric MR sequence”. In: AMIA Summits on Translational
Science Proceedings 2020 (2020), p. 552.
[201] Nahian Siddique et al. “U-net and its variants for medical image segmentation: A
review of theory and applications”. In: Ieee Access 9 (2021), pp. 82031–82057.
[202] Timothy Wong et al. “Fully automated detection of prostate transition zone tumors
on T2-weighted and apparent diffusion coefficient (ADC) map MR images using UNet ensemble”. In: Medical Physics 48.11 (2021), pp. 6889–6900.
[203] Patrick Schelb et al. “Classification of cancer at prostate MRI: deep learning versus
clinical PI-RADS assessment”. In: Radiology 293.3 (2019), pp. 607–617.
[204] Zhenzhen Dai et al. “Accurate Prostate Cancer Detection and Segmentation Using
Non-Local Mask R-CNN With Histopathological Ground Truth”. In: International
Journal of Radiation Oncology, Biology, Physics 111.3 (2021), S45.
[205] Zhiyu Liu et al. “A two-stage approach for automated prostate lesion detection and
classification with mask R-CNN and weakly supervised deep neural network”. In: Artificial Intelligence in Radiation Therapy: First International Workshop, AIRT 2019,
Held in Conjunction with MICCAI 2019, Shenzhen, China, October 17, 2019, Proceedings 1. Springer. 2019, pp. 43–51.
[206] Mukesh Soni et al. “Light weighted healthcare CNN model to detect prostate cancer on multiparametric MRI”. In: Computational Intelligence and Neuroscience 2022
(2022).
[207] Ruiming Cao et al. “Prostate cancer detection and segmentation in multi-parametric
MRI via CNN and conditional random field”. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE. 2019, pp. 1900–1904.
[208] Xin Yang et al. “Co-trained convolutional neural networks for automated detection
of prostate cancer in multi-parametric MRI”. In: Medical image analysis 42 (2017),
pp. 212–227.
[209] Robert James Gillies et al. “The biology underlying molecular imaging in oncology:
from genome to anatome and back again”. In: Clinical radiology 65.7 (2010), pp. 517–
521.
220
[210] Philippe Lambin et al. “Radiomics: extracting more information from medical images
using advanced feature analysis”. In: European journal of cancer 48.4 (2012), pp. 441–
446.
[211] Ahmad Algohary et al. “Radiomic features on MRI enable risk categorization of
prostate cancer patients on active surveillance: Preliminary findings”. In: Journal of
Magnetic Resonance Imaging 48.3 (2018), pp. 818–828.
[212] Renato Cuocolo et al. “Clinically significant prostate cancer detection on MRI: A
radiomic shape features study”. In: European journal of radiology 116 (2019), pp. 144–
149.
[213] Jussi Toivonen et al. “Radiomics and machine learning of multisequence multiparametric prostate MRI: Towards improved non-invasive prostate cancer characterization”. In: PLoS One 14.7 (2019), e0217702.
[214] Shan Yao, Hanyu Jiang, and Bin Song. “Radiomics in prostate cancer: basic concepts
and current state-of-the-art”. In: Chinese Journal of Academic Radiology 2 (2020),
pp. 47–55.
[215] C-C Jay Kuo and Yueru Chen. “On data-driven saak transform”. In: Journal of Visual
Communication and Image Representation 50 (2018), pp. 237–246.
[216] Zohreh Azizi and C-C Jay Kuo. “PAGER: Progressive Attribute-Guided Extendable
Robust Image Generation”. In: arXiv preprint arXiv:2206.00162 (2022).
[217] Pranav Kadam et al. “Unsupervised Point Cloud Registration via Salient Points Analysis (SPA)”. In: 2020 IEEE International Conference on Visual Communications and
Image Processing (VCIP). IEEE. 2020, pp. 5–8.
[218] Xuejing Lei, Ganning Zhao, and C-C Jay Kuo. “NITES: A Non-Parametric Interpretable Texture Synthesis Method”. In: 2020 Asia-Pacific Signal and Information
Processing Association Annual Summit and Conference (APSIPA ASC). IEEE. 2020,
pp. 1698–1706.
[219] Mozhdeh Rouhsedaghat et al. “Facehop: A light-weight low-resolution face gender
classification method”. In: arXiv preprint arXiv:2007.09510 (2020).
[220] Min Zhang et al. “Pointhop++: A lightweight learning model on point sets for 3d
classification”. In: 2020 IEEE International Conference on Image Processing (ICIP).
IEEE. 2020, pp. 3319–3323.
[221] Min Zhang et al. “PointHop: An Explainable Machine Learning Method for Point
Cloud Classification”. In: IEEE Transactions on Multimedia (2020).
221
[222] Anindo Saha et al. “Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI—The PI-CAI Challenge”. In: Medical Imaging with Deep Learning,
short paper track. 2023.
[223] Geert Litjens et al. “Computer-aided detection of prostate cancer in MRI”. In: IEEE
transactions on medical imaging 33.5 (2014), pp. 1083–1092.
[224] PC Vos et al. “Automatic computer-aided detection of prostate cancer based on multiparametric magnetic resonance image analysis”. In: Physics in Medicine & Biology
57.6 (2012), p. 1527.
222
Abstract (if available)
Abstract
Artificial Intelligence (AI) for healthcare systems is a rapidly growing field that has received an unprecedented number of contributions during the last decade. Modern AI and Deep Learning (DL) have enabled the automation of many tasks within the clinical diagnosis pipeline, paving the way for AI-powered computer-aided diagnosis (CAD) tools that can be used as physicians’ assistants. This is expected to reduce the required time for diagnosis, which in turn will decrease the clinical costs on patients’ end. Moreover, AI CAD tools provide a more objective decision making process, thus eliminating the inter-reader disagreement rate. Cancer detection is of paramount importance within healthcare and clinical diagnosis. AI is anticipated to play an instrumental role in the future of cancer diagnosis. It will assist physicians in accurately diagnosing cancer in its early stages, a pivotal step for saving lives. In medical image analysis, there are different types of images that need to be analyzed. Magnetic Resonance Imaging (MRI) analysis shows the clinical image of several organs, in a non-invasive way. Thereby, modern AI and computer vision can automate some of the tasks in radiologists’ pipeline and expedite their everyday workflow. After the radiology report, the final step that follows for cancer detection and staging is the histopathological image analysis. Digital pathology analyzes images from biopsies to reveal nuclei patterns indicative to cancer. Towards this end, nuclei cell segmentation is an important, yet laborious and time consuming task for pathologists. All in all, the thesis’s proposed solutions can be divided into two parts, the histological and MRI image analysis. That aims to provide automation at different types of images across the different stages within cancer diagnosis pipeline.
For the histological image analysis part, this thesis includes three self-supervised pipelines proposed for nuclei segmentation, which is a key task in cancer grading. At first, a novel preprocessing technique is proposed to enhance the appearance of nuclei cells over background. A set of local processing techniques is also proposed to predict a pseudo-label in an unsupervised way (named CBM and HUNIS methods), based on a new adaptive thresholding technique and a novel anomalous instance removal module. In turn, a novel feature extraction module is proposed, named NuSegHop, to learn the local texture of nuclei, based on the generated pseudolabel. Furthermore, a set of post-processing techniques are applied globally on the predicted heatmap of NuSegHop, to improve the detection rate of nuclei. Extensive experiments on three publicly available datasets demonstrate the effectiveness of the proposed methods, where they have a competitive standing among other self-supervised methods, as well as a high generalization ability to unseen domains. On the other hand, the MR part of this work is on the prostate cancer, which is the second most frequently occurring cancer in men. PCa-RadHop model is proposed to automate the prostate cancer detection, and includes two stages. Stage-1 predicts a heatmap using a novel feature extraction method, named RadHop, and generates candidates Regions of Interest (ROI) for stage-2. RadHop is a linear model meant to learn data-driven radiomics that help to classify cancerous regions. Stage-2 has been devised to re-classify the candidate ROIs from stage-1, by including more context information surrounding them. The goal of stage-2 is to reduce the assigned probability of the false positives, thus increasing the detection performance. PCa-RadHop has achieved a competitive performance in the PI-CAI challenge, having a model size orders of magnitude less than other state-of-the-art DL-based works and also maintaining a more transparent pipeline than other DL-based solutions.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Video object segmentation and tracking with deep learning techniques
PDF
Advanced techniques for object classification: methodologies and performance evaluation
PDF
Syntax-aware natural language processing techniques and their applications
PDF
Digital signal processing techniques for music structure analysis
PDF
Data-driven image analysis, modeling, synthesis and anomaly localization techniques
PDF
Optimization methods and algorithms for constrained magnetic resonance imaging
PDF
Advanced knowledge graph embedding techniques: theory and applications
PDF
A data-driven approach to image splicing localization
PDF
Novel computational techniques for connectome analysis based on diffusion MRI
PDF
Advanced techniques for green image coding via hierarchical vector quantization
PDF
Green learning for 3D point cloud data processing
PDF
Machine learning techniques for outdoor and indoor layout estimation
PDF
A green learning approach to image forensics: methodology, applications, and performance evaluation
PDF
3D vessel mapping techniques for retina and brain as an early imaging biomarker for small vessel diseases
PDF
Advanced visual processing techniques for latent fingerprint detection and video retargeting
PDF
Advanced coronary CT angiography image processing techniques
PDF
Real-time simulation of hand anatomy using medical imaging
PDF
Object classification based on neural-network-inspired image transforms
PDF
Local-aware deep learning: methodology and applications
PDF
Pattern detection in medical imaging: pathology specific imaging contrast, features, and statistical models
Asset Metadata
Creator
Magoulianitis, Vasileios
(author)
Core Title
Transparent and lightweight medical image analysis techniques: algorithms and applications
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Degree Conferral Date
2024-08
Publication Date
10/01/2024
Defense Date
09/09/2024
Publisher
Los Angeles, California
(original),
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
histology imaging,interpretability,low complexity,medical image analysis,MRI analysis,nuclei segmentation,OAI-PMH Harvest,prostate cancer segmentation
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, Jay (
committee chair
)
Creator Email
magoulia@usc.edu,vamagoul@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC11399BEE6
Unique identifier
UC11399BEE6
Identifier
etd-Magouliani-13566.pdf (filename)
Legacy Identifier
etd-Magouliani-13566
Document Type
Dissertation
Format
theses (aat)
Rights
Magoulianitis, Vasileios
Internet Media Type
application/pdf
Type
texts
Source
20241002-usctheses-batch-1216
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
histology imaging
interpretability
low complexity
medical image analysis
MRI analysis
nuclei segmentation
prostate cancer segmentation