Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Machine learning based techniques for biomedical image/video analysis
(USC Thesis Other)
Machine learning based techniques for biomedical image/video analysis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
i
MACHINE LEARNING BASED TECHNIQUES FOR BIOMEDICAL IMAGE/
VIDEO ANALYSIS
by
Xue Wang
____________
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(ELECTRICAL ENGINEERING)
August 2014
Doctoral Committee:
Professor C. -C. Jay Kuo, Chair
Professor Alexander A. Sawchuk, Co-Chair
Professor Jesse T. Yen, Outside Member
Copyright 2014 Xue Wang
ii
Abstract
During the last decade advances in biomedical and information technologies have
increased the requirements for objective and automated approaches to analyze large-scale
biomedical image data using methods from image processing and computer vision. To
explore the data and determine meaningful conclusions, image analysis methods
incorporating advanced machine learning algorithms are expected to be beneficial. The
primary purpose is to emphasize machine learning potentials for biomedical image
processing. It focuses on three particular topics: 1) high-throughput screening (HTS) of a
chemical compound library aimed at drug discovery, 2) mitochondria segmentation for
morphological subtype quantification, and 3) objective analysis of surgical skills for the
capsulorhexis procedure in cataract surgery. We present specific approaches to these
problems.
In the first part, our contribution lies in the development of a pipeline algorithm that
analyzes microscopic fluorescent cell images in HTS, which screens large chemical
libraries for compounds that enhance peroxisome assembly in cells from patients having
peroxisome biogenesis disorders (PBDs). The challenge mainly lies in how to accurately
detect the peroxisome shown in punctate structures, which indicates the degree of
successful peroxisome assembly due to a specific treatment. Ideally, in PBD cells
peroxisomes that are completely rescued will present as clearly discernible punctate
structures; however, in many cases, cells only respond partially, resulting in a blurry
fluorescent textured cytosol background. We successfully overcome this challenge by
developing an analysis pipeline. Results will show that our approach is sensitive and can
reliably detect recovery of peroxisome assembly in PBD cell lines, and is ready for
automated screening of large-scale chemical libraries by applying machine learning
while feature extraction and classification are improved.
For the second part, our work explores sub-cellular feature learning to realize fully
automated segmentation for mitochondria, which requires more accurate and robust
techniques to delineate mitochondria in serial confocal microscopic data. The goal is to
iii
establish and validate a data-driven platform for mitochondrial functional analysis based
on accurate tracing of mitochondrial structures. Previous solutions are limited due to the
inhomogeneity in background intensity, signal-to-noise ratio, and noise level. To address
those problems we first use a machine learning approach to estimate the main structures
in mitochondria objects and then apply line segment detection to recognize and locate
mitochondria centerline fragments. To bridge the fragments, a cost function is developed
to judge the occurrence of connection for each pair of centerline fragments. At the output
of the segmentation system, standard image segmentation metrics are used to evaluate
the results. Our results show that the proposed pipeline achieves segmentation accuracy
larger than 98%. In addition to accuracy, another advantage is modularity of the pipeline,
as each step in the pipeline could be altered to improve the accuracy for different kinds of
image datasets.
The final part of this work explores the use of image analysis methods and computer
vision techniques to assess the cataract surgical skills of residents from surgical video
data. The work aims at enhancing teaching by accurately measuring and evaluating
residents’ surgical performance to produce prompt feedback from surgical experts. The
capsulorhexis part is studied as it is considered one of the most significant steps in
cataract surgery. The ultimate goal is to generate objective numeric assessments for
surgical videos. To realize this, a 3-stage video assessment system is developed to
process the video frames consistently. The video frame sequence is stabilized and an
evaluation metric is developed based on the grading intuition. The registered video
frames are then sent to a learning stage so that the surgical instruments in the frames can
be identified and extracted. The instrument movement is recognized and tracked. The
final stage is to quantify instrument movements such as insertion and withdrawal since
these steps may be repeated several times. Experimental results show that our proposed
pipeline is able provide a new tool to automatically measure proficiency in capsulorhexis
surgery. The results of this work demonstrate that image processing and computer vision
approaches coupled with artificial intelligence techniques are valuable for solving several
challenging problems of biomedical information processing. The methods produce more
iv
objective and reliable results and the variability of inspection techniques involving
human observers.
v
Dedication
I dedicate this thesis to my father, Chun-Wei Wang, and
my mother, Mei-Rong Pang.
Hope you would be kept away from diseases.
vi
Acknowledgements
This dissertation would not have been possible without the support from so many people
in every way it could be. It is a composition that combines every piece of thinking and
idea that makes me greatly different from whom I was 5 years ago. I am very much
grateful to be accepted by the University of Southern California (USC), and I am
awarded tremendous spiritual treasure as a life-long gift here. I learned to cogitate
independently and deliberately, enjoying the fresh academia atmosphere and fruitful
research achievement.
There are a number of people to whom I am greatly appreciative.
First, I would like to express the deepest respect to Prof. C-C Jay Kuo, who presents
rigorous scholarship and diligence when mentoring my work. He convincingly conveyed
spirit of taking adventure, inspiring me to overcome obstacles in research and pursue the
truth. Prof. Kuo lets me know that knowledge is never exhaustible and the joy of learning
would be maximized when mind is open. His enthusiasm in nurturing students and
persistent help are significant to my work.
I express my sincere gratitude to my co-advisor, Dr. Alexander A. (Sandy) Sawchuk,
for his constant guidance and encouragement. The last year is fulfilled with valuable and
precious research discussions with Dr. Sawchuk. I am extremely indebted to him for
unceasing support. Dr. Ronald J. Smith and Dr. Sawchuk introduced me to the world of
cataract surgery assessment, which has been a challenging and interesting research topic.
I would also like to thank Dr. Ronald J. Smith from University of California, Los
Angeles (UCLA) and Dr. Chun-Nan Hsu from University of California, San Diego
(UCSD), for participating in my education and training during the research. My
knowledge, vision, and capability have grown under their supervision. And it is my
honor to work with talented people and learn from them.
Finally I would like to thank my parents, who keep standing behind me and
supporting me all the time. They know my hard times better than anyone else -- even
vii
better than myself. They brought me to this world and let me know how wonderful it is.
They have understood and cared for me since my first day in the world, and I hope to
reward them as much as possible.
I feel thankful to one and all who, more or less, have helped me in this venture.
1
Table of Contents
Abstract .............................................................................................................................. ii
Dedication .......................................................................................................................... v
Acknowledgements ........................................................................................................... vi
List of Figures .................................................................................................................... 3
List of Tables ..................................................................................................................... 5
Chapter 1 Introduction .................................................................................................. 6
1.1 Introduction ..................................................................................................................... 6
1.2 Significance of the Research ........................................................................................... 9
1.3 Contributions of the Research ...................................................................................... 11
1.4 Organization of the Dissertation .................................................................................. 13
Chapter 2 Background Review ................................................................................... 15
2.1 Review on Biomedical Imaging Techniques ............................................................... 15
2.2 Background of Research Topics .................................................................................. 17
2.2.1 High-Throughput Screening ..................................................................................... 17
2.2.2 Mitochondria Segmentation ..................................................................................... 19
2.2.3 Evaluation of Capsulorhexis Surgical Techniques ................................................... 21
2.3 Challenges ...................................................................................................................... 23
Chapter 3 High-Throughput Drug Screening for Peroxisome Biogenesis Disorders
..................................................................................................................... 26
3.1 Introduction ................................................................................................................... 26
3.2 Methods .......................................................................................................................... 29
3.2.1 Data Collection ......................................................................................................... 29
3.2.2 Software .................................................................................................................... 29
3.2.3 Peroxitracker Pipeline .............................................................................................. 30
3.2.4 CP/CPA Pipeline ...................................................................................................... 37
3.3 Results and Discussion .................................................................................................. 40
3.4 Conclusion ...................................................................................................................... 43
2
Chapter 4 Morphological Feature Learning for Mitochondria Segmentation ...... 45
4.1 Introduction ................................................................................................................... 45
4.2 Human Learning of Mitochondrial Morphology ....................................................... 48
4.3 Methods .......................................................................................................................... 49
4.3.1 Image Dataset and Ground Truth ............................................................................. 49
4.3.2 Stage I of Segmentation System ............................................................................... 50
4.3.3 Stage II of Segmentation System ............................................................................. 53
4.4 Results and Discussion .................................................................................................. 55
4.5 Conclusion ...................................................................................................................... 66
Chapter 5 Objective Analysis of Capsulorhexis Surgical Techniques .................... 67
5.1 Introduction ................................................................................................................... 67
5.2 Video Dataset ................................................................................................................. 71
5.3 Methods .......................................................................................................................... 72
5.3.1 Stage I: Video Registration ...................................................................................... 73
5.3.2 Stage II: Instrument Movement Understanding ....................................................... 80
5.4 Results and Discussion .................................................................................................. 88
5.4.1 Video Registration of Pupil ...................................................................................... 88
5.4.2 Instrument Identification and Tracking .................................................................. 101
5.5 Conclusion .................................................................................................................... 107
Chapter 6 Conclusion and Future Work ................................................................. 109
6.1 Conclusion .................................................................................................................... 109
6.2 Future Research Directions ........................................................................................ 111
Bibliography .................................................................................................................. 113
3
List of Figures
Figure 1. Illustration of capsulorhexis ........................................................................ 22
Figure 2. Confirmation of small-scale drug screen hits ............................................. 27
Figure 3. Analysis pipeline of Peroxitracker .............................................................. 31
Figure 4. Illstation of illumination correction ............................................................ 32
Figure 5. Peroxisome extraction by using LAT algorithm ........................................ 33
Figure 6. Illustration of improved quantification to score a well ............................. 36
Figure 7. Illustration of miscounting the nuclei by CP .............................................. 37
Figure 8. Representative examples of 4 types of cells observed in the experiment . 39
Figure 9. Performance comparison of different software for analysis ..................... 41
Figure 10. The morphological subtypes of mitochondria [27] .................................. 46
Figure 11. Feature extraction within an individual patch ......................................... 51
Figure 12. Observed and labeled data ......................................................................... 56
Figure 13. Illustration of using feature input to train the logistic regression
classifier ....................................................................................................... 57
Figure 14. Comparison of different prediction quality .............................................. 59
Figure 15. Error analysis after Stage I ........................................................................ 59
Figure 16. Hough and de-Hough transform for line segment detection .................. 60
Figure 17. Connection of mitochondria breakouts .................................................... 63
Figure 18. AUC-ROC Curves under different approaches ....................................... 64
Figure 19. Comparison of mitochondria segmentation results ................................. 65
Figure 20. Flowchart of designing algorithms for objective cataract surgical
assessment ................................................................................................... 68
Figure 21. Proposed surgical technique assessment system ...................................... 73
Figure 22. Proposed video registration approach by template matching ................ 77
4
Figure 23. Adding temporal filter to remove foreground instrument movement ... 77
Figure 24. Pipeline of instrument identification and tracking .................................. 80
Figure 25. Illustration of using feature input to train binary SVM classifier ......... 81
Figure 26. Improved SVM classifier by adding a geometrical transformation
model ........................................................................................................... 86
Figure 27. Example of 2 pairs of control points on detected needle from separate
frames .......................................................................................................... 87
Figure 28. Raw frame image and the edge detection results ..................................... 89
Figure 29. Segmentation of cornea and pupil by Paint Selection ............................. 90
Figure 30. Oval shape fitting of cornea and pupil using ImageJ (yellow) ............... 91
Figure 31. Histogram of red, green, and blue channels of Figure 28(a) ................... 92
Figure 32. Extraction of cornea based on different threshold values in 3 channels93
Figure 33. Human labeled region of cornea (left) and pupil (right) ......................... 94
Figure 34. Raw frame (frame 30, left) and the registration result (right) ............... 95
Figure 35. Adaptive templates (left) and modified adaptive templates (right) for
video CCR903 at different times: (a) 10 sec, (b) 20 sec, (c)30 sec, and (d)
40 sec. ........................................................................................................... 96
Figure 36. Compared results for detected contour (middle) and labeled contour
(right) ........................................................................................................... 96
Figure 37. Raw video frame (left) vs. registered video frame (right) for video
CCR903 ....................................................................................................... 98
Figure 38. Raw video frame (left) vs. registered video frame (right) for video
CCR988 ....................................................................................................... 99
Figure 39. Visualization of labeled data and feature input for Figure 28(a) ......... 102
Figure 40. Registered video frame after Canny edge detection (left) vs. registered
video frame with instrument predicted by SVM (right) for video
CCR903 ..................................................................................................... 104
Figure 41. Instrument predicted by SVM (right) for video CCR903 vs. improved
instrument tracking for video CCR903 .................................................. 105
5
List of Tables
Table 1. The comparison of applying different processes on the 2012-01-26 plate 40
Table 2. Comparison of the discriminating power of different pipeline ................. 42
Table 3. The 2-stage segmentation system .................................................................. 50
Table 4. Normalized cost matrix based on cost function .......................................... 61
Table 5. Different I-P rates that are applied for optimal results ............................. 88
Table 6. Comparison of Experimental Results by Programs and Ground Truth .. 94
6
Chapter 1 Introduction
1.1 Introduction
The research field of biomedical image analysis is very broad, and includes a number of
essential techniques for biomedical image acquisition, processing, understanding,
management, and visualization for specific medical diagnoses. With advances in
biomedical research, clinical applications, and computing technology, biomedical image
analysis is evolving rapidly. Addressing unsolved problems requires the use of novel and
powerful techniques that use a priori knowledge, modeling of uncertainty, data mining
and image understanding [1-3].
With the development of Internet technology and multimedia techniques, research
in machine learning has emerged, benefiting a number of advanced topics between
computer vision and conventional signal processing. This research direction can be used
to simulate human thought [4-5] and has attracted increasing attention in many
disciplines at the frontier of multidisciplinary science and engineering. Machine learning
has inspired new ideas and methods for scientists from many different professional
backgrounds. As a branch of artificial intelligence, those systems have begun to show
human comprehension and behavior characteristics. Machine learning concepts have
entered practical applications in various areas of the national economy as well as many
aspects of social life, and its development will continue, new algorithms for exploring
and understanding of existing data, and classifiers that are trained to make predictions
will be developed [6-11].
Due to the characteristics of machine learning approaches, researchers in biomedical
image processing and analysis have realized that knowledge-based understanding
7
systems can be built to address the difficulties in biomedical data mining more
successfully than existing methods, and many efforts have been focused on exploring the
relationship between machine learning and image processing [12-17]. The standard
pipeline of learning-based biomedical image analysis systems include four components:
1) an initial module for pre-processing the raw image data, including denosing the
images and illumination correction for enhanced feature extraction; 2) a segmentation
module in which objects or structures of interests are divided from the background by a
certain set of rules; 3) a feature extraction module in which the global features and/or
specific features are extracted; and 4) a pattern-recognition module which interprets and
analyzes the features, as well as the transformed signals of features, based on
segmentation results [2]. Those basic pipelines have been applied in image analysis to
overcome problems such as unpredictable distortions and high levels of
background-to-signal noise ratio. In the last two decades, that basic pipeline has
advanced to a strong combination of advanced machine learning and image processing
techniques.
The related work includes a large family of topics [18] and in this thesis, our work
will focus on the application of learning-based approaches in digital image processing in
three specific problems: high-throughout screening (HTS) targeted at drug discovery as a
peroxisome proliferator; mitochondria segmentation imaged by confocal fluorescence
microscopy; and quantitative assessment of surgical skills in the capsulorhexis part of
cataract surgery. All of these problems share challenges in the image segmentation.
Image segmentation is defined as partitioning an image into multiple non-overlapped
regions due to a semantic attribution specified by a human. As in digital images, a single
region is a set of constituent pixels having human-specified characteristics. In biomedical
image analysis, one segmented structure could be an object of a single cell, organelle, or
nuclei et al. It is obvious that any inaccuracies of the segmentation module directly
impact the following modules of feature extraction and pattern classification.
In the first part of our work on HTS in the chemical library for peroxisome
proliferator, two pipelines are designed to process fluorescent microscope images during
8
the high-throughput screening, analyzing all images, handling the statistics, replacing the
visual scoring and enabling larger libraries to be screened, which in the existing assay
was done by human visual inspection. Image processing methods are applied to make
HTS for large-scale images more accurate and efficient since the feature extraction and
peroxisome recognition is improved. Our work also compares the sensitivity and
reliability of detecting recovery of peroxisome assembly in peroxisome biogenesis
disorders patient cells for automated screens of large-scale chemical libraries.
Learning-based approaches are coupled and applied for assay validation.
The second part of our work concerns automatic mitochondria segmentation and the
designed approach has two stages. The first stage is a machine learning stage, where
classifiers are trained by texture features of image data to learn the characteristics of
pixels labeled as foreground mitochondria objects or background. Then the model is
applied to the testing data to estimate the probability of each pixel being a foreground,
followed by a standard image binarization approach. The second stage is built based on
error analysis from previous stage, and a selection approach aimed at global optimization
of minimizing errors is designed to achieve mitochondria centerline extraction by
establishing a cost function in order to bridge the disconnection of mitochondria branches.
Results and discussion explaining why our approach is superior to other methods are
given.
The last part discusses information processing and analysis for video data. It aims to
use video data from the capsulorhexis step of cataract surgery to develop quantitative
assessment techniques to measure quality of the opening, which is created in the capsule,
and to evaluate efficiency and accuracy with which the surgical technique is performed.
The goal is to improve the teaching of surgery residents using interventions and
techniques that are scientifically established contribute to the surgical training curriculum.
The main challenge of the work lies in the broad range of video quality, and the difficulty
of identifying instruments and their movements. A 3-stage assessment system is designed
to perform video registration, instrument movement understanding, and surgical
technique quantification, respectively. Those specific components all require developing
9
numerical metrics. Our experimental results show that the proposed assessment system
could contribute to provide an objective measurement of capsulorhexis surgical skills
with minimum human inspection.
1.2 Significance of the Research
The motivation and significance of the three topics in the research work are described as
follows. The first topic is the application of HTS for efficient drug discovery for treating
patients suffering from peroxisome biogenesis disorders (PBD).
Peroxisomes are organelles found in almost all eukaryotic cells [19, 20]. They play
key roles in normal lipid metabolic processes, including the catabolism of potentially
neurotoxic very long chain fatty acids and the biosynthesis of neuroprotective ether
phospholipids called plasmalogens. PBDs are inherited disorders caused by defects in
PEX genes required for normal peroxisome assembly [21]. Peroxisome matrix proteins
are normally translated in the cytoplasm and subsequently imported into the peroxisome
with the assistance of PEX genes. In PBD patient cells, this matrix protein import process
is impaired and as a result these matrix proteins are instead left in the cytoplasm and
normal peroxisome functions are compromised. While severely affected PBD patients
show profound mental retardation and die within one year of age, the majority of patients
have milder forms of disease compatible with survival through adulthood. The latter
individuals show mild to moderate mental retardation, craniofacial dysmorphism, liver
dysfunction, progressive sensorineural hearing loss, and retinopathy. To date, most
treatments are palliative in nature and do not address the molecular basis for disease,
hence an automatic HTS pipeline is in urgent need for biological investigation of arrayed
cells to identify compounds that enhance peroxisome assembly in cells obtained from
PBD patients.
However, the related work to this topic is quite limited. The closest work would be
Sexton et al.’s work in Int. J. High Throughput Screen in 2010, in which they report an
effective assay validation and get an optimized Z’ factor of 0.74 [22]. In their study to
10
explore a drug discovery on non-classical peroxisome proliferators, the cell line is from
HCC liver cancer, not from PBD and peroxisomes function normally. Moreover,
although the goal should be detecting increasing of the number of peroxisomes, their
principle component analysis (PCA) shows that the peroxisome number is not a n
important feature correlated to regress a scoring method that leads to high Z’ factor.
Those disadvantages deserve to be addressed for a better HTS pipeline.
The second topic is to realize mitochondria segmentation integrated with confocal
microscopy imaging system. The importance of this research work lies in related
morphological characters evaluation for mitochondria objects after accurate object
extraction. Studies have shown that the fusion-fission dynamics of mitochondria is
involved in many cellular processes, including maintenance of adenosine triphosphate
(ATP) levels, redox signaling, oxidative stress generation, and cell death [23-26].
Therefore, mitochondrial morphology can reveal the physiological or pathological status
of mitochondria and in a typical analysis, and researchers manually label the
mitochondria morphological structures into several subtypes, such as fragmented,
networked, and swollen structures [27]. However, although there exist a number of
algorithms for mitochondria segmentation [28-30], they require careful manual tuning
and optimization while the resultant segmented mitochondria objects are still not
correctly classified into standardized morphological subtypes. The challenge is that the
gray-level fluorescent intensity is the only clue to segment background from foreground
mitochondrial objects. To overcome the challenge, our work aims at applying computer
vision techniques to achieve accurate segmentation based on texture feature extraction
for morphological characters.
The last topic is to realize objective analysis of capsulorhexis surgical techniques.
As state-of-art evaluation system, cataract surgical techniques are measured by
expert-panel-video-review using evaluation questions from Global Rating Assessment of
Skills in Intraocular Surgery (GRASIS) and the International Council of Ophthalmology
approved Ophthalmology Surgical Competency Assessment Rubric for
phacoemulsification (ICO-OSCAR: phaco) [31, 32]. The accuracy of grading results
11
would be degraded form inter-observer variability as direct quantitative questions are
least reliable. And the grading process is tedious and time-consuming, resulting in high
cost for experienced surgery experts, not to mention that the results could be suffered
from possible grader misinterpretation [33]. Both the Accreditation Council for Graduate
Medical Education (ACGME) and the American Board of Ophthalmology have
requested development of scientific methods for teaching surgery [34]. Our work would
directly contribute in 4 aspects: 1) to develop scientific measures of surgical technique, 2)
to improve teaching interventions, 3) to less emphasis on case counts as primary measure,
and 4) to adopt new measures of training outcomes. This work, including video frame
stabilization of pupil/limbus, instrument identification/labeling/tracking, instrument
movement measurement, surgical techniques understanding, and robustness testing, can
show clinical significance in quantitative evaluation, reliable assessment, and prompt
feedback. Moreover, the automatic assessment system saves time for experienced
surgical experts.
1.3 Contributions of the Research
For the high content screening for efficient drug discovery for PBD, compared to the
mentioned disadvantages in the existing methods to do high-throughout screening
applications, this research is one of the first attempts that apply a fully automated high
throughput image assay to screen effective drugs for peroxisome biogenesis disorders.
The major contributions and advantages of this research are summarized below.
1) The explored cell line is from PBD patients’ fibroblast, where peroxisomes in those
cells do not function normally. And positive controls are TMSO or Diosmetin, which
are unapproved and pure drug-like chemical compound for drug target. As the metric
to evaluate the assay validation, Z’ factor is calculated based on percentage of
rescued cells per well and well responses in positive and negative treatment.
12
2) A robust pipeline, called Peroxitracker, is designed to overcome the key challenge by
applying various enhancing and smoothing methods at both image and feature levels
to develop an analysis pipeline.
3) Peroxitracker can achieve a Z’ factor > 0.7, which measures how widely the
distributions of the scores of the positive and negative controls that enhance
peroxisome assembly are separated, while the Z’ factor from the other pipeline built
based on existing software is 0.44 after improvement. The performance of
Peroxitracker is the closest to human observation compared to other competing
pipelines.
4) Compared to existing software, it is proved that Peroxitracker is more sensitive and
can reliably detect recovery of peroxisome assembly in peroxisome biogenesis
disorders patient cells and is ready for automated screens of large-scale chemical
libraries.
For the mitochondria segmentation, the contribution of this research work is
described as follows.
1) The work involves constant feedback between computation and experimental results,
i.e., the approach is built based on combining the principle of computer vision and
human vision. A 2-stage segmentation system has been built.
2) In Stage I where machine learning classifiers are trained for initial segmentation, the
key to the success is that the image signal is transformed and represented by a linear
combination of a subset of extracted texture features, and data grouping methods are
applied to enhance the accuracy of classifiers. Our work shows that learning-based
approaches fit our problem as they can overcome the existing challenges. Because
although the mitochondria appear to have a wide variety of morphologies, it is quite
likely that we can identify a weighted combination of features as atoms to represent
each mitochondrion.
13
3) In Stage II of mitochondria centerline extraction, the cost function is designed based
on the human learning/labeling experience to judge the occurrence of connection for
each pair of centerline fragments in mitochondria, which improves the accuracy of
segmentation and the resultant mitochondrial morphological characteristics.
4) The overall segmentation pipeline results in mitochondria segmentation that retains
morphological features as well as segmentation accuracy as high as 98%. Moreover,
another advantage is modularity of this pipeline, as each processing step in both two
stages in the pipeline could be improved individually to suit different kinds of image
dataset for better accuracy.
In the analysis of capsulorhexis surgical skills, our work shows superiority in the
following aspects:
1) The work analyzes the significant metrics in order to generate accurate and reliable
numeric assessments.
2) A 3-stage surgical video assessment system is developed based on exploring the use
of image analysis and computer vision methods to assess quantitatively cataract
surgical skills of residents from surgical video clips.
3) Experimental results, including registered video, extracted limbus outline, instrument
detection and tracking, show that the proposed assessment system is the basis for an
objective way to measure proficiency in the capsulorhexis part of cataract surgery.
1.4 Organization of the Dissertation
The rest of the proposal is organized as the follows. Chapter 2 provides a background
review of the three research topics in detail. The related work and the challenges of the
research are discussed as well. Chapter 3 focuses on completed description and
explanation of methods, results, and discussion in designing fully automatic HTS
pipelines of peroxisome screening. The performance comparison of those two pipelines
is also provided. The machine learning model and dynamic global optimization approach
for mitochondria segmentation is provided in Chapter 4. Chapter 5 presents the work on
14
evaluation of cataract surgical techniques based on video analysis. The final concluding
remarks and future work is presented in Chapter 6.
15
Chapter 2 Background Review
A comprehensive overview of background knowledge related to the research is presented
in this chapter. Sec. 2.1 provides an introduction of conventional and recent work in
biomedical image processing. It illustrates the reasons why knowledge-based machine
learning techniques are useful for biomedical image analysis. The following sections 2.2
and 2.3 discuss the developments in image analysis approaches on the three special
biomedical application issues. The common solution to three problems is based on
knowledge-based learning approaches in image segmentation. At the end of this chapter,
the challenges are interpreted with an emphasis on requiring robust approaches as
powerful tools to address the challenges in an automatic way.
2.1 Review on Biomedical Imaging Techniques
In the last two decades, research work in biomedical image processing and analysis has
witnessed a tremendous improvement of powerful technologies in both computerized and
practical aspects. Those technologies in acquisition, storing, transmitting, understanding,
and displaying digital biomedical images are helping biological scientists and medical
researchers to obtain quantitative evaluation and numerical analysis, which are key steps
in efficiently extracting clinical information for accurate diagnostic and therapeutic
evaluation [35-37].
Work on designing biomedical image processing systems was started in the late
1960s and early 1970s. After the computerized tomography (CT) scanner was invented in
1972, advanced high-resolution scanning techniques were developed to deal in
interaction of X-ray, ultrasound, magnetism (MRI), nuclear medicine, and light (optical
imaging technologies) [38]. The extracted states of an organ or tissue were presented as
16
images and simple image processing approaches have shown great achievement.
Algorithms in image enhancement, gray-level image mapping, and image reconstruction
are applied in both pre-processing and post-processing stages to enhance the clinical
information; morphological operations and histological measurements are studied to
characterize and classify objects of interest. These application-oriented image analysis
systems conventionally have three components: a biomedical image acquisition system, a
computerized image processing and analysis pipeline for assay validation, and an image
display environment. As the techniques in each unit improved, the real-time image
processing systems were developed with advanced and complex analysis functions
provided for manipulation. Those real-time processing techniques had then become a
significant research branch in modern sophisticated biomedical image analysis
undertaking interpretation tasks in specific applications [2].
In recent years, supercomputers have expanded their computing power, and the huge
quantity of information to be processed quickly and accurately have brought on the era of
“Big Data”. Nowadays knowledge-based image understanding techniques are playing key
roles to establish expert systems and such image understanding subsumes artificial
intelligence vision and image processing strategies [39]. In this situation, machine
learning shows considerable potential in designing objective algorithms to analyze
multimodal and high-dimensional biomedical information. As research work in machine
learning becomes mature, deep understanding of biomedical data has attracted attention
in developing supervised and unsupervised feature extraction and pattern recognition.
Particularly significant for biomedical science applications has been the exploration of
decision-making systems, especially for clinical detection and diagnosis of diseases [40].
Researchers are making efforts to set up machine learning systems as efficient and
precise as human brains, which can cooperate the prior knowledge into making clinically
targeted decisions. By this means, machine learning community has been explored as a
rich source of novel techniques in an attempt to solve the existing difficulties in
understanding genomic, proteomic, and ancillary data, which are integrated with
17
advanced imagery techniques. In the following sections, we review three special topics
that we expect to benefit from machine learning.
2.2 Background of Research Topics
2.2.1 High-Throughput Screening
Over the last two decades, High-throughput Screening (HTS) has been applied in a
variety of advanced drug discovery techniques for assay validation targeted at exploring
new drugs in pharmaceutical research, which is driven by the increasing economic
pressure to enhance the competitiveness of commercial products and to reduce the
development cost at the same time. A traditional drug discovery pipeline, including the
components of target identification, target validation, lead identification, candidate
optimization, pre-clinical proof, and clinical proof, usually takes a period of more than
one decade to complete. As the demand for screening effective compounds from large
compound libraries keeps increasing, high-speed drug discovery tools are required to
allow researchers to quickly conduct millions of compound tests [41, 42].
HTS is an efficient approach in target validation by combining robotics, data
processing and control software, liquid handling devices and detectors, and HTS data is
characterized by diverse nature of chemical classes in a high volume [43]. For instance,
the experimental assay plate preparation for HTS generally has either 96, 384, 1536, or
3456 wells within one microplate (it is multiple of 96 because the standard size of an
experimental plate is 8x12); as the techniques improve, microplates for assay preparation
have higher density and smaller volume. To prepare the assay, each well is processed by
a prescribed treatment. The researcher analyzes the effect of treatment after the
incubation time to allow chemical treatment to complete or biological matter to absorb by
looking into the captured images from imaging systems. Currently, in the pharmaceutical
industry 50,000–100,000 chemical compounds can be operated by the HTS pipelines per
week; with ultra-HTS pipelines in some labs the throughout rates could be 1,000 times
higher [42, 44-46].
18
The concept of HTS is still young since it was conceived at the end of the 1980’s,
while the first paper published in PubMed is in the early 1990’s. The process was focused
on genomics and combinatorial chemistry of a small molecule library in a 96-well format
plate [47]. During this period the development of related technologies built the
foundation of techniques such as high-density microplates and laboratory automation.
However, the quality of overall assay performance was limited since many efforts are
mainly focused on setting the process, not against evaluating the chemical collections as
drug compounds. In the late 1990’s the attention of HTS was shifted to using pure
drug-like compounds, and researchers kept improving the control quality by using
combinatorial chemicals, such that the compound collections could target at enrichment
of representative structures showing whether the compound collections can act as a hit
drug. If one hit is captured, the compound collections, which are a family of protodrugs,
could be extended to a larger cover of compounds [44, 47]. When the 21st century begins,
the boosting techniques in availability of human genome sequence drove the HTS to
potential human drug targets to identify the hit as drug-likeness filters. The work of
Ricardo Macarron et al provides a brief discussion of a number of examples in approved
drug discovery in HTS hits from 2003–2009 [48]. The related studies cover areas of
solution-based biochemical assays and cell-based assays.
Recently, HTS of chemical libraries has been used to identify compounds that
enhance peroxisome assembly in cells obtained from patients suffered from PBD, which
motivates this work. The assay was designed based on the subcellular distribution of
green fluorescent protein harboring a peroxisome targeting sequence (GFP-PTS1)
expressed in PBD patient skin fibroblasts. Promising compounds were prioritized based
on percentage of treated cells containing GFP-positive punctate structures, which serve as
a surrogate marker, meaning that the protein imported into peroxisome was now
functional. Although previous results of initial HTS were extremely promising, all image
analysis was done by manual inspection due to drawbacks of existing software. Although
it is now feasible to acquire large-scale image libraries of drug-treated cells,
technological issues involving limited image quality such as noise level, cellular
19
morphology distortion and cell density provide serious challenges for data analysis.
Highly efficient and sensitive methods to screen treated PBD patients’ cells for
peroxisome assembly are required to apply the assay for large-scale screening of
chemical libraries. To acquire accurate HTS results, besides advanced microscopy
imaging techniques, robust image processing and analysis approaches have to be
developed to perform automatic investigation of cellular context from structural and
functional point of view.
The most closely related work to PBD treatment using HTS is Sexton et al.’s work
[22]. Their assay used the human hepatocellular carcinoma cell line with normal
peroxisomes and their target is a peroxisome proliferator that increases peroxisome
biogenesis so they tagged peroxisomes with an enhanced peroxisome targeting
fluorescent reporter (EPTFR) in liver cells. EPTFR expresses a GFP variant, when
excited, appearing as green puncta. Increases in peroxisomes will be visible as clumps of
green puncta in contrast to scattered ones in negative controls. The difference can be
distinguished by measuring intensity of green fluorescence in each cell. Their assay is not
suitable for our purpose because their reporter works specifically in liver cells, which
requires intrusive surgeries to obtain, and our target is a peroxisome rescuer for PBD
patients with defective peroxisomes, where EPTFR is not applicable.
Another assay to detect peroxisome rescue is to stain catalase and PMP70, markers
for peroxisome matrix and membrane proteins, respectively. We used this imaging assay
to prove that G418-treatments of PBD patient fibroblasts with PEX gene nonsense
mutations can rescue their peroxisome assembly [49]. However, this assay requires
additional immunohistochemistry and would not be useful for high-throughput screens.
2.2.2 Mitochondria Segmentation
Image segmentation is one of the main research branches in image processing and
medical-imaging applications. It facilitates the recognition of objects or regions of
interests in biomedical images in an automatic way that ideally mimics human
segmentation. The current segmentation approaches include thresholding [50], region
20
growing [51-53], classifiers [54], clustering [55, 56], Markov random field models [57],
deformable models [58, 59], and other approaches. Those studies strive toward improving
the accuracy, precision, and reducing computational efficiency and human interaction,
which improves the usefulness of segmentation in clinical applications. Recently,
computer-aid of diagnosis has been applied to the study of mitochondria segmentation,
which is important in evaluating human healthy conditions by investigating
mitochondrial fusion and fission dynamics [27].
As described in Chapter 1, mitochondria are important subcellular organelles whose
morphology and distribution have biological significance in evaluating pathological
conditions. Abnormal fusion and fission dynamics could be an important indicator of
early diseases associated with neural dysfunction and neuro-degeneration [60-62]. As the
mitochondria dynamics plays a significant role in balanced cellular life and death as well
as disease states, powerful techniques are required to evaluate the morphological
distribution (length, area, perimeter, et al) of mitochondria in each individual cell. The
earliest approach is to use manual labeling to count the mitochondria objects in each
subtype, which wastes much effort from biomedical experts and the results is less reliable
due to the biased individual judgment criteria from person to person. Moreover, such
non-automated work is usually highly time-consuming and it cannot fully accommodate
the diversity of mitochondrial morphologies in different cell lines. In conclusion, manual
segmentation does not perform well in a large quantity of cell samples, and the results are
very likely to be inaccurate and subject to user bias. Therefore, there is urgent demand to
explore an automated computational tool to evaluate morphological characters of
mitochondria objects, and the key step is to develop robust algorithms or approaches to
allow accurate segmentation of mitochondrial structures in diverse cell types.
In related research work, the existing mitochondria segmentation approaches include
a decision tree-based software approach [27], three-dimensional segmentation techniques
for an EM image stack [30, 63], texture or morphology based classification methods [28],
and machine learning approaches [64]. However, those approaches all have limitations.
The decision tree analysis realizes automation but only works for a number of
21
mitochondria structure subtypes. Three-dimensional EM images involve a large number
of objects with various shapes. The intensity alone cannot be used to identify a given
structure and related techniques require a priori knowledge of image textures and object
shapes. Texture based methods are suffering from insufficient features and the machine
learning approaches are not performing robustly due to the errors in connecting
neighboring mitochondria objects.
Within all existing segmentation approaches for mitochondria segmentation,
Jyh-Ying Peng’s work [65] explores an approach called Local Adaptive Thresholding
(LAT) to define a local neighborhood area of a certain center pixel to do double
thresholding, and so is able to reduce the false segmentation due to background noise and
intensity variation. The intuition is that the selected local neighborhood area should be
large enough to contain both foreground and background pixels but not too large to
contain objects far away in the raw images. The proposed LAT algorithm fits for
mitochondria fluorescent microscope images than other kinds of imaging schemes,
however, it still generates errors in false segmentation when the pixel intensity of
foreground mitochondria object is blurred with neighboring background pixels.
2.2.3 Evaluation of Capsulorhexis Surgical Techniques
Cataracts are a clouding or changes in the surface of the human eye lens. It appears at
birth or later due to aging of the crystalline lens. Without proper surgical treatment,
cataract may result in blindness. However, at the beginning, most people do not pay
enough attention since cataracts are usually less painful than other vision related issues
and develop slowly with no apparent reason. Nevertheless, it eventually impairs the
patient’s vision. Fortunately, today cataract surgery techniques have become mature in
both effectiveness and safety. With a new artificial lens implanted, vision can be
corrected and the risks of complications are largely reduced as well [66].
In order to replace the eye lens, cataract extraction surgery includes two subtypes:
intracapsular and extracapsular. The intracapsular technique is rarely used today; the
extraccapsular technique with a small incision is preferred for fewer postoperative
22
complications [67, 68]. Capsulorhexis is an important technique to remove lens capsule
in extraccapsular cataract extraction. It is also called Continuous Curvilinear
Capsulorhexis (CCC) as a round tear-resistant opening in the capsular is desired at this
stage. CCC greatly improves the safety of the whole cataract surgery and enhances the
surgeon’s capability in placing lens afterwards [69, 70].
However, capsulorhexis is not easy to perform and a complete and successful tear is
considered as one of the most difficult maneuvers. Thus, CCC is one of the key elements
of cataract surgery. Thus to goal of this topic is to reliably evaluate the surgical skills of
medical residents who are being trained. During the surgical training, the residents’
performance is recorded in video and is graded by experienced ophthalmologists. Ref [69]
lists four steps to perform a standard CCC. For our work they are reduced to two main
steps, as shown in Figure 1:
1) Perform the tear at the center of eye and try to create a flap in the capsule.
2) Grasp the flap to complete a curvilinear tear.
Figure 1. Illustration of capsulorhexis
The top row illustrates step 1, while bottom row illustrate step 2. Note that the 2 steps require
different surgical instruments. (Figures are adapted from Ref [110].)
23
Intuitively, capsulorhexis, or CCC, requires a number of attempts to complete the
tearing an opening in the capsule. This provides a method to evaluate the capsulorhexis
surgical performance. In the past, the evaluation was based on counting prior cases for
residents. As mentioned in Chapter 1, ACGME and American Board of Ophthalmology
have mandated the scientific development of numerical scoring system for cataract
surgical techniques. The recorded capsulorhexis video clips are processed as data input to
such evaluation system. Evaluation tools such as GRASIS and OSCAR [31, 32] provide
subjective evaluation questions designed to facilitate the measurement of surgical
technique and knowledge. Descriptions of the performance are offered together with a
continuous Likert scale for each question so that graders could judge the technique on
based of the description. Experimental results show that both surgical evaluation
protocols provide instruction by experts to residents and accelerate the learning of
surgical techniques [32]. However, recent work has reported that the subjective
evaluation produces unexpected errors when graders misunderstand the grading
procedure. In same cases, the description of the Likert scale does not provide
distinguishable information and confuses the graders [33]. Therefore, those questions
introduce high inter-observer variability and limit the evaluation quality.
As a result, advanced refinements of surgical evaluation tools should aim at
objective and reliable assessment while lowering the costs.
2.3 Challenges
For the first topic of applying HTS for efficient drug discovery for PBD, HTS is
introduced as an approach based on imaging assays and allows acquiring large-scale
image libraries of drug-treated cells for the measurement of large-scale detailed cellular
responses to chemical and biological modulators. However, it is still challenging for data
analysis to address technological issues involving limited image quality such as noise
level, cellular morphology distortion and cell density. In our work, we present our
approach to address those image processing challenges in order to facilitate HTS of large
chemical libraries for compounds that promote peroxisome assembly in PBD patient cells.
24
The challenge in this application of HTS lies mainly in how to accurately detect punctate
structures as the contrast of peroxisome structures and their background can be so weak
and blurry that a less robust image segmentation algorithm could fail to detect the true
peroxisome structures, which would lead to wrong analysis of successful rescue of
peroxisome assembly by a treatment.
For the second topic of mitochondria segmentation, the challenges lie in the regions
in image data of mitochondria that are difficult to tell even under 100× objective:
1) Large overlapped regions, e.g., closely located areas along the nuclei’s
boundaries;
2) Crossing areas of multiple foreground objects.
In those regions, some dim pixels are shown and considered to be background, while
in fact they should be pixels for foreground. Moreover, since the observation data is
gray-level image, the information could be used is only derived from gray-level intensity,
which enriches the difficulty of the work. We would show that the LAT algorithm
generates errors in such regions.
There are two more problems that are hard to solve. First, the context around the cell
core is hard to decide or separate even for biomedical researchers. The reason is that, for
a cell sample, other regions can be flat or thin, but the regions parallel to the core are
usually not. The around-core region has many accumulated cellular organelles or
materials, such as protein or DNA, so that in the Z-direction the fluorescent intensity is
affected and is relatively higher in this region. As a result, mitochondria objects in this
place are hard to separate. Second, since the imaging system is a confocal
micro-fluorescent system, it is possible that some mitochondria objects are not imaged
clearly in one focal plane and show as blurred regions or noise. However, they are
considered as target objects (or ground truth) that are labeled by human experts. The
human observers usually adjust the microscope to focus on a local part of the image to
help with the labeling. Current computer algorithms cannot perform this interactive focus
adjustment and will probably generate errors in these complex regions.
25
As for the last topic, the analysis and processing of capsulorhexis video data to
identify foreground object motion is challenging due to inhomogeneities in luminance,
eye location, and color information (chrominance) as the surgery proceeds. In addition,
there is a great diversity in instruments, their movements, the chrominance, and contract
information of the limbus and pupil of each patient. Part of our work involves selecting
basic color spaces. To express and analyze chrominance information, color theory
provides a number of color representation models, and the efficiency of video analysis is
directly affected by the choice of models. And the general framework should aim at using
image processing approaches to let the program train a model in order to do accurate
prediction on the new data sample. Finally, the robustness of any algorithm should be
tested not only on one video but also for all other provided video clips.
Considering all the challenges above requiring exploration of deep understanding in
biomedical image/video information, it is reasonable to believe that the learning-based
approaches are the most appropriate direction to explore solutions in order to find
powerful computational tools for quantitative analysis and accurate results.
26
Chapter 3 High-Throughput Drug Screening for
Peroxisome Biogenesis Disorders
3.1 Introduction
Peroxisome biogenesis disorders (PBDs) are autosomal recessive disorders caused by
defects in PEX genes required for normal peroxisome assembly [21, 71-74].
Approximately 80% of patients fall within the category of Zellweger spectrum disorder
(PBD-ZSD), which has an overall incidence of 1:50,000 US births [75]. While severely
affected patients show profound mental retardation and die by one year of age, the
majority of patients have milder forms of disease compatible with survival through
adulthood [72-74]. To date, disease management is only supportive in nature. Impending
newborn screens for peroxisomal disorders will increase the demand for effective early
treatments prior to disease progression [20, 76-78]. Ongoing pilot newborn screens in
Minnesota and Maryland are based on VLCFA levels in dried blood spots and can
accurately detect individuals with PBD-ZSD. Since patients with milder forms of disease
still have brain and liver dysfunction as well as complete hearing and visual loss later in
life, more effective therapeutic interventions could have a major impact on their
longevity and quality of life.
In a previous study a pilot high-throughput screening (HTS) of 2,000 chemicals is
applied to identify compounds that enhance peroxisome assembly in cells obtained from
PBD patients [79]. As a baseline, this study shows a predominantly cytosolic distribution
in patient cells. Promising compounds were prioritized based on the percentage of treated
cells containing GFP-positive punctate structures, which served as a surrogate marker
that protein import into peroxisomes was now functional. Confirmational assays based on
27
enzyme activity and endogenous peroxisome protein import validated the HTS assay. The
assay detects downstream recovery of peroxisome import and is valuable in that all
possible recovery mechanisms can be captured, including unanticipated ones. The initial
HTS results were extremely promising and identified two classes of compounds
suspected to act as PEX1 chaperones. Figure 2 shows example images of patient
fibroblasts under different treatments.
Untreated 200 mM TMAO 100 mM Betaine
(a) (b) (c)
Figure 2. Confirmation of small-scale drug screen hits
PEX1 G843D/I700fs PBD patient fibroblasts were treated with (a) a negative control (DMSO) or
(b) the chemical chaperones TMAO and (c) betaine. The punctate structures found in the TMAO
and betaine-treated cells represent peroxisomal import of the GFP-PTS1. (b) shows a cell with
perfect import while the cells in the (c) show a partial import and no import (a) that are
challenging to distinguish automatically.
To accelerate the discovery, we further scaled up and extended the existing assay for
very high throughput screening. A software pipeline called Peroxitracker is built based on
image processing approaches, which will be also useful for studies of other complex
diseases related to peroxisome dysfunction. Peroxitracker is substantially different from
existing software for cellular image analysis because detecting reporter signals poses
unique challenges to state-of-the-art cellular image analysis algorithms. The GFP-PTS1
reports protein translocation from a diffuse cytoplasmic stain in PBD fibroblasts to
punctate structures in drug treated rescued cells. Ideally, in a cell with successful
peroxisome rescue, all GFP-PTS1 will import into peroxisomes, presenting clearly
discernible bright puncta in front of a dark cytoplasmic background, as shown in Figure
28
2(b). We note that the large oval-shaped green fluorescent object is the nucleus of the cell
while the peroxisomes are those puncta around the nucleus.
However, in reality, degree of rescue varies among different cells within the same
well and hence the amount of GFP-PTS1 being imported into peroxisomes. Some green
fluorophore may still remain in nucleus and cytosol, resulting in a green fluorescent
background behind green fluorescent peroxisome granules (see the cell body in Figure
2(c)). As a result, the contrast of peroxisome granules and their background can be so
weak and blurry that a highly sensitive speckle detector is required to detect them.
Meanwhile, the cell shown in Figure 2(a) has rare or no peroxisome rescue, but high
concentration of GFP fluorophore not importing into peroxisomes may distribute
unevenly in cytosol, leaving tiny bubbles and spotty texture appearing similarly to a
positively responding cell in Figure 2(c). Although the difference is still clearly
discernible by well-trained human eyes, for automated software it is quite challenging
because a speckle detector sensitive enough to identify punctate structures in Figure 2(c)
may be too sensitive and falsely considers the cell in Figure 2(a) as responding positively.
Moreover, uneven background illumination, confluent cells and occasionally
contamination during high-throughput imaging can exacerbate the challenge. An
effective pipeline should accurately detect green fluorescent peroxisome granules from
green fluorescent cytosol with spotty texture for HTS.
In this chapter, our work presents a pipeline for high content screening of
peroxisome assembly rescue in order to evaluate the drug targets that can function as a
peroxisome proliferator for PBD. In the software-based pipeline, mean shift algorithm is
applied for illumination correction of digital images and adaptive thresholding is applied
to extract peroxisome punctate structures. Post processing is designed to select images
that provide credible information. By applying machine learning, we can automatically
classify the cells into four types: positive, negative, questionable, and non-candidate cells.
Our pipeline reached a Z’ factor of 0.44, which basically validates the pipeline as a
validated assay for real drug screening.
29
Another pipeline called Peroxitracker is designed for analyzing all image data
observed from the same experiment. Feature extraction is even more improved by
applying feature-enhanced processing methods. Our results show that, compared to
existing software, a prototype Peroxitracker is more sensitive and can reliably detect
recovery of peroxisome assembly in PBD patient cells and is ready for automated
screening of large-scale chemical libraries, since the Z’ factor can achieve 0.72.
3.2 Methods
3.2.1 Data Collection
The assays of our validation plates are essentially the same as that described in [77].
Immortalized human fibroblasts from a PBD patient were transfected with GFP-PTS1.
The positive control compounds include TMSO and Diosmetin. DMSO vehicle control is
used as the negative compound. Cells were imaged on a GE IN Cell Analyzer 2000 using
a 20X/0.45 ELWD Plan Fluor objective. Excitation wavelengths DAPI (350 nm) and
FITC (495 nm) and emission wavelengths DAPI (470 nm) and FITC (520 nm) with 2-D
deconvolution image processing were used to image the Hoechst stained nuclei and
GFP-labeled PTS1, respectively. All of our validation plates are in 1536-well format.
Each well contained 40 to 400 cells and was imaged with 2 channels: Hoechst and Green
Fluorescent Protein (GFP) channels, which provided context of nuclei and peroxisome
assembly in cytoplasm, respectively. An additional Texas Red channel was imaged
simultaneously to outline the boundary of cells but not used in the analysis.
3.2.2 Software
CellProfiler/CellProfiler Analysis
We did try to use available software packages to complete our analysis, especially
CellProfiler (CP) and CellProfiler Analysis (CPA) [80]. CP is very powerful with a large
library of available image processing algorithms put together into a package with an
easy-to-use graphical user interface that guides users step-by-step to create an analysis
30
pipeline. CPA is for users to create a cell type classifier by supervised machine learning
through a graphical user interface, which allows a user to select features of cells as
training examples and estimate generalization performance of the classifier. However,
due to the unique challenges, innovations in algorithm development are required to
provide a robust and scalable solution to our screens.
GE INCell Analyzer
GE INCell Analyzer is a commercial software system designed for high-throughput
cellular and subcellular image analysis and screening for both fixed and live cells. GE
INCell can detect contours of individual punctate organelles and nuclei and can be
configured to output various parameters for each well. These parameters are aggregate
statistics of the properties of the detected puncta and nuclei in a well. They are useful but
are not direct estimates of the percentage of positively responding cells in a well that we
need. The closest are the parameter “count”, the average number of punctate organelles
per cell, and “area”, and the average total area of cells composed of puncta.
3.2.3 Peroxitracker Pipeline
The key to our success is a preprocessing step that smooths the image into regions of the
same intensity, which essentially levels out noisy spotty texture but preserves punctate
structures of true peroxisome granules. Also contributing to our success is our
graph-based approach to well scoring [89]. Figure 3 shows the pipeline for image
analysis. We tried software for cellular image analysis to compare their performance,
including IN Cell Analyzer, and open-source CP/CPA and ImageJ [81]. It will show that
the most successful results were from our prototype, which integrates open-source and
our own programs as components in the analysis pipeline.
Image Enhancement and De-Noising
This preprocessing step is critical to the success of the following steps of object
identification and analysis, including uneven illumination correction and noise removal
from the image. Though a large number of methods for related purposes are available, it
31
is not trivial to select and tune their parameters to yield optimal results. Some image
denoising methods designed to suppress salt-and-pepper noise may also suppress target
peroxisome punctate structures, which can be as small as two to three pixels, while image
enhancement methods may falsely enhance both signal and noise. To conquer this
problem, we designed our program to apply the contrast enhancement and the mean-shift
filtering algorithm in a row. Histogram equalization is a fundamental technique of image
enhancement. It improves the image contrast by stretching the image histogram so that
the intensity values span to a desired range. In our optimization, 35% of pixels in the
image are set to be saturated and displayed in black or white.
Figure 3. Analysis pipeline of Peroxitracker
Shaded boxes represent data and other boxes are processes. The pipeline consists of three main
parts. (1) Preprocessing: enhance and denoise input images from the three channels, segment
and extract objects of nuclei and peroxisomes in the images. (2) Content screening: call the
response of each cell according to the morphological and image features of the extracted nucleus
and peroxisomes. (3) Post-processing: filter out outlier wells that are either with too few healthy
cells or of low quality and assign a score of cell responses to each well according to cell type
calling.
32
In addition, we used the mean shift (MS) algorithm for illumination correction,
which aims at reducing the inhomogeneity in background intensity. MS clusters nearby
pixels into regions of similar intensity and in effect smoothen an image while preserves
edges [82, 83]. It starts with a set of data points, and creates a fixed-radius window for
each. Then it iterates over each window, generating a mean-shift vector which points in
the direction of the maximum increase in the local density function. Each window is then
shifted through the vector to a new position, and the iteration resumes. As each window
reaches a local maximum in the density function, the iteration completes, and the vector
becomes negligible. Since MS can prune the image by retaining the local maximum,
important edges can be easily detected after applying MS.
Figure 4(a) and (b) show the result of removing unevenness of background in the
original microscopy image. It is obvious that after applying MS the non-uniformities in
background are reduced. Figure 4(c)-(e) compare the result of keeping peroxisome
context after illumination correction by estimating a correction function and by MS. The
punctate structures appear blurred after correction In Figure 4(d) but are retained in
Figure 4(e) after applying MS with initial spatial and color radius of 3 and 25,
respectively.
(a) (b)
(c) (d) (e)
Figure 4. Illstation of illumination correction
33
(a) Unevenness of background intensity in enhanced digital image, (b) result of removing
unevenness in (a) by MS, (c) original image from GFP channel, illumination correction of (c) by
(d) calculating a correction function, and (e) applying MS.
Nucleus Detection
Fully automatic cell counting in a cellular image is still considered an unsolved problem
in cellular image analysis because existing methods are sensitive to changes of the
parameters, and for some methods, slight changes may lead to drastically different counts.
Manually optimization for a given assay is usually necessary [84]. Peroxitracker uses the
Gaussian filter as a convolution mask and then using the histogram of the intensity to
estimate a gradient as the global threshold to binarize the image, where bright regions
will be the nuclei. We then applied a Laplacian filter to determine the center of each
detected nucleus. Detected nuclei that have uneven staining, are somewhat rippled, or
have a wavy shape will be excluded from analysis because those are most likely
unhealthy or dying cells.
Peroxisome Detection
The most challenging aspect of this informatics assay is reliably detecting puncta that
indicate rescued peroxisome granules from the GFP channel. We tried two approaches to
this challenge. The first is local adaptive thresholding (LAT) that is developed in [65].
Figure 5 shows an example of peroxisome extraction by using LAT.
(a) (b)
Figure
5.
Peroxisome extraction by using LAT algorithm
(a) Raw image containing nucleus and peroxisome; (b) the extracted peroxisome context using
LAT.
34
This method has been successfully applied to segment mitochondria, which may
have many different shapes (e.g., small globules, short tubules, or large connected
networks), in fluorescent cellular images [85]. This method is superior when the target
structures are in front of varying levels of background intensity, as frequently observed in
fluorescent cellular images. We optimized the parameters so that they are sensitive to
small punctate structures.
The second approach is by applying the morphological operator called the Top-Hat
Filter [86]. To use the top-hat filter as a detector of peroxisome granules, we first apply it
with a disk-shaped structuring element to all images and plot a histogram of their
intensity to determine a cut-off that keeps the brightest 0.5% of pixels. These bright
pixels are presumed to be peroxisome granules. One tricky thing about the top-hat filter is
to set up the structuring element size. We often have it changed based on the images we
have. Since the structuring element directly affects the raw output signal that is filtered
out, we wish the size of element to be the same as the size of the punctate structure,
which is usually 3 pixels in our trials, and hence we can reduce the noise as much as
possible.
Measurement of Objects
For each detected nucleus, we measured various features of the detected peroxisomes
detected near it, including count, size, intensity, and spatial distribution. Each feature is
normalized to the range [0, 1]. We also characterized the shape and texture of each
nucleus. These features allow us to determine whether the cell actually responds
positively.
• Count: The counts of the connected objects formed by the bright pixels.
• Size: The total area of the bright pixels.
• Intensity: The net intensity of all the detected objects in the original, non-binarized
tiles.
35
• Spatial Distribution: Statistics of mutual distance between detected peroxisomes,
whether they sparsely spread or clump together [87].
The spatial distribution is characterized by a set of features with 5 sequential values.
Assume that there are 𝑁
!
separated punctate objects in the initial image. While
expanding their sizes by 4 pixels each time, the objects start to connect with each other
and the amount of separated objects decreases. Eventually, all the objects collapse into
one. Our feature set, 𝑓
!"
, contains the values of the first 5 expansions, which can be
written in the form
𝑓
!"
!
=
!
!
!!(!)
!
!
!!
, 𝑡= {1,2,3,4,5}. (3.1)
Cell Type Classification and Well Scoring
This step is to classify the sample cells into one of their response types. A positive cell
responds positively to a drug treatment, showing clumps of fluorescent punctate
structures around the nucleus, while a negative cell lacks successful PTS1 translocation
to peroxisomes and shows few or no punctate structure in the cytoplasm, and diffuse
cytoplasmic GFP staining should be evident. However, it is difficult to find a clear cut-off
to divide the two types, as many cells respond partially. Our guidelines for calling the
cells is that if a cell contains 20 or more fluorescent peroxisome granules then it is
positive; otherwise, if it has less than 5 rescued peroxisomes then it is most likely
negative [75]. This guideline leaves ample room for human interpretation when the count
of puncta is between 5 and 20. Also, the size and intensity of a puncta need to be
considered as well as the intensity of cytoplasmic GFP staining in the background.
To score a well, the guideline is to calculate the percentage of the cells that respond
positively. As defined by the biomedical researchers, if the percentage is higher than 20
then the drug used to treat that well will be considered a hit. However, there are wells
containing too few cells as a large enough sample for this percentage as a useful score
and must be excluded from the analysis.
36
According to the measured feature values of detected nuclei and peroxisomes, a cell
can be classified into one of the response types. However, this binary classification
approach fails to consider partial recoveries and may fail to distinguish a hit from
ineffective ones and miss it. Yet it is infeasible to solve a regression problem to learn a
scoring function of observed image features. We previously developed an approach to
quantifying partial fragmentation of mitochondria in cellular images based on a
graph-theoretic method [88, 89, 108]. Later, we developed a new method called
FABS-NC’ to enhance that approach and greatly improved the efficiency and accuracy of
the quantification [89, 90]. The general idea of FABS-NC’ is illustrated in Figure 6 (from
Ref [89]). FABS-NC’ wires all sample cells as a huge graph where the length of the links
are proportional to the feature similarity between pairs of cells. FABS-NC’ efficiently
searches for the “minimum cut” in the graph -- a set of links between cells the least
similar as a whole. The cut separates positively responding cells and negative ones. The
proportion of cells on different side of the cut naturally gives a useful score to rank the
effectiveness of the drug.
(a) (b)
Figure 6. Illustration of improved quantification to score a well
(a) The data samples associated with two different controls (positive control R1 and negative
control R2), and four different drugs (drug A, B, C and D). (b) The binary classification of data
samples based on the cut (dash curve): if R1 contains positive controls and represents cells in a
normal healthy state with mitochondria fully rescued from the completely fragmented state, while
R2 contains negative controls and results in the completely fragmented state of mitochondria for
toxicity criterion, then FABS_drugA = 1, FABS_drugB = 2/3, FABS_drugC = 1/3, and
FABS_drugD = 0. The ranking of the drugs will be: A >> B >> C >> D, where x >> y suggests
37
that x is effective than y. (Figures and illustration are adapted from Ref [89].)
3.2.4 CP/CPA Pipeline
The software CP and CPA are used to establish a second pipeline with all the same
components shown in Figure 3. The configuration is reported in details in the next
paragraph.
Nucleus Detection
CP provides a wide range of object segmentation methods suitable for detecting nuclei. In
its IdentifyPrimaryObject module, there are six methods to choose and each can be
coupled with one of three thresholding methods -- Global, Adaptive or Per-Object. It is
found that Background Adaptive provides the best performance. CP tended to
overestimate the number of cells in an image. Figure 7 shows examples of donut-shaped
nuclei that are counted as two by CP. Cells preparing to enter mitosis and divide may
appear concave at sides also cause CP to count them as two, even though we exhausted
different options attempting to address the problem. In contrast, Peroxitracker is free
from such problems.
(a) (b)
(c)
Figure 7. Illustration of miscounting the nuclei by CP
(a) Raw image from the Hoechst channel. (b) Result of nucleus detection by CP, shown as a
38
number on top of a detected nucleus, where 63, 69, and 77, 80 should be one object. (c) Result by
the cell detector of Peroxitracker, where a green box marks each detected nucleus.
Peroxisome Detection
CP offers the Speckle Counting function for users through its image-processing module
called EnhanceOrSupressFeatures, where the top-hat filter is available to enhance
speckles. The result is included in the feature set for cell type classification.
Cell Type Classification
CPA applies Gentle-Boost as an effective supervised classifier-learning algorithm [90].
We configured CPA to train four Gentle-Boost classifiers to classify cells into four types:
positive, negative, questionable, and non-candidate cells. Non-candidate cells are dead
cells or non-cell objects. The optimal number of rules is determined by cross validation.
The generated rules are then used to classify all cells and calculate N
p
, N
n
, N
q
and N
non
,
which are defined as number of positive, negative, questionable, and non-candidate cells.
Figure 8 shows representative examples of each cell type. 200 to 400 cells are
carefully selected as training examples for each cell type, except for the questionable type,
for which there were not that many. The training usually converges to an 80%
cross-validation accuracy after 25 rules were learned. The optimal number of rules lies in
the range of 50 to 90.
(a) (b)
39
(c) (d)
Figure 8. Representative examples of 4 types of cells observed in the experiment
(a) positive cell, (b) negative cell, (c) questionable cell, and (d) non-candidate cell. The small
squares show the center of the nuclei of the representative cells.
Outlier Well Filtering
High quality images are very crucial to generate accurate HTS result that is close to the
“ground truth”. For instance, in a high quality image, a cell should not overlap with
neighboring cells so that the exact number of cells in each well can be accurately
estimated. In fact, overlapping cells are usually unhealthy or died and their response to a
treatment is not reliable for the purpose of drug screening. Not overlapping with each
other is also helpful to determine to which cell a peroxisome punctate belongs to so that
classification of positive or negative cells can be unambiguous. In this part of the pipeline,
we apply criteria to filter out low quality images to ensure that the result is reliable. The
criteria will be applied to all images regardless of their treatments.
The criteria are designed based on the population of cells in each type after
classification, that is why they cannot be applied as a preprocessing step but applied as a
post-processing step. They are described as follows:
• N = (N
p
+N
n
+N
q
+N
non
) > 100
• N < 450
• N
non
/ N < 20%
Wells must satisfy all three criteria to be considered a high quality well and will be
scored. Other wells will be filtered out and removed from the scoring. The 1st and 2nd
criteria indicate that the total number of cells within a well should not be too small or too
40
large, both of which are problematic and may bias the statistic accuracy. The 3rd criterion
requires that the cells in a selected well should not suffer from a toxication or mutation
during the experiment.
3.3 Results and Discussion
We created a gold standard to evaluate the performance of various software systems.
Since it is too expensive to manually annotate the image of each well in an entire
1536-well plate, we randomly selected 10 wells from one of the validation plates. These
10 wells consist of negative control wells and wells treated with different concentrations
of the positive control, as Table 1 shows. R-06 should be the most positive well as
expected with the knowledge of our assay; while R-04 should be the most negative one.
Table 1. The comparison of applying different processes on the 2012-01-26 plate
ID Human Peroxitracker Cutoff CPA
R06 0.8487 0.7616 0.7550 0.6815
U10 0.8483 0.6901 0.6127 0.7197
S10 0.8108 0.5989 0.6631 0.7976
R10 0.7333 0.5556 0.6543 0.6608
R12 0.5828 0.5988 0.5864 0.7284
R11 0.5298 0.3945 0.4602 0.3096
R18 0.5176 0.2956 0.4138 0.4393
R19 0.2314 0.2409 0.3636 0.2839
R03 0.0968 0.0652 0.3007 0.1850
R04 0.0202 0.0870 0.2843 0.1191
Figure 9 depicts the results for different software. Manual annotation results showed that
as expected, wells treated with a high concentration of the positive control contained a
higher percentage of positively responding cells and thus should be scored higher. We
then applied Peroxitracker, our informatics analysis assay, CP/CPA, and two scores by
GE InCell. Cut-off is a simple scoring method, which calls a cell as positive if the sum of
all feature values exceeds a cut-off value, determined empirical to maximize its
Spearman correlation with the result of human scoring. Spearman’s correlation
coefficient assesses how well the ordering of scores matches that of human scoring. The
results show that Peroxitracker outperforms other software systems, as the curves suggest.
41
We also used Pearson’s correlation coefficient and mean-square error to compare their
performance, though Spearman is still the most appropriate here because our goal is to
rank drugs accurately to distinguish a hit from a miss. We report here the performance of
Peroxitracker tested on a whole plate of cell images and compared to CP/CPA. We
evaluate how well our software can distinguish wells with cells treated by a positive
control from those by a negative one and use the Z’ factor as the assessment metric [91].
The Z’ factor measures the overlap between the scores of positive and negative controls.
For the assay to be useful, the Z’ factor should be at least 0.4.
Peroxitracker Cut-off CPA InCell/count InCell/area
Spearman 0.97576 0.96364 0.84242 0.96364 0.90578
p-value 0.000001 0.000007 0.002220 0.000007 0.000307
Pearson 0.95499 0.94558 0.92274 0.93992 0.91219
p-value 0.000017 0.000036 0.000142 0.000053 0.000234
MSE 0.13509 0.15388 0.12085 0.10482 0.17355
Figure 9. Performance comparison of different software for analysis
The curves show the ratio of positive responding cells in each well estimated by various software
systems and human counting. The table compares the results of the software with human scoring
by Pearson and Spearman correlations and MSE (Mean Square Error). Well ID and the
treatment – R-06, R-10, U-10, and S-10: High concentration of positive control TMAO 200mM.
R-11and R-12: Medium concentration of TMAO 150mM. R-18, R-19: Low TMAO 100mM. R-03
42
and R-04: Negative control DMSO. We evaluate how well our software can distinguish wells with
cells treated by a positive control from those by a negative one and use the Z’ factor as the
assessment metric [91]. The Z’ factor measures how much overlap between the scores of positive
and negative controls. For the assay to be useful, the Z’ factor should be at least 0.4.
The validation plate that we selected to report is in a 1536-well format but we used
only the first four columns, i.e., 128 wells. Among them, 64 contained cells treated with a
vehicle control DMSO (negative), 32 with high concentration (38 mM) of the positive
control compound. The other 32 are titration wells not considered here. These are the
control columns that we will use on each plate when we perform a real screen. This plate
was incubated with compounds for 48 hours so that rescues can become obvious.
Table 2 shows the final Z’ factors, as well as the means for positives (µ+) and
negatives (µ-) and their corresponding standard deviations (σ+,σ-). The results show that
Peroxitracker performed the best Z’ factor of 0.72 through its use of Mean Shift as a
preprocessing step and FABS-NC’ to search for a graph cut that elevated the scores of
positive wells. Its superiority becomes obvious when more wells and cells are considered.
“Cut-off” failed to achieve a desirable Z’ factor but it successfully suppressed the scores
of negatives (µ- =0.0008).
Table 2. Comparison of the discriminating power of different pipeline
µ+ σ+ µ- σ- Z’ factor
Peroxitracker 0.9785 0.0491 0.1869 0.0245 0.7213
Cut-off 0.2968 0.1353 0.0008 0.0044 -0.4163
CPA 0.5298 0.1097 0.1374 0.0422 -0.1610
CPA+filtering 0.4811 0.0599 0.1366 0.0316 0.2032
CPA+MS+LAT 0.5967 0.0781 0.0957 0.0359 0.3173
CPA+ ALL 0.5616 0.0589 0.0960 0.0359 0.4407
In comparison, the CP/CPA pipeline performed poorly in a negative Z’ factor value
even though we trained it with more than 900 cells. It is able to improve the CPA
performance by additional steps including filtering outlier wells (CPA + filtering) and
applying MS and LAT, a more accurate puncta detection algorithm (CPA+MS+LAT). By
applying well filtering as the post processing step, the Z’ factor increases by 0.13 in the
43
MS-LAT mode and 0.36 in the default mode. The best Z’ factor that we can obtain here is
0.44. It is in accordance with the fact that the peroxisome rescue in positives and
negatives are visually distinguishable for those wells after post-processing so that it is
expected that the Z’ factor by the automated image processing should be at least 0.4 or
more. Therefore, it shows that the criteria used for filtering is helpful and it is fair
because a number of high recovery wells in the positive controls and low recovery wells
in the negative controls are also removed.
It is also observed in our previous trial plates that false negatives always exceed the
number of false positives. That is, weak response is frequently observed in many of the
positive control wells. The reason is due to the fact that cells usually may become
unhealthy in a short period of time before they start to respond to a positive control drug.
The results shown in Table 2 also indicate that the percentage of positively responding
cells may not be sufficient in the positive control wells because cells respond
heterogeneously and the percentage fluctuates (with high standard deviation σ+) so much
that leads to poor discrimination for methods other than Peroxitracker. FABS-NC’ helps
us smoothen the fluctuation.
3.4 Conclusion
In this work, analysis pipelines are designed for high-throughput screening of candidate
drugs for PBD, which is built based on implementing and integrating various algorithms.
CP/CPA pipeline can reach an optimum Z’ factor of 0.44, while Peroxitracker reaches
0.72. It can be concluded that Peroxitracker basically validates the pipeline as a validated
assay for real drug screening, and performs superior to other provided pipelines with a
existing software based pipeline. In addition, the performance of Peroxitracker is the
closest to human-observed gold standard.
To improve the work, two possible directions are proposed: 1) set up the modeling
of the properties and features of the questionable cells to improve the machine learning
44
and classification; 2) to determine more efficient criteria to automate high-level post
processing of image filtering in order to make the assay more robust.
45
Chapter 4 Morphological Feature Learning for
Mitochondria Segmentation
4.1 Introduction
Mitochondria are important subcellular organelles, which function as cellular energy
factories that synthesize the majority of the cell’s ATP from the processing of nutrients
for cellular energy supply [23, 26]. As introduced in Chapter 1, mitochondrial fusion and
fission dynamics is essential to many cellular processes, such as maintaining the ATP
level, and plays an important role in maintaining a population of healthy mitochondria in
human cells. Therefore, dysfunctional mitochondrial dynamics in the fusion and fission
process may result in abnormal morphological configurations in mitochondria and after
the resultant statistics of mitochondria objects in each morphological subtype [27].
Mitochondria in such irregular states are unable to function normally in the tasks of
cellular energy production, cellular differentiation, programmed cell death, as well as the
control of cell growth by recycling products needed for proper cell functioning [23-27].
From clinical research work on mitochondria disorder diseases, the morphology of
mitochondria under healthy conditions is distributed among balanced structure subtypes
while that of mitochondria under absence of fusion and fission would show an excessive
number of small fragmented or long-folded structures, i.e., mitochondrial morphology
could be an indicator of healthy status of mitochondria [25, 27]. Thus, tracking the
mitochondria morphology has been involved in to analysis the specific experimental drug
discovery in HTS. Due to the lack of efficient computational tools to label the
46
mitochondria into user-defined subtypes and to count the number of mitochondria in each
subtype, much human effort from research experts are wasted to complete that
time-consuming and tedious job. For instance, human labeling has been applied to
explore the relationship between the change in mitochondrial morphology and the change
in human life span [92].
However, although human labeling follows specific rules, the manual work is
subject to individual visual bias, hence the labeling and classification results could be less
reliable. Moreover, even in some research work the morphology of mitochondria is
divided into subtypes of networked structure, fragmented structure and swollen structure
as shown in Figure 10 [27], there is no standardized definition of morphological subtypes.
Given those limitations, progress should be made to develop automatic segmentation
techniques for mitochondria according to the need for accuracy and robustness in
delineating mitochondria in morphological characters.
Figure 10. The morphological subtypes of mitochondria [27]
There are subtypes: networked, fragmented and swollen structures. The difference among those 3
subtypes lies in morphological characteristics.
The challenge of realizing mitochondria segmentation in fluorescent images lies in
the variation in morphological structures and background fluorescence blurring of object
47
edges. In the digital images, pixel intensity is the only information provided as
observation data. Our goal is to develop a computational tool that can quantitatively
classify and count mitochondria objects. Learning-based approaches would be suitable
for this task, since the mitochondria dynamics in fusion and fission are having a wide
variety of morphological structures there exist rules for classifying a consistent area of
pixels into one set to represent a single mitochondrion.
In previous work, multiple learning-based mitochondria segmentation methods are
used to generate reasonable results. A. Lucchi and his colleague used a trained
Gentle-Boost classifier to detect mitochondria based on textural features (2012, [30]).
Classifiers including k-NN, SVM, and Adaboost are used to classify cell samples form
SEM image stacks as proposed by S. Vitaladevuni (2010, [102]). Smith K et al used
shape features rather than texture features to detect mitochondria (2009, [103]). In 2011,
Irda Eva Sampe et al segmented mitochondria in fluorescence micrographs by SVM, and
the feature used is an array of fluorescent intensity of 4-connected pixels of the target
pixel. Also considering in learning solution, Yara Reis et al built a decision tree based on
exploring large-scale measurements of mitochondrial morphologies to classify the
mitochondria objects into the above 3 types (2012, [27]). However, all those approaches
are not robust enough for all types of mitochondria objects because of the weak
boundaries between foreground and background.
In our work, a 2-stage segmentation system combining computer vision and image
processing techniques is built for mitochondria image data observed from a confocal
fluorescence microscope. It is a novel computational approach aimed at investigating the
connection between systematic and unbiased quantification and classification of
mitochondria. Stage I is implemented with machine learning models for initial
segmentation, where pre-processing of image data and feature extraction are completed to
48
obtain predicted probability for each pixel as being foreground or not. Stage II is built for
connection of breakouts in mitochondria centerline segments, where degrades the
segmentation accuracy and related morphological characteristics. A cost function is
designed based on the human learning experience in order to minimize the false
segmentation.
4.2 Human Learning of Mitochondrial Morphology
Before introducing the machine learning approach, a summary of segmentation rules is
provided based on human labeling experience.
1) In human visual observation and judgment there are mainly three factors that affect
the determination of a foreground mitochondria object: continuity of morphological
track; fluorescent intensity; and spatial similarity compared to neighboring
environment, which are considered as human learning principle.
2) Continuity of morphological track provides evidence to connect fragments into
complete mitochondria objects. The mitochondria fragments might be represented in
breakouts of discontinuity due to local minimum (depressed) fluorescent intensities
within a small neighborhood area (with a diameter of about 1~2 pixels). Although the
trace of those fragments indicates that the local pressure of emitted photons is not
uniform, ideally the trace should be a complete structure and the fragments should
belong to one mitochondrion.
3) From the discussion above, intuitively one object should have uniform intensity
everywhere inside the object. This intuition can be used to segment a cluster of
neighboring mitochondria objects. For instance, if there are 2 globule-shaped
mitochondria (which are both of high intensity and are very close to each other) and
49
there is a small gap with very low intensity in between, then we can decide them as 2
objects with no doubt even they are in a vague neighborhood.
4) As for the spatial similarity, it is used to tell overlapped objects. The situation would
be more complicated since the cell sample being imaged is a 3D sample. That is, in
X-Y plane it is possible to have one pixel overlapped by multiple objects along the Z
direction. Spatial similarity can be used to recognize multiple objects in X-Y plane.
5) There are T-shape structures in images: T-shape structure is possible for long
mitochondria objects represented in long thread or tube (they can curve within cells)
structures, but is not a normal case for short mitochondria shaped as small globules.
Those rules concluded from human labeling experience would be very important in
error analysis of the mitochondria segmentation results, which would be shown later in
this chapter.
4.3 Methods
The structure of the 2-stage segmentation system is described in Table 3 with brief
description of each step, followed by the details discussed next. Both stages have error
analysis for current results.
4.3.1 Image Dataset and Ground Truth
The mitochondria image data (both observation data and human labeled data) is obtained
from Dr. David Chen’s lab at Caltech. The size of images is 1024×1024 and the pixel
intensity lies in a range of [0, 255]. The human labeled data, considered as ground truth,
is obtained by fixing the results processed by LAT algorithm [65].
50
Table 3. The 2-stage segmentation system
Stage I: Machine Learning for
Segmentation
Stage II: Centerline Extraction
Pre-processing
Denosing raw image data Step 5
Reduce the segmentation results via
globally optimal selection for
re-connecting breakouts based on error
analysis in Step 4
Step 1
Grouping the input data
Step 2
Morphological feature extraction in
multiple resolution spaces
Step 3
Train machine learning classifiers for
initial segmentation
Step 6
Combine the results in step 4 and step
6 for final output Step 4
Binarization
Error Analysis Error Analysis
4.3.2 Stage I of Segmentation System
Pre-processing
The 1024×1024 raw image is sliced into 1024 individual non-overlapped sample
patches in size of 32×32. The global Otsu threshold value (T
otsu
) is calculated to choose
sample patches that have foreground objects for sure [104]. For a certain sample patch, if
it contains at least one pixel with intensity larger than T
otsu
, then it is selected for
following steps; otherwise, it is considered as a valid input data sample and would not be
considered to be a background pixel only. In other words, those patches that do not
satisfy the condition are all considered as background patches. The pre-processing step
selects patches 100% correctly, even when the size of sample patches are change to
16×16 and 8×8.
51
Step 1. Grouping the Input Data Samples
After pre-processing step a number of sample patches are selected as input data to
segmentation system. For each patch, the intensity variance is calculated and compared
with the global intensity variance. For all patches, the contrast ratio covers a certain range.
The grouping of the sample patches depends on this range by dividing it into several
intervals while the number of patches in each group should be distributed as equally as
possible.
Step 2. Extraction of Morphological Features
Figure 11. Feature extraction within an individual patch
The features include 13 Haralick texture features and the intensity (highlighted in bold blue) of
the pixel located at the center of the 9×9 window.
For each group, the standard Haralick texture features of every pixel in every patch are
calculated. Each pixel has 13 Haralick features, which is shown in the table in Figure 11.
To calculate Haralick texture features, every pixel is extracted in all selected patches. For
each pixel, a local square window sized in 9×9 centered by it is selected. The standard
52
Haralick features are calculated for this local window and 13 features are generated.
Adding the gray-level intensity in raw image, the total number of feature input for raw
image is 14. Those 14 features of raw image, down-sampled image, and local
intensity-enhanced image are calculated separately. And predicted probabilities of all
pixels in each case are combined and for a second round of learning. We also tried to
extract the significant features in each learner and combine them for further learning, in
order to see whether the dis-connectivity problem can be solved.
Step 3. Apply Machine Learning for Prediction
For each group, a classifier is learned from a logistic regression model by using
Leave-One-Out cross validation, i.e., in every iteration one patch is used for testing data
while all other patches are used as training data in order to learn the rules for the
classifier. Logistic regression in binary response is selected to modeling the input feature
and to explain the effects of the explanatory variables on the binary response of
foreground and background. It is based on the logistic function (4.1).
𝑌=
!
!!!
!!(!)
(4.1)
𝑝
!
=
!
!!!
!(!
!
!!
!
!
!,!
!⋯!!
!
!
!,!
)
(4.2)
The logistic function fits our requirement in estimation of the probability of a pixels
being foreground as it can couple all values from negative infinity to positive infinity and
output the estimated results in values between 0 and 1, and hence is interpretable as a
probability. As shown in formula (4.2), since there are 14 features extracted from step 2,
then {𝑥
!
} would contain 14 entries and so is the set of regression coefficients {𝑤
!
}, which
would give a linear combination of the input variables (features). The output Y
considered as the estimated probability of the pixel being foreground lies in range of [0,1].
53
The output probability is then multiplied by 255 to form the pixel intensity as the
prediction result ranged from [0, 255] as 8-bit gray-level images. It provides the related
information to inform the researcher how large the probability of a pixel should be to be
classified as mitochondria object.
Step 4. Binarization of the Prediction Result
The global threshold method is then applied to make the result binary. Among the
available thresholding approaches, the classic Otsu’s thresholding approach [104]. This
method assumes that the whole image has 2 classes of pixels and search for a optimum
threshold that minimize the intensity variation within each class (i.e., intra-class
variation), and it also proves that the minimization of intensity variation within each class
indicates the maximixation of intensity variation between 2 classes (i.e., inter-class
variation).
4.3.3 Stage II of Segmentation System
Step 5. Centerline Extraction by Dynamic Global Optimization
The dynamic global optimization part aims implementing a searching approach to
bridge the disconnections for the breakouts of mitochondria objects. First, as the binary
image is sliced back into patches as previously and we calculate the line segments by
Hough and de-Hough Transform and line detection. Hough Transform is an operation
that transforms the image context under polar coordinates and searches for the maximum
peak values in curves, which are generated in polar space for each edge point in Cartesian
space. Since the work of the Hough transform is to determine both what and where the
edge features are and how many of them exist in the original image in spatial domain,
every peak indicates an intersection of locally maximum number of curves in polar space
and a set of collinear points in Cartesian space. For each patch five peak values are
54
selected in polar space, and collinear points are found for connection to span the
centerline of the foreground objects in special domain.
Thus all line segments would be considered as the candidates to connect in order to
form the centerline. Then we define a cost function and a threshold value. A cost matrix
would be completed with each entry being the cost value between two line segments.
And we pay attention to those cost value lower than the user-defined threshold.
Operations on the cost matrix are done by recursive iterations. In each iteration, we
select the smallest cost value and connect the 2 corresponded line segments in a way as
the cost function calculates. After the connection, the cost value between those two line
segments is set as infinity in order to avoid making the same selection again. The
selection and connection would terminate till there is no more cost value smaller than the
threshold. The intuition of applying a threshold value is based on the fact that there would
be no connection between two line segments if they are spatially too far away.
For our problem, a cost function is consisting of connecting distance, angular
mismatch and variance contrast between line segments between neighboring patches.
And each term is normalized by simply doing a calculation of
!"#$%!!"#
!"#! !"#
, where max and
min denotes the maximum and minimum values among all values. The optimal solution
would be the smallest cost via calculating cost function throughout all pairs of patches.
Here is objective cost function (4.3):
𝐶𝑜𝑠𝑡 (𝑖,𝑗) =
𝐼𝑛𝑓, 𝑓𝑜𝑟 𝑖= 𝑗
𝑎1∗𝑐𝑜𝑠𝑡1+𝑎2∗𝑐𝑜𝑠𝑡2+𝑎3∗𝑐𝑜𝑠𝑡3, 𝑓𝑜𝑟 𝑖≠ 𝑗
(4.3)
where (a1, a2, a3) are the related coefficients to cost1, cost2 and cost3, with each
item explained as follows:
55
1. Distance
Cost1 indicates the minimum Euclidian distance between two endpoints in a pair of
line segments.
2. Angular mismatch
Cost2 covers three parts: a) mismatch between angle of two line segments (in blue);
b) mismatch between bridge line (in green) and line segment 1; c) mismatch between
bridge line (in green) and line segment 2.
3. Variance contrast
Cost3 has been updated to a mathematical expression transformed from Z’ factor in
order to show how different the intensity between linespace and its surrounding disk is:
𝑐𝑜𝑠𝑡3=
!(!"!
!
!!"!
!
)
(!"#!
!
!!"#!
!
!!"#)
(4.4)
where std = standard deviation, b = bridge line, d = local disk, which is defined by
diameter = cost1. An eps is added in Matlab program to avoid division by zero error
when the two mean values are too close (eps is the minimum float value in Matlab). All
of the three components are derived from the human labeling experience described in the
last section.
Step 6. Form the Output Image
The final output image is a combination of binary images in Step 4 and centerline
extraction result in Step 5.
4.4 Results and Discussion
The raw image and human labeled image are shown in Figure 12(a) and (b), respectively.
It is obviously shown in the Figure 12(a) that the florescent mitochondria in cell
56
micrographs exhibits inhomogeneity in background intensity, signal-to-background ratio
(intensity contrast) and signal-to-noise ratio (noise level). Detailed structural features are
exhibited in different subtypes as fragmented, globular, swollen, twisted, branched and
networked structures. The comparison of the raw image and the ground truth can show
accordance with the human labeling results as shown in Section 4.2.
And in the pre-processing, the observation image data in Figure 12(a) has a total
number of 166 sample patches that satisfy selection rule. The ranges of contrast ratio for
each group are [0.187, 1], (1, 5), [5, 9.210] with the numbers of patches are 57, 43, and
66. Each group of data is the input to their own machine learning model to estimate the
probability of each pixel as being foreground.
(a) (b)
Figure 12. Observed and labeled data
(a) Observation image data, and (b) human labeled ground truth
57
Figure 13. Illustration of using feature input to train the logistic regression classifier
In Figure 13 the non-overlapped 32×32 patches sliced from raw images are shown
together with their down-sampling results. Down sampling is applied to maintain the
global structures in a coarse resolution space such that the original gray-level intensity in
256 bins is redistributed into 128 bins. Before texture feature extraction is applied, we
only use intensity in two resolutions as well as LAT-processed results as feature input to
the logistic regression model. And the output patches are shown in Figure 14, which
depicts a group of well-estimated data (upper row) as well as a group of predicted data
with errors, together with raw data and human labeled data.
The reason of errors lies in irregular object pattern of mitochondria and the
connectivity in relatively low intensity area in foreground. Although the patch size are
changed and 1
st
-order derivatives of intensity are added to try to capture more detailed
local structure and edge information, and the errors are analyzed after binarization. In our
result, it is shown that the accuracy is not improved a lot since the edge detection
58
operators in horizontal and vertical directions cannot contribute much, the p-value of
those 2 components are around 0.025, which indicates that those two feature input are
very unimportant comparing to the other three with p value less than 10
-22
. The reason is
that in both raw and low-resolution spaces the width of mitochondria branches where
connectivity is not remained in predicted data is usually a couple of pixels, and those
areas are having relative blurring edges hard to tell from background. Therefore, edge
detection operators can make mistakes in providing correct detection results and mislead
the classifier. Figure 15 shows the comparison between predicted data and ground truth
and the yellow boundaries show the errors. The reasons of errors are mainly due to two
aspects: false prediction of objects as small globules, which is induced by LAT algorithm,
and the dis-connectivity that is still from the local low intensity.
Then the intensity from patches in multiple resolution spaces and extracted Haralick
texture features are used as feature input. The model is updated to contain 2-layer
slogistic regression, and the resultant image after binarization is analyzed again based on
error map. The error analysis based on the error map shows the 4 types of errors:
• Disconnections as short breakouts (diameter of several pixels) in dendritic structures
• Missing complete branches since the classifier is sensitive to local intensity
• False foreground objects
• False foreground area, which is too close to true foreground structures
59
Figure 14. Comparison of different prediction quality
Figure 15. Error analysis after Stage I
The round region shows the false detection resulted in LAT algorithm, while the rectangular
region shows the error in breakouts in mitochondria segments when the foreground pixel
intensity is blurred with the background pixels. Note that LAT processed data is used as input
features.
60
The main error lies in the failed detection of foreground in connectivity issue. Since
many efforts have been made in building a connection to bridge the discontinued
mitochondria branches, they are all not showing a reliable and stable enough result
because the breakout misses too many pixels of dim intensity that are hard to recover.
The reason is rooted at the machine learning model, where all the patch candidates are
used as data samples for one classifier, which would generate errors when the intensity
distribution of are of large diversity. Therefore, grouping methods are introduced and
applied for better pattern recognition in machine learning models. The detailed method of
exploring the contrast ratio is discussed in the step 1 in Section 4.3.
As we did a comparison between the result after 2-layer logistic regression with
grouping input data, and the results without grouping input data, although the breakout
remains somewhere but it is reduced to show a trend in connection. Then the processing
in Stage II starts based on the result to perform line segment detection based on Hough
and de-Hough transform as depicted in Figure 16. And the metric of cost between each
pair of line segments is designed by the cost function in (4.3).
Figure 16. Hough and de-Hough transform for line segment detection
61
Table 4 gives a part of the cost matrix after normalization of each term, based on
which dynamic programming algorithm is implemented to run for optimal solution for
connection two line segments recursively. Note that in the initial setting the cost of a pair
of line segments is set as “Inf” (∞) not only between two identical lines or but also two
lines that have endpoints distributed spatially far away. Besides “Inf”, the matrix obtained
from the designed cost function has values lie in range of [0, 3], and user can pre-define a
threshold to see the connection results.
Table 4. Normalized cost matrix based on cost function
The images shown in Figure 17 are the results after mitochondria centerline
segments extraction and re-connection under the threshold value changes from 0.0 to 0.5
with step length of 0.1.
62
(a) threshold =0 (b) threshold = 0.1
(c) threshold =0.2 (d) threshold = 0.3
63
(e) threshold =0.4 (f) threshold = 0.5
Figure 17. Connection of mitochondria breakouts
(a)-(f) shows the fragments connection results under different threshold values. It is shown that as
the threshold of objective function increases, more line segments are selected for connection. As
compared to the ground truth in human labeling and segmentation results from Stage I, the
accuracy of morphological structures increases.
In Figure 17, the red channel has the information of all line segments after de-Hough
transformation, while the green channel is the information of connection between a
proper pair of line segments that are connected. The blue channel has the information line
segments being chosen for connection. Thus, one line segments in blue channel must be
one line segments in red channel, and in overall appearance the chosen line segments that
are connected are shown in purple.
The performance is evaluated by drawing area under ROC curves. To draw the AUC
curves, precision and false alarm ratio are calculated. The definition of AUC-ROC is
described briefly together with the result. As shown in Figure 18 ROC curves for
64
grouping and non-grouping results are drawn as the threshold value changes from
0.05-0.5 with step length of 0.05.
Figure 18. AUC-ROC Curves under different approaches
blue : with input-grouped approach green: without input-grouped approach
Vertical axis: True Positive Rate = 1-Miss, where Miss =# missing pixel / # foreground pixels in
ground truth
Horizontal axis: False Positive rate = # false-detected foreground pixels / # foreground pixels
obtained in the proposed algorithm
In Figure 18, The AUC in grouping method is 0.093 while the AUC in non-grouping
method is 0.075. As a larger AUC indicates a better performance of classifier, the result
shows that the grouping method gives superior accuracy. The global segmentation
accuracy for 1024×1024 image achieved > 98%. The final segmented mitochondria
image is shown in Figure 19, which is a combination of binarization of results in machine
learning stage and fragments connected in centerline extraction.
65
(a) (b)
(c) (d)
Figure 19. Comparison of mitochondria segmentation results
(a) The binary output after Stage I with input data grouped; (b) the binary output after Stage I
without input data grouped.
(c) The binary output after Stage II with input data grouped; (d) the binary output after Stage II
without input data grouped.
66
The binary results after both Stage I and Stage II, with input data grouped and
non-grouped approaches applied at the beginning, are compared. It is shown that the
designed mitochondria segmentation system retains the morphological structures close to
the ground truth labeled manually, as shown in Figure 12(b).
4.5 Conclusion
In this work, a segmentation system is designed using the learning-based techniques.
Learning-based approaches are used to realize fully automation in processing the
fluorescent mitochondria image data and obtain line segments in small local areas, and
then use the centerline extraction to connect the segments in a desired way. At the final
output foreground structure in mitochondria images is automatically and accurately
extracted.
A few points are proposed to improve the work. First, the bias among the
components of cost function should be learned. Currently, all 3 items of the equal
importance since it still to figure out how each item performs significantly in the cost
function. Second, extra work should be done to learn to numerically analyze how to
define a threshold value for the cost function. Finally, the proposed segmentation system
should be tested by more mitochondria image data.
67
Chapter 5 Objective Analysis of Capsulorhexis
Surgical Techniques
5.1 Introduction
This chapter focuses on the development of video techniques for the objective assessment
of the capsulorhexis part of cataract surgery. Our data sources are video clips of
capsulorhexis surgical procedure by surgeons with many different levels of experience.
Video clips have both global similarity between frames, and local difference within
frames. We apply learning-based processing techniques to help in the automated analysis
of the video clips.
Traditionally, to educate new residents for cataract surgical skills effectively,
subjective evaluation is performed with interaction between practitioners and surgical
preceptors [93-95]. Ref [96] provides a survey of such tools. Keeping pace with the
cataract surgical evaluation guidelines from ACGME, Ref [96] agrees that an objective
assessment system should have three characteristics: low intra-observer variability with
consistency in repeated evaluations from one grader; low inter-observer variability
having a similarity of evaluations from different graders; and construct validity that
focuses on various types of expertise such as surgical concept, knowledge, and practical
performance. It also reviews and compares five ophthalmological surgical technique
assessment tools foe evaluating residents, including GRASIS and ICO-OSCAR:phaco
[31, 32], and claims that those tools represent the global trend of moving away from
counting prior cases toward standardized measurement when educating the residents.
68
However, the work also points out that misinterpretation by graders makes it problematic
in evaluating surgical skills and degrades the quality of teaching interventions for training
programs. We expect that the assessment of surgical performance could be strongly
improved with the complementary use of both objective and subjective ratings. It is
expected that assessment tools could improve training in a most practical and efficient
way.
To our best knowledge, this is the first time that computer vision techniques are
involved in evaluating surgical videos together with image processing approaches to
evaluate the capsulorhexis surgical techniques. This work is in collaboration between the
Department of Electrical Engineering at USC and Jules Stein Eye Institute (JSEI) at
UCLA, focusing on analyzing video clips from the capsulorhexis part of cataract surgery
to develop quantitative ways to measure quality of the opening created in the capsule and
assess efficiency and accuracy with which the surgical technique is performed. The main
tasks of the work are shown in Figure 20.
Figure 20. Flowchart of designing algorithms for objective cataract surgical assessment
As discussed in previous chapters, one significant quality metric of a capsulorhexis
surgery is the quality of the opening created in the capsule. Thus, the size and shape of
the opening along with other relevant factors are a component of the proposed assessment
system that should be designed to evaluate the surgical process. The input data to the
69
assessment system is the raw video, and the output should be an objective score that
evaluates the overall quality of the surgeon.
Experimental algorithms we use include, but are not limited to: image denoising and
enhancement; edge detection; color space transformation; histogram measurements;
morphological descriptors; manual tracing; motion estimation; and pattern recognition.
Video image processing and analysis algorithms are required to stabilize the video frames
and to identify and locate the limbus. In addition, the instrument movements needed to
complete the surgery must be classified and recorded. For each step, specific
requirements and detailed tasks in the development of assessment metrics are analyzed as
follows.
• Task 1: Centration of eye within the operative field in order to register and track the
eye and its movement during cataract surgery; assess re-centering movements; and
enable analysis of the tasks below.
The goal of this part is to produce a video in which all the frames have the centroid
of the pupil precisely aligned with that of the view field. This step is important and
helpful for quality assessment of surgical technique of capsulorhexis by evaluating
and analyzing the trace of instrument movement and measurement of motion vectors.
The recommended techniques include (but are not limited to) object tracking
followed by image registration. With input signal as raw frame sequence, the
location and morphological features of the pupil are detected to determine the motion
magnitude and motion angle of certain points on the pupil edge or the centroid of the
pupil. The pupil location must be fixed in the output frames; they need to be aligned
with one another in this way so that differences can be detected. It should be noted
that the image registration algorithm must work independently of the size or color of
the pupil for every patient’s surgery video. Algorithms for pupil margin detection or
70
shape renormalization must be robust enough to perform well for a variety of eye
shape, colors, size, etc.
• Task 2: Recognize needle, cannula and forceps and count the number of times each
instrument is inserted into the eye
This involves feature extraction and object recognition of instruments. A needle has
a sharp end while a cannula has a small hook at the end, and both of them are hollow.
The cannula is neglected for now since it is used to irrigate the eye surface, and is
generally used by the assistant surgeon, thus it does not relate directly to the quality
of capsulorhexis performed by residents. To count the number of insertion times for
needle and forceps, we may make measurements of the position of instrument tips vs.
surgery time. More generally, finding the motion vector of instruments is probably
the best way to achieve our goal.
• Task 3: Count number of attempts at grasping capsule flap with forceps and count
the number of successful grasps (motion capture and learning)
Instrument movement is measured in this part. Since the actual surgery is completed
in a 3D space, there is a depth along the axis perpendicular to the video frames. It
means that during practical surgery, the movement does not only lie in the X-Y plane,
but also in the Z direction in the Cartesian system. This depth information in
instrument movement directly determines whether the seam cut by the needle is well
done enough to grasp and whether a grasp is successful or not.
To count the number of grasps, the distance between tips of forceps could be a
characteristic of a grasp. Thus, forceps tracking and registration are necessary as
pre-processing steps. Then a distance threshold could be determined by observing a
histogram distribution of all distances (obviously, in ideal case it should be 0). Then
a tip distance larger than some threshold value corresponds to “being a new round of
71
grasp”. Since a successful grasp intends to be followed by an action to tear, the trace
of tip movement could be used as a measure of the number of successful grasps. The
trace can be modeled as a portion of a circle, or some smooth curves. This part
should be very important since it is directly related to the surgery efficiency.
• Task 4: Assess completed capsulorhexis for shape, size and concentricity with iris.
After the previous steps are finished, this task can be done by generating a temporal
sequence of extracted features. Considering the roundness of the opening as an
example, since the area of opening has similar color distribution as its background,
and the cortex beneath the opening might be disrupted in various ways, the
morphological features of the “roundness” may be affected. To determine the
robustness of this procedure, a number of video clips of different human subjects
should be evaluated. Since there is a wide variation in appearance between each, this
is a challenging task.
In summary, the objective of this research topic is to provide new insights to
understand and improve the learning of surgical skills for residents. The work is
significant for the practice of and advances in ophthalmology by all healthcare
professionals. And the ultimate deliverable is the ability to enter a new video clip of
the capsulorhexis portion of cataract surgery and have the program generate accurate
and reliable numerical assessments.
5.2 Video Dataset
The video dataset provided by the UCLA Jules Stein Eye Institute, and is the same
dataset used in Ref [33] and consists of DVD recordings of the capsulorhexis procedure
from six surgeons at different training levels: one case from a postgraduate year (PGY) 3
resident with 7 prior cases (video index: CCR989, video length: 195 sec), one from a
72
PGY 3 resident with 27 prior cases (video index: CCR988, video length: 251 sec), one
from a PGY 4 resident with 111 prior cases (video index: CCR904, video length: 82 sec),
one from a PGY 4 resident with 115 prior cases (video index: CCR970, video length: 112
sec), one from a expert in cataract surgery with prior 7500 cases (video index: CCR933,
video length: 59 sec) and one from another expert with 10000 prior cases (video index:
CCR903, video length: 42 sec). The resolution of video CCR933 is 960*540
pixel-by-pixel, while all others are 720*540 pixel-by-pixel; frame rate is fixed at 30 fps.
5.3 Methods
Our 3-stage surgical video assessment system is described as illustrated in Figure 21. In
the first stage, video frame registration is applied to stabilize and register all the raw
video frames based on the position of limbus or pupil. These stabilized and registered
video frames are sent to the second stage, which has two steps. Image processing is
applied to identify and extract instruments in every frame. Then computer vision
approaches are explored to build a recognition system that identifies instrument
movement. In this system, image analysis runs for every movement so that the trace of
instrument can be spatially tracked. The tracking information will be used for movement
understanding. Finally, as one operation might occur several times within one precedure,
repeated movement recognition is learned and recorded. All records are used to do
classification and assessment.
Currently, the first stage of work is completed, and the ongoing work is to improve
the instrument identification in stage II. All work approaches are implemented in the
technical computing language Matlab (version: 2013a) for optimized algorithm
programming and data visualization. Details of each stage are discussed in the next
sections.
73
Figure 21. Proposed surgical technique assessment system
The overall 3-stage assessment system with detailed description of function added to each stage.
5.3.1 Stage I: Video Registration
Pre-processing
The first step is a preprocessing step to remove the video noise. Since the visual quality
of raw video is relatively low and diverse noise levels affect the signal-to-background
contrast, this step is important to reduce noise that may be present. The noise reduction
approach is applied to each frame individually; this is called spatial video denosing. The
specific algorithm is called block matching and 3D filtering (BM3D) proposed recently
as an efficient noise removal method [97]. This filtering relies on the sparse
representation of true image signals in the transform-domain that is generated by forming
a group of similar 2D images into 3D blocks. By verifying local and nonlocal
characteristics of raw image signal, the true signal can be better separated from noise if
74
sparse representation is enhanced and the energy is compactly formed. This filtering
method shows a high-level denoising effect superior to most advanced noise-removal
techniques.
Methodology 1: Traditional Image Processing and Video Registration Methods
After pre-processing, the next task is to center the eyeball in order to maintain it within
the operative field. A well-fixed position of cornea or pupil benefits the next steps and
directly affects the output results for measuring the quality of the capsule opening. A
popular method is to use edge detection and color histogram analysis to identify the area
of cornea or pupil. The reason to consider edge detection and color histogram is that they
are based on either mathematical kernels or statistics, which provide reliable information
of discontinuities in image brightness. It is desired to keep the cornea or pupil area only
and image registration is applied based on location of cornea or pupil. However, as
shown in the next part, the result was not as good as expected as an individual video
frame contains much more detailed structures such as blood vessels.
Next, interactive segmentation based on color information is tried to detect the
global location and center position of cornea and pupil. Three methods are examined and
compared to find the location of cornea/pupil centroid in order to do the image
registration. A single frame from a video clip is extracted to do the testing of different
algorithms. The methods are Paint Selection (Microsoft, 2009), shape bounding (analyze
the fitting shape of object), and histogram analysis (analysis of color information in RGB
channels). All those results are compared with human manual labeling.
Another standard video registration method called Optical Flow was also tried.
Optical Flow, or Optic Flow, is an approximation of the local image motion based upon
local derivatives in a sequence of images, that is, it estimates how much each image pixel
75
moves between adjacent images. In 2D images, it is the pattern of apparent motion of
objects, surfaces, and edges in a visual scene caused by the relative motion between an
observer (an eye or a camera) and the scene [98, 99]. The basis of differential optical
flow is the motion constraint equation, which is derived by writing the first order Taylor
series expansion about intensity I(x+dx, y+dy, t+dt) and compare it with I(x, y, t),
assuming that I(x+dx, y+dy, t+dt) and I(x, y, t) have the same pixel intensity in
neighboring frames. There are many Optical Flow algorithms and the one chosen is
called Combined Global and Local (CGM) method [100, 101]. It uses a global 2D motion
constraint function and local smoothing filter to solve the optimal solution. The results
are considerably better than previous methods using a single control point for registration,
while the drawback is that the shape of any instruments (i.e., needle or forceps) is
distorted in the output video frames. In addition, the location of the limbus is very stable:
in the raw frame sequence the location of the limbus jumps from frame to frame, which is
more obvious as the instrument is inserted into the limbus; after registration, the limbus is
likely to be fixed on the same spatial location in each frame as an aid for observers to
evaluate the movement of instruments and the quality of the surgical procedure.
Methodology 2: Adaptive Template Registration
Although traditional image/video processing might locate the limbus or pupil in single
video frames, it appears less robust when applied independently to each frame of the
whole sequence and because the shape of the instruments may be distorted, therefore we
seek a registration method that ignores instrument shape, yet detects the pupil/limbus in
all video frames. If necessary, the quality of registration could be slightly sacrificed, i.e.,
the registration may not have to be performed to sub-pixel precision.
Figure 22 shows our approach for video registration. At the beginning, in the very
first frame of the whole video, a template enclosing the pupil and part of the limbus is
76
manually selected as shown by the red bounding box. If a copy of that template slides all
over positions within the second frame (the search area), the quality of the image match
between the search area and the template can be expressed by calculating the normalized
cross-correlation. Obviously, an optimal match occurs where the search window and the
template are most similar, and the normalized cross-correlation at this point is a global
peak. Then the second frame is spatially shifted so that the search window is exactly
aligned with the template selected from the previous (first) frame. Thus, frame 2 is
registered to frame 1. Then the search window in frame 2, which gives the best match, is
applied as a new template for the next coming frame, i.e., frame 3. The same procedure is
repeated. Thus, the sequence of templates is updated from frame to frame. We called this
adaptive template matching, and the procedure makes all frames registered with the first
frame as well as possible.
If we look into all selected templates, which are the as images bounded by red boxes
for different frames in Figure 23 (left), we find that all templates have a slight change in
the background (mainly pupil) and an obvious change in foreground objects (instruments).
This two types of changes introduce errors while calculating the normalized
cross-correlation. The normalized cross-correlation operation uses gray level images
obtained by a linear transformation of the RGB components of the color information. The
effects of changes in the background in each frame (eye) and removal of instruments
within the sequence of templates may cause errors.
77
Figure 22. Proposed video registration approach by template matching
The template is keeping being updated for every frame so that every frame is registered to the
very first one. Normalized ross-correlation is calculated as a metric for matching between
searching window and template that have same window sizes.
Figure 23. Adding temporal filter to remove foreground instrument movement
78
To solve this issue, a temporal filter is added to modify all the templates before
using them for registration. To realize this, median filters in X-Y-T dimension (X-Y is the
spatial coordinate system, while T is the time index, i.e., the index of frame along the
sequence) are used to process the extracted template sequence before the correlation
search. We experimented with different filter parameters are experimented for all video
clips. Results show that after the processing by median filters, the templates are slightly
smoothed and somehow blurred, while the content of instruments are removed when the
parameter of T is large enough. In order to reduce blurring, the median filter parameters
are of the format [x, y, t] = [1, 1, t], which means that the 3D filtering is reduced to 1D
filtering along the time (frame index). Taking the video CCR 903 with total length of 42
sec for example, the output template sequence processed by a median filter of [1, 1, 31]
only the pupil and limbus remain and no inserted instruments are visible in the templates.
Then the output template sequence is used to do the frame registration, and the resultant
video frames are improved in stabilization as the jumps due to fake correlation peaks
from the insertion of instruments or instrument movement are removed. Although at the
end of the video, there still exists a jump at sharp illumination changes and deformation
of eyeball, it would be removed if the video could be recorded for a longer time, which
would contribute to a better result after median filtering. A comparison of the original
template after median filter smoothing and revised template is shown in right part of
Figure 23 (right), which uses the filter format [x, y, t] = [1, 1, 31].
Extended Work for Adaptive Template Registration
Additional work is focused on reducing the runtime for the proposed registration
method by experimentally testing an algorithm called Local Neighborhood Searching
(LNS) for adaptive templates. LNS is proposed due to the following reasons: 1) the
program runtime should be improved to accelerate the overall algorithm development,
79
and 2) the original searching method of maximum peak on cross-correlation map can
sometimes generate a false optimal match. In addition, a more severe problem is that the
eye moves, deforms and rotates in its socket as surgical instruments are inserted and
removed. Also, the effective illumination changes along with these motions. All this
makes the tracking of the limbus more difficult when using only the first frame as a
reference, and it is why we propose adaptive template registration. We also make an
assumption that the effect of instruments, needle, cannula, forceps, etc. on the
limbus-tracking algorithm is small as these instruments are much smaller in size
compared to the limbus and only appear in a fraction of the frames of the video sequence.
To realize LNS, the algorithm is revised by (1) reducing the search area to
correspond to the size of original frame (the size of cross-correlation map is larger than
the size of frame and their difference is the size of the template); (2) changing the search
for global maximum peaks in the overall cross-correlation map into the search for a
number of local maximum peaks while the true optimal match must be one of those local
maximum peaks; and (3) the peak should be have the property that the 2nd order
derivatives in the X-Y plane are 0. In fact, by considering (2), the idea of LNS could be
more accurately named as Local Maximum Searching (LMS).
The above approach was applied to the video dataset. The template for each video is
carefully selected to avoid the situation that the template goes outside the boundary of the
frame being searched, by ensuring that false matching does not occur at the image frame
boundaries. The approach is helpful in removing the obvious jumps between adjacent
frames.
In addition to above modification, we added one more step to extract the outline of
the limbus using a state-of-art algorithm called active snake model [105]. This model
formulates the image constraints from image intensity, edge, and curvatures in image
80
content, and applies that constraint to a set of points that are initially selected by the user
and run iteratively to converge at the desired contour for intended image content. After
video stabilization/registration, the user selects only a set of initial points in the very first
frame. All video frames share that set of initial points and the contour of limbus in each
frame is extracted accurately. Both the registration and limbus location procedure work
well for the several sample videos in this study.
5.3.2 Stage II: Instrument Movement Understanding
The task for stage II is to analyze and understand instrument movement. This
requires that both instrument identification and tracking should be performed accurately.
Figure 24 depicts the framework of implementing a learning model for this goal. Support
Vector Machine (SVM) algorithm, which groups data sample into binary classifications
by constructs a hyperplane that has the optimum (largest) distance to the nearest data
sample in both classes, is suitable for our problem and applied to fuse the results of many
different types of video frame features, such as color, edges, lines and their orientations,
to eventually distinguish between the foreground instruments and the background based
on decision procedures derived from labeled ground truth data. SVM is selected among
the advanced machine learning algorithms to train the model for prediction of instruments
because it aims at binary classification of data samples, which is in accordance with our
goal of binary segmentation of foreground objects (instruments) and background
(everything else). Moreover, post processing is applied to enhance the instrument
prediction results.
Figure 24. Pipeline of instrument identification and tracking
81
Bi-directional Prediction Model
The key step in building a learning model to accurately estimate/predict instruments is to
collect efficient features. Figure 25 shows the illustration of building an SVM classifier to
do the fusion of 7 specific features, including color, edge, gradient, estimated objects, hue,
saturation and saturation on specularity over the objects.
(a)
(b)
Figure 25. Illustration of using feature input to train binary SVM classifier
(a) Feature input represented in binary vector (the sub-image of SVM is adapted from Ref [111]);
82
(b) testing and testing procedure of SVM. In (b) the blue frames are human labeled data, while
the green frames generated by the SVM model are used to provide prediction of instruments.
All features of each pixel are extracted and form a 7×1 binary feature vector. The
ground truth of pixels, i.e., the determination of whether a pixel is foreground or not, is
labeled by manual work. This human labeling work is only applied to a certain number of
frames, which will be further explained in the next section. To give a better
understanding why the selected features are important, each feature is discussed carefully
as follows.
• Feature 1: Color Index
Color information is one of the most important visual perceptual properties in images.
In video frames, the instruments normally appear in light gray. To separate those
pixels, different representations in different color spaces such as RGB
(red/green/blue), HSV (hue/saturation/value), and Lab (Lightness/ a, b are
chrominance dimensions) are investigated. For our problem, information in channels
of “a”, “b”, “h”, and “s” is extracted and form a 4×1 feature vector for each pixel to
do a K-means clustering with pre-defined value K=5. At first, K cluster centers are
randomly picked and each feature vector (represented for a specific pixel) is assigned
to a cluster that minimizes the Euclidian distance between the center and pixel. This
center is then recomputed by adding that pixel to its cluster. The process continues
until all pixels are assigned and all clusters converge. Then the frame is clustered
into K partitions. Every pixel in the cluster with instrument objects is labeled 1, while
all other pixels are assigned 0.
• Feature 2: Gradient with Direction Constraints
Pixels in the same frame generally have a large gradient if they are located at the
edges of structures. The gradient is calculated through edge detectors and the
83
direction constrains are computed through line segment detection. Since in Chapter 4
the Hough Transform and de-Hough Transform imply successful line segment
detection, we re-applied them to find the line segments on the edge of instrument. It
locates edge segments that lie within a specified range of angles, by which the
instrument edges could be further identified. This method generally works well,
although it produces broken segments due to changes in illumination, movement of
the instrument, etc. That is why the complete closed edge may not be found.
However, it is helpful to provide a set of angular information that is recorded to set
up a range for constraints. Pixels with large enough gradients located in that range
are assigned with index of 1, and all others are assigned 0.
• Feature 3: Thick Edge
The edge detection results from Feature 2 are then processed using growing
algorithm to expand them to the possible region of true edges (edge detection may
give results that are not accurate enough). The thickened edge information outlines
the area with a higher possibility of being foreground, which is represented as 1.
• Feature 4: Estimated Region of Interest (ROI) for Instruments
Based on the edge detection in Feature 2, the possible areas for instruments are
outlined. Moreover, line segment detection could also provide estimation of
centerline axis of instruments. Those areas that overlap with the centerline axis are
used as estimated ROI for instruments. Pixels within those areas are assigned a
feature with value of 1.
• Features 5 and 6: Hue, Saturation
Since Feature 1 may not show adequately divided color attributes, i.e., it may fail to
provide enough restrictions to select foreground objects, other color spaces are
investigated to provide useful separation in other way. HSI is the space that is used to
84
show color information more intuitively inspired by color wheel. Feature 5 and 6 are
binarized hue and saturation information.
• Feature 7: Saturation on Specularity
Specularity indicates the visual specular reflection of objects from the luminance
source. This feature is added as the foreground objects are showing considerable
appearance similarity as compared to the specular reflection in texture and color. The
specular reflection is computed using the model in Ref [106], and is represented in
HSI space. The saturation channel is selected and made binary as Feature 7.
To train the SVM model, all features are computed for each pixel within a frame and
sent together with labeled ground truth. The very first frame of every second is labeled.
To reduce the runtime of SVM, the learning process is applied to a ROI, which is the
grown area from prediction or the labeled area generated in its neighbor frames either in
the forward or backward direction. Prior knowledge such as the incision location, the
range of instrument moving area, and the physical dimensions and shape of the
instrument are applied to constrain the instrument identification process. Thus, instead of
starting the search in each full frame, our algorithm predicts possible foreground areas
(instruments) through successive frames. By doing this, the runtime of algorithms
improves, as the preliminary work might be very computation intensive and time
consuming.
Figure 25 (b) shows how the bi-directional learning model is trained at each second
by using the labeled data. For every pixel of every frame between 2 labeled frames, the
features are collected and sent to the learned model for prediction and are finally
classified to be foreground or background. The output classification results are expected
to provide accurate identified instruments.
85
Improved Bi-directional Prediction Model
From the results in last part, the SVM model performs well for instrument extraction and
tracking in the cases where instruments are having enough image contrast with
background. However, when the video quality degrades the quality of instrument
identification decreases and so is the tracking result. In order to obtain better tracking
performance, the original bi-directional prediction model is improved by adding a
geometrical transformation module (GTM). In this modification, only a pre-defined
number of frames are identified from SVM classification, called “I frames”, while the
others are processed through GTM generated by two closest I frames, called “P frames”.
Surveillance points in two neighboring I frames are automatically selected as control
points based on which the GTM is computed. The geometrical transformation matrix
with parameters of spatial translation distances in X-Y plane, scaling factor, and rotation
angle are calculated via standard image warping algorithm. The coordinate system is
fixed and the objects are transformed. Different GTMs for every pair of I frames is
calculated from sides towards center.
As shown in Figure 26, after the human labeled data (frame 1 and frame 31) is sent
to the SVM model to fuse the feature input, the trained model performs the prediction of
instrument as testing results (frame 2 and frame 30). Those predicted results, considered
as I frames, are processed by post-processing steps and used to select control points for
instrument tracking through GTM. The tracking would be done in pairs, i.e., tracking
instrument in frame 3 and frame 29 is done by GTM form frame 2 and frame 30,
obtaining the ROI for possible instruments in both frames. Then the restricted area is
applied with simple segmentation and the foreground instrument is detected. Then the
detection results are sent to select the next round of tracking for the pair of frame 4 and
86
frame 28, and the process would be repeated until the center frame, frame 16. The results
are improved in accurate instrument tip location and instrument rigid body recovery.
Figure 27 provides an example to show the selection of control points. Noted that
the control points could be generally considered surveillance, however, it may generate
errors in calculation of spatial transformation matrix. The reason is related to the nature
of our problem. In general a surveillance point is considered a feature that could be
helpful to identify the foreground objects. However, in our problem, the limited size of
instruments and plentiful image content at background prevent efficient selection and
mapping of surveillance points on instruments between separate frames from relative
standard algorithms in computer vision. Therefore, it is necessary to explore proper
schemes to select the right control points.
Figure 26. Improved SVM classifier by adding a geometrical transformation model
Other than the human labeled data (blue) and the I frames (green), other frames are processed
throughout the computed geometrical transformation model, termed as P frames (red). This
terminology brings the concept from the domain of video compression.
87
Figure 27. Example of 2 pairs of control points on detected needle from separate frames
To compute the GTM, the control points are selected to solve transformation
equations. The first pair of control points is calculated through corner detector (can be
considered as 𝑥
!
,𝑦
!
and (𝑥
!
!
,𝑦
!
!
)), which locates the tip of instruments. The second
pair is defined as the control points on the centerline axis with the user-defined spatial
distance to the instrument tip. Multiple pairs of points are expected to generate better
tracking results. The GTM is computed by (5.1), where 𝑆
!
and 𝑆
!
are scaling factor, 𝜃
is rotation angle, and 𝑇
!
and 𝑇
!
are spatial translation parameters. At least two sets of
equations are required if 𝑆
!
and 𝑆
!
are assumed to be identical, thus at least two pairs
of control points are required.
x
t'
y
t'
1
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
=
S
x
0 0
0 S
y
0
0 0 1
⎛
⎝
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⋅
cosθ −sinθ 0
sinθ cosθ 0
0 0 1
⎛
⎝
⎜
⎜
⎞
⎠
⎟
⎟
⋅
1 0 T
x
0 1 T
y
0 0 1
⎛
⎝
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⋅
x
t
y
t
1
⎛
⎝
⎜
⎜
⎜
⎜
⎞
⎠
⎟
⎟
⎟
⎟
(5.1)
To investigate the effect of introducing GMT, different I-P rates are applied in order
to study the property. Table 5 lists the I-P rates that are tested. A 1-0 rate indicates that no
P frame is generated, i.e., all frames between labeled frames are obtained from SVM,
while 1-1 rate indicates that every other frame is generated from SVM as I frames and the
88
one between them is the P frame from GTM. 1-ALL rate indicates that only frames near
the labeled frames are from SVM while others are from GTM.
Table 5. Different I-P rates that are applied for optimal results
Post-processing
Post-processing schemes are applied for every I frame and P frame. It includes 2 kinds of
processing: convex-hull and mirroring folded-detection. The convex-hull method is
applied to bridge the fragments of predicted object, which should belong to a completed
one. Mirroring folded-detection should be more carefully used as it is designed to recover
the missing detection of foreground instruments under the assumption that the instrument
is fully symmetrical in geometry shape. Both methods are helpful in tracing the
instrument movement.
5.4 Results and Discussion
5.4.1 Video Registration of Pupil
Methodology 1: Traditional Image Processing and Registration Methods
To do the testing of different traditional image processing and registration algorithms, we
extracted the first frame in video clip with the index CCR903. The reason to choose this
video is that it is the most blurred one among all videos we have in visual quality. All
other videos have higher color contrast and sharper edges between pupil and left cornea
area, and between cornea and left eye area. That frame is shown in Figure 28(a).
89
• Approach 1. Edge Detection
Edge detection is applied to draw the outline of the pupil or limbus in order to realize
the centration of eyeball and stabilize it within the operative field. Figure 28(b)
shows edge detection using a double-threshold edge detection operator called a
Canny detector, which is a state-of-art edge detection approach. The expectation is to
keep the edges for limbus or pupil so that the center of the pupil can be localized,
while the result is not robust enough since plenty of details remain at output. Since it
was not reliable enough to identify either the limbus or the pupil margin, the work
was then focused on doing interactive segmentation based on color information.
(a) (b)
Figure 28. Raw frame image and the edge detection results
(a) Extracted frame from video CCR903, and (b) Edge detection of (a) using Canny operator
• Approach 2. Paint Selection
Figure 29 lists the segmentation of cornea and pupil and the related bounding box of
the extracted foreground objects from the method termed as Paint Selection.
Compared to previous results, they are slightly improved in visual quality.
90
(a) (b)
(c) (d)
Figure 29. Segmentation of cornea and pupil by Paint Selection
(a) Extracted area of cornea, (b) Bounding box (yellow) of cornea in (a), and (c) Extracted area
of pupil, and (d) Bounding box (yellow) of pupil in (c)
Microsoft described this method in 2009 [107]. It demonstrates a painting-based tool
to do local segmentation on images. The kernel work is based on the model of
Grab-Cut, which is an interactive foreground extraction approach, meaning that a
user should use a paint brush to draw a line by dragging the mouse and label it either
foreground or background. Then Paint Selection uses pixel information in the labeled
area to build a color model by fitting a Gaussian Mixture Model (GMM), and then
propagate to neighboring areas and update the GMM as well. This procedure also
happens in background areas. Therefore, the foreground model and background
91
model would be updated until the user is satisfied with segmentation results and
terminates the program. This work also requires human inspection for every input
frame, and is easy to introduce errors if the image quality is limited.
• Approach 3. Shape Fitting
Shape Fitting is a way to use a round shape or ellipse shape to fit the area of cornea
or pupil. It uses human intervention to better fit the object. The work is completed by
the software [81], which provides oval and elliptical selection tools to draw desired
shapes. The tool used for Shape Fitting is the oval selection tool. It draws an oval
shape with 8 handle points on the boundary. Clicking and dragging any one of those
points adjust properties of location, width, height, and aspect ratio. The results of
shape fitting are shown in Figure 30. Then the centroid position is measured based
on location of oval shape.
(a) (b)
Figure 30. Oval shape fitting of cornea and pupil using ImageJ (yellow)
In this approach, we adjust the oval shape to fit the cornea/pupil area. The more the
oval shape fits the objective area, the more accurate the centroid measurement should
be. Again, this method is not reliable although the visual quality of outlines is
superior to previous results it still requires human work to complete.
92
• Approach 4. Histogram Analysis
The 4th approach is based on the histogram analysis of RGB channels, as shown in
Figure 31. Based on choosing proper threshold range in each channel, object
extraction results can be obtained, as shown in binary images in Figure 32. It is also
shown that different threshold range would lead to results with different remained
content.
Figure 31. Histogram of red, green, and blue channels of Figure 28(a)
Initially, histogram analysis is considered to be a robust method to get high quality
results, which should contain cleanly segmented objects. However, pixels with
similar intensities may lie in different unconnected areas; thus, unique global
threshold values cannot provide expected results. If we still want to use histogram
analysis to do the object extraction, we need to look into adaptive threshold methods.
Red
Green
Blue
93
Otherwise, global thresholding methods are likely to give unconnected and unwanted
regions as shown in Figure 32.
Figure 32. Extraction of cornea based on different threshold values in 3 channels
The left result is generated from wider threshold ranges: [90, 130] for red channel, [115, 153]
for green channel, and [105, 157] for blue channel; and the right one is from narrower ranges:
[145, 150] for red channel, [180, 210] for green channel, and [177, 190] for blue channel. In
each channel, pixels within the threshold range remain in the output. Thus, more details are
retained in the left image.
• Approach 5. Human Labeling
The last method is human labeling, which is used to measure the accuracy of results
from the above 4 methods. A drawing tool is used to label the area of cornea and
pupil. They are compared with Approaches 2 and 3 for centroid coordinates. The
results are shown in Figure 33. To compare the above methods, we calculate the
centroid coordinates of segmented results in Approach 2 and 3, and compared it with
human labeling results. Since Approach 4 is not as accurate as expected, the centroid
was not measured. The centroid coordinates in the Cartesian system are listed in
Table 6.
94
Figure 33. Human labeled region of cornea (left) and pupil (right)
Table 6. Comparison of Experimental Results by Programs and Ground Truth
• Approach 6. Optical Flow
Optical Flow is a fully automatic method to register the image content. It performs
very well when applied to neighboring frames; however, since this method should
register every frame compared to the first frame, it tends to deform the foreground
objects when the temporal span between two frames is large. Figure 34 indicates an
example for this effect for frame 30, which is the last frame of second 1 in video
CCR903. In fact, when Optical Flow is applied to neighboring frames only, every
frame is registered to its previous one and the stabilization result is not obvious while
the shape of instruments is retained well.
95
Figure 34. Raw frame (frame 30, left) and the registration result (right)
Methodology 2: Adaptive Template Registration
First, the adaptive templates and modified adaptive templates for video CCR903 are
shown as well as the process of outlining contour of limbus, in Figure 35 and Figure 36,
respectively. It is obvious that in the modified templates on the right, the signal of
instruments is reduced and background content is remained. The registration results from
our proposed algorithm are shown in Figure 37 and Figure 38 that are compared for
expert surgeon (video CCR903) and novice surgeon (CCR988). Note that the video frame
on the left, obtained from raw video clips, is showing spatial jumps are times, while the
registered video frame locates the area pupil at fixed position.
(a)
96
(b)
(c)
(d)
Figure 35. Adaptive templates (left) and modified adaptive templates (right) for video
CCR903 at different times: (a) 10 sec, (b) 20 sec, (c)30 sec, and (d) 40 sec.
Figure 36. Compared results for detected contour (middle) and labeled contour (right)
97
The image on the left indicates the selection of initial points. As mentioned before, this process
only occurs once and the initial points are shared by the whole video frame sequence. The outline
contours in all frames are well aligned with human perception.
(a)
(b)
(c)
98
(d)
Figure 37. Raw video frame (left) vs. registered video frame (right) for video CCR903
(a)-(d) are frames at different surgery time. The black bounding box indicates the
template-matched area, while the red contour indicates the contour extracted by snake model
proposed in Ref [105]. A well – registered video would also minimize the human inspection when
using the snake model.
(a)
99
(b)
(c)
(d)
Figure 38. Raw video frame (left) vs. registered video frame (right) for video CCR988
Video CCR988 is recorded for a novice resident who has 27 prior cases. The instrument motion
in this video is relatively slow compared to Video CCR903. This, to some extent, reduces the
error in video frame registration. Template-matched area and pupil contour are marked in white.
100
Discussion
We draw number of conclusions from results based on the listed approaches. For
methodology 1, at first we try to define the “best” result (or the “ground truth”) as the
boundary drawn by a human observer, as described by Approach 5. Approaches 2 and 3
provide reliable detection results of the centroid of the limbus and the pupil as compared
to ground truth results, if the error tolerance is defined ∞ within a few pixels (e.g., ±5
pixels). Comparing approaches 1-5, the best procedures sorted in order of quality from
high to low are 3 > 2 > 5 > 4 >1. However, all those approaches require manual
inspection to some extent. Approach 6, which can perform an automatic registration,
introduces deformation in the results and directly affects the tasks that follow. Thus, that
method is not reliable either.
Although these approaches fail to provide fully satisfactory results, it is important to
know how to improve the results of these approaches based on error analysis. For
example, Approach 3, shape fitting, may inspire the idea of exploring angular rotation of
contours. Defining (x, y) as the centroid coordinate and Q as the rotation angle between
major/minor axis and horizontal/vertical axis of an ecliptical contour, possible reasons for
inaccurate contour location are due to at least two factors: 1) the angular rotation of Q in
eye movement/location (abduction, adduction, supraduction or infraduction), which
causes the variance in measurement of (x, y) and the existing approach, does not provide
adjustment of Q; and 2) inaccuracy in human labeling, which cannot be eliminated but
ideally should be reduced by combining the results of multiple labeling by the same or
different human observers. To reduce errors from angular rotation, elliptical shape
selection by adjusting (x, y) and Q could be used to rotate or resize the area and adjust
eccentricity.
101
In summary, Approaches 1 and 2 require human inspection to reduce unwanted
detection, and Approach 3, having semi-automatic or automatic approaches, would
process the frame sequence but is still time-consuming. Thus, the processing of a longer
video clip might be difficult for all above approaches. For example, a 2-minute surgery
video with 30 frames per second (fps) has 3,690 frames in total. Labeling of a large
number of frames by a human observer is very tedious and tiring, and accuracy may
decrease with time. Approach 6 does not retain the instrument shapes. Therefore, all
those results motivate the development of automatic registration methods.
In comparison, methodology 2, proposed as adaptive template registration, produces
greatly improved registration results in visual quality and numerical measurement. The
most important advantage is that the method is automated with minimal human
intervention and is not sensitive to inter-frame inhomogeneity in illumination or color
contrast. Moreover, the area of pupil is outlined by the snake model, which computes the
image constraints from initially selected edge points [105]. This requires a small amount
of manual work as the registered video frame sequence could share one set of initial edge
points. The experimental results show that this algorithm could generate stabilized video
for further study.
5.4.2 Instrument Identification and Tracking
Before showing the identification and tracking results, visualization of labeled data and
features are shown in Figure 39. All parts of the figure except Figure 39(b) are
binary-valued. Figure 39(b) could be compared with the raw image data in Figure 28(a)
to see the grouped result of K-means segmentation labeled with different colors with
K=5.
102
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Figure 39. Visualization of labeled data and feature input for Figure 28(a)
103
(a) human labeled instrument, (b) color index (not binary), (c) gradient with direction constraints,
(d): thick edge, (e) estimated possible instrument, (f) hue, (g) saturation, and (h) saturation of
specular reflection.
Figure 40 and Figure 41 show the results from default trained SVM model with I-P
rate of 1-0 and improved instrument tracking model with I-P rate of 1-1, respectively. In
the comparison in Figure 41, the detected area is outlined by blue contours. It could be
found that although the prediction results from default SVM could track the instrument
accurately for most cases unexpected signals are still remained somehow, as shown in
Figure 40(b). However, the improved algorithm for instrument tracking could reduce that
background image context, since the geometrical transformation model estimates a
possible region from motion analysis.
(a)
(b)
104
(c)
(d)
Figure 40. Registered video frame after Canny edge detection (left) vs. registered video
frame with instrument predicted by SVM (right) for video CCR903
The right part of (a) shows prediction result in the second frame that shows accurate prediction.
(b) also represents the result for needle but with a few more errors. (c) shows a frame taken from
another part of the video in which no instrument is used, and (d) is the prediction for forceps.
(a)
105
(b)
(c)
(d)
Figure 41. Instrument predicted by SVM (right) for video CCR903 vs. improved instrument
tracking for video CCR903
Currently our work is still focusing on improving the SVM-GTM pipeline. Future
work may be targeted at adding more training data to predict forceps more accurately.
106
Discussion
To realize the current results at Stage II, there are 2 main difficulties in algorithm
implementation. First, the nature of our problem and the video quality limit the feature
extraction. The size of frame is 540×720 pixel-by-pixel, while the width of instrument
is less than 10 pixels. The image features for the foreground objects extracted from edge,
centerline axis or surveillance points are too sensitive to perform well for all video frames.
Moreover, as the resolution of raw videos is not high, even after image enhancement and
denoising, the foreground image context structure is not of high contrast. Second, the
control points are hard to choose. Some points looks like good candidates for control
points, but they do not always exist in all video frames. Some control points exist for a
large majority of the video frames but are located on outside the limbus on blood vessels
or external devices.
In order to train a SVM classifier for good results, our work has focused on applying
various feature detectors to extract points of interest to identify instrument movement.
For most feature detectors, image enhancement is required to improve the feature
detection results. By studying existing detectors and learning from published papers for
comparison, it is known that Hessian-Affine detectors are more reliable than Harris-
Affine detectors [109], which makes scale-invariant feature transform (SIFT) detector
outperform than other detectors by representing the distribution of local features within
the interest point neighborhood as a feature descriptor [112, 113]. The SIFT detector uses
matrix-based measurement, and the SIFT descriptor contains the local histogram
distribution, selecting interest points at distinctive locations such as corners (tips of
instruments), blobs (small globule inside liquid), and junctions (crossing of vessels). At
the detection output for each frame, each interest point is represented as a feature vector,
107
and Euclidean distance is used to compute the degree of matching of interest points in
two image frames.
However, it turns out that such detectors generate too many “interesting points” and
most of them are not useful for recognizing instrument movement. Extra work to remove
those unmatched interest points is required and increases the risk of false tracking.
Therefore, the corner function is used to detect the tip of instrument. In addition, another
point is located on the centerline axis of the instrument with a fixed distance to the tip.
Those control points serve well for calculating the geometrical transformation model and
contribute to improved instrument tracking.
5.5 Conclusion
In this work, our goal is to build an assessment system to evaluate surgical techniques.
The system should be objective, accurate and reliable. The use of quantitative analysis
using computer vision and image processing along with qualitative evaluation based on
questions are part of the system to improve the training and skills of surgical residents.
Our proposed system has 3 stages. First, raw video is sent to a stage called video
frame registration where the position of limbus or pupil is fixed from frame to frame as
much as possible. In Stage II, the registered video is processed and analyzed to perform
instrument movement understanding. Finally, all the movement is quantified to explore
surgeon evaluation and improvement. Results show that the limbus was reliably
identified and tracked through the entire video clip, even if the eye moved significantly or
partly left the field of view. The stabilized (registered) sequence is identified for moving
objects (surgical instruments) within the eye based on a learning/prediction model. Line
segment detection also helps to outline specific instruments.
108
The overall system is being improved and developed. The final goal is to develop
objective numerical scores for surgical techniques and provides feedback based on the
evaluation of each task. Stage II in the overall pipeline is nearly complete. After
identification and tracking of the instruments, the next step is to define, record and
recognize the instrument motions. Future work to count the number of insertions and
attempts at grasping the capsule is underway, and will be further discussed in Chapter 6.
109
Chapter 6 Conclusion and Future Work
In this dissertation, our efforts on three specific issues are summarized: High Content
Screening (HTS) for efficient drug discovery for peroxisome biogenesis disorders (PBD),
mitochondria segmentation from confocal microscopy imaging system, and objective
analysis of capsulorhexis surgical skills. Specifically, our work proposes learning-based
pipelines for all three topics, cooperating with the image processing techniques and
aiming at full automation in processing. A brief summary of our research is proposed in
Section 6.1 and in Section 6.2 the future research plan would be discussed.
6.1 Conclusion
In the first topic, two pipelines on high throughout screening of peroxisome assembly
rescue in order to realize drug discovery targeted at peroxisome proliferator are
presented.
At first the development “Peroxitracker” pipeline is represented, which screens large
chemical libraries for compounds by applying image-level and feature-level enhancing
and smoothing methods. It is shown that, compared to the improved CP/CPA software
pipeline, Peroxitracker shows more sensitivity and reliability in detecting recovery of
peroxisome assembly in PBD patient cell. It achieves a Z’ factor of 0.72 when analyzing
the same cell samples, showing how widely the distributions of the scores of the positive
and negative controls that enhance peroxisome assembly are separated. In addition, its
performance is the closest to human observation compared to other provided pipelines.
110
For the CP/CPA software pipeline, we apply mean shift for illumination correction
of digital images and local adaptive thresholding for the extraction of peroxisome
punctate structures. Post processing is designed to select images that provide credible
information. By applying machine learning, we can automatically classify the cells into
four types: positive, negative, questionable, and non-candidate cells. This pipeline
achieved a Z’ factor of 0.44, which basically confirms the pipeline as a validated assay
for real drug screening. The work could be improved in two directions: 1) set up the
modeling of the properties and features of the questionable cells to improve the machine
learning and classification; and 2) to figure out more efficient criteria to automate
high-level post processing of image filtering in order to make the assay more robust.
In addition to the HTS topic, mitochondria segmentation is realized in a 2-stage
system coupled with principles from machine learning and global optimization
algorithms. The performance is improved by applying a grouping method to classify the
input image data, by using the contrast ratio of global intensity variation and local
intensity variation in Stage I, and by connecting the centerline fragments connection in
Stage II. In the rest of this work, the standard Haralick features of every pixel in every
sample data are calculated and based on the classifier learned from a logistic regression
model by using Leave-One-Out cross validation, the predicted data is the probability for a
pixel being foreground. While the overall segmentation system achieves an accuracy of
98%, there is still room for improvements in the system.
For the objective assessment system of surgical techniques, the research motivation
is to measure surgical techniques accurately and objectively. Combining image analysis
and computer vision techniques provides an objective way to measure proficiency in the
capsulorhexis part of cataract surgery, and may provide reliable tools to accelerate the
learning of surgical skills for residents during their early careers. This work explores new
111
applications and improved techniques for computer learning and data understanding. A
difficulty is that video data processing requires advanced robustness in proposed
algorithms for temporal consistency. The video quality also limits the feature extraction
for learning model. The challenging cases are overcome by coupling a geometrical
transformation model to the learning model. It is expected that in the future, the
assessment system could provide objective grades by doing fusion for all inter-system
decisions.
6.2 Future Research Directions
Currently, the research work on cataract surgical techniques evaluation is still on the way.
In the proposed a 3-stage assessment system, we are currently in stage II and future work
should focus on improving instrument identification and movement recognition. One
possible solution is to add the knowledge of temporal consistency for rigid instrument
and possible movement area. Discussed in Chapter 2, we have claimed that one of the
significant metric for capsulorhexis is the geometrical property related to capsule opening.
Actually, the current evaluation tools are asking questions on it directly but the
description of related grades owns individual bias in understanding for each rater.
Therefore, it would be extremely helpful if the program can precisely and automatically
identify and measure the roundness of capsular opening, which is even difficult to tell by
a human observer. Moreover, the trace of instrument tip does not definitely indicate the
contour of opening!
As for Stage III, it requires the classification and quantification and instrument
movements, as well as counting the number of attempt and successful attempts. This is
more challenging since the definition of “successful” attempts may require expertise in
knowledge and practical experience in cataract surgery. Attention should also be paid to
112
cases of unexpected opening such as “Argentinian Flag”, which is a common
complication in capsulorhexis. It occurs when a radial tear of the capsule extends into the
periphery such that the appearance of the capsule stained in blue with a white cataract
shows a color arrangement of parallel blue and white strips as in the Argentinian flag.
This is the goal of task 4, aimed at investigating the force/angular/position
management of opening during the during cataract surgery. A radial tear represented in
the Argentinian Flag sign is definitely unwanted. And this kind of operation should also
be predicted and reported by the system, which means the analysis of errors in cataract
surgery should also be learned carefully.
Our future work will include focused efforts for better prediction of instrument and
robustness testing of algorithms. Our goal is to track instruments such that their
movement is classified and quantified, and evaluating the quality of the capsular opening
numerically and scientifically. It is hoped that in future our work will contribute
significantly in teaching interventions of cataract surgical techniques.
113
Bibliography
[1] H. K. Huang. Biomedical image processing. Crit. Rev. Bioeng. 5(3), pp. 185-271. 1981.
[2] A. P. Dhawan. A review on biomedical image processing and future trends. Comput.
Methods Programs Biomed. 31(3), pp. 141-183. 1990.
[3] J. Ruiz-Alzola, C. Alberola-López and C. Westin. Advanced signal processing methods for
biomedical imaging. International Journal of Biomedical Imaging 20132013.
[4] M. N. Wernick, Y. Yang, J. G. Brankov, G. Yourganov and S. C. Strother. Machine
learning in medical imaging. Signal Processing Magazine, IEEE 27(4), pp. 25-38. 2010.
[5] O. Lézoray, C. Charrier, H. Cardot and S. Lefèvre. Machine learning in image processing.
EURASIP Journal on Advances in Signal Processing 2008(1), pp. 927950. 2008.
[6] L. De Raedt. A perspective on inductive databases. ACM SIGKDD Explorations Newsletter
4(2), pp. 69-77. 2002.
[7] C. Helma. Predictive Toxicology 2005.
[8] U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy. Advances in
knowledge discovery and data mining. 1996.
[9] T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Commun
ACM 39(11), pp. 58-64. 1996.
[10] T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman and R. Tibshirani. The
Elements of Statistical Learning 20092(1).
[11] M. E. Tipping. Sparse bayesian learning and the relevance vector machine. The Journal of
Machine Learning Research 1pp. 211-244. 2001.
[12] R. Spinks, V. A. Magnotta, N. C. Andreasen, K. C. Albright, S. Ziebell, P. Nopoulos and
M. Cassell. Manual and automated measurement of the whole thalamus and mediodorsal
nucleus using magnetic resonance imaging. Neuroimage 17(2), pp. 631-642. 2002.
[13] A. Wismüller, F. Vietze, J. Behrends, A. Meyer-Baese, M. Reiser and H. Ritter. Fully
automated biomedical image segmentation by self-organized model adaptation. Neural
Networks 17(8), pp. 1327-1344. 2004.
[14] Z. Lao, D. Shen, Z. Xue, B. Karacali, S. M. Resnick and C. Davatzikos. Morphological
classification of brains via high-dimensional shape transformations and machine learning
methods. Neuroimage 21(1), pp. 46-57. 2004.
[15] P. Khurd, R. Verma and C. Davatzikos. On characterizing and analyzing diffusion tensor
images by learning their underlying manifold structure. Presented at Computer Vision and
Pattern Recognition Workshop, 2006. CVPRW'06. Conference on. 2006.
114
[16] C. Davatzikos, K. Ruparel, Y. Fan, D. Shen, M. Acharyya, J. Loughead, R. Gur and D. D.
Langleben. Classifying spatial patterns of brain activity with machine learning methods:
Application to lie detection. Neuroimage 28(3), pp. 663-668. 2005.
[17] C. D. Good, I. S. Johnsrude, J. Ashburner, R. N. Henson, K. Fristen and R. S. Frackowiak.
A voxel-based morphometric study of ageing in 465 normal adult human brains. Presented
at Biomedical Imaging, 2002. 5th IEEE EMBS International Summer School on. 2002.
[18] F. A. Gonzalez, E. Romero and I. Global. Biomedical Image Analysis and Machine
Learning Technologies: Applications and Techniques 2010.
[19] J. M. Lehmann, L. B. Moore, T. A. Smith-Oliver, W. O. Wilkison, T. M. Willson and S. A.
Kliewer. An antidiabetic thiazolidinedione is a high affinity ligand for peroxisome
proliferator-activated receptor gamma (PPAR gamma). J. Biol. Chem. 270(22), pp.
12953-12956. 1995.
[20] W. C. Hubbard, A. B. Moser, S. Tortorelli, A. Liu, D. Jones and H. Moser. Combined
liquid chromatography–tandem mass spectrometry as an analytical method for high
throughput screening for X-linked adrenoleukodystrophy and other peroxisomal disorders:
Preliminary findings. Mol. Genet. Metab. 89(1), pp. 185-187. 2006.
[21] S. Weller, S. J. Gould and D. Valle. Peroxisome biogenesis disorders. Annual Review of
Genomics and Human Genetics 4(1), pp. 165-211. 2003.
[22] J. Z. Sexton, Q. He, L. J. Forsberg and J. E. Brenman. High content screening for
non-classical peroxisome proliferators. Int. J. High. Throughput Screen. 2010(1), pp.
127-140. 2010. . DOI: 10.2147/IJHTS.S10547 [doi].
[23] P. Newsholme, E. P. Haber, S. M. Hirabara, E. L. Rebelato, J. Procopio, D. Morgan, H. C.
Oliveira-Emilio, A. R. Carpinelli and R. Curi. Diabetes associated cell stress and
dysfunction: Role of mitochondrial and non-mitochondrial ROS production and activity. J.
Physiol. 583(Pt 1), pp. 9-24. 2007. . DOI: jphysiol.2007.135871 [pii].
[24] C. Bonnard, A. Durand, S. Peyrol, E. Chanseaume, M. A. Chauvin, B. Morio, H. Vidal and
J. Rieusset. Mitochondrial dysfunction results from oxidative stress in the skeletal muscle
of diet-induced insulin-resistant mice. J. Clin. Invest. 118(2), pp. 789-800. 2008. . DOI:
10.1172/JCI32601 [doi].
[25] H. F. Jheng, P. J. Tsai, S. M. Guo, L. H. Kuo, C. S. Chang, I. J. Su, C. R. Chang and Y. S.
Tsai. Mitochondrial fission contributes to mitochondrial dysfunction and insulin resistance
in skeletal muscle. Mol. Cell. Biol. 32(2), pp. 309-319. 2012. . DOI:
10.1128/MCB.05603-11 [doi].
[26] C. Brons, S. Jacobsen, N. Hiscock, A. White, E. Nilsson, D. Dunger, A. Astrup, B.
Quistorff and A. Vaag. Effects of high-fat overfeeding on mitochondrial function, glucose
and fat metabolism, and adipokine levels in low-birth-weight subjects. Am. J. Physiol.
Endocrinol. Metab. 302(1), pp. E43-51. 2012. . DOI: 10.1152/ajpendo.00095.2011 [doi].
115
[27] Y. Reis, M. Bernardo-Faura, D. Richter, T. Wolf, B. Brors, A. Hamacher-Brady, R. Eils
and N. R. Brady. Multi-parametric analysis and modeling of relationships between
mitochondrial morphology and apoptosis. PloS One 7(1), pp. e28694. 2012.
[28] R. J. Giuly, M. E. Martone and M. H. Ellisman. Method: Automatic segmentation of
mitochondria utilizing patch classification, contour pair classification, and automatically
seeded level sets. BMC Bioinformatics 13pp. 29-2105-13-29. 2012. . DOI:
10.1186/1471-2105-13-29 [doi].
[29] E. Mumcuoglu, R. Hassanpour, S. Tasel, G. Perkins, M. Martone and M. Gurcan.
Computerized detection and segmentation of mitochondria on electron microscope images.
J. Microsc. 246(3), pp. 248-265. 2012.
[30] A. Lucchi, K. Smith, R. Achanta, G. Knott and P. Fua. Supervoxel-based segmentation of
mitochondria in em image stacks with learned shape features. Medical Imaging, IEEE
Transactions on 31(2), pp. 474-486. 2012.
[31] S. L. Cremers, A. N. Lora and Z. K. Ferrufino-Ponce. Global rating assessment of skills in
intraocular surgery (GRASIS). Ophthalmology 112(10), pp. 1655-1660. 2005.
[32] K. C. Golnik, H. Beaver, V. Gauba, A. G. Lee, E. Mayorga, G. Palis and G. M. Saleh.
Cataract surgical skill assessment. Ophthalmology 118(2), pp. 427-427. e5. 2011.
[33] R. J. Smith, C. A. McCannel, L. K. Gordon, D. A. Hollander, J. A. Giaconi, S. K. Stelzner,
U. Devgan, J. Bartlett and B. J. Mondino. Evaluating teaching methods of cataract surgery:
Validation of an evaluation tool for assessing surgical technique of capsulorhexis. Journal
of Cataract & Refractive Surgery 38(5), pp. 799-806. 2012.
[34] S. J. Lurie, C. J. Mooney and J. M. Lyness. Measurement of the general competencies of
the accreditation council for graduate medical education: A systematic review. Acad. Med.
84(3), pp. 301-309. 2009. . DOI: 10.1097/ACM.0b013e3181971f08 [doi].
[35] M. Sonka, V. Hlavac and R. Boyle. Image processing, analysis, and machine vision. 1999.
Champion & Hall pp. 2-6. 1998.
[36] G. Baselli, E. Caiani, A. Porta, N. Montana, M. G. Signorini and S. Cerutti. Biomedical
signal processing and modeling in cardiovascular systems. Critical Reviews™ in
Biomedical Engineering 30(1-3), 2002.
[37] S. Cerutti, G. Baselli, A. M. Bianchi, E. Caiani, D. Contini, R. Cubeddu, F. Dercole, L. Di
Rienzo, D. Liberati and L. Mainardi. Biomedical signal and image processing. Pulse, IEEE
2(3), pp. 41-54. 2011.
[38] U. Sinha, A. Bui, R. Taira, J. Dionisio, C. Morioka, D. Johnson and H. Kangarloo. A
review of medical imaging informatics. Ann. N. Y. Acad. Sci. 980(1), pp. 168-197. 2002.
116
[39] G. Vernazza, S. B. Serpico and S. G. Dellepiane. A knowledge-based system for
biomedical image processing and recognition. Circuits and Systems, IEEE Transactions on
34(11), pp. 1399-1416. 1987.
[40] P. Sajda. Machine learning for detection and diagnosis of disease. Annu. Rev. Biomed. Eng.
8pp. 537-565. 2006.
[41] R. P. Hertzberg and A. J. Pope. High-throughput screening: New technology for the 21st
century. Curr. Opin. Chem. Biol. 4(4), pp. 445-451. 2000.
[42] S. A. Sundberg. High-throughput and ultra-high-throughput screening: Solution-and
cell-based approaches. Curr. Opin. Biotechnol. 11(1), pp. 47-53. 2000.
[43] E. A. Martis and R. R. Somani. Drug designing, discovery and development techniques.
Edited by Purusotam Basnet pp. 19. 2012.
[44] S. Becker, H. Schmoldt, T. M. Adams, S. Wilhelm and H. Kolmar. Ultra-high-throughput
screening based on cell-surface display and fluorescence-activated cell sorting for the
identification of novel biocatalysts. Curr. Opin. Biotechnol. 15(4), pp. 323-329. 2004.
[45] L. Mere, T. Bennett, P. Coassin, P. England, B. Hamman, T. Rink, S. Zimmerman and P.
Negulescu. Miniaturized FRET assays and microfluidics: Key components for
ultra-high-throughput screening. Drug Discov. Today 4(8), pp. 363-369. 1999.
[46] J. Khandurina and A. Guttman. Microchip-based high-throughput screening analysis of
combinatorial libraries. Curr. Opin. Chem. Biol. 6(3), pp. 359-366. 2002.
[47] A. Carnero. High throughput screening in drug discovery. Clinical and Translational
Oncology 8(7), pp. 482-490. 2006.
[48] R. Macarron, M. N. Banks, D. Bojanic, D. J. Burns, D. A. Cirovic, T. Garyantes, D. V.
Green, R. P. Hertzberg, W. P. Janzen and J. W. Paslay. Impact of high-throughput
screening in biomedical research. Nature Reviews Drug Discovery 10(3), pp. 188-195.
2011.
[49] P. K. Dranchak, E. Di Pietro, A. Snowden, N. Oesch, N. E. Braverman, S. J. Steinberg and
J. G. Hacia. Nonsense suppressor therapies rescue peroxisome lipid metabolism and
assembly in cells from patients with specific PEX gene mutations. J. Cell. Biochem. 112(5),
pp. 1250-1258. 2011.
[50] P. K. Sahoo, S. Soltani and A. K. Wong. A survey of thresholding techniques. Computer
Vision, Graphics, and Image Processing 41(2), pp. 233-260. 1988.
[51] P. Gibbs, D. Buckley, S. Blackband and A. Horsman. Tumour volume determination from
MR images by morphological segmentation. Phys. Med. Biol. 41(11), pp. 2437. 1996.
[52] S. Pohlman, K. A. Powell, N. A. Obuchowski, W. A. Chilcote and S. Grundfest‐
Broniatowski. Quantitative classification of breast tumors in digitized mammograms. Med.
Phys. 23(8), pp. 1337-1345. 1996.
117
[53] I. Manousakas, P. Undrill, G. Cameron and T. Redpath. Split-and-merge segmentation of
magnetic resonance medical images: Performance evaluation and extension to three
dimensions. Computers and Biomedical Research 31(6), pp. 393-412. 1998.
[54] J. Bezdek, L. Hall and L. Clarke. Review of MR image segmentation techniques using
pattern recognition. Med. Phys. 20(4), pp. 4. 1993.
[55] G. B. Coleman and H. C. Andrews. Image segmentation by clustering. Proc IEEE 67(5),
pp. 773-785. 1979.
[56] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data 1988.
[57] J. Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical
Society.Series B (Methodological) pp. 259-302. 1986.
[58] E. Bardinet, L. D. Cohen and N. Ayache. A parametric deformable model to fit
unstructured 3D data. Comput. Vision Image Understanding 71(1), pp. 39-54. 1998.
[59] A. Neumann and C. Lorenz. Statistical shape model based segmentation of medical images.
Comput. Med. Imaging Graphics 22(2), pp. 133-143. 1998.
[60] Y. Tsujimoto, T. Nakagawa and S. Shimizu. Mitochondrial membrane permeability
transition and cell death. Biochimica Et Biophysica Acta (BBA)-Bioenergetics 1757(9), pp.
1297-1300. 2006.
[61] J. Henry-Mowatt, C. Dive, J. Martinou and D. James. Role of mitochondrial membrane
permeabilization in apoptosis and cancer. Oncogene 23(16), pp. 2850-2860. 2004.
[62] J. Bereiter‐Hahn and M. Vöth. Dynamics of mitochondria in living cells: Shape changes,
dislocations, fusion, and fission of mitochondria. Microsc. Res. Tech. 27(3), pp. 198-219.
1994.
[63] A. Lucchi, K. Smith, R. Achanta, G. Knott and P. Fua. Supervoxel-Based Segmentation of
Em Image Stacks with Learned Shape Features 2010.
[64] I. E. Sampe, H. Dann, Y. Tsai and C. Lin. Segmentation of mitochondria in fluorescence
micrographs by SVM. Presented at Biomedical Engineering and Informatics (BMEI), 2011
4th International Conference on. 2011.
[65] J. Peng, C. Lin and C. Hsu. Adaptive image enhancement for fluorescence microscopy.
Presented at Technologies and Applications of Artificial Intelligence (TAAI), 2010
International Conference on. 2010.
[66] D. Allen and A. Vasavada. Cataract and surgery for cataract. Bmj 333(7559), pp. 128-132.
2006. . DOI: 333/7559/128 [pii].
[67] M. Gillies, G. Brian, J. La Nauze, R. Le Mesurier, D. Moran, H. Taylor and S. Ruit.
Modern surgery for global cataract blindness: Preliminary considerations. Arch.
Ophthalmol. 116(1), pp. 90-92. 1998.
118
[68] L. Civerchia, R. D. Ravindran, S. W. Apoorvananda, R. Ramakrishnan, A. Balent, M. H.
Spencer and D. Green. High-volume intraocular lens surgery in a rural eye camp in india.
Ophthalmic Surg. Lasers 27(3), pp. 200-208. 1996.
[69] M. Colvard, Ed., Achieving Excellence in Cataract Surgery: A Step-by-Step Approach.
2009.
[70] H. V. Gimbel. Divide and conquer nucleofractis phacoemulsification: Development and
variations. Journal of Cataract & Refractive Surgery 17(3), pp. 281-291. 1991.
[71] R. J. Wanders. Peroxisomes, lipid metabolism, and human disease. Cell Biochem. Biophys.
32(1-3), pp. 89-106. 2000.
[72] R. Wanders. Peroxisomes, lipid metabolism, and peroxisomal disorders. Mol. Genet. Metab.
83(1), pp. 16-27. 2004.
[73] J. M. Powers and H. W. Moser. Peroxisomal disorders: Genotype, phenotype, major
neuropathologic lesions, and pathogenesis. Brain Pathology 8(1), pp. 101-120. 1998.
[74] Y. Fujiki, K. Okumoto, H. Otera and S. Tamura. Peroxisome biogenesis and molecular
defects in peroxisome assembly disorders. Cell Biochem. Biophys. 32(1-3), pp. 155-164.
2000.
[75] S. Steinberg, L. Chen, L. Wei, A. Moser, H. Moser, G. Cutting and N. Braverman. The
PEX gene screen: Molecular diagnosis of peroxisome biogenesis disorders in the zellweger
syndrome spectrum. Mol. Genet. Metab. 83(3), pp. 252-263. 2004.
[76] G. V. Raymond, R. O. Jones and A. B. Moser. Newborn screening for
adrenoleukodystrophy. Molecular Diagnosis & Therapy 11(6), pp. 381-384. 2007.
[77] W. C. Hubbard, A. B. Moser, A. C. Liu, R. O. Jones, S. J. Steinberg, F. Lorey, S. R. Panny,
R. F. Vogt Jr, D. Macaya and C. T. Turgeon. Newborn screening for X-linked
adrenoleukodystrophy (X-ALD): Validation of a combined liquid chromatography–tandem
mass spectrometric (LC–MS/MS) method. Mol. Genet. Metab. 97(3), pp. 212-220. 2009.
[78] J. Schaller, H. Moser, M. L. Begleiter and J. Edwards. Attitudes of families affected by
adrenoleukodystrophy toward prenatal diagnosis, presymptomatic and carrier testing, and
newborn screening. Genet. Test. 11(3), pp. 296-302. 2007.
[79] R. Zhang, L. Chen, S. Jiralerspong, A. Snowden, S. Steinberg and N. Braverman. Recovery
of PEX1-Gly843Asp peroxisome dysfunction by small-molecule compounds. Proc. Natl.
Acad. Sci. U. S. A. 107(12), pp. 5569-5574. 2010. . DOI: 10.1073/pnas.0914960107 [doi].
[80] T. R. Jones, I. H. Kang, D. B. Wheeler, R. A. Lindquist, A. Papallo, D. M. Sabatini, P.
Golland and A. E. Carpenter. CellProfiler analyst: Data exploration and analysis software
for complex image-based screens. BMC Bioinformatics 9pp. 482-2105-9-482. 2008. . DOI:
10.1186/1471-2105-9-482 [doi].
119
[81] C. A. Schneider, W. S. Rasband, K. W. Eliceiri, J. Schindelin, I. Arganda-Carreras, E.
Frise, V. Kaynig, M. Longair, T. Pietzsch and S. Preibisch. 671 nih image to imageJ: 25
years of image analysis. Nature Methods 9(7), 2012.
[82] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis.
Pattern Analysis and Machine Intelligence, IEEE Transactions on 24(5), pp. 603-619.
2002.
[83] Y. Cheng. Mean shift, mode seeking, and clustering. Pattern Analysis and Machine
Intelligence, IEEE Transactions on 17(8), pp. 790-799. 1995.
[84] D. K. Singh, C. Ku, C. Wichaidit, R. J. Steininger, L. F. Wu and S. J. Altschuler. Patterns
of basal signaling heterogeneity can distinguish cellular populations with different drug
sensitivities. Molecular Systems Biology 6(1), 2010.
[85] J. Peng, C. Lin, Y. Chen, L. Kao, Y. Liu, C. Chou, Y. Huang, F. Chang, Y. Wu and Y. Tsai.
Automatic morphological subtyping reveals new roles of caspases in mitochondrial
dynamics. PLoS Computational Biology 7(10), pp. e1002212. 2011.
[86] D. S. Bright and E. B. Steel. Two‐dimensional top hat filter for extracting spots and spheres
from digital images. J. Microsc. 146(2), pp. 191-200. 1987.
[87] L. O'Gorman, M. J. Sammon and M. Seul. Practical Algorithms for Image Analysis with
CD-ROM 2008.
[88] Y. S. Lin, C. C. Lin, Y. S. Tsai, T. C. Ku, Y. H. Huang and C. N. Hsu. A spectral graph
theoretic approach to quantification and calibration of collective morphological differences
in cell images. Bioinformatics 26(12), pp. i29-37. 2010. . DOI:
10.1093/bioinformatics/btq194 [doi].
[89] D. S. Hochbaum, C. N. Hsu and Y. T. Yang. Ranking of multidimensional drug profiling
data by fractional-adjusted bi-partitional scores. Bioinformatics 28(12), pp. i106-14. 2012. .
DOI: 10.1093/bioinformatics/bts232 [doi].
[90] J. Friedman, T. Hastie and R. Tibshirani. Additive logistic regression: A statistical view of
boosting (with discussion and a rejoinder by the authors). The Annals of Statistics 28(2), pp.
337-407. 2000.
[91] J. H. Zhang, T. D. Chung and K. R. Oldenburg. A simple statistical parameter for use in
evaluation and validation of high throughput screening assays. J. Biomol. Screen. 4(2), pp.
67-73. 1999.
[92] S. G. Regmi, S. G. Rolland and B. Conradt. Age-dependent changes in mitochondrial
morphology and volume are not predictors of lifespan. Aging (Albany NY) 6(2), pp.
118-130. 2014. . DOI: 100639 [pii].
[93] V. Gauba, P. Tsangaris, C. Tossounis, A. Mitra, C. McLean and G. M. Saleh. Human
reliability analysis of cataract surgery. Arch. Ophthalmol. 126(2), pp. 173-177. 2008.
120
[94] T. A. Oetting. Surgical competency in residents. Curr. Opin. Ophthalmol. 20(1), pp. 56-60.
2009.
[95] T. A. Oetting, A. G. Lee, H. A. Beaver, A. T. Johnson, H. C. Boldt, R. Olson and K. Carter.
Teaching and assessing surgical competency in ophthalmology training programs.
Ophthalmic Surg. Lasers Imaging 37(5), pp. 384-393. 2006.
[96] S. Puri and S. Sikder. Cataract surgical skill assessment tools. Journal of Cataract &
Refractive Surgery 40(4), pp. 657-665. 2014.
[97] K. Dabov, A. Foi, V. Katkovnik and K. Egiazarian. Image denoising with block-matching
and 3D filtering. Presented at Electronic Imaging 2006. 2006.
[98] T. Brox, A. Bruhn, N. Papenberg and J. Weickert. High accuracy optical flow estimation
based on a theory for warping. In Computer Vision-ECCV 2004, pp. 25-36, 2004.
[99] J. Barron and N. Thacker. Tutorial: Computing 2D and 3D optical flow. Imaging Science
and Biomedical Engineering Division, Medical School, University of Manchester 2005.
[100] A. Bruhn, J. Weickert and C. Schnörr. Lucas/Kanade meets Horn/Schunck: Combining
local and global optic flow methods. International Journal of Computer Vision 61(3), pp.
211-231. 2005.
[101] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. J. Black and R. Szeliski. A database and
evaluation methodology for optical flow. International Journal of Computer Vision 92(1),
pp. 1-31. 2011.
[102] S. N. Vitaladevuni and R. Basri. Co-clustering of image segments using convex
optimization applied to EM neuronal reconstruction. Presented at Computer Vision and
Pattern Recognition (CVPR), 2010 IEEE Conference on. 2010.
[103] K. Smith, A. Carleton and V. Lepetit. Fast ray features for learning irregular shapes.
Presented at Computer Vision, 2009 IEEE 12th International Conference on. 2009.
[104] M. Sezgin. Survey over image thresholding techniques and quantitative performance
evaluation. Journal of Electronic Imaging 13(1), pp. 146-168. 2004.
[105] M. Kass, A. Witkin and D. Terzopoulos. Snakes: Active contour models. International
Journal of Computer Vision 1(4), pp. 321-331. 1988.
[106] A. Artusi, F. Banterle and D. Chetverikov. A survey of specularity removal methods.
Presented at Computer Graphics Forum. 2011.
[107] C. Rother, V. Kolmogorov and A. Blake. Grabcut: Interactive foreground extraction using
iterated graph cuts. Presented at ACM Transactions on Graphics (TOG). 2004.
[108] D. S. Hochbaum. Polynomial time algorithms for ratio regions and a variant of normalized
cut. Pattern Analysis and Machine Intelligence, IEEE Transactions on 32(5), pp. 889-898.
2010.
121
[109] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T.
Kadir and L. Van Gool. A comparison of affine region detectors. International Journal of
Computer Vision 65(1-2), pp. 43-72. 2005.
[110] Available: http://www.ophthobook.com/videos/cartoon-cataract-surgery-video.
[111] Available: http://scikit-learn.org/stable/auto_examples/svm/plot_weighted_samples.html.
[112] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. Pattern
Analysis and Machine Intelligence, IEEE Transactions on 27(10), pp. 1615-1630. 2005.
[113] D. G. Lowe. Object recognition from local scale-invariant features. Presented at Computer
Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. 1999.
Abstract (if available)
Abstract
During the last decade advances in biomedical and information technologies have increased the requirements for objective and automated approaches to analyze large‐scale biomedical image data using methods from image processing and computer vision. To explore the data and determine meaningful conclusions, image analysis methods incorporating advanced machine learning algorithms are expected to be beneficial. The primary purpose is to emphasize machine learning potentials for biomedical image processing. It focuses on three particular topics: 1) high‐throughput screening (HTS) of a chemical compound library aimed at drug discovery, 2) mitochondria segmentation for morphological subtype quantification, and 3) objective analysis of surgical skills for the capsulorhexis procedure in cataract surgery. We present specific approaches to these problems. ❧ In the first part, our contribution lies in the development of a pipeline algorithm that analyzes microscopic fluorescent cell images in HTS, which screens large chemical libraries for compounds that enhance peroxisome assembly in cells from patients having peroxisome biogenesis disorders (PBDs). The challenge mainly lies in how to accurately detect the peroxisome shown in punctate structures, which indicates the degree of successful peroxisome assembly due to a specific treatment. Ideally, in PBD cells peroxisomes that are completely rescued will present as clearly discernible punctate structures
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Video object segmentation and tracking with deep learning techniques
PDF
Advanced machine learning techniques for video, social and biomedical data analytics
PDF
Effective graph representation and vertex classification with machine learning techniques
PDF
Techniques for vanishing point detection
PDF
Machine learning techniques for outdoor and indoor layout estimation
PDF
Learning shared subspaces across multiple views and modalities
PDF
Modeling, learning, and leveraging similarity
PDF
Advanced techniques for stereoscopic image rectification and quality assessment
PDF
Object localization with deep learning techniques
PDF
Learning logical abstractions from sequential data
PDF
Machine learning and image processing of fluorescence lifetime imaging microscopy enables tracking and analysis of subcellular metabolism
PDF
Efficient machine learning techniques for low- and high-dimensional data sources
PDF
A learning‐based approach to image quality assessment
PDF
Generating gestures from speech for virtual humans using machine learning approaches
PDF
Advanced coronary CT angiography image processing techniques
PDF
Syntactic alignment models for large-scale statistical machine translation
PDF
Advanced visual processing techniques for latent fingerprint detection and video retargeting
PDF
Word, sentence and knowledge graph embedding techniques: theory and performance evaluation
PDF
Tensor learning for large-scale spatiotemporal analysis
PDF
Formal analysis of data poisoning robustness of K-nearest neighbors
Asset Metadata
Creator
Wang, Xue
(author)
Core Title
Machine learning based techniques for biomedical image/video analysis
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Electrical Engineering
Publication Date
07/27/2015
Defense Date
06/12/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
biomedical information processing,computer vision,data understanding,image analysis,machine learning,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Sawchuk, Alexander A. (Sandy) (
committee chair
), Yen, Jesse T. (
committee member
)
Creator Email
wangxuehust@gmail.com,xuew@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-450457
Unique identifier
UC11287576
Identifier
etd-WangXue-2750.pdf (filename),usctheses-c3-450457 (legacy record id)
Legacy Identifier
etd-WangXue-2750-0.pdf
Dmrecord
450457
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Wang, Xue
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
biomedical information processing
computer vision
data understanding
image analysis
machine learning