Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Registration and 3-D reconstruction of a retinal fundus from a fluorescein images sequence
(USC Thesis Other)
Registration and 3-D reconstruction of a retinal fundus from a fluorescein images sequence
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
REGISTRATION AND 3-D RECONSTRUCTION OF A RETINAL
FUNDUS FROM A FLUORESCEIN IMAGES SEQUENCE
by
Tae Eun Choe
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
August 2007
Copyright 2007 Tae Eun Choe
ii
Acknowledgements
I am deeply grateful to my advisor Prof. Gérard G. Medioni for his guidance
and support in this work. He has brought in thoughtful insight and invalu-
able knowledge to different aspects of the problem addressed in this work
and in computer vision in general; without him this thesis would have been
impossible. I would also like to thank Prof. Ramakant Nevatia, Prof. Isaac
Cohen, Prof. Itti Laurent, and Prof. SriniVas R. Sadda for their guidance,
interaction, and participation in my qualifying examination and dissertation
defense. I also appreciate support and cooperation with Dr. Alexander C.
Walsh and Paul G. Updike in the Doheny Eye Institute.
I am also thankful to members of the Institute for Robotics and Intelli-
gent Systems (IRIS) for their assistance and support, especially Xuefeng
Song, Munwai Lee, and Li Zhang for the enjoyable years of sharing the
same office.
Finally, special thanks to my wife Emiko for her invaluable help and
support in every way and to my lovely son Jun for giving me a peace of
mind.
iii
Table of Contents
Acknowledgements ........................................................................................ii
List of Tables..................................................................................................v
List of Figures ...............................................................................................vi
Abbreviations ................................................................................................xi
Abstract ........................................................................................................xii
Chapter 1 Introduction ................................................................................1
1.1 Background...................................................................................1
1.2 Goals..............................................................................................3
1.3 Issues and Difficulties ...................................................................4
1.3.1 Issues in 2-D Registration ..............................................4
1.3.2 Issues in 3-D Reconstruction........................................10
1.4 Overview of approach .................................................................12
1.5 Contributions...............................................................................12
1.6 Thesis overview...........................................................................13
Chapter 2 2-D Registration.......................................................................14
2.1 Previous work..............................................................................16
2.1.1 Feature Extraction........................................................16
2.1.2 Matching Algorithm.....................................................19
2.1.3 Global Registration......................................................20
2.2 Estimating Initial Y-feature Positions .........................................20
2.2.1 Directional Filtering.....................................................21
2.2.2 Principal Components Analysis ...................................22
2.3 Fitting an Articulated Y-feature Model.......................................23
2.3.1 Articulated Model for Y-features.................................23
2.3.2 Initialization of the Y-feature Shape ............................24
2.3.3 Fitting Y-features.........................................................26
2.4 Matching Y-features....................................................................31
2.4.1 Mutual Information......................................................31
2.4.2 Matching Y-features using Mutual Information ..........32
iv
2.5 Global Registration......................................................................34
2.5.1 Pairwise Registration...................................................35
2.5.2 Selection of the Reference Frame ................................38
2.5.3 Optimal Registration....................................................39
2.6 Experimental Results...................................................................41
Chapter 3 3-D Shape Inference.................................................................47
3.1 Previous work..............................................................................50
3.1.1 Epipolar Geometry Estimation.....................................51
3.1.2 Dense Disparity Map....................................................52
3.2 Estimation of Epipolar Geometry................................................53
3.2.1 Homography Estimation..............................................53
3.2.2 Fundamental Matrix Estimation...................................53
3.3 Stereo Matching and 3-D Surface Estimation.............................58
3.3.1 Estimating the Search Space ........................................58
3.3.2 Dense Disparity Map using Mutual Information .........61
3.4 Experimental Results...................................................................63
3.4.1 3-D shape of retinal fundus images..............................63
3.4.2 Evaluation of the stereo matching................................66
Chapter 4 3-D Reconstruction and Registration .......................................70
4.1 Introduction.................................................................................74
4.2 2-D Global Registration and 3-D Shape Reconstruction ............77
4.3 3-D Euclidean Reconstruction and Registration .........................79
4.3.1 3-D Euclidean Reconstruction .....................................79
4.3.2 Registration by Back-projection...................................86
4.4 Definition of Near-Planarity........................................................88
4.4.1 Synthetic Retinal Images..............................................88
4.4.2 Definition of Near-Planar Surface................................96
4.5 Experimental Results...................................................................99
4.5.1 Fluorescein Images.....................................................100
4.5.2 Evaluation of Reconstruction using OCT data...........103
4.5.3 Other Near-planar surface images - Façade Images...106
4.5.4 Comparison with Other Methods...............................109
Chapter 5 Conclusion .............................................................................111
References ..................................................................................................114
v
List of Tables
Table 1. The categories of this dissertation for 2-D registration.................. 10
Table 2. Comparison of average geometric pixel errors using: the
sequential registration of corners, the sequential registration of
extracted Y-features, and the global registration using Y-features. ..... 46
Table 3. Disparity Errors of different stereo algorithms. For SSD, DP,
SO, and GC, parameters with the best performance for each image
are selected. For NCC and MI, the parameters are fixed across
images................................................................................................... 68
Table 4. Comparison of average geometric pixel errors of retinal fundus
images using the 2-D global registration and the registration by
back-projecting to 3-D structure. For each set of retinal images,
parallax ranges are shown. ................................................................. 102
Table 5. 3-D registration errors of 3-D reconstructed structure and 3-D
OCT data in pixel metric.................................................................... 104
Table 6. Comparison of average geometric pixel errors of façade images
............................................................................................................ 109
vi
List of Figures
Figure 1. A retinal fundus camera.................................................................. 2
Figure 2. Color and fluorescein images of a retina ........................................ 5
Figure 3. Features in a retinal image, Optic disc(green), blood
vessels(red), and Y-features(magenta) ................................................... 6
Figure 4. Overall flowchart of 2-D registration of color and fluorescein
images................................................................................................... 15
Figure 5. 6 Directional LoG Filters defined using 16 =
x
and 4 =
y
...... 22
Figure 6. Initial Y-feature position in two images of different
modalities. Only the best 100 features are selected.............................. 23
Figure 7. Articulated model of a Y-feature .................................................. 24
Figure 8. Initial estimation of a bright or dark vessel around the circular
boundary. The red cross shows local peaks, and the green cross
shows the local valleys characterizing respectively bright and dark
vessels. The yellow dot in the middle shows the center position. In
(a)-(c) the seed locations detected for bright and dark vessels, (d)-
(e) depict the corresponding initialization of the Y-feature model. ..... 26
Figure 9. The barrier functions with lower and upper boundary w.r.t. to
s. When the value x is closer to the lower or upper bounds, the
function penalizes x by associating a large value................................. 28
Figure 10. The procedure of fitting an articulated Y-feature model to
image data with a gradient descent approach. Gradient descent
method iteratively updates the Y-feature articulated model as
shown in second and third columns. The yellow and white points
correspond respectively to the seed point, and the estimated center
point. The green and red crosses indicate respectively local valleys,
and peaks in the grey level distribution. (a)(d) Initial Y-feature
model of bright or dark vessel. (b)(e) Y-feature model after fitting.
vii
(c)(f) Detected Y-features. Every valid Y-feature is classified into
bright or dark vessels............................................................................ 30
Figure 11. (a)(b) Fitted and validated Y-feature. Squares indicate non-
validated Y-features. (c)(d) Matching pairs of Y-features across
modalities ............................................................................................. 34
Figure 12. Difference between (a) Sequential registration (b) Global
registration constructed from all pairs’ shortest path algorithm........... 36
Figure 13. A complete graph of a sequence of color and fluorescein
images. Nodes correspond to images and costs of edge are
assigned our registration error. Originally, this sequence has 17
images, but the complete graph with only 7 images is shown for
display purposes. .................................................................................. 37
Figure 14. An example of global registration from the selected reference
frame to all other frames using the shortest path algorithm. Nodes
of images are connected by arcs with registration error between the
two images............................................................................................ 40
Figure 15. A view of registered images ...................................................... 42
Figure 16. Mosaic image of registered angiograms ..................................... 43
Figure 17. ROC curve of Y-feature detection according to the number
of Y-features N
S
................................................................................... 44
Figure 18. Examples of fluorescein images discarded automatically
from the registration because of high registration error with other
images in a sequence ............................................................................ 45
Figure 19. Various pairs of retinal images (a)(b) Images of Retina_A,
(c)(d) Images of Retina_B (e)(f) Images of Retina_C......................... 47
Figure 20. Overall flowchart of 3-D reconstruction of retinal fundus ......... 49
Figure 21. Cross-sectional image of a macula obtained by OCT................. 50
viii
Figure 22. Examples of estimated epipolar geometry using an 8-point
algorithm in the case of a translation of the camera. The estimated
epipolar geometry is inaccurate. (a)(b) Retina_A (c)(d) Retina_B ...... 55
Figure 23. Illustration of the estimation of fundamental matrix using
theplane+parallax algorithm. 4 corresponding points define the
projective geometry,H. Two additional points in the residual
parallax regions define the epipole e,which then provides the
estimate of the fundamental matrix F=[ e]
×
H...................................... 56
Figure 24. The epipolar lines obtained from the plane+parallax based
fundamental matrix estimation............................................................. 58
Figure 25. (a)(b)(c) Histograms of all pixels’ disparity for each stereo
pair. (d)(e)(f) Histograms of Y-features’ disparity. The mean(µ)
and the standard deviation() of each histogram is (a)µ=-
6.1,=2.221 (d) µ
Y
=-6.0,
Y
=2.801 (b) µ =-55.1 =2.459 (e) µ
Y
=-54.3
Y
=2.121 (c) µ =-2.8 =0.983 (f) µ
Y
=-3.5
Y
=1.336.......... 60
Figure 26. Dense 3-D depth maps using Mutual Information...................... 65
Figure 27. (a) Errors according to the area window size, N
z
, of Mutual
Information. (b) Errors according to the Parzen window size, N
p
. ...... 67
Figure 28. Reconstructed 3-D shapes of retinal fundi (a)(c)(e) Surface
maps with an intersection plane. The heights of the vertical planes
are determined by the minimum and maximum values of 3-D
depths (b)(d)(f) Texture maps. Each line matches with the
intersection position of the surface image on the left........................... 69
Figure 29. (a)(b)(c)(d) Example of retinal fluorescein images
(e)(f)(g)(h) Two pairs of stereo retinal images from a patient with
age-related macular degeneration. A characteristic ‘blister’ in the
macula is harder to see in the red-free images (e)(f) than it is in the
images taken after injection of fluorescein dye (g)(h) into the
antecubital vein. Nevertheless, the image in (h) is blurred which
complicates stereoscopic measurements. ............................................. 72
Figure 30. Flow chart of the proposed approach.......................................... 73
ix
Figure 31: Rectified images. Two images are selected for the estimation
of the dense disparity map.................................................................... 78
Figure 32: Projective reconstruction of images in Figure 31 ....................... 81
Figure 33. Three-pass bundle adjustment. I
R
is the reference image. The
triangle indicates camera position with R, T, and f.............................. 83
Figure 34: Reconstructed 3-D structures of the retinal fundi (a)(b)
Reconstruction of the retina from images in Figure 29-(a)(b)(c)(d)
Reconstruction of the retina from images in Figure 29-(e)(f)(g)(h)..... 85
Figure 35. Registration by back-projecting to a 3-D structure..................... 87
Figure 36. Synthetic 3-D surfaces with different depth range. (a)(c)(e) A
hyperbolic paraboloid function,
=
2
2
2
2
) ( ) (
a
x x
b
y y
s z
c c
,
where s is a variable scale value, (x
c
, y
c
) is the center of the image,
and a and b values are fixed as the one 10th of the width of the
structure, which is a=b=64. (b)(d)(f) A sinc function,
( )
w
w
s z
sin
= ,
where
2
2
2
2
) ( ) (
b
y y
a
x x
w
c c
+
= . ........................................................ 90
Figure 37. Images projected onto synthetic surfaces.................................... 91
Figure 38. Reconstructed synthetic surfaces with different depth range.
The boundary of the 3-D structure is not well reconstructed for
boundary occlusions in stereo matching. ............................................. 93
Figure 39. (a) Average reconstruction errors based on the depth range of
3-D structure. (b) Normalized reconstruction errors. (a) is
normalized by the size of the depth...................................................... 95
Figure 40. Average reconstruction errors according to baseline of two
cameras................................................................................................. 96
Figure 41. Reconstruction errors according to range of parallax
residuals................................................................................................ 98
x
Figure 42: Reconstructed 3-D surface images of the retinal fundi. The
line indicates the elevation of the surface in one y-axis..................... 101
Figure 43. A mosaic of Retina_1. 12 images were -blended in the
mosaic. ............................................................................................... 102
Figure 44. 3-D Registration of reconstructed structures and OCTs........... 105
Figure 45. Selected façade images ............................................................. 106
Figure 46. 2-D registration of a facade....................................................... 107
Figure 47. Registration using back-projection to the 3-D structure of a
facade ................................................................................................. 107
Figure 48. 3-D reconstructed façade ......................................................... 108
Figure 49. Textured structure from a different viewpoint.......................... 108
xi
Abbreviations
AMD Age-related macular degeneration
CC Cross-Correlation
CNV Choroidal neovascularization
DP Dynamic programming
GC Graph cut
ICP Iterative closest point
LoG Laplacian of Gaussian
MI Mutual Information
NCC Normalized cross correlation
OCT Optical coherence tomography
PCA Principal components analysis
RANSAC Random sample consensus
SfM Structure-from-Motion
SLO Scanning laser ophthalmoscope
SO Scanline optimization
SSD Sum of squared difference
SVD Singular value decomposition
xii
Abstract
Image registration and shape reconstruction of a retinal fundus from fluo-
rescein images are important steps to diagnose retinal diseases. This is diffi-
cult due to intensity changes in fluorescein images and the near-planarity of
the retinal surface, which has a narrow range of depth variations. This study
proposes methods for the registration and the 3-D reconstruction of a retina
from a sequence of images. We first present a method locating vessels bifur-
cations, so called Y-features, which is a robust geometric feature in a retinal
image. Subsequently, fitted Y-features are matched across images using Mu-
tual Information to register their unordered colors and fluorescein images
globally. Second, using matched Y-features, the epipolar geometry for a
near-planar surface is robustly estimated by a plane+parallax approach, and
a dense disparity map is estimated using the Mutual Information criterion.
Finally, the 3-D structure of the near-planar retinal fundus is reconstructed
in metric space by introducing a novel three-pass bundle adjustment method.
Our experimental results validate the proposed methods on a set of difficult
fluorescein image pairs and other near-planar surface images.
1
Chapter 1 Introduction
1.1 Background
The automatic analysis of retinal images can assist in the diagnosis and management
of blinding retinal diseases, such as neovascular age-related macular degeneration
(AMD), choroidal neovascularization (CNV), diabetic retinopathy, and glaucoma
[65]. To evaluate patients with retinal diseases, clinical photographers usually first
capture color images of the retina using a specialized fundus camera. Figure 1 shows
a retinal fundus camera. Subsequently, a fluorescein dye is injected into a vein in the
subject’s arm, and as the dye propagates through the retinal blood vessels, the pho-
tographer takes a series (over a 5-10 minute sequence) of pictures of the retina from
different view points by moving the camera side-to-side. The 2-D registration of the
retinal images in a sequence must be performed to align the images in order to facili-
tate the study of the evolution of the dye propagation. From patterns and intensities
of fluorescence changes, the progress of retinal diseases can be inferred. For exam-
ple, CNV can be detected from patterns of fluorescein leakage. In early phases of an
angiogram, the lesion with CNV is bright. However in mid and late phases, the fluo-
rescence leaks and creates blur. For the accurate analysis of angiogram, the registra-
tion of fluorescein images is an essential process. The most common method of reg-
istering images is manual annotation of matching points in all images of a sequence.
However, it is a tedious and user-prone process to register all images by hand.
2
Figure 1. A retinal fundus camera
The retinal fundus is nearly planar but it does have 3-D depth. When the patients
have glaucoma, the optic disc is deeply intruded as a result of fluid pressure in the
eye. In case of diabetic retinopathy, the fluid leaks into the macula area and the fluid
makes the macula an extruded blister. It is difficult to measure the progress of those
diseases only by 2-D registration. The diagnosis and quantification of those retinal
diseases require 3-D reconstruction of a retinal fundus. The most well-known method
is interpretation of cross-sectional depth images from optical coherence tomography
(OCT). The cost of OCT equipment and expertise of its interpretation, however, limit
the use of this technology. An automated and low cost technology for registration and
3-D reconstruction of a retinal fundus would provide a valuable alternative.
3
1.2 Goals
This dissertation presents methods that facilitate the analysis of human retinal images.
The literature on the analysis of retinal images mainly deals with the issue of 2-D
registration. Accurate diagnosis may also require a 3-D depth map of the retina. In
particular, the analysis of the 3-D reconstruction of a retinal fundus is essential to
measure the eye pressure on the optic disc and the fovea, identifying lesions, and
estimating the extent degree of lesion development. The first goal of this dissertation
is the 2-D registration of color images and a sequence of fluorescein angiographic
images. The second is the calculation of the dense disparity map of the retinal fundus
from two images. The third is the 3-D metric reconstruction of the retina and the reg-
istration using the reconstructed 3-D structure. These methods are intended to im-
prove the accuracy and reproducibility of diagnosis for ophthalmologists.
4
1.3 Issues and Difficulties
The issues and difficulties of 2-D registration and 3-D reconstruction of a retinal
fundus are discussed in this section.
1.3.1 Issues in 2-D Registration
The diagnosis and quantified evaluation of retinal diseases relies on the interpretation
of color images and fluorescein angiography images in a sequence by qualified ex-
perts. A fluorescein angiography image sequence depicts the circulation of the so-
dium fluorescein dye in the retinal vessels, and the interpretation of critical phases in
the circulation is a key element for diagnosis. The registration of color and fluo-
rescein retinal images helps ophthalmologist to diagnose retinal diseases.
The registration of color and fluorescein retinal images needs to address follow-
ing five issues:
Intensity changes
Feature extraction
Transformation mode
Matching method
Mosaic
5
(a)
(b)
(c) (d)
Figure 2. Color and fluorescein images of a retina
1.3.1.1 Intensity Changes
The intensity of the color image and the acquired grey level angiograms vary sub-
stantially during the circulation of the dye, making the 2-D registration of these im-
ages nontrivial. For example, the retinal veins in Figure 2 are red in the color image
(Figure 2-(a)), hypointense (dark) in the early angiogram (Figure 2-(b)), increase in
intensity (fill with dye) in mid angiogram (Figure 1-(c)), and then decrease in inten-
sity (“fade”) in the late phases (Figure 2-(d)).
Therefore, the geometric registration of the angiograms requires the use of
strong geometric invariants and a robust matching function. Similarly, registering a
color image with the acquired angiograms to facilitate the expert’s analysis creates
6
similar problems, as the grey levels are not consistent across color and fluorescein
images. The resolutions of color and fluorescein images also vary; the resolution of
the color image varies from 640x480 to 2300x2000, and the resolution of fluorescein
images varies from 1320x1032 to 2300x2000.
1.3.1.2 Feature Extraction
The process of matching and registering color and fluorescein images requires the
extraction of invariant geometric features. The detection of geometric features is es-
sential for medical diagnosis as well. There are many features in retinal images such
as optic disc, macula, fovea, blood vessels, and Y-features.
Figure 3. Features in a retinal image, Optic disc(green), blood vessels(red), and Y-
features(magenta)
The optic disc is an area where all vessels and nerves converge and it is a fre-
quently-used robust feature. The macula has an oval shape and it lies on the center of
the retina. No vessels pass through the macular area. The central part of the macular
area is the fovea. Cone cells are concentrated in the fovea area. Even though the fo-
7
vea and the optic disc are important geometric features, their size varies across im-
ages and they are sometimes partially, or even completely invisible in the image.
The blood vessel network is also a strong geometric feature in the image. How-
ever, the color and size of the blood vessels change with the development of fluo-
rescein diffusion. Y-features are one of the most reliable features in the retina. The
disappearance of some vessels should not affect the registration of images using Y-
features because there are a sufficient number of Y-features in any retinal image.
1.3.1.3 Matching Algorithm
Since the intensity of the fluorescein angiography retinal images changes, the size of
the vessel varies, and some parts of the retina appear or disappear across images,
matching features in fluorescein images is a quite difficult process. The most com-
mon method, correlation, does not work for retinal images. One study [10] uses the
shape of the Y-features to match retinal images. However, because the shapes of Y-
features in one image are quite similar to each other, this method is incapable of dif-
ferentiating one feature from another. In the field of medical imaging, Mutual Infor-
mation (MI), which has robust performance on intensity changes, is widely applied
instead of correlation for the registration of such images.
1.3.1.4 Transformation Model
The simplest transformation for the registration of retinal images is a 2-D rigid mo-
tion, which only recovers in plane translation and rotation [67][69]. In this model,
8
two pairs of corresponding points are sufficient to recover the rigid transformation.
However, the planar rigid body model fails when images are skewed and scaled. The
most commonly-used transformation for the retinal images registration is an affine
motion model [7][42][43][60][61][69], which adds scale and skew to the rigid trans-
form. To estimate an affine motion, at least three corresponding points are necessary.
If the shape of a retinal fundus is considered to be planar, the appropriate trans-
formation model is a homography [42]. The parameters of the homography, or the
projective matrix, can be obtained from 4 corresponding points. In case where there
are a sufficient number of points that are matched correctly, the projective transfor-
mation provides the best results. Note, however, that the difference in the results be-
tween the projective motion and the affine motion tends to be small for retinal im-
ages, so the affine model may be a sufficiently complex method.
On the other hand, Can et al [9][10] refutes the assumption that the shape of a
retinal fundus is planar. Thus, they use a quadratic model with 12-parameters to es-
timate the surface of a retinal fundus, which requires at least 6 corresponding points.
In this method, they apply a stratified approach going from the rigid body model to
the quadratic model. First, with the assumption that rotation does not occur in the
retinal image, only the translational motion is estimated. Secondly, the result of the
translation motion is used to assess the initial approximation for the affine motion.
Finally, the result of affine is utilized to estimate an initial value for the quadratic
motion. This hierarchical approach removes errors in estimation that are caused by
9
the high degree of freedom in the quadratic estimation. However, they do not esti-
mate the shape of a retina. They only use a more complex motion, which may or may
not account for the 3-D shape.
In reality, a retinal fundus is neither a plane nor a quadratic surface; rather, it has
a complex 3-D shape, and may contain extruded lesions and an intruded optic disc.
The discussion of the 3-D reconstruction for a retinal fundus is provided in Chapter 3.
1.3.1.5 Mosaic
The most common method of registering an image sequence to produce a mosaic is
the sequential registration, which sequentially registers images. The downside of this
method becomes obvious when an incorrectly registered image is added to the mo-
saic image in the middle of the sequence. In such a case, the registration error propa-
gates to all other images. A global registration method solves this problem by con-
sidering a graph that connects all overlapping images and estimate the global regis-
tration error. We discuss global registration in Section 2.4.3.
The categories applicable in this dissertation are shown in bold characters in
Table 1.
10
1.3.2 Issues in 3-D Reconstruction
The 3-D reconstruction of a retina may help diagnose retinal diseases, such as glau-
coma and diabetic retinopathy more accurately. However, few research efforts have
been conducted in the 3-D reconstruction of retinal images. Fluorescein images of a
retinal fundus have unique properties that prevent the use of classical stereo algo-
rithms to estimate the 3-D structure of a retinal fundus from a pair of images. First,
the intensity and color of the same physical positions may vary between consecutive
images. Second, the 3-D structure of the retinal fundus is near-planar so that its 3-D
depth is very shallow, which makes classical methods unfit to estimate the epipolar
geometry and camera parameters of each image.
1.3.2.1 Near-planar surface
When all the points in 3-D are located on a plane, the equations relating the images
(the fundamental matrix) cannot be reliably estimated, because this is an explicit de-
Modalities Feature space
Matching
method
Transformation
Model
Mosaic
Color –
Color
Optic disc Rigid Sequential
Vessels
Correlation
Affine
Color –
Fluorescein
Y-features Projective
Accumulative
Fluorescein –
Fluorescein
Others
Mutual
Information
Curved
(or Quadratic)
Global
Table 1. The categories of this dissertation for 2-D registration
11
generate case of epipolar geometry estimation. Thus, the classical 8-point or 7-point
algorithms are unable to estimate a correct fundamental matrix.
The 3-D reconstruction of a near-planar surface is a quasi-degenerate case. Many
studies require that the viewing angles of each camera should be at least 10°~20°
apart. However, the viewing angles of retinal fundus images are very close and their
motion is close to translation, violating this requirement.
1.3.2.2 Intensity changes
As mentioned in Section 1.3.1.1, the intensity of the retinal images changes across
images. Radical intensity changes of the matching area across color and fluorescein
images make the estimation of the dense disparity map more difficult. The fre-
quently-used correlation based algorithms do not work in this situation.
1.3.2.3 3-D Reconstruction of the macular area
The macula is located in the middle of the retinal fundus surface. No vessels go
through the macula. This results in a homogeneous texture in the macular area. Thus
the 3-D reconstruction of the macular area is a difficult problem. At this point, we
know of no study trying to reconstruct the macula. The algorithm to reconstruct the
whole retinal fundus including the macula and the optic disc in 3-D is necessary for a
more accurate diagnosis and quantified analysis of various retinal diseases.
12
1.4 Overview of approach
The methods proposed here consist of three main topics: The first topic deals with
the 2-D registration of color images and fluorescein angiographic images in a se-
quence. For the accurate estimation of positions of Y-features, an articulated Y-
feature model is introduced. Fitted Y-features are matched using Mutual Information
criteria. A sequence of images is registered globally by all pairs’ shortest path algo-
rithm. The second presents the calculation of 3-D dense disparity map of a retinal
fundus from a pair of images. To overcome the degeneracy of the near-planar surface
of a retina, a plane+parallax approach is used to estimate the epipolar geometry. The
selected two intensity-varying images are matched using Mutual Information as a
matching criterion. The third explains the method of the 3-D metric reconstruction of
the near-planar retinal surface, introducing the three-pass bundle adjustment method.
The images of a retinal fundus are registered again by back-projecting the recon-
structed 3-D structure.
1.5 Contributions
The novel contributions of this dissertation are:
(1) The use of an articulated Y-feature model,
(2) A global 2-D registration method using the all pairs’ shortest path algorithm,
(3) The estimation of epipolar geometry by a plane+parallax method and search
space reduction using accurately matched Y-features,
13
(4) The estimation of a dense disparity map using Mutual Information criteria,
(5) The definition of a near-planar surface images and its range, and
(6) The 3-D metric reconstruction of a near-planar surface using the proposed
three-pass bundle adjustment algorithm.
1.6 Thesis overview
Chapter 2 describes the 2-D registration of color and fluorescein retinal images with
model-based Y-feature extraction, matching Y-features, and the global registration.
Chapter 3 explains the estimation of the epipolar geometry and the dense disparity
map from retinal fundus images. Chapter 4 shows the 3-D reconstruction of a retinal
fundus and the registration of fluorescein images using a 3-D structure. In Chapter 5,
concluding remarks are given and some future research directions are listed.
14
Chapter 2 2-D Registration
This section presents a fully automatic method for registering color and fluorescein
angiograms in 2-D. The method consists of four main steps. First, the seed positions
of Y-feature are computed using a PCA-based analysis of directional filters responses.
Second, an articulated model of each Y-feature is fitted to the image features using a
gradient descent method. Third, the extracted Y-features are matched by maximizing
Mutual Information. Lastly, color and fluorescein images are registered globally us-
ing the all pairs’ shortest path algorithm. In Figure 4, the overall flow of the 2-D reg-
istration methods is shown.
Chapter 2 is organized as follows: In Section 2.1, previous studies are reviewed.
In Section 2.2, we describe the initial estimate of a Y-feature position. Section 2.3
explains the fitting of an articulated Y-feature model and its validation. Section 2.4
describes the matching of the Y-features. In the Section 2.5, global registration using
all pairs’ shortest path algorithm is proposed. Experimental results of these methods
are reported and discussed in Section 2.6.
15
Figure 4. Overall flowchart of 2-D registration of color and fluorescein images
…
Estimating
Initial Y-feature
Positions
Fitting Y-features
Matching Y-features
Pairwise Registration
…
Color
Image
Convolution with 6
directional filters
Fluorescein
Images
Global Registration
Mosaic
Estimating
Initial Y-feature
Positions
Fitting Y-features
Convolution with 6
directional filters
16
2.1 Previous work
2.1.1 Feature Extraction
The key element of matching and registering color and fluorescein images is the ex-
traction of invariant geometric features. Commonly used robust features are the optic
disc, blood vessels, Y-features, and image edges.
a. Optic disc
The optic disc is a frequently-used and robust feature where all vessels and nerve
fibers converge. Although many studies [47][62] have implemented the image regis-
tration with the detection of the optic disc, its use is limited since it requires the optic
disc to be visible in the image, and the images to be registered by a simple translation
model [18]. The assumption of a simple translation model imposes a stringent re-
striction which is often not valid with fluorescein angiograms.
b. Blood vessels
Another important feature in retinal images is blood vessels. A number of research
papers have addressed the issues of blood vessel enhancement and extraction not
only with retinal images, but also with other medical images such as lung and cere-
bral images [33][53][63][64]. The extraction of vessels in lung and cerebral images
is more complicated than in retinal images. Because their images represent the slice
17
of the vessels, some vessels that face in a vertical direction to the camera can be dis-
played as an oval spot. Therefore, to obtain accurate results, a vessel extraction
method requires a 3-D modeling based on the volume of image sets [33][53][64].
Unlike the lung and cerebral images, blood vessels in retinal fundus images are 2-D
features. The most common method to extract the vessels in a 2-D image is to use
directional filters [11][68][69]. Exploiting the property of linearity in vessels, blood
vessels can be extracted using a set of shifted Gaussian filters rotated at different
orientations. In [35][33][53], vessels are detected and enhanced based on the princi-
pal component analysis (PCA) value of each pixel in medical images. However, this
method extracts not only vessels but also unnecessary nodules. Recently, Agam et al.
[1] proposed a method to suppress the nodules and to enhance vessels by implement-
ing the Cauchy distribution-based probabilistic model. Several investigators
[42][66][68][69], have demonstrated the detection of blood vessels through a
mathematical morphology approach using linear structural elements. This method
performs well to detect bright vessels. To detect dark vessels, the method is applied
on the negative version of the image. This method, however, has difficulty when
there is great variation in the vessel colors; for example, during the early phase of the
fluorescein angiogram, when both dark and bright vessels are present in the same
image (Figure 2-(b)). Existing methods of vessel detection are therefore not suitable
for extracting vessels in color and fluorescein images where the color, the size, and
the brightness of vessels vary across fluorescein images. These methods are not able
18
to extract vessels in a low contrast image which is a common attribute of the late
phase of fluorescein angiograms.
c. Y-features
The retinal vascular tree has a regular pattern of bifurcation which form Y-features,
and thereby provides a reliable set of features in a retinal image. Can et al. [8][9][10]
present a method of Y-feature detection in retinal images. Y-features are extracted
while tracing the vessels by finding the centerline between the left and right bounda-
ries of the vessel. An important advantage of using the centerline of the vessel is that
the Y-feature detection is less affected by variations in retinal vessel diameter which
can occur as a result of the normal pulsatility or retinal blood flow. On the other hand,
because the centerline cannot be reliably extracted in the branch area, this tracing
method produces multiple and inaccurate center positions, which subsequently cause
errors in the image registration. Tsai et al. [58][59] improve this approach by specify-
ing a branching area and extracting the center position of Y-feature within the area.
With this method, the center position is estimated from the closest point of the three
linearly approximated traces. Nevertheless, this method still produces multiple Y-
features in one branch.
Zana and Klein [69], proposed a method that detects the Y-features using the
mathematical morphology approach with structural elements of Y and T shapes.
19
However, this approach does not estimate the orientation of the Y-features accurately
and it tends to generate a large number of false detections.
In [12], the authors have proposed a method for which the initial positions are
estimated using directional filters and the accurate position and shape of the Y-
features are obtained by fitting an articulated model to the image boundary.
d. Others
Besides the optic disc, vessels and Y-features, the fovea and pathological symptoms
such as nodule or hemorrhage contain significant features. Unlike vessels, lesions
and a fovea entail an oval shape. In [22], fovea in fluorescein angiographies is de-
tected by the active contour model. Exudates, drusen, and areas of leakage are also
important features in the retinal images.
In many cases, the image itself is treated as a feature and registered directly
[61][67][50]. In other cases, simple gradient or edge images are used as a feature [7].
2.1.2 Matching Algorithm
The most common window-based matching criteria are the sum of squared difference
(SSD) and cross-correlation (CC). In case of medical images, Mutual Information
(MI) gives better results for intensity changes of multimodal images
[7][40][48][60][61][67] [50]. Mutual Information does not rely on the intensity val-
ues but on a statistical measurement called entropy. Viola and Wells apply Mutual
Information to register retinal images [60][61][67]. Their method maximizes Mutual
20
Information of two retinal images and derives the best affine transformation with the
gradient descent approach. The problem with their method is that it easily falls into
local minima and is quite time consuming, since it has to use the two entire images
as a feature to compute the Mutual Information. While CC and SSD take O(N) time
to calculate the correspondence of two images, where N is number of the pixels, MI
has O(N×M) complexity, where M is the number of pixels in a Parzen window,
which estimates the density function of entropy.
2.1.3 Global Registration
The most common method of registering images in a sequence is sequential registra-
tion, which registers images one at a time. In [54][59], the authors propose a dual-
bootstrap iterative closest point (ICP) algorithm to register a pair of images using
blood vessels as features. They improved the accuracy of the registration and solved
the initialization problem of the ICP algorithm. The downside of this method is that
when an incorrectly registered image is added to the mosaic image, the registration
error propagates to all other images. A global registration method solves this problem
by considering a graph that connects all overlapping images and estimates the global
registration error. We discuss the global registration in Section 2.5.
2.2 Estimating Initial Y-feature Positions
In this section, we present a method for locating Y-feature candidates in the image.
Y-features are characterized by regions in the image where three vessels converge.
21
The Y-feature should have exactly 3 strong responses from directional filters span-
ning 180°. The PCA based analysis of filter outputs allows us to locate the positions
of the Y-features in the image.
2.2.1 Directional Filtering
The approximate position of a Y-feature is estimated using a set of directional filters.
We use a 2-D Laplacian of Gaussian (LoG) filter centered at the origin:
+
+ =
2 2
2 2
2 2 4
2
4
2
1 1
2
1
y x
y x
y x y x y x
e
y x
LoG
(1)
With the proper selection of
x
and
y
, (
x
=16 and
y
=4 for a 640×480 image), the
elongated shape of the LoG filter is created. With this base kernel, directional filters
are generated by rotating the kernel from 0 to 180 degrees. A large number of direc-
tional filters will generate finer responses from the Y-features in the image. However,
this can cause duplication of the responses from one vessel. On the other hand, using
a small number of filters could result in some missed Y-features. As a tradeoff, we
use 6 directional filters at 6 different angles. In this paper, all parameters related with
image size, such as the filter size, the length of the Y-feature, the search space, and
window size, are specified for a 640×480 image. These parameters can be changed
proportionally to account for the size of an image. Figure 5 shows the 6 filters used.
22
2.2.2 Principal Components Analysis
The initial position of Y-features is estimated from the 6 filter responses. Since a Y-
feature should respond at exactly 3 different directions, we use the Principal Compo-
nents Analysis (PCA) method to detect the pixels with large responses in exactly 3
different filter outputs. A local analysis of the directional filter responses permits the
integration of responses from neighboring pixels. The 3×3 neighboring pixels for
each filter response are then transformed to 1-D 9 pixels, and are autocorrelation of
the 6×9 matrix is calculated. The resulting 6×6 autocorrelation matrix is decomposed
to eigenvalues and eigenvectors using the Singular Value Decomposition (SVD)
method. For each pixel, we examine the third largest eigenvalue; if this eigenvalue is
large, the pixel is considered as a good candidate for a Y-feature. After sorting pixels
by the third largest eigenvalues, we select the best 100 features for each image. In
Figure 6 we show the extracted seed points using the PCA analysis of the directional
filter outputs.
0° 30° 60° 90° 120° 150°
Figure 5. 6 Directional LoG Filters defined using 16 =
x
and 4 =
y
23
2.3 Fitting an Articulated Y-feature Model
We propose a Y-feature extraction method based on fitting an articulated model. The
articulated model is fitted by maximizing the local intensities inside the template and
gradient information on the boundary of the template.
2.3.1 Articulated Model for Y-features
The considered articulated model for the Y-features has 8 DOF,
) , , , , , , , (
3 2 1 3 2 1
w w w y x = x which include the center position, three angles, and
three widths for each branch. The length of each branch is fixed.
(a) (b)
Figure 6. Initial Y-feature position in two images of different modalities. Only
the best 100 features are selected.
24
Figure 7. Articulated model of a Y-feature
The three arms rotate, and are attached to the center position. Using the geometric
properties of the Y-features in the retinal images, we constrain the arms to be neither
too close nor too far from each other. The half width of each arm, w
i
(i=1,2,3), is the
distance between the center of the vessel and its boundary. The width is also con-
strained to be between the minimum and maximal size of the vessels we are inter-
ested in detecting.
2.3.2 Initialization of the Y-feature Shape
Given the seed points extracted by the PCA analysis of the response of the direc-
tional filter, the initial orientations of the three arms of the model need to be esti-
mated. Can et al. [8] propose the use of a rectangular grid and local detection of the
maximum intensities for each vertical and horizontal line over the whole image in
order to estimate the shape of Y-features. This approach, however, has several poten-
tial flaws: (1) a large number of grids must be dealt with, (2) Y-features located at
w 3
w 2
w 1
3
1
2
L
(x,y)
25
the edges of the grid cannot be detected, and (3) local maxima from the four grid
lines are selected in an ad-hoc manner to find the three best vessels. Our method de-
tects three bright or dark vessels from the circular boundary of the Y-feature located
on the seed points. Along the circle, peaks or valleys of intensities are detected.
Peaks represent bright vessels and valleys correspond to dark vessels. For each peak
or valley, the best three arms are selected based on the following error function:
where L is the length of the branch, w
i
is the half width of the branch, i is the index of
the considered arm, and I(x
i
, y
i
) is the image intensity at
i i i
l w x x cos sin + + = ,
i i i
l w y y sin cos + + = , where (x,y) is the center position of the model. F
I
(x) de-
scribes the sum of intensities inside the Y-feature model, represented by the shaded
area in Figure 7. The angles of the Y-feature model are determined by connecting the
center of the model and three vessels on the circular boundary. In the case of a dark
vessel, three valleys that minimize F
I
(x) are selected, and, for a bright vessel, three
peaks maximizing F
I
(x) are considered for the initial angles of the Y-feature model.
Every possible combination of three peaks and three valleys on the circular boundary
is tested based on the error function F
I
(x). In Figure 8, the initial estimation of bright
and dark vessels is represented in the case of color images and angiograms.
[]
=
=
3
1
0
2
) , (
2
1
) (
i
w
w
L
i i I
dw dl y x I F
i
i
x
(2)
26
2.3.3 Fitting Y-features
After initializing the position of the Y-feature and the angle of the branches, we fit
the articulated Y-features using a gradient descent method. In addition to the con-
straints defined by equation (2), the gradient value along the boundaries of the ves-
sels is utilized to enhance the accuracy of the estimated Y-feature model. We mini-
mize the following energy:
[] []
= =
=
3
1) , {
0
2
3
1
0
2
) , (
2
1
) , (
2
1
) 1 ( ) (
iw w w
L
i i
i
w
w
L
i i
m
i i
i
i
dl y x G dw dl y x I F x
(3)
(a) (b) (c)
(d) (e) (f)
Figure 8. Initial estimation of a bright or dark vessel around the circular boundary.
The red cross shows local peaks, and the green cross shows the local valleys char-
acterizing respectively bright and dark vessels. The yellow dot in the middle shows
the center position. In (a)-(c) the seed locations detected for bright and dark vessels,
(d)-(e) depict the corresponding initialization of the Y-feature model.
27
where G(x
i
, y
i
) is the gradient value on the boundary, and m=0 for a dark vessel, m=1
for a bright vessel. G(x
i
, y
i
) is computed by the gradient of Gaussian filter output.
When an articulated Y-feature is correctly fit, the sum of the intensity values, F
I
(x),
in a Y-feature will be maximum for a bright vessel and the boundaries of the model
are on the edge of the image. In addition, we constrain the angle and width of the
branch to be within a specified range of values. The relative angles between the
branches of the Y-model are defined by g
1
(), g
2
(), and g
3
() as follows:
, 2 ) (
and , ) (
, ) (
3 1 3
2 3 2
1 2 1
u l
u l
u l
b g b
b g b
b g b
+ =
=
=
4
4
4
where =(
1
,
2
,
3
) and angle boundaries are set to 9 / 10 , 9 / = =
u l
b b . These
bounds were defined empirically after observing that no Y-feature goes beyond the
specified boundary angles in the working set of retinal images for this study. Simi-
larly, the half width of each branch is constrained by defining an upper and lower
bound on the width:
3 , 2 , 1 , = i w
b i l
In our experiments, boundaries of the half width are set to 4 , 1 = =
b l
. The
length of the Y-feature model and the width of vessel boundaries are determined to
be proportional to the size of the image. These constraints are enforced using La-
grange multiplier, thus solving the constrained optimization problem with the above
28
inequalities [3] . We use a penalization approach using a barrier function. The ine-
quality constraints on the angles are translated to the penalty function:
=
=
3
1
)) ( )( ) ( (
1
) (
j
s
j u l j
g b b g
B
4 4
4 .
Figure 9 plots the barrier function ) (4 B with different values of s, which controls
the smoothness of the barrier function. As s increases, the curve becomes steeper
near the boundaries. In our experiments we set s = 1.
A similar barrier function is used for the width of the arms, and the complete
penalization function considered is:
= =
+
=
3
1
3
1
) )( (
1
)) ( )( ) ( (
1
) , (
j
s
j b l j
j
s
j u l j
w w g b b g
B
4 4
w 4 (4)
Figure 9. The barrier functions with lower and upper boundary w.r.t. to s. When
the value x is closer to the lower or upper bounds, the function penalizes x by asso-
ciating a large value.
29
where ) , , (
3 2 1
w w w = w . and are parameters to homogenize radian and pixel into
metric quantities.
The extraction of Y-features in the image consists of initializing the model using
the feature point location and orientations and fitting the articulated Y-model to the
image features by minimizing the function:
) , ( ) ( ) ( w 4 x x B F E + =
(5)
where mitigates the trade-off between the goodness of fit to image features and the
constraints on the orientation of the arms and their thickness. The function E is
minimized iteratively using a gradient-based approach.
Fitting is performed based on Equation (5). Since the articulated Y-feature has 8
degrees of freedom, if we were to update all 8 parameters at once, the gradient algo-
rithm could be trapped in a local minimum. To prevent this, we fix the three end-
points of the branch and update only the position and the width of the arms. Once the
center point of the Y-feature converges to a stationary location, the angles and widths
of the branches are updated. The gradient-based minimization scheme is iterated un-
til it reaches a maximum number of iterations, set to 25, or the parameters reach sta-
tionary values. As the PCA analysis detects both types of vessel (i.e. bright/dark), the
model is fitted twice to detect Y-features, first assuming dark vessels and second
assuming bright vessels. After alignment, the Y-feature candidate with the higher
gradient value on the boundary is selected.
30
All fitted Y-features do not necessarily correspond to real bifurcations of vessels.
Indeed, many initial Y-features are selected based on whether they have a high re-
sponse to the directional filters, as described in Section 2.2. A large number of seed
points are considered in order to minimize the miss of real Y-feature in the image.
After fitting the articulated model to the selected image regions, we discard some of
the extracted Y-feature points where the boundary of the arms of the articulated
model does not lie on strong edge points. If 2/3 of the boundary of any arm is not on
the edge, the Y-feature model is discarded. Figure 10 shows the procedure of fitting
an articulated Y-feature model.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 10. The procedure of fitting an articulated Y-feature model to image data
with a gradient descent approach. Gradient descent method iteratively updates the
Y-feature articulated model as shown in second and third columns. The yellow
and white points correspond respectively to the seed point, and the estimated cen-
ter point. The green and red crosses indicate respectively local valleys, and peaks
in the grey level distribution. (a)(d) Initial Y-feature model of bright or dark ves-
sel. (b)(e) Y-feature model after fitting. (c)(f) Detected Y-features. Every valid Y-
feature is classified into bright or dark vessels.
31
2.4 Matching Y-features
The circulation of the fluorescein dye in the retinal vessels translates into different
grey level properties in the color and angiogram images, making the matching of
extracted Y-features challenging. We propose to match extracted Y-features across
modalities and through different phases of the circulation using the local maximiza-
tion of Mutual Information. To match individual Y-features, geometric descriptions
are too coarse, and intensity matches are unreliable due to the intensity changes. In-
stead, we do match intensities, but use Mutual Information rather than correlation.
2.4.1 Mutual Information
Mutual Information is derived from entropy theory. Entropy is known as measure of
information, and it is mostly used in communication theory. After measuring the un-
certainty of an event z by probability of the event, the entropy is defined by
z z z z d p p H ) ( ln ) ( ) (
=
where p(z) is a probability of event z, and also interpreted as a probability of occur-
rence of event z. ln p(z) is the size of information per event z. The definition of en-
tropy H(z) is the average amount of information to be gained from a certain set of
events. For the most uncertain event, its entropy is maximal, and for the most pre-
dictable events, the entropy is minimal. Mutual Information is defined by this entropy
32
theory. When there are two events z
A
and z
B
, the Mutual Information for a pair of
events z
A
and z
B
is defined by [67]:
) , ( ) ( ) ( ) , (
B A B A B A
H H H E
MI
z z z z z z + =
(6)
Equation (6) indicates that the maximization of Mutual Information is related to the
minimization of the joint entropy ) , (
B A
H z z [48]. Marginal entropy H(z
A
) and H(z
B
)
boost the Mutual Information value when each event z
A
and z
B
has more information.
In the case of images, when two image patches z
A
and z
B
have more texture, marginal
entropy H(z
A
) and H(z
B
) has a high value; when two image patches are similar to
each other, the joint entropy ) , (
B A
H z z has a low value. Matching images using Mu-
tual Information gives not only the similarity of two images, but also the credibility
of the matching.
2.4.2 Matching Y-features using Mutual Information
We consider a rectangular window enclosing the Y-feature model in the source im-
age and compare it to other windows in the target image within a search area, which
is set to a fifth of the image size. Y-features for which Mutual Information is maxi-
mal among the set of candidates are paired. We set two events z
A
and z
B
as two win-
dows enclosing the Y-features. We consider the following approximation of the en-
tropy:
z z
z
i
z
i
z p
N
H ) ( ln
1
) (
33
where N
z
is the size of the window z, and the density function ) (z p is estimated
based on Parzen window density estimation. Since measuring p(z) from the infinite
boundary is also difficult, the Parzen window approach estimate the density function
p(z) from the sample patterns within the finite window. We consider a Gaussian den-
sity function for the Parzen window W
P
, and the distribution of the grey levels is lo-
cally approximated as follows:
P j
W z
j
p
z z g
N
z p ) (
1
) (
where N
p
is the number of samples in the Parzen Window W
P
, and ) (z g
is the uni-
or bi-variate Gaussian density function with a diagonal covariance matrix 1 [67].
Now the entropy function becomes:
z z
z
iP j
zW z
j i
p
z z g
N N
H ) (
1
ln
1
) (
(7)
We consider a window enclosing the Y-feature model in the source image and com-
pare it with other windows in the target image within a specified neighborhood. Y-
features for which Mutual Information is maximal among the set of candidates are
paired. In Figure 11-(a) and (b), the extracted Y-features are displayed. The small
squares in the image indicate locations selected as an initial Y-feature but which
were later discarded due to low image gradients along the edges of the Y-feature
arms.
34
2.5 Global Registration
In this section, methods to register color and fluorescein images are discussed. A
simple registration method is to register images sequentially. However, the registra-
tion error in the previous stage can be propagated to the next stage of the registration.
We propose a global registration method, which provides an optimal registration
solution for an unsorted sequence of color and fluorescein images.
(a) (b)
(c) (d)
Figure 11. (a)(b) Fitted and validated Y-feature. Squares indicate non-validated Y-
features. (c)(d) Matching pairs of Y-features across modalities
35
2.5.1 Pairwise Registration
Using matched pairs of Y-features, images are registered using an affine transform
model. The RANSAC method [17] is applied to find the inliers from the matched
features. Among randomly sampled three matching pairs, the best three matching
pairs are selected to obtain the affine transform, which minimizes the geometric er-
rors of every match:
()() { }
+ = =
N
i
i i i i Geometric best
N
E
2
1 2
2
1
Min arg x A y Ay x A
A
A
(8)
where x
i
and y
i
are a matching pair, N is the number of matching pairs, and A is the
affine transform obtained by selecting 3 pairs of Y-features. With the selected affine
transform, the geometric error for each matching pair is computed. Based on the error,
the outliers among the matching pairs are removed and the inliers are considered for
the estimation of the affine transform. Figure 11-(c) and (d) show the pairs of corre-
sponding inliers after matching and registration.
After the estimation of the affine transform, we define the registration error
function E
A
with the geometric error of matched Y-features and Mutual Information
of whole image pairs after transformation.
)) ( , ( ) , ( ) , (
2 1 2 1 2 1
I A I I I I I
A A
MI Geometric
E E E # = (9)
where A is the estimated affine matrix, E
A
Geometric
is the geometric error in Equation
(8), and E
MI
is Mutual Information in Equation (6). # is the normalization factor for
two different error scales of E
A
Geometric
and E
MI
.
36
Images in a sequence of angiograms can be registered sequentially using the se-
quential registration as illustrated in Figure 12-(a). However, if one image is not cor-
rectly registered, this error propagates to the remaining images. Furthermore, the reg-
istration across modalities varies in accuracy according to the phase of circulation of
the fluorescein dye in the retinal vessels. The global registration is introduced to
automatically reduce the registration errors of an unordered set of images within and
across modalities. The global registration is intended to identify the best registration
among all pairs of images, while minimizing the global registration error. Figure 12
illustrates the difference between the sequential registration and the global registra-
tion.
(a) Sequential registration (b) Global registration
Figure 12. Difference between (a) Sequential registration (b) Global registration
constructed from all pairs’ shortest path algorithm.
I
n
I
3
I
2
I
1
I
n
I
3
I
2
I
1
37
Figure 13. A complete graph of a sequence of color and fluorescein images.
Nodes correspond to images and costs of edge are assigned our registration er-
ror. Originally, this sequence has 17 images, but the complete graph with only 7
images is shown for display purposes.
38
2.5.2 Selection of the Reference Frame
The quality of the resulting mosaic depends considerably on which image is selected
as the reference frame. We propose here a method for selecting the reference frame
that gives the lowest registration error using graph analysis. To address this problem,
we construct a complete and undirected graph, where the nodes correspond to the
images to be registered, and the edges correspond to the pairwise registration of the
images. A cost is associated to each edge, representing the registration error com-
puted by Equation (9) obtained by the pairwise registration. Figure 13 shows an ex-
ample of a complete graph of images; there are 17 images in this set of data but for
clarity, only 7 images are shown for display purposes.
The global registration problem is formulated as finding the shortest path from
every node to all the other nodes in this complete and undirected graph. The all-pairs’
shortest paths are calculated using the Floyd-Warshall algorithm [20]. With this algo-
rithm, all shortest paths from a node to any other nodes are obtained. When there are
n images in a sequence, we build a n×n size symmetric adjacency matrix A where
each element represents the geometric error of two images. If there is no connection
between two images, an infinite value is assigned for that corresponding element of
the adjacency matrix A.
39
Procedure All_Pairs_shortest_path(n,A,s)
begin
for k = 1 to n do
for i = 1 to n do
for j = 1 to n do
if A[i,k]+ A[k,j]< A[i,j] then
begin
A[i,j]=A[i,k]+A[k,j];
s[i,j]=s[i,k]+s[k,j];
end
end All_Pairs_shortest_path
After running this algorithm, the shortest path list from image i to image j is
saved in the list s[i,j]. In the matrix A, the accumulated costs for each shortest
path are calculated. The i-th row of matrix A indicates the registration errors from the
image i to all other images. Therefore, the values in each row of matrix A are
summed up and the row with the minimum total registration errors is selected as the
reference frame:
=
= =
n
j
n i
error
n i
j i i R
1
} ,.., 1 { } ,.., 1 {
] , [ Min arg ) ( Min arg frame Reference_ A
(10)
where R
error
(i) is the total registration error with the reference frame i.
2.5.3 Optimal Registration
After the reference frame is selected, the global registration is simply accomplished
by selecting the shortest path from reference frame to all other frames as represented
by the lists s[i,1] … s[i,n]. The selected shortest paths from a reference frame
to all other frames give an optimal solution given the costs on the edges, since this
40
guarantees that total registration error across images are minimal. The time complex-
ity of the algorithm is O(n
3
), where n is the number of nodes. Figure 14 illustrates the
shortest paths from the reference frame to other frames.
Figure 14. An example of global registration from the selected reference frame to
all other frames using the shortest path algorithm. Nodes of images are con-
nected by arcs with registration error between the two images.
41
2.6 Experimental Results
We conducted experiments on 11 sets of images. The 11 image sets were chosen by
an experienced retinal specialist to be representative of the typical range of images
encountered in clinical practice, encompassing a range of common retinal diseases.
Each set contains one color image and approximately 20 angiographic images, in-
cluding very dark ones with poor contrast. In most of the previous work dealing with
retinal angiogram registration, these images are simply discarded and not registered.
Furthermore, in some pathological cases, irregular spots with very strong contrast
bias the extraction of Y-features. Only the green channel of the color image was con-
sidered as it contains the best signal response.
We have added, regardless of the type of the image considered, a pre-processing
stage based on morphological operator for increasing the accuracy of the Y-feature
selection. Morphological operators are applied in order to obtain rectilinear structures
in the image. These represent good indication of the presence of a vessel [68]. After
opening the image with a linear structuring element with 12 different orientations,
the maximum value of the 12 different responses is selected for each pixel. The lin-
ear opening and the sum of top-hats remove small isolated spots and enhance vessels
detection. To detect dark vessels, the image is inverted before using the same process.
Since we are processing the collection of frames independently of the circulation
phase of the dye, images can have both bright and dark vessels. In the experiments
described, we have preprocessed all images similarly. Y-features detected from
42
original and preprocessed images are paired for matching and the registration of
frames.
Figure 15. A view of registered images
43
Figure 16. Mosaic image of registered angiograms
Figure 15 shows the pairwise registration of color and one of fluorescein images
using an affine transform. The global registration was derived using the all pairs’
shortest path algorithm described in Section 2.5. The node corresponds to each pic-
ture and the weight of the edge represents the registration error in Equation (9).
Figure 16 shows the final mosaic obtained after registering a set of images.
We have conducted an evaluation to quantify the accuracy of the detection of the
Y-feature using the proposed articulated model. We have measured the number of
detected Y-features, as a variable of the total number of seeds points N
S
detected by
the PCA-based algorithm. The obtained ROC curve is displayed in Figure 17. The
ROC curve is saturated at approximately 0.17 false detection rate with more than
44
0.96 positive detection rate. Ground truth was provided by hand-tagging the location
of the Y-features in the image set.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.10.20.3 0.40.5 0.60.70.8 0.9 1
False Detection Rate
Positive Detection Rate
Figure 17. ROC curve of Y-feature detection according to the number of Y-
features N
S
Evaluating the accuracy of the registration across modalities is a difficult task
because of the lack of ground truth. We defined the ground truth by annotating corre-
sponding points manually and deriving the ground truth affine transform matrix.
We have also compared the proposed approach based on the matching of Y-
features to other different methods that use other image features. In Table 2, we show
the registration error obtained by: PCA-based features, the pairwise registration of Y-
features fitted to image characteristics, and using the global registration method.
91.09% of images (235 out of 258 images) were accepted for the registration, and
8.91% of images (23 out of 258 images) were automatically discarded from the regis-
tration since they had high registration errors. For color images, only one image is
discarded among 11 color images. Among 23 discarded images, 1 image is from
45
color images, 7 images from the early phase, 0 images from the middle phase, and 15
images from the late phase of fluorescein images. Figure 18 shows examples of dis-
carded images. We notice an improvement of the accuracy by using the proposed
global registration method with the all pairs’ shortest path algorithm. The geometric
error of matching points was calculated from the manually estimated ground truth
affine transform. The average pixel error of the proposed method is 2.757204. This
value was obtained by averaging over the whole set of images considered in the 11
sets of images (224 pairs).
This evaluation of 2-D registration using manual annotation has limitations,
however. First, manual annotation cannot be perfect, since annotation is performed
by a human. Second, the actual shape of the retina is a 3-D surface, but we registered
the images by 2-D affine transformation. These two sources of errors contribute to
the results reported in Table 2. We address these limitations by reconstructing the 3-
D shape of a retina in the next chapter.
(a) Early phase (b) Early Phase (c) Late phase
Figure 18. Examples of fluorescein images discarded automatically from the regis-
tration because of high registration error with other images in a sequence
46
Image
Name
# of
Images
# of
discarded
Images
Sequential
Registration
with PCA-
based features
Sequential
Registration
with
Y-features
Global
Registration
with
Y-features
Retina_1 20 0 2.2760 2.5658 2.3313
Retina_2 17 3 8.4572 3.5519 2.8198
Retina_3 9 0 3.6032 1.9391 1.7253
Retina_4 17 0 8.2562 3.4280 2.5488
Retina_5 24 1 2.7828 3.8374 1.6484
Retina_6 24 3 2.7372 3.3803 3.5249
Retina_7 24 4 4.7375 6.0997 2.8175
Retina_8 32 6 3.0115 4.7030 3.1998
Retina_9 20 0 4.2200 2.9743 2.4980
Retina_10 38 5 5.3870 4.8811 4.1132
Retina_11 33 1 3.9761 2.7736 2.0227
Total 258 23 4.335719 3.792287 2.7572
Table 2. Comparison of average geometric pixel errors using: the sequential regis-
tration of corners, the sequential registration of extracted Y-features, and the
global registration using Y-features.
47
Chapter 3 3-D Shape Inference
In this chapter, we propose an automatic method to infer the 3-D shape of the fundus
from a pair of images. The method consists of three steps: First, the matched Y-
features from a pair of images are used to estimate the epipolar geometry based on a
plane+parallax approach. Second, the images are rectified by the estimated geometry.
After the rectification, the search space on the scanline for stereo matching is esti-
mated based on the Y-feature correspondences. Subsequently, a dense disparity map
is estimated using Mutual Information criteria. Figure 19 shows examples of a pair
of retinal images used in this study. In Figure 20, the overall flow of proposed meth-
ods for the 3-D shape reconstruction is shown.
(a) (c) (e)
(b) (d) (f)
Figure 19. Various pairs of retinal images (a)(b) Images of Retina_A, (c)(d) Images
of Retina_B (e)(f) Images of Retina_C.
48
This chapter is organized as follows: In Section 3.1 the previous studies are re-
viewed. Section 3.2 addresses the problem of estimating epipolar geometry. Section
3.3 explains the matching algorithm for dense disparity map. Section 3.4 discusses
the experimental results including the comparison of the performance of various ste-
reo estimation methods on fluorescein images.
49
Figure 20. Overall flowchart of 3-D reconstruction of retinal fundus
$
$
$
%
&
'
'
'
(
)
=
57 . 19066 67 . 1730 507
53 . 1681 01 . 0 52 . 0
90 . 507 47 . 0 02 . 0
F
Estimate Epipolar Geometry using
Plane+Parallax Algorithm
Rectification of Images
Image1
Matched Y-features
Fundamental Matrix
Image2
Dense Stereo Matching
using Mutual Information
Rectified Images
Disparity Map
2-D Registration
50
3.1 Previous work
To obtain the 3-D shape of the retina, practitioners often use specialized devices such
as a scanning laser ophthalmoscope (SLO) or an optical coherence tomography
(OCT) system. Based on the principle of low-coherence interferometry, OCT pro-
vides an in vivo cross-sectional image of the retina that simulates microscopic visu-
alization and has axial resolutions under 3Zm [49][65]. Unfortunately, the cost of
OCT equipment and the expertise required for its interpretation have limited wide-
spread adoption of this technology, particularly in developing countries. In addition,
angiographic studies using these OCT systems have thus far proved to be challenging.
An alternative to subjective inspection and high-cost imaging consists of inferring
the 3-D shape of the retina using images acquired with a commonly-available, lower-
cost fundus camera. Figure 21 shows OCT data of the macula.
Figure 21. Cross-sectional image of a macula obtained by OCT
51
A few research efforts have been reported on the concept of reconstructing the 3-
D shape of an optic disc from the retinal images [28][29]. Previous methods are rela-
tively simple as they only consider images that are synchronously acquired using a
calibrated rig and therefore the epipolar geometry estimation is performed offline
using a calibration pattern. The problem of the 3-D shape reconstruction of a retinal
fundus from sequences of fluorescein images has not been addressed in the literature.
We believe that a successful 3-D reconstruction of the retina from a pair of fluo-
rescein images should provide a cost-effective and useful diagnostic tool for oph-
thalmologists.
The first step to obtain a dense stereo map is rectification, a process which aligns
the matching points along horizontal scanlines. In order to rectify two stereo images,
the epipolar geometry, represented by the fundamental matrix, should be estimated.
After rectification, each point in one image is matched with one in other image by the
matching function. In this section, we list the previous works on epipolar geometry
estimation and various stereo matching algorithms.
3.1.1 Epipolar Geometry Estimation
The classical approaches for estimating the fundamental matrix F (using 7- or 8-
points) [2][25][36], are not accurate because the shape of the retina is relatively flat
[24]. Even in disease states such as macular edema, where the central retina can be
“dramatically” thickened, the retinal surface elevation in these situations is relatively
52
modest compared with the overall radius of curvature of the retina. Furthermore, al-
though a retinal fundus has a curved 3-D shape, its surface curvature is relatively
small. In [34], the authors proposed a method for computing the fundamental matrix
using the plane-and-parallax approach. The method consists of first estimating the 2-
D projective motion, or the homography, and then inferring the epipoles by using two
other matching points belonging to residual parallax regions. The fundamental ma-
trix is then estimated using the epipoles and the homography. In [15], the authors use
the RANSAC (RANdom SAmple Consensus) method [17] to select a set of 7 match-
ing points that are not on a plane and estimate the fundamental matrix using the
plane-and-parallax algorithm.
3.1.2 Dense Disparity Map
A comprehensive review of several stereo estimation methods is provided in [51]. In
particular, graph cut [32] and belief propagation [55] methods have shown good per-
formance on test sets of stereo images with scenes having large depth changes, simi-
lar intensity for identical pixels, and textured regions. Retinal images, however, ex-
hibit unique and specific challenges, such as varying intensities, shallow depth retinal
fundus surface (especially in relatively normal retina), and low texture. Existing
methods in the literature are not well adapted for stereo estimation of the retinal fun-
dus from fluorescein images, as attested by our comparative study presented in the
experimental results section.
53
3.2 Estimation of Epipolar Geometry
The homography and the fundamental matrix are estimated from a set of Y-feature
correspondences between two images. This section describes the method for estimat-
ing the epipolar geometry from a pair of fluorescein images of a retinal fundus.
3.2.1 Homography Estimation
Using the matched pairs of Y-features, images are registered by a homography. The
RANSAC method is implemented to detect the inliers among the matched features.
The best four corresponding pairs of Y-features are selected to estimate the homo-
graphy, which minimizes the geometric error:
()() { }
$
%
&
'
(
)
+ =
i
i i i i best
2
1 2
2
1
min arg x H y Hy x H
H
(11)
where x
i
and y
i
are the matching pair and H is the homography.
3.2.2 Fundamental Matrix Estimation
At least seven point correspondences are necessary to estimate the fundamental ma-
trix of the epipolar geometry. However, various standard implementations of the 7-
points and 8-points algorithm were tested and did not provide satisfactory results
[25][36]. Figure 22 shows the result of the corresponding epipolar lines defined by
the fundamental matrix estimated by the 8-point algorithm.
54
In Figure 22, we show examples of erroneous estimation of the epipolar geome-
try from a set of correctly matched Y features. In the examples above, the motion of
the camera between two acquisitions is close to a pure translation. There are two
common situations where a degenerate case occurs in the estimation of the funda-
mental matrix [24]: Cases in which both camera centers and the 3-D points are on a
ruled quadric, or when all the points lie on a plane. In the case of retinal images, the
later case is common because the surface of the retina is relatively flat (except in a
few patients with severe retinal disease). Although the points are not on the same
plane, the range of 3-D depth is too shallow to get a satisfactory fundamental matrix
from the 8-point algorithm.
55
To overcome this limitation of the 8-point algorithm on retinal images, we use
the plane-and-parallax algorithm proposed in [34]. Given 4 corresponding points,
first a homography is calculated. Adding 2 more point correspondences belonging to
residual parallax regions enables us to estimate the location of the epipoles. The fun-
damental matrix can then be estimated by:
(a)
(b)
(c) (d)
Figure 22. Examples of estimated epipolar geometry using an 8-point algorithm
in the case of a translation of the camera. The estimated epipolar geometry is inac-
curate. (a)(b) Retina_A (c)(d) Retina_B
F=[e]
×
H (12)
56
where H is the homography obtained from the Equation (11) and the epipole e is the
intersection of (Hx
1
)×y
1
and (Hx
2
)×y
2
where x
i
and y
i
(i=1,2) are a matching pair of
features, and []
×
is a matrix notation for the calculation of the cross product. Figure
23 illustrates the plane-and-parallax algorithm to estimate the fundamental matrix.
We have implemented a RANSAC-based algorithm for the plane-and-parallax
method, similar to the one used for the estimation of the homography. For each esti-
Figure 23. Illustration of the estimation of fundamental matrix using the-
plane+parallax algorithm. 4 corresponding points define the projective
geometry,H. Two additional points in the residual parallax regions define
the epipole e,which then provides the estimate of the fundamental matrix
F=[ e]
×
H
Parallax
Parallax
H
e’
57
mated homography, we randomly select two additional matching pairs with large
geometric errors reported by Equation (8) to form a fundamental matrix. Subse-
quently, we select the fundamental matrix that satisfies the following equation.
()
*
+
,
-
.
/
=
i
i i best
2
min arg Fy x F
F
(13)
where x
i
and y
i
is the corresponding pair. Figure 24 shows the corresponding epipolar
lines obtained by the plane-and-parallax approach. The plane-and-parallax algorithm
can reconstruct not only flat-like surfaces but also surfaces with greater 3-D varia-
tions. After estimating the fundamental matrix, stereo images are rectified using
Gluckman and Nayar’s method [21] for the estimation of the disparity map.
58
3.3 Stereo Matching and 3-D Surface Estimation
3.3.1 Estimating the Search Space
In this section, we address the problem of estimating the depth range from the set of
matched Y-features. The definition of a suitable search space within the scanlines for
stereo matching is important: when the search space is too narrow, the corresponding
(a) (b)
(c) (d)
Figure 24. The epipolar lines obtained from the plane+parallax based fundamental
matrix estimation.
59
points cannot be identified, on the other hand, when the search space is too wide, it
may be possible to detect the corresponding pixels in the scanline, but the detection
is not necessarily accurate. A proper range for the search space gives a high confi-
dence in matching; and it also reduces the computation time. Most of the existing
algorithms manually select the search space to get the best results. One approach for
automatically estimating the search space is based on using a coarse-to-fine search
[41]. This method assesses the approximate offset of the search space from the
coarser level, and does not provide an accurate estimate. In addition, error from
wrong matches in the coarser level can be propagated to the finer level. In [45], the
search space is defined based on the maximum of the disparity gradient value. This
requires scenes or surfaces with strong disparity in depth, and therefore may not be
applicable to many retinal images. We propose a method that automatically defines
the search range by using the matched Y-features.
Although some cases of severe retinal disease may have “dramatic” distortion of
the retinal morphology with focal elevations of the retina several hundred microns
above the expected retinal surface, for many patients the retina has a relatively small
3-D depth variation. As such, the disparity search space is fairly shallow. Since dis-
parities for the sparse key features are known, disparity at other points can be easily
extrapolated because the retinal surface is an almost planar surface. Assuming that
corresponding Y-features are well distributed across the images, the search space can
be easily estimated from the set of known disparities.
60
Figure 25-(a)(b)(c) shows the histogram of the dense disparity map. Figure 25-
(d)(e)(f) shows the histogram of Y-features’ disparities. We note that the shape of the
distribution can be well approximated by a Gaussian distribution. Even though the
number of sample points defined by the Y-features is much smaller, the histogram
follows the same pattern as the one estimated on the dense disparity map. The aver-
age and the variance of the Y-features’ disparity are also similar to those of the dis-
parity map. After rectifying the position of matching Y-features, we can obtain the
mean(µ
Y
) and the standard deviation(
Y
) of disparities among all transformed Y-
0
2000
4000
6000
8000
10000
12000
14000
16000
-63
-61
-59
-57
-55
-53
-51
-49
-47
-45
Depth of all pixels (Retina_A)
(a)
0
2000
4000
6000
8000
10000
12000
14000
16000
-63
-61
-59
-57
-55
-53
-51
-49
-47
-45
Depth of all pixe ls (Retina_B)
(b)
0
5000
10000
15000
20000
25000
30000
35000
-10
-8
-6
-4
-2
0
Depth of all pixels (Retina_C)
(c)
0
1
2
3
4
5
-17
-15
-13
-11
-9
-7
-5
-3
-1
1
3
5
Depth of Y-features (Retina_A)
(d)
0
1
2
3
4
5
6
7
-63
-61
-59
-57
-55
-53
-51
-49
-47
-45
Depth of Y-features (Retina_B)
(e)
0
1
2
3
4
5
6
7
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1
Depth of Y-feature (Retina_C)
(f)
Figure 25. (a)(b)(c) Histograms of all pixels’ disparity for each stereo pair.
(d)(e)(f) Histograms of Y-features’ disparity. The mean(µ) and the standard
deviation() of each histogram is (a)µ=-6.1,=2.221 (d) µ
Y
=-6.0,
Y
=2.801
(b) µ =-55.1 =2.459 (e) µ
Y
=-54.3
Y
=2.121 (c) µ =-2.8 =0.983 (f) µ
Y
=-
3.5
Y
=1.336
61
feature correspondences. The search space S is then determined by the following ine-
quality:
µ
Y
-4
Y
< S < µ
Y
+4
Y
(14)
which has 99.994% confidence interval in the Gaussian distribution.
3.3.2 Dense Disparity Map using Mutual Information
In the rectified images, all matching pixels are located on the same scanlines. On the
scanline, the lower and the upper bound of the search space are estimated by the
method described in the previous section. This method produces a 1-D narrow
search space which enables a more accurate computation of the dense disparity map.
In case of the retinal fundus images where the intensities of the matching areas vary
across the stereo images, a general cross-correlation based algorithm does not pro-
vide satisfactory results. Instead, we propose an algorithm using Mutual Information
for matching points along the scanlines and estimating the depth map.
In [30], the authors used only joint entropy H(z
A
, z
B
) rather than Mutual Informa-
tion for the general stereo matching criteria. The proposed method does not work
well on low-textured areas. When mostly textureless areas are compared to each
other, joint entropy has a high value on textureless area, which is incorrect. However,
marginal entropies H(z
A
) and H(z
B
) help to boost the Mutual Information value on
the textured structures. In [27], he author estimated the discretized density function
p(z) in order to reduce the time complexity. p(z) is calculated at the given point and it
62
is convolved with a 2-D Gaussian to simulate the density function. However this
simplification reduces the accuracy of Mutual Information.
We use Equation (6) to implement Mutual Information. In Mutual Information,
the marginal entropies H(z
A
) and H(z
B
) are included and the density function p(z) is
computed individually using a Parzen window estimation with Gaussian assumption.
The disparity of each pixel is determined by following equation.
) , ( max arg ) ( max arg
,d B A
S d S d
MI d MI disparity z z
= =
(15)
where MI(z
A
,z
B,d
) is Mutual Information from Equation (6) and z
B,d
is the window of
d pixel distance from the window z
B
in the second image. Each pixel is centered in a
window z
A
, which is then compared with a window z
B,d
in the other image within the
search space S. The pair of windows that has maximum Mutual Information deter-
mines the disparity of each pixel position.
Since the range of disparity values is narrow, the subpixel resolution of disparity
is essential in the 3-D reconstruction of retinal images to avoid “staircasing” in the
disparity map. We estimate the disparity map with subpixel resolution using a quad-
ratic interpolation relying on neighboring Mutual Information values as follows:
) , min( 2
1
c a b
a c
disparity Subpixel
+ =
(16)
where a=MI(disparity-1), b=MI(disparity), c=MI (disparity+1), and disparity is the
value of selected disparity in Equation (15). Every disparity is determined individu-
63
ally by the Mutual Information matching criterion. We do not apply any smoothness
constraint to calculate disparities. Based on the taxonomy of stereo algorithms [51],
only the matching function is applied. Neither aggregation nor optimization methods
are used. In our experiments, we have observed that such methods only degrade the
accuracy of the matching.
3.4 Experimental Results
The reconstructed 3-D shape is evaluated against manually annotated ground truth. A
more accurate depth map would require knowledge of the internal and external cam-
era parameters, which are currently unavailable.
3.4.1 3-D shape of retinal fundus images
We conducted experiments on more than 20 sets of stereo images and 3 challenging
pairs of fluorescein images as selected by an experienced retina specialist (3 pairs are
shown in Figure 19). Retina_A has a large elevated lesion in the center of the image.
In addition, the stereo images of Retina_A have a very strong illumination change.
Retina_B is the concave shape with a very small lesion in the upper middle part of
the image. Retina_C is almost flat, but it has small intrusion at the optic disc.
After the Y-features in each image are extracted and matched, the epipolar ge-
ometry is estimated using the plane-and-parallax algorithm (Figure 23). The stereo
images are rectified based on the fundamental matrix, and the search space is esti-
64
mated from the set of matching Y-features (Figure 25 and Equation (14)). Using Mu-
tual Information, subpixel resolution dense disparity maps are estimated and shown
in Figure 26.
In Figure 26-(a)(b), (c)(d) and (e)(f), the 3-D shapes for Retina_A, Retina_B,
and Retina_C are shown from different views. In these images, the disparity values
have been scaled (by a factor of 30) to magnify the 3-D depth. The estimated 3-D
depth maps have little noise even though we did not apply any post-matching
smoothing step. The calculated depth values near the boundary are incorrect due to
occlusions. The size and degree of the elevated lesion area of Retina_A is demon-
strated in Figure 26-(a)(b). Figure 26-(c)(d) shows the concave shape of the fundus in
this myopic patient’s eye. The optic disc is intruded area in retinal fundus image. In
Figure 26-(e)(f), 3-D shape of a blister-like elevation of the retinal surface is accu-
rately estimated.
65
(a) Retina_A side view
(b) Retina_A
(c) Retina_B side view
(d) Retina_B
(e) Retina_C side view (f) Retina_C
Figure 26. Dense 3-D depth maps using Mutual Information
66
3.4.2 Evaluation of the stereo matching
The evaluation of the 3-D reconstruction is one of the most difficult processes in
medical images because of difficult access to ground truth measures. We evaluated
the performance of the reconstruction by annotating the corresponding points in the
stereo images. The error is measured by the pixel error between the annotated ground
truth disparity and the estimated disparity.
In the first experiment, we evaluate the performance of Mutual Information ac-
cording to the window size. There are two windows in Mutual Information. One is z
in Equation (6), which is the area window enclosing the current pixel position. The
other is W
p
, which is the Parzen window to estimate the density function. N
z
and N
p
are the sizes of each window. First, we fixed the Parzen window size to N
p
=3 and
measured the pixel errors, changing the area window size, N
z
. Second, we fixed the
area window size to N
z
=15 and monitored the errors, changing the Parzen window
size, N
p
. The error measures for these two scenarios are plotted in Figure 27-(a) and
(b).
67
In Figure 27-(a), as the area window size increases, the errors decrease first, then
stabilize, and increase. The graph of Parzen window size also follows the similar
pattern with that of the area window size. Since Mutual Information is a statistical
measure, more data gives better accuracy. However, when the window size is too
large, the saliency of Mutual Information decreases. In addition, the larger window,
the slower is the computation.
In the second experiment, we evaluated the performance of different stereo
matching algorithms. We have used the evaluation codes from the Middlebury data-
base [71] for SSD, DP (Dynamic Programming), SO (Scanline Optimization), and
GC (Graph Cut) algorithms. For SSD, not only sum of squared difference, but AD
(Absolute Differences) were also compared with different parameters, such as win-
dow size, shiftable window, and truncation. For global optimization algorithms, such
0.0 0
0.5 0
1.0 0
1.5 0
2.00
2.50
3.00
5 10 15 20 2 5 3 0 35 40 45 50 5 5 6 0
Area W indow S ize
Pixel Error
0.00
0.50
1.00
1.50
2.00
2.50
3.00
1 2 34 56 7891011
P arzen W in do w Size
Retina_A
Retina_B
Retina_C
(a) (b)
Figure 27. (a) Errors according to the area window size, N
z
, of Mutual Informa-
tion. (b) Errors according to the Parzen window size, N
p
.
68
as DP, SO, and GC, we compared the performance with two matching functions
(SSD and AD) and with different smoothness terms, and the best parameters for each
image were selected for each method. We implemented NCC (Normalized Cross
Correlation) and Mutual Information. In Table 3, we can see that Mutual Information
outperforms all other methods in terms of accuracy. Mutual Information generated
little noisy estimation and is highly accurate, even in areas with low texture.
In Figure 28-(a)(c)(e), the surfaces of 3-D retinal fundus are shown. We reduced
the boundary area to remove outliers caused by occlusions. A slice of the surface is
displayed at the intersection with a vertical plane. In Figure 28-(b)(d)(f), the image
texture is mapped on top of the 3-D depth map. The green line, which matches with
the intersection position of the surface images on the left, displays the 3-D depth of
the texture. Except the boundary area, most of areas are well reconstructed. The 3-D
shape of lesions and an optic disc are also accurately estimated.
Matching Functions Retina_A Retina_B Retina_C
Sum of Squared Distance(SSD) 3.5281 2.4851 3.5207
Dynamic Programming(DP) 2.5780 3.4193 4.0795
Scanline Optimization(SO) 2.1882 2.9460 3.5228
Graph Cut (GC) 1.9562 2.4396 3.8251
Normalized Cross Corr.(NCC) 1.6873 0.8419 0.9400
Mutual Information(MI) 1.5673 0.8899 0.8827
Table 3. Disparity Errors of different stereo algorithms. For SSD, DP, SO, and GC,
parameters with the best performance for each image are selected. For NCC and
MI, the parameters are fixed across images.
69
(a) Surface map of Retina_A
(b) Texture map of Retina_A
(c) Surface map of Retina_B
(d) Texture map of Retina_B
(e) Surface map of Retina_C (f) Texture map of Retina_C
Figure 28. Reconstructed 3-D shapes of retinal fundi (a)(c)(e) Surface maps with
an intersection plane. The heights of the vertical planes are determined by the
minimum and maximum values of 3-D depths (b)(d)(f) Texture maps. Each line
matches with the intersection position of the surface image on the left.
70
Chapter 4 3-D Reconstruction and Registration
In this chapter, we address the problem of the 3-D Euclidean reconstruction and the
registration from multiple images, given that the observed surface is nearly planar.
This is difficult, as classical methods work well only if the scene is truly planar (mo-
saicing) or the scene has certain significant depth variations (classical Structure-
from-Motion (SfM)). One domain in which this problem occurs is image analysis of
a retinal fundus. In Figure 29, examples of fluorescein images are shown. Our ap-
proach is to first assume planarity, and perform a 2-D global registration. A first
bundle adjustment is applied to find the camera positions in Euclidean space. We
then select two images and compute the epipolar geometry between them using a
plane+parallax approach. These images are matched to generate a dense disparity
map using Mutual Information. A second bundle adjustment is applied to transform
the disparity map into a dense Euclidean 3-D depth map, fixing the 2 camera posi-
tions. A third bundle adjustment is performed to refine both camera positions and a
3-D structure. All images are back-projected to the 3-D structure for the final regis-
tration. The method is evaluated by synthetic and real retinal images. The entire
process is fully automatic. The method is general, and can be applied to other do-
mains, as shown in the experiments. Figure 30 shows the overall flow chart of the
proposed approach.
Chapter 4 is organized as follows: in Section 4.1, previous studies are reviewed.
Section 4.2 summarizes 2-D global registration and 3-D shape reconstruction algo-
71
rithm. Section 4.3 describes the three-pass bundle adjustment algorithm to estimate
the camera motion and to reconstruct the 3-D structure in Euclidean space, followed
by the final 3-D registration. In Section 4.4, using the synthetic structure, the method
is evaluated and near-planarity is defined. Experimental results are given in Section
4.5.
72
(a) (e)
(b) (f)
(c) (g)
(d) (h)
Figure 29. (a)(b)(c)(d) Example of retinal fluorescein images (e)(f)(g)(h) Two
pairs of stereo retinal images from a patient with age-related macular degenera-
tion. A characteristic ‘blister’ in the macula is harder to see in the red-free images
(e)(f) than it is in the images taken after injection of fluorescein dye (g)(h) into the
antecubital vein. Nevertheless, the image in (h) is blurred which complicates
stereoscopic measurements.
73
Figure 30. Flow chart of the proposed approach
Obtain epipolar geometry using
plane+parallax approach
2-D global registration
Stereo matching
using Mutual Information
Dense disparity map
Image sequence
Select two images
Rectified two images
Change dense disparity map into
coordinates of original images
Dense correspondence in
original images
Matching features
Three-pass bundle adjustment
Estimation of camera parame-
ters using planar assumption
Euclidean reconstruction of 3-D structure
Final refinement of camera parameters and 3-D structure
Registration by back-projecting all images to 3-D structure
Camera parameters R, T, f
Camera Motion and 3-D structure
Refined Camera Motion and 3-D structure
Registered image sequence
74
4.1 Introduction
Registration of images of a near-planar surface is generally performed in 2-D, which
generates errors due to 3-D depth information. For more accurate registration, the 3-
D structure of the surface has to be explicitly considered. However, the near-planarity
condition makes it difficult to estimate a 3-D structure since it is a quasi-degenerate
case for the estimation of the geometry from images [19]. For the Euclidean recon-
struction of the surface, the plane at infinity and the absolute conic have to be esti-
mated when camera parameters are not provided. However, the lack of 3-D depth in
the surface prevents us from estimating even affine reconstruction. Triggs suggested
a method reconstructing a near-planar surface from the homography of each image
[56]. Assuming that the projection of one direction of points at infinity is orthogonal
to every image plane, the camera parameters are estimated for Euclidean reconstruc-
tion. This method needs at least 3 images if only the focal length is estimated. How-
ever, the author recommends 8~10 images for reliable results. Furthermore, the im-
ages should be taken with 10~20° angle differences.
Near-planar 3-D surfaces are common, such as man-made structures (façades of
buildings), terrain images from a satellite, or any image of 3-D structures taken at
distance. To reconstruct and register those images is an important issue since they are
quasi-degenerate cases to reconstruct in 3-D Euclidean space. One good domain
where this problem occurs is fluorescein angiogram sequences of a retinal fundus.
We recall the properties of the fluorescein angiogram of a retinal fundus: (1) the reti-
75
nal fundus has a shallow depth. The overall shape is a patch of a sphere, but there is
intrusion in an optic disc, and sometimes there are extruded blisters in the middle of
the macula. (2) The intensity level of angiograms changes significantly during the
circulation of the fluorescein dye. (3) Camera motion is close to a translation. In the
middle phase of a fluorescein sequence, images have many vessels and their bifurca-
tions with high contrast. In contrary, the early and the late phases of fluorescein se-
quence pairs have fewer vessels with lower contrast. Thus, when we try to find
matching features across images, (4) the number and the position of the matching
features of image pairs are not consistently distributed. Those properties make it dif-
ficult to apply Triggs’ algorithm, or any other auto-calibration method for planar
scenes [31].
Brown and Lowe proposed an image registration algorithm for near-planar sur-
face images such as scenery images [6]. They use SIFT features for the invariant
features and RANSAC method to obtain the matching inliers [38][39]. They estimate
a focal length and a rotation vector for each camera using bundle adjustment. Since
they do not consider the 3-D structure, nor translation of the camera, the method also
creates residual errors. They reduce residual errors visually by blending multiple im-
ages. However, our goal is not a mosaic, but a reconstructed 3-D structure and an
accurately registered image sequence of the retinal fundus, which is more useful for
the doctor’s diagnosis.
76
We propose a fully automatic method for the reconstruction and the registration
of images of the near-planar surface. Since we cannot obtain 3-D structures from all
image pairs, we use only one dense 3-D depth from the most accurately registered
pair. Even if the number of matching points in the early or late phase of fluorescein
angiograms is small, we may estimate at least the camera parameters of those images.
Combining a dense 3-D depth structure and the images with camera motion, we reg-
ister a full set of fluorescein images in 3-D.
We propose a three-pass bundle adjustment algorithm. The procedure is as fol-
lows: first, we register images in 2-D using a global registration algorithm [6]. Sec-
ond, a pair of images with the low registration error is selected and a dense disparity
map of the pair is obtained using the plane+parallax algorithm for the geometry, and
Mutual Information for the matching criteria. Third, three-step bundle adjustment
algorithms are applied for the estimation of both the camera parameters of each im-
age and the reconstruction of a 3-D structure. To take depth effects into account, we
first assume that the majority of the scene is planar. The scene is estimated as a plane
and 3-D depth information out of the plane is considered to acquire the 3-D geometry
from the images. Finally, using the camera parameters, every image is back-projected
to the reconstructed 3-D structure and re-projected again to the reference image to
acquire the final registration.
77
4.2 2-D Global Registration and 3-D Shape Reconstruction
For the 3-D reconstruction, the same methods described in previous chapters are used,
which are the 2-D global registration and the 3-D shape reconstruction of the near-
planar surface. Y-features are extracted using an articulated model and matched
across images using Mutual Information. The global registration method, using all
pairs’ shortest path algorithm, finds the reference image and its connections to other
images with the lowest registration error.
After registering images in 2-D, a dense disparity map is estimated from two
images, which have lowest registration error, using the method in Chapter 3. We
make use of the planar assumption: the plane+parallax algorithm is applied to esti-
mate the fundamental matrix [34]. We rectify the images using Gluckman and
Nayar’s algorithm by obtaining 2-D rectification homographies R
1
and R
2
for images
I
1
and I
2
, minimizing resampling effects [21]. Two images are rectified to I
R
1
and
I
R
2
, such as,
I
R
1
=R
1
I
1
,
I
R
2
= R
2
I
2
.
Figure 31 shows the rectified two images.
78
(a) (b)
Figure 31: Rectified images. Two images are selected for the estimation of the
dense disparity map
The search range of the disparity is determined by Gaussian assumption of Y-
features’ distribution [14]. Subsequently, a point in I
R
1
(x,y) is matched with that of
I
R
2
.(x+d, y) in the 1-D search range dS. Mutual Information in Equation (6) is used
as matching criteria, since Mutual Information is robust to intensity changes in the
fluorescein images [14]. The disparity map D is calculated by the following equation.
)) , ( ), , ( ( max arg ) , (
2 1
y d x y x E y x
R R
MI
S d
+ =
I I D
(17)
where E
MI
calculates Mutual Information between two rectangular neighborhood
windows centered on the specified position. The disparity map D describes the corre-
spondence of I
R
1
and I
R
2
such as,
I
R
1
(x,y) I
R
2
(x+ D(x,y), y) (18)
We transform the above equation to original positions I
1
and I
2
by applying inverse
rectification homographies R
1
-1
and R
2
-1
.
79
I'
1
(x
1
, y
1
) = R
1
-1
I
R
1
(x,y)
I'
2
(x
2
, y
2
) = R
2
-1
I
R
2
(x+ D(x,y), y)
(19)
I'
1
(x
1
, y
1
) and I'
2
(x
2
, y
2
) are matching points in the original images. Instead of dense
disparity map, we use these dense correspondences for original images as an input to
the bundle adjustment to achieve the 3-D reconstruction.
4.3 3-D Euclidean Reconstruction and Registration
The disparity map encodes correspondences between rectified two images, but is not
a reconstructed structure. The 3-D structure of a surface should be reconstructed at
least up to Euclidean (or metric) space, where only scale is not determined. For the
Euclidean reconstruction of a structure, the internal and external camera parameters
are necessary. In this section, we propose a 3-D Euclidean reconstruction method for
near-planar surfaces using a three-pass bundle adjustment algorithm. In addition, the
reconstructed 3-D structure is used for the registration of images of a near-planar
surface.
4.3.1 3-D Euclidean Reconstruction
When the camera parameters are unknown, the classical reconstruction method is the
stratified reconstruction algorithm, which reconstructs a surface first in projective,
second in affine, and third in metric space, step by step. Since we obtained a funda-
80
mental matrix by the plane+parallax algorithm, we can infer the projective transfor-
mation matrices by
P
1
= [I | 0], P
2
= [[e']
×
F| e']
We express points in the 3-D structure as X, then the projected image points are x
1
=
P
1
X for image I
1
and x
2
= P
2
X for I
2
.
If we find the plane at infinity, the affine transformation matrices can be ob-
tained by applying the 3-D homography H
a
such that
P
a1
= P
1
H
a
-1
, P
a2
= P
2
H
a
-1
, and X
a
= H
a
X
where
$
%
&
'
(
)
=
0
?
I
H
0 |
a
, and is the plane at infinity. We need 3 vanishing points to
define the plane at infinity . However, the 3-D depth of a surface is too shallow to
determine the vanishing point in the depth direction. Therefore, we cannot proceed to
the next stratified reconstruction method any more. Figure 32 shows the projective
reconstruction of a retinal fundus in Figure 3. The 3-D points are estimated by trian-
gulation [24]. Since the reconstructed points are on the plane, we cannot estimate the
plane at infinity.
81
The other method for 3-D reconstruction is using bundle adjustment [37][57].
This method can determine the internal and external camera parameters of each im-
age and reconstruct the 3-D scene structure in Euclidean space simultaneously. We
use bundle adjustment using the Levenberg-Marquardt minimization method in [37].
Since we have corresponding Y-features for all image pairs and dense matching
points of a pair of images I'
1
and I'
2
in Equation (19), we can apply bundle adjust-
ment. However, if we apply bundle adjustment to get all camera parameters and the
3-D structure at the same time, the method falls into local minima. When we try to
estimate the camera parameters first, the 3-D structure is not provided, and vice versa.
To solve this situation, we propose a three-pass bundle adjustment using first a pla-
nar assumption for the near-planar surface.
As we need 3-D points for the initial input of bundle adjustment, we assume
those 3-D points are on a plane, parallel to the image plane. Initially, we set the 3-D
structure to have the same size as the reference image plane since the scale does not
(a) Frontal view (b) Side view
Figure 32: Projective reconstruction of images in Figure 31
82
matter in Euclidean space. Therefore, the width and the height of the 3-D structure
are the same with those of the reference image and the planar 3-D structure is parallel
to the reference image. Assuming that there is no skew and the CCD cell is a square,
we estimate only the focal length f for the internal camera parameter. Figure 33-(a)
shows the initial settings.
83
Figure 33. Three-pass bundle adjustment. I
R
is the reference image. The triangle
indicates camera position with R, T, and f
(a) First bundle adjustment: Assume planar structure, and estimate camera motion of each image
(b) Second bundle adjustment: Fix two camera motions, and estimate 3-D structure
Matching features across images
…
I R
Plane parallel to reference image
3-D points are on the plane
…
I R
Dense Matching Map
…
I' 1
I' 2
(c) Third bundle adjustment: Refine motion and structure
84
In the first-pass of bundle adjustment, fixing the shape of the 3-D structure X,
which is a plane at this moment, we estimate the rotation matrix R
i
, the translation
vector T
i
of the each camera, and focal length f
i
(i = 1, .., m, and m is the number of
images in a sequence) from the matching Y-features across images. The depth of the
whole 3-D plane is also determined in this step. All these parameters are estimated
by minimizing the following re-projection error,
= =
=
m
i j
j i ij
i
E
1 1
) , (
n
X P x X P
(20)
where x
ij
is j-th feature position in the image i. n
i
is the number of corresponding
features in the image i, P
i
= K
i
[R
i
|T
i
], and the internal camera parameter K
i
consists
of only the focal length f
i
.
In the second-pass of bundle adjustment, we fix the camera parameters of two
images I'
1
and I'
2
, and estimate only the 3-D structure from the dense correspon-
dences of the stereo images, I'
1
and I'
2
. The index order of points of the dense 3-D
structure is set as same as that of the reference image to refer the back-projected posi-
tions of the 3-D structure in the final registration process. Since it is assumed that the
surface is a plane, the points not on the plane cause re-projection errors when we re-
project the 3-D structures to any image except a reference image. By moving those 3-
D points on the plane to a certain 3-D position with reducing the re-projection error,
the 3-D structure is estimated minimizing the following equation.
{ }
=
+ =
p
k
d
k k
d
k k
d
E
1
2 2 1 1
' ' ) ( X P I X P I X
(21)
85
where p is the number of dense matching points, X
d
k
is 3-D points of those dense
matches, and P
1
and P
2
are the projection matrices of I'
1
and I'
2
. Figure 33-(b) illus-
trates the second-pass bundle adjustment.
In the third-pass bundle adjustment, we refine both the camera motion for all
images in a sequence, the 3-D structure using the same matching features x
ij
, the
camera parameters P
i
from the first-pass bundle adjustment, and the reconstructed 3-
D structure X
d
j
from the second-pass bundle adjustment.
==
=
m
i
n
j
d
j i ij
i
E
11
) , ( X P x X P
(22)
Figure 33-(c) shows the third-pass bundle adjustment.
After the three-pass bundle adjustment algorithm, we have both the internal and
(a) Frontal view
(b) Side view
(c) Frontal view
(d) Side view
Figure 34: Reconstructed 3-D structures of the retinal fundi (a)(b) Reconstruction
of the retina from images in Figure 29-(a)(b)(c)(d) Reconstruction of the retina
from images in Figure 29-(e)(f)(g)(h).
86
external parameters of each camera and the 3-D structure. Figure 34 displays recon-
structed 3-D structures of the retinal fundi. We use this information for the final reg-
istration method.
4.3.2 Registration by Back-projection
The reconstructed 3-D structure of a retinal fundus itself provides useful information
for doctors to diagnose the retinal disease. Moreover, we make use of the 3-D struc-
ture for the registration of other fluorescein images. The common 2-D registration
method is to set a reference image and to register other images one by one in 2-D
[12]. When the surface is not a plane, the registration using this method may generate
residual errors caused by the 3-D structure. The other method is to reconstruct the
surfaces from each image pairs in 3-D and register a 3-D structure using 3-D registra-
tion algorithms, such as iterative closest point (ICP) method [4]. When the surface is
near-planar, it is hard to reconstruct the 3-D structure from the image pairs as men-
tioned in Section 4.1. Instead, we propose the registration method using one 3-D
structure and camera positions by back-projecting all images to the 3-D structure.
The reference image is one of the images with the 3-D dense disparity map. To
register the i-th image I
i
to the reference image I
R
, all pixels x
R
in the reference image
plane are back-projected to the 3-D structure X, and then, the 3-D point X is pro-
jected to the image point x
i
.
87
To estimate the back-projected 3-D points from the reference frame, we trace
back the index of the 3-D structure, which has the same index order with reference
image in Section 4.1. The projection matrix P
R
for the reference image is
x
R
= P
R
X. (23)
The function of back-projection b is
) , (
R R
b x P X= . (24)
The 3-D structure X is re-projected to image i by
) , (
R R i i i
b x P P X P x = =
(25)
The pixel value of re-projected position x
i
determines that of registered position of
image i. Since the re-projected position of x
i
is not an integer, the pixel value of x
i
is
determined by bilinear interpolation from the neighboring 4 pixels. Figure 35 illus-
trates the registration method using the 3-D structure.
Figure 35. Registration by back-projecting to a 3-D structure
…
Reference Image I
R
x
R
I
1
I
i
X=b(P
R
, x
R
)
x
i
=P
i
b(P
R
, x
R
)
x
1
=P
1
b(P
R
, x
R
)
b(P
R
)
P
i P
1
…
88
4.4 Definition of Near-Planarity
In this section, we evaluate the proposed method using synthetically generated 3-D
structures. In addition, boundary of 3-D reconstruction of retinal images is specified
changing the depth of the 3-D structure, the baseline and the angle of two cameras.
Our result presents a factor, which is directly related to near-planarity. Based on the
factor, we discuss our definition of a near-planarity and its minimum requirements.
4.4.1 Synthetic Retinal Images
Reconstruction of a retinal fundus is a difficult process since no method provides the
3-D structure of the whole retinal fundus. OCT data may be a good ground truth
value but an OCT machine estimates only a partial area of a retinal fundus rather than
the entire retinal fundus. Because of this limitation of OCT, we evaluated our method
with synthetically generated retinal images.
We generated two kinds of 3-D surfaces, one is a hyperbolic paraboloid, and the
other is a sinc surface. The hyperbolic paraboloid has a similar shape with a saddle.
The saddle point is on the center of the surface. In the sinc surface, the height of sine
waves peaks in the middle and gradually decreases toward the end. The width and
height of both surfaces are 640×480. We set the internal and external camera parame-
ters similar to those of a retinal fundus camera. In order to generate synthetic images,
a texture is mapped to the 3-D surface and the surface is projected to each image
89
plane based on camera parameters. Figure 37 shows the images where the textured 3-
D surfaces are projected.
90
(a) 3-D structure of a hyperbolic
paraboloid, size : 640×480×42
(b) 3-D structure of a Sinc function
the size of the structure : 640×480×30
(c) 3-D structure of a hyperbolic
paraboloid, size : 640×480×171
(d) 3-D structure of a Sinc function
the size of the structure : 640×480×121
(e) 3-D structure of a hyperbolic
paraboloid, size : 640×480×301
(f) 3-D structure of a Sinc function
the size of the structure : 640×480×304
Figure 36. Synthetic 3-D surfaces with different depth range. (a)(c)(e) A hyper-
bolic paraboloid function,
=
2
2
2
2
) ( ) (
a
x x
b
y y
s z
c c
, where s is a variable scale
value, (x
c
, y
c
) is the center of the image, and a and b values are fixed as the one
10th of the width of the structure, which is a=b=64. (b)(d)(f) A sinc function,
( )
w
w
s z
sin
= , where
2
2
2
2
) ( ) (
b
y y
a
x x
w
c c
+
= .
91
In the first experiment, with varying depth of the surface, we generated multiple
sets of synthetic images. The purpose of this experiment was to define the upper and
lower boundaries of a near-planar surface, located between a plane and a non-planar
3-D surface. Figure 36 shows examples of two synthetic surfaces with different
depths. For each set of images, the complete reconstruction process demonstrated in
this dissertation (i.e., Y-feature extraction, 2-D global registration, 3-D shape recon-
struction, and 3-D Euclidean reconstruction) was executed respectively. Figure 38
shows the reconstructed structures and error map. The projected images of planar
(a) Images projected onto a sinc surface
(b) Images projected onto a hyperbolic paraboloid
Figure 37. Images projected onto synthetic surfaces.
92
surfaces such as Figure 36-(a) and (b) were not reconstructed. This is because the
projected images did not provide enough 3-D information (parallax residual in this
case) to estimate the epipolar geometry. In Figure 38-(d), the sinc surface was accu-
rately reconstructed, except the central area due to the self occlusion of the structure
and insufficient range of parallax residuals to represent the sharp extrusion.
After the reconstruction, Euclidean distances between the synthetic 3-D structure
and the reconstructed 3-D structure were calculated. Figure 39 shows the reconstruc-
tion errors regarding the varying depths of structures. For the method unable to re-
construct images, the error was set as the maximum value. In Figure 39-(a), average
reconstruction errors are plotted. The graph indicates relatively high reconstruction
errors for planar surfaces. We investigated these planar surfaces and determined that
the high error came from the inaccurate estimate of epipolar geometry. Because most
of the matching points are on the same plane, unreliable parallax points were selected,
producing the erroneous fundamental matrix. On the other hand, the error stays com-
paratively low in near-planar surfaces.
93
(a) Reconstructed hyperbolic paraboloid
size : 640×480×171 Error map ( Error : 6.518786)
(c) Reconstructed hyperbolic paraboloid
size : 640×480×301 Error map ( Error : 12.06079)
(b) Reconstructed sinc function
size : 640×480×121 Error map ( Error : 10.03085 )
(d) Reconstructed sinc function
size : 640×480×304 Error map ( Error : 39.74035)
Figure 38. Reconstructed synthetic surfaces with different depth range. The
boundary of the 3-D structure is not well reconstructed for boundary occlusions in
stereo matching.
94
As the depth increases, the error increases gradually. The reasons for the in-
creased error are as follows. First, the first-pass bundle adjustment in the proposed
method assumes that the surface is planar thus limited in processing the wide-depth
3-D structures. Secondly, the self-occlusion made it difficult to estimate an accurate
disparity map. Finally, the high scale of depth increased the absolute error. However,
we note that an accurate comparison between wide-depth structures and narrow-
depth structures requires normalization of the each reconstruction error with its depth
scale. Thus Figure 39-(b) plotted the normalized reconstruction error. As described in
the figure, when the depth ranges are between 90 and 250, the method provided a
reliable result. In case of a hyperbolic paraboloid, 97% of the 3-D structure was cor-
rectly reconstructed when the depth range was 152. In case of a sinc surface, 92% of
the surface was correctly reconstructed. Most of the errors resulted from the lack of
texture when processing stereo matching of retinal images. In addition, the sinc sur-
face particularly has a sharp extrusion in the center of the structure, further causing
the increment of reconstruction error.
95
Based on this experiment, we estimate the lower boundary of the depth range for
reconstructing a retinal fundus as 11% of the diagonal length of the image. For ex-
ample, in a 640x480 structure, the required depth range is at least 90. However, this
minimum requirement applies only to commercial retinal fundus camera settings. If
the cameras have higher resolution or wider baseline, the required range would be-
come lower and also enables the reconstruction of retinal fundus with narrower depth
range. The upper boundary of the depth was 45% of the diagonal length of the image.
The second experiment computed the reconstruction errors with varying baseline
of cameras. To capture the images in wide baseline, the angle of cameras was ad-
justed to direct the 3-D structure. We fixed the depth range as 100, and all other
camera parameters were set as the same as the first experiment. Figure 40 shows the
average reconstruction errors with respect to the baseline of cameras. With the nar-
row baseline cameras, the method did not estimate the fundamental matrix, because
the range of parallax residuals was too narrow. On the other hand, in case of the rela-
(a) (b)
Figure 39. (a) Average reconstruction errors based on the depth range of 3-D
structure. (b) Normalized reconstruction errors. (a) is normalized by the size of the
depth.
96
tively wide baselines, the method reconstructed structures correctly. With the wider
baseline cameras, the reconstruction errors gradually increased because that; (1) im-
ages were distorted by a perspective effect; and (2) parts of images were self-
occluded by 3-D structures.
A verage Reconstruction Error w .r.t Baseline
0
5
10
15
20
25
30
35
20 40 80 1 20 1 60 2 00 2 40 2 80
B as eline
Sin c
Hyp e r
Figure 40. Average reconstruction errors according to base-
line of two cameras
4.4.2 Definition of Near-Planar Surface
The depth of a structure and the baseline of cameras are important factors to recon-
struct a near-planar surface. However, each factor by itself does not determine the
accuracy of reconstruction. For example, even if the depth is very narrow, high reso-
lution images and wide baseline of cameras may provide accurate estimate of 3-D
structure. In general, besides the depth of the structure and the baseline of cameras,
97
angle of cameras and resolution of the images are also relevant to the definition of a
near-planar surface image.
Our extensive experiments on different sets of images revealed that the key vari-
able related to the definition of a near-planar surface was the parallax range, which
was determined by the depth of the structure, the resolution of the image, and the
baseline and the angle of cameras. The parallax range is possibly determined simply
by examining the minimum and the maximum values of parallax residuals. However,
the outliers of residuals precluded the usage of the minimum and the maximum val-
ues of residuals. Assuming that the parallax residuals of retinal images follow a
Gaussian distribution, we used the similar approach in Equation (14) that was used to
automatically determine the search space. Using dense correspondences used for the
second-pass bundle adjustment, we estimate the parallax range R.
µ
d
-2
d
< Parallax Range < µ
d
+2
d
R=4
d
(26)
where µ
d
is the mean and
d
is the standard deviation of the distance of dense corre-
spondences. If the distribution of the residuals follows the normal distribution,
95.45% of samples are included in the interval R. For the images that do not follow
normal distribution, median, minimum, and maximum values are possibly applicable
to define the parallax range.
98
Accordingly, we define near-planar surface image as an image with low parallax
ranges. The defined image in this case includes not only the image of the near-planar
surface but also images of a surface with a wide 3-D depth, including those taken at
distance with low resolution or a narrow baseline. When images of a surface with
very narrow depth are taken closely with high resolution or taken with a wide base-
line, those are also included as a near-planar surface image. There are two minimum
requirements for the reconstruction of near-planar surface images. First, the parallax
residuals have at least two Manhattan distances of a pixel. Second, two parallax di-
rections are not parallel. A more reliable result, however, needs a parallax range that
is wider than the minimum requirements. Figure 41 shows that each image has some
intervals of residual ranges where the proposed method worked best. However, the
intervals vary depending on the characteristic of surfaces.
(a) Registration Error (b) Normalized registration error
Figure 41. Reconstruction errors according to range of parallax residuals
99
4.5 Experimental Results
We tested on 5 fluorescein sequences of images and a sequence of façade images.
Each fluorescein sequence has 10~20 images (total 60 images). Two sequences (Ret-
ina_1 and Retina_2) are images of healthy retinal fundi; Three sequences of Ret-
ina_D taken in different fluorescein phases are tested, and two sets are shown in
Figure 29-(e)(f) and (g)(h). The images demonstrate a retinal pigment epithelial de-
tachment (‘blister’) in the macula secondary to choroidal neovascularization from
age-related macular degeneration. The ‘blister’ in the Figure 29-(h) is out of focus,
which makes the image reconstruction difficult. Images taken during early or late
circulation of the sodium fluorescein dye are very dark with poor contrast. For doc-
tors, the accurate registration of fluorescein retinal fundus images is a very important
issue, because the small intensity changes in fluorescein images can produce a differ-
ent diagnosis. From the intensity changes in a pixel position in fluorescein images,
different types of disease can be defined [26][65]. Therefore, doctors take images
mostly on the pathologically suspicious area with small translational motion. We
tested the performance of each registration method by manually annotating the
ground truth in Section 4.5.1 and by using the current clinical gold standard, optical
coherence tomography (OCT) in Section 4.5.2. In Section 4.5.3, other near-planar
surface images are tested with the proposed method. In Section 4.5.4, we compared
100
our method with other state-of-art commercial products, Autostitch, Photomodeler,
and Boujou.
4.5.1 Fluorescein Images
First, the 2-D global registration was performed on sets of fluorescein images. For all
images, Y-features were extracted using the articulated model and they were matched
across images using Mutual Information in Equation (6). For the global registration,
when the geometric error of pairs in Equation (8) was higher than the certain thresh-
old value (
geometic
), those pairs were discarded from the fluorescein sequence. We set
geometic
=3 pixels; accordingly, the normalization factor was set as #=3 in Equation
(9). Since this threshold is very tight, 21.43% of images, whose registration errors
were higher than the threshold value, were discarded in the experiment.
Two images with the small registration error and enough parallax range (more
than 3 pixels of parallax range) among image pairs were selected for dense matching.
The mean 2-D registration error value of selected image pairs was 1.4830 pixel. Af-
ter obtaining dense correspondences and applying the three-pass bundle adjustment
algorithm, the 3-D structure of a retina is reconstructed and camera parameters of all
images are estimated. The reconstructed 3-D structure is shown in Figure 42. In
Figure 42-(a)(b), a healthy retina was reconstructed as a curved surface, and its optic
disc is also well reconstructed. A retina with a blister was clearly reconstructed in
Figure 42-(c)(d).
101
All points in the reference image were back-projected to the 3-D structure and
re-projected again to each image to define the pixel value of each point in a regis-
tered image. Since the registration is performed only on the 3-D structure, if back-
projection of a point does not intersect the 3-D structure, the point is not registered.
Therefore, the displayed area of back-projected images is smaller than that of 2-D
registration method. The registration error of fluorescein images for each method is
shown in Table 4. Since the depth range of Retina_1 and Retina_2 is quite narrow,
the difference of registration errors between 2-D registration and the proposed regis-
(a) Surface Image (b) Textured Image
(c) Surface Image
(d) Textured Image
Figure 42: Reconstructed 3-D surface images of the retinal fundi. The line indi-
cates the elevation of the surface in one y-axis.
102
tration method was not evident. However, the registration errors of the sequence of
the retina with a blister were 2.1435 for 2-D registration and 1.2469 for registration
by back-projection, since the retina has a larger 3-D depth with larger parallax ranges.
Figure 43 shows a mosaic of registered retinal images.
Retina_1 Retina_2
Retina_D
with a blister
Parallax Range 2.664023 1.830510 5.882474
Error of
2-D Global Registration
1.047754 2.475218 2.143509
Error of Registration
using 3-D Structure
0.980606 2.123321 1.246913
Table 4. Comparison of average geometric pixel errors of retinal fundus
images using the 2-D global registration and the registration by back-
projecting to 3-D structure. For each set of retinal images, parallax ranges
are shown.
Figure 43. A mosaic of Retina_1. 12 images were -blended in the mosaic.
103
4.5.2 Evaluation of Reconstruction using OCT data
The evaluation of 3-D reconstruction is a difficult process in medical imaging be-
cause of the difficulty in accessing ground truth measurements. The manual annota-
tion cannot be accurate since some feature points across fluorescein images are in-
visible, blurred, or moved since the width of vessels changes. One possible source of
ground truth data is obtained in this study from OCT. However, because OCT data
and fundus images are acquired by different machines, it is not easy to align the data
to the 3-D shape data. Therefore, for evaluation, we manually matched several fea-
ture positions, which have high curvature values, from the OCT data, X, to the 3-D
depth map, Y, to align them to obtain a 3-D transformation matrix T. Differences
between the 3-D point in the transformed OCT data, T(X), and its nearest neighbor
point in the 3-D depth map, Y, are calculated as an evaluation criterion.
=
X
j i
Y
i
X i
j
T T
N
T Y X E
x
y
y x x ) ( min arg ) (
1
) , , (
(27)
where N
X
is the number of the points in OCT data.
Retina_D, which has a blister in a macula, has three sets of images in different
fluorescein phases and corresponding OCT data from a commercial instrument (Carl
Zeiss Meditec, Dublin, CA). Set_1 and Set_2 are shown in Figure 29-(e)(f) and
(g)(h). Set_3 are taken in final phase of angiogram with a narrow baseline. For each
sets, we reconstruct the 3-D structures separately using the three-pass bundle adjust-
104
ment algorithm. The reconstructed 3-D shapes of Retina_D are evaluated by compar-
ing our results to aligned OCT data in Table 5.
Even though Retina_D had an obvious 3-D structure, the registration error of Set_3
was high, since the images are taken with very narrow baseline with small parallax
range, 1.360333. Figure 44 shows reconstruction of three sets of images and 3-D
registration with OCT. Set_1 and Set_2 were accurately reconstructed.
Retina_D Set_1 Set_2 Set_3
Parallax Range 5.882474 2.422023 1.360333
3-D Reconstruction Error 2.077233 4.548050 8.755127
Table 5. 3-D registration errors of 3-D reconstructed structure and 3-D
OCT data in pixel metric.
105
Reconstructed structure
3-D registration of structure and OCT
(a) Reconstruction of Set_1, parallax range:5.882474, registration error: 2.077233
Reconstructed structure
3-D registration of structure and OCT
(b) Reconstruction of Set_2 parallax range:2.422023, registration error: 4.548050
Reconstructed structure
3-D registration of OCT and incorrectly
reconstructed structure
(c) Reconstruction of Set_3 parallax range:1.360333, registration error: 8.755127
Figure 44. 3-D Registration of reconstructed structures and OCTs
106
4.5.3 Other Near-planar surface images - Façade Images
We also tested on other types of sequences. Figure 45 shows selected images of a
façade of a palace in Denmark. The façade looks flat but it does have a 3-D structure.
Instead of Y-features, we used SIFT features [38]. After matching SIFT features us-
ing the SIFT description vector, all following processes are the same as the methods
explained in this paper. Figure 46 shows 2-D registered images. Figure 48 shows the
reconstructed 3-D structure after the three-pass bundle adjustment. The two pillars
are merged in reconstruction, since we used a window size of 41 for matching.
Figure 47 shows registered images using the proposed method. Figure 49 shows the
textured 3-D structure from a different viewpoint.
(a) Image 5 (b) Image 6
Figure 45. Selected façade images
107
Figure 46. 2-D registration of a facade
(b)
Figure 47. Registration using back-projection to the 3-D
structure of a facade
108
Figure 48. 3-D reconstructed façade
Figure 49. Textured structure from a different viewpoint
109
We also evaluated registration errors of each method for façade images in Table
6. The ground truth is manually annotated. Since the façade has wider 3-D depth than
a retina, the difference of errors is more pronounced.
Facade
Parallax Range 33.067138
2-D Global
Registration
10.124093
2-D
Registration Error
Registration using
3-D Structure
3.568197
Table 6. Comparison of average geometric pixel errors of façade images
4.5.4 Comparison with Other Methods
We tested other state of art commercial packages on near-planar surface images.
First, we used Photomodeler [72] to reconstruct both synthetic images and real reti-
nal images. However, for all sequences of retinal images the process fails to converge.
The error message was that no 3-D points could be calculated due to low camera
angles, which is common in retinal images. Second, we tested Boujou4 [70], which
accurately estimates camera motions and sparse structures for a non-planar scene.
The features were correctly tracked in 2-D but the estimation of the structure and
camera motion was incorrect. For façade images, Boujou could not process them
because it requires more than 5 frames. All four real retinal sequences produced very
poor results. Even though the actual camera motion is close to a pure translation, the
estimated motion was a rotation. This is a well known ambiguity for small displace-
110
ments. Even after adding the option of a plane constraint, their results were poor.
The average normalized reconstruction error of Boujou4 was 32.16% for synthetic
image sets. Our method gave only 8.24% error on the same image sets.
111
Chapter 5 Conclusion
In this dissertation, we have described methods of 2-D registration and 3-D recon-
struction of retinal fundus images. Extracting Y-features with an articulated model
provides robust, accurate, and fully automatic registration of color and fluorescein
retinal images. The PCA analysis of the directional filter responses presents good
estimates of the initial position of Y-features. The fitting of the articulated Y-feature
model provides accurate estimates of the Y-feature positions in the image. The model
is also capable of discarding false alarms. The proposed global registration has dem-
onstrated a satisfactory performance in registering color images and fluorescein an-
giograms of the retina in difficult cases where images entailed low contrast.
Reconstructing a 3-D shape of the retinal fundus from a fluorescein pair of im-
ages is challenging, since gray levels vary as the fluorescein dye propagates through
the vessels. The difficulty also comes from the fact that the 3-D retinal surface is
near-planar. To solve these problems, we proposed a method of the 3-D reconstruc-
tion that integrates reliable landmarks, defined by Y- features, a plane+parallax ap-
proach for robust epipolar geometry estimation, and the Mutual Information criteria
to estimate the dense stereo map.
The Euclidean reconstruction of a near-planar surface from images is a difficult
problem because it is a quasi-degenerate case for the estimation of geometry. We
proposed a three-pass bundle adjustment method: first, the camera parameters are
112
estimated using planar assumption of the surface, second, the 3-D structure of the
surface is reconstructed in Euclidean space by fixing the camera parameters, and
third, both the 3-D structure and the camera motion are refined. The proposed
method not only reconstructs a near-planar surface in Euclidean space but also regis-
ters its images by back-projecting images to the 3-D structure. In addition, the near-
planar surface image is defined in terms of the parallax range.
All processes were completed without any user interaction, and they were fully
tested and evaluated with a large number of retinal images. We believe that our
method will play a crucial role in assisting ophthalmological diagnosis based on a
wide range of retinal fundus images. In addition, the proposed method may be ap-
plied to many other near-planar surface images.
Since the retinal images are produced from an obvious near-planar surface, the
proposed method works on most sets of images. If multiple images of an unknown
surface are given, our method may not work on images of a planar surface or a non-
planar surface. Future work should focus on, first, automatic categorization of a
given surface as planar, near-planar, or non-planar by characterizing the surface and
using residual ranges, in order to apply different reconstruction methods for each
surface. Second, the 3-D shape inference method should be extended to use multiple
images in a sequence, rather than two images only, in an attempt to produce more
robust and wider range of 3-D reconstructions. Finally, the current method requires
113
segmentation of a scene into multiple near-planar surfaces so that more complex 3-D
structures in the scene may properly be reconstructed.
114
References
[1] G. Agam and C. Wu, “Probabilistic Modeling-Based Vessel Enhancement in
Thoracic CT Scans”, Computer Vision and Pattern Recognition 2005, Volume
2, 20-26 Page(s):684 - 689 June 2005
[2] A. Bartoli, P. Sturm, “Nolinear Estimation of the Fundamental Matrix with
Minimal Parameters,”, IEEE Trans. on PAMI, Vol. 25, No. 3, March 2004
[3] D.P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, MA: Athena Scien-
tific, 1999
[4] P. Besl and N. McKay “A method for registration of 3-d shapes,” IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 14(2) pp.239-255, 1992
[5] A.G. Brown: A survey of Image Registration Techniques; ACM Computing
Surveys, vol.24, pp.326-276, 1992
[6] M. Brown, D. Lowe, “Recognising Panoramas,” ICCV 2003 pp. 1218-1225,
2003
[7] T. Butz and J.-P. Thiran, "Affine Registration with Feature Space Mutual in-
formation" MICCAI 2001
[8] A. Can, H. Shen, J.N. Turner, H.L. Tanenbaum, B. Roysam, "Rapid automated
tracing and feature extraction from live high-resolution retinal fundus images
using direct exploratory algorithms", IEEE Transaction on Information Tech-
nology for Biomedicine 3:2, pp 125-138, 1999
[9] A. Can, C.V. Stewart, B. Roysam, "Robust hierarchical algorithm for construct-
ing a mosaic from images of the curved human retina," in Proc. IEEE Conf. on
Computer Vision and Pattern Recognition, pp. 286-292, June, 1999
[10] A. Can, C.V. Stewart, B. Roysam, H.L. Tanenbaum, "A feature-based, robust,
hierarchical algorithm for registering pairs of images of the curved human ret-
ina," IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 347-
364, March 2002
[11] S. Chaudhuri, S. Chatterjee, N. Katz, M. Nelson, and M. Goldbaum, “Detec-
tion of blood vessels in retinal images using two-dimensional matched filters”
IEEE Transactions on Medical Image Volume 8, Issue 3, pp. 263 – 269, Sept.
1989
115
[12] T.E. Choe, I. Cohen, “Registration of Multimodal Fluorescein Images Se-
quence of the Retina,” ICCV 2005
[13] T.E. Choe, I. Cohen, M. Lee, G. Medioni “Optimal Global Mosaic Genera-
tion from Retinal Images,” ICPR 2006
[14] T.E. Choe, I. Cohen, G. Medioni, “3-D Shape Reconstruction of Retinal Fun-
dus” CVPR 2006
[15] T.E. Choe, I. Cohen, G. Medioni, A.C. Walsh, S.R. Sadda, "Evaluation of 3-
D Shape Reconstruction of Retinal Fundus," MICCAI 2006, pp. 134-141, 2006
[16] O. Chum, T. Werner, J. Matas, “Two-view Geometry Estimation Unaffected
by a Dominant Plane,” CVPR pp.772-779, 2005
[17] M.A. Fischler, R.C. Bolles. “Random sample consensus: A paradigm for
model fitting with applications to image analysis and automated cartography,”
Comm. Assoc. Comp. Mach., 24(6): 381-395, 1981
[18] M. Foracchia, E. Grisan, A. Ruggeri, “Detection of Optic Disc in Retinal Im-
ages by Means of a Geometrical Model of Vessel Structure,” IEEE Transac-
tions on Medical Imaging , vol 23, no. 10, pp. 1189-1195, October 2004
[19] J.-M. Frahm, M. Pollefeys, “RANSAC for (Quasi-)Degenerate data
(QDEGSAC),” CVPR 2006, pp. 453-460, 2006
[20] H.N. Gabow, Z. Galil, T. Spencer, and R.E. Tarjan, “Efficient algorithms for
finding minimum spanning trees in undirected and directed graphs,” Combina-
torica, vol. 6, 1986, pp. 109-122.
[21] J. Gluckman and S. K. Nayar. Rectifying transformations that minimize re-
sampling effects. In IEEE Proceedings of Computer Vision and Pattern Recog-
nition, Kauai, December 2001.
[22] J. Gutierrez, I. Epifanio, E. de Ves, F. Ferri, “An active contour model for the
automatic detection of the fovea in fluorescein angiographies,” ICPR 2000
[23] C. Harris, M. Stephens, "A combined corner and edge detector", Proc. Alvey
Vision Conf., Univ. Manchester, pp. 147-151, 1988.
[24] R. I. Hartley, A. Zisserman. Multiple View Geometry in Computer Vision.
Cambridge University Press, 2000
[25] R. Hartley, “In Defence of the 8 point Algorithm”, ICCV 1995, pp.1064-1070,
June 1995
116
[26] J.H. Hipwell, A. Manivannan, P. Vieira, P.F. Sharp, J.V. Forrester, “Quanti-
fying changes in retinal circulation: the generation of parametric images from
fluorescein angiograms,”, Physiological Measurement No. 19 pp.165-180,
1998
[27] H. Hirschmüller, “Accurate and Efficient Stereo Processing by Semi-Global
Matching and Mutual information,” CVPR 2005, Volume 2, 20-26 June 2005
Page(s):807 – 814
[28] X. Jiang, M. Abramoff. "3D retina vessel reconstruction from stereo pairs",
Submitted to MidGraph
[29] Z. Kai, “Stereo Matching and 3-D Reconstruction for Optic Disc Images,”,
Computer Vision for Biomedical Image Applications, 2005
[30] J. Kim, V. Kolmogorov, R. Zabih, “Visual correspondence using energy
minimization and mutual information”, Ninth IEEE International Conference
on Computer Vision, 2003. Proceedings, 13-16 Page(s):1033 - 1040 vol.2, Oct.
2003
[31] J. Knight, I. Reid, “Binocular Self-Alignment and Calibration from Planar
Scenes,” ECCV 2000, pp.462-476, LNCS 1843, 2000
[32] V. Kolmogorov, R. Zabih, “Computing visual correspondence with occlu-
sions using graph cuts,” in International Conference on Computer Vision, ,2001.
[33] T. Koller, G. Gerig, G. Szekely, D. Dettwiler. Multiscale detection of curvi-
linear structures in 2D and 3D image data.In Proc. ICCV, pages 864–869, 1995
[34] R. Kumar, P. Anandan, K. Hanna, "Shape Recovery From Multiple Views: A
Parallax Based Approach," DARPA IU Workshop, Monterey, Calif., Nov. 1994
[35] Q. Li, S. Sone, K. Doi. “Selective enhancement filters for vessels and airway
walls in two- and three-dimensional CT scans,” Medical Physics, 30(8):2040–
2051, 2003.
[36] H.C. Longuet-Higgins. A computer algorithm for reconstructing a scene from
two projections. Nature, 293:133–135, Sept 1981.
[37] M.I.A. Lourakis, A.A. Argyros, “The Design and Implementation of a Ge-
neric Sparse Bundle Adjustment Software Package Based on the Leven-
berg-Marquardt Algorithm,” ICS/FORTH TR-340, Aug 2004
[38] D. Lowe, “Distinctive image features from scale-invariant keypoints," Inter-
national Journal of Computer Vision, 60, 2 , pp. 91-110, 2004
117
[39] D. Lowe, “Object Recognition from Local Scale-Invariant Features”, ICCV
1999, pp. 1150-1157, 1999
[40] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, P. Suetens, “Multimo-
dality image registration by maximization of mutual information,” IEEE Trans-
actions on Medical Imaging, vol. 16, no. 2, pp. 187–198, April 1997.
[41] J. Magarely, A. Dick, "Multiresolution Stereo Image Matching using Com-
plex Wavelet", IEEE Int. Conf. on Pattern Recognition (ICPR98), Vol 1, pp. 4-
7, 1998.
[42] G. K. Matsopoulos, N. A. Mouravliansky, K. K. Delibasis, and K. S. Nikita,
“Automatic registration of retinal images with global optimization techniques,”
IEEE Trans. Inform. Tech. Biomed., vol. 3, pp. 47–60, Mar. 1999.
[43] G. K. Matsopoulos, P.A. Asvestas, N.A. Mouravliansky, K.K. Delibasis,
“Multimodal Registration of Retinal Images Using Self Organizing Maps”,
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 12,
DECEMBER 2004
[44] G. Medioni, M.S. Lee, and C.K. Tang, A Computational Framework for Seg-
mentation and Grouping, Elsevier, 2000.
[45] P. Moallem, K. Faez, “Search space reduction in the edge based stereo corre-
spondence,” Proceeding of 6th International Fall Workshop on Vision, Model-
ing, and Visualization 2001 (VMV2001), pp. 423-429, Stuttgart, Germany,
November 2001.
[46] P. Mordohai, G. Medioni. "Dense Multiple View Stereo with General Camera
Placement using Tensor Voting," 3dpvt, pp. 725-732, Second International
Symposium on 3D Data Processing, Visualization and Transmission
(3DPVT'04), 2004.
[47] P.M.D.S Pallawala, W. Hsu, M.L. Lee, K.-G. A. Eong. “Automated Optic
Disc Localization and Contour Detection Using Ellipse Fitting and Wavelet
Transform,” in 8th European Conference on Computer Vision (ECCV), Prague,
Czech Republic, May 2004
[48] J.P.W. Pluim, J.B.A.Maintz, M.A.Viergever, “Mutual-information-based reg-
istration of medical images: A survey.” IEEE Transactions on Medical Imaging
vol. 22 no. 8 986–1004 August 2003
[49] A.G. Podoleanu, J.A. Rogers, D.A. Jackson, “Three dimensional OCT images
from retina and skin”, Vol. 7, No. 9, Optic Express, Oct 2000
118
[50] N. Ritter, R. Owens, J. Cooper, "Registration of stereo and temporal images
of the retina" IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 18,
NO. 5, MAY 1999
[51] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-
frame stereo correspondence algorithms,” IJCV, vol. 47, no. 1-3, pp. 7–42,
April 2002.
[52] H. Shen, C.V. Stewart, B. Roysam, G. L. Howard, L. Tanenbaum, “Frame-
Rate Spatial Referencing Based on Invariant Indexing and Alignment with Ap-
plication to Online Retinal Image Registration”, IEEE TRANSACTIONS ON
PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 3,
MARCH 2003
[53] H. Shikata, E.A. Hoffman and M. Sonka "Automated Segmentation of Pul-
monary Vascular Trees from 3D CT Images", SPIE International Symposium
Medical Imaging, San Diego, USA, 2004
[54] C.V. Stewart, C.L. Tsai, B. Roysam, “The dual-bootstrap iterative closest
point algorithm with application to retinal image registration,” IEEE Transac-
tions on Medical Imaging, Vol. 22, Issue 11, Nov 2003, pp 1379-1394
[55] J. Sun, N.N Zheng, H. Shum, “Stereo Matching Using Belief Propagation,”
IEEE Trans. on PAMI, Vol. 25, No. 7, July 2003
[56] B. Triggs, “Autocalibration from planar scenes,” ECCV 1998 pp. 89-105,
1998
[57] B. Triggs, R. Harley, A. Fitzgibbon, “Bundle Adjustment A Modern Synthe-
sis,” Vision Algorithms: Theory and Practice, Springer-Verlag, 2000
[58] C.L. Tsai, C.V. Stewart, H.L. Tanenbaum, B. Roysam, “Model-based method
for improving the accuracy and repeatability of estimating vascular bifurcations
and crossovers from retinal fundus images,” in IEEE Transactions on Informa-
tion Technology in Biomedicine, Volume 8, Issue 2, Jun 2004 pp.122 – 130
[59] C.-L. Tsai, A. Majerovics, C.V. Stewart, B. Roysam, "Disease-Oriented
Evaluation of Dual-Bootstrap Retinal Image Registration", (to appear) Medical
Image Computing and Computer-Assisted Intervention, November 2003C.L
Tsai,
[60] P.A. Viola, “Alignment by maximization of mutual information,”, Ph.D. the-
sis, MIT, 1995
119
[61] P.A. Viola, W.M. Wells, III. Alignment by maximization of mutual informa-
tion. International Journal of Computer Vision, 24(2):137–154, September
1997.
[62] T. Walter, J.-C. Klein, “Segmentation of Color Fundus Images of the Human
Retina: Detection of the Optic Disc and the Vascular Tree Using Morphological
Techniques,” ISMDA 2001: 282-287
[63] Y. Tuduki, K. Murase, M. Izumida, H. Miki, K. Kikuchi, K. Murakami, J.
Ikezoe, “Automated seeded region growing algorithm for extraction of cerebral
blood vessels from magnetic resonance angiographic data,”, Proceedings of the
22nd Annual International Conference of the IEEE Engineering in Medicine
and Biology Society 2000, Volume 3, 23-28 Page(s):1756 - 1759 vol.3, July
2000
[64] B. Verdonck, L. Bloch, H. Maitre, D. Vandermeulen, P. Suetens, G. Marchal,
“Accurate segmentation of blood vessels from 3D medical images”, Image
Processing, 1996. Proceedings., International Conference on Volume 3, 16-19
Sept. 1996 Page(s):311 - 314 vol.3
[65] A.C. Walsh, P.G. Updike, S.R. Sadda, “Quantitative Fluorescein Angiogra-
phy,” Retina, 4th edition, Chapter 52, Mosby, Sep. 2005
[66] T. Walter, J.-C. Klein, “Segmentation of Color Fundus Images of the Human
Retina: Detection of the Optic Disc and the Vascular Tree Using Morphological
Techniques,” ISMDA 2001: 282-287
[67] W.M. Wells III, P. Viola, H. Atsumi S. Nakajima, R. Kikinis, “Multi-modal
volume registration by maximization of mutual information,” Medical Image
Analysis, vol. 1, no. 1, pp. 35–51, March 1996.
[68] F. Zana and J.-C. Klein, “Segmentation of Vessel-like Patterns Using Mathe-
matical Morphology and Curvature Evaluation,” in IEEE Transactions on Im-
age Processing, vol. 10, no. 7, pp. 1010-1019, July 2001.
[69] F. Zana and J.-C. Klein, “A multimodal registration algorithm of eye fundus
images using vessels detection and Hough transform,” IEEE Transactions on
Medical Imaging, Volume 18, Issue 5, Page(s):419 - 428 , May 1999
[70] Boujou4, 2d3, http://www.2d3.com
[71] Middlebury stereo database and source code,
http://www.middlebury.edu/stereo
[72] Photomodeler, EOS Systems, http://www.photomodeler.com
Abstract (if available)
Abstract
Image registration and shape reconstruction of a retinal fundus from fluorescein images are important steps to diagnose retinal diseases. This is difficult due to intensity changes in fluorescein images and the near-planarity of the retinal surface, which has a narrow range of depth variations. This study proposes methods for the registration and the 3-D reconstruction of a retina from a sequence of images. We first present a method locating vessels bifurcations, so called Y-features, which is a robust geometric feature in a retinal image. Subsequently, fitted Y-features are matched across images using Mutual Information to register their unordered colors and fluorescein images globally. Second, using matched Y-features, the epipolar geometry for a near-planar surface is robustly estimated by a plane+parallax approach, and a dense disparity map is estimated using the Mutual Information criterion. Finally, the 3-D structure of the near-planar retinal fundus is reconstructed in metric space by introducing a novel three-pass bundle adjustment method. Our experimental results validate the proposed methods on a set of difficult fluorescein image pairs and other near-planar surface images.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
3D inference and registration with application to retinal and facial image analysis
PDF
Accurate image registration through 3D reconstruction
PDF
Face recognition and 3D face modeling from images in the wild
PDF
Point-based representations for 3D perception and reconstruction
PDF
Landmark-free 3D face modeling for facial analysis and synthesis
PDF
3-D building detection and description from multiple intensity images using hierarchical grouping and matching of features
Asset Metadata
Creator
Choe, Tae Eun
(author)
Core Title
Registration and 3-D reconstruction of a retinal fundus from a fluorescein images sequence
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
06/29/2007
Defense Date
04/02/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3-D reconstruction,feature extraction,global registration,OAI-PMH Harvest,registration,stereo matching
Language
English
Advisor
Medioni, Gerard (
committee chair
), Itti, Laurent (
committee member
), Sadda, Srinivas R. (
committee member
)
Creator Email
tchoe@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m561
Unique identifier
UC1134323
Identifier
etd-Choe-20070629 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-506533 (legacy record id),usctheses-m561 (legacy record id)
Legacy Identifier
etd-Choe-20070629.pdf
Dmrecord
506533
Document Type
Dissertation
Rights
Choe, Tae Eun
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
3-D reconstruction
feature extraction
global registration
stereo matching