Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Techniques for vanishing point detection
(USC Thesis Other)
Techniques for vanishing point detection
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
TECHNIQUES FOR VANISHING POINT DETECTION
by
Yuzhuo Ren
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
(ELECTRICAL ENGINEERING)
May 2013
Copyright 2013 Yuzhuo Ren
To My Family
ii
Acknowledgments
I hereby want to acknowledge:
• My advisor Professor C.-C. Jay Kuo for his vision, problem-solving capability,
continuous help and instruction during the research process. Prof. Kuo is energetic
and hospital. He set up a good example for me in both his personality and research
attitude. I benefit a lot from working with him.
• My senior PhD student, Ms. Jingwei Wang, for her kindness and support. She
gave me suggestions during our research discussion. Thanks for taking me along.
• My great family, including my mother Yan and my father Wei, for their support to
allow me to go abroad for graduate study. My horizon and experience have been
largely expanded with their endless love and encouragement.
iii
Table of Contents
Dedication ii
Acknowledgments iii
List of Figures vi
Abstract viii
Chapter 1: Introduction 1
1.1 Significance of the Research .......................... 1
1.2 Contributions of the Research......................... 2
1.3 OrganizationofThesis ............................. 3
Chapter 2: Background Review 4
2.1 PerspectiveProjectionandVanishingPoints. ................ 4
2.2 HoughTransformandLineDetection .................... 5
2.3 LineSegmentDetection ............................ 6
2.4 Clustering of Vanishing Lines and Vanishing Point Location Detection . . 8
Chapter 3: Database Construction 11
Chapter 4: Angular Histogram Feature Extraction 13
4.1 LineExtractionviaHoughTransform .................... 13
4.2 Preprocessing for Robust Line Extraction .................. 15
4.2.1 Bilateral Filtering ........................... 15
4.2.2 Down-sampling ............................. 18
4.2.3 Mean-shiftSegmentation ....................... 19
4.3 AngularHistogram............................... 22
4.4 Summary .................................... 25
Chapter 5: Defocus Feature Extraction 26
5.1 DefocusMapGeneration............................ 26
5.2 Post-Processing via Matting.......................... 29
5.3 DefocusFeature................................. 31
iv
Chapter 6: Experimental Results 35
6.1 VanishingPointExistenceDetection ..................... 35
6.2 VanishingPointLocationEstimation..................... 36
Chapter 7: Conclusion and Future Work 40
7.1 Conclusion ................................... 40
7.2 FutureWork .................................. 40
Bibliography 41
v
List of Figures
2.1 Illustrationofthelinearperspectiveandvanishingpoints. ......... 5
2.2 IllustrationoftheperformanceoftheLSD[1]................. 7
2.3 ThelineclusteringresultusingtheJ-linkagealgorithm[2]. ........ 9
2.4 The Circle5 set perturbed with Gaussian noise (σ=0.0075) with 50 %
outliers[3]. ................................... 10
2.5 The Stair4 set perturbed with Gaussian noise (σ =0.0075) with 60 %
outliers[3]. ................................... 10
3.1 Exampleimagesfromdatabase......................... 12
4.1 IllustrationoftheHough-transform-basedlinedetection........... 14
4.2 Comparison of line detection results for test images with and without
bilateral filtering as a preprocessing unit. .................. 17
4.3 Linedetectionresultsunderdifferentimageresolutionswithdown-sampling,
where we plot the downsampled images in the original image resolution
fortheeaseofvisualization. .......................... 19
4.4 Comparison of line detection results using the original image and the
imageaftermean-shiftsegmentation...................... 20
4.5 Mean-shiftsegmentationresults. ....................... 21
4.6 Line detection results with respect to different segmentation results. . . . 21
4.7 Exemplaryimageswithvanishingpoints. .................. 22
4.8 Exemplaryimageswithoutavanishingpoint. ................ 23
4.9 Theangleofaline................................ 23
4.10 Linedetectionanditsangularhistogram. .................. 24
4.11 Illustrationofverticallines. .......................... 24
5.1 (a) The thin lens model, and (b) the diameter of CoC, c, as a function of
object distance d with df = 500mm and f
0
=80mm[4]. .......... 27
5.2 Differentblurdegreesfordifferentedges[5]. ................. 27
5.3 The edge response to a second-derivative filter [6]. ............. 28
5.4 (a) An input image and (b) its defocus map. ................ 29
5.5 Themattingalgorithmresult.......................... 31
5.6 (a) The input image, (b) the initial defocus map, and (c) the defocus
mattingmap. .................................. 32
5.7 Imageswithvanishingpoints. ......................... 32
5.8 Imageswithoutvanishingpoints. ....................... 33
5.9 Illustrationofthedefocusfeaturevector.................... 34
6.1 Alinedetectionexample. ........................... 37
vi
6.2 Vanishingpointlocationestimation: theleftcolumnistheimplementation
in[7]whiletherightcolumnistheresultfrom[2]. ............. 39
vii
Abstract
Automatic vanishing point detection is an important problem in computer vision
since it has many applications such as road navigation, 3D scene reconstruction and
camera calibration. Accurate detection of the vanishing point location facilitates the
solution of these related problems. For a given image, this research attempts to answer
the following two questions: 1) whether there is vanishing point or not in this image; and
2) if there are vanishing points, where their locations are. To address the first question,
we apply a machine learning approach. First, we construct a database containing a wide
varietyofimagesanduseittotrainamodeltodeterminewhetherthereisvanishingpoint
in a test image. The two features used in this training and test process are the angular
histogram and the defocus degree. Furthermore, we adopt the Adaboost algorithm as
incremental learning to increase classification accuracy. To address the second problem,
we implement and improve the algorithm in [7] for vanishing point location estimation,
and compare its performance with another algorithm based on the J-linkage model.
Finally, concluding remarks and future research directions are discussed.
viii
Chapter 1
Introduction
1.1 Significance of the Research
Images in the world contain various scenes. Although the human brain can recognize
and understand the scene and objects in the scene within seconds, it is challenging
for a computer to learn and recognize the scene accurately and robustly. General image
classification and scene understanding problems are important problems in the computer
vision field. To tackle this major challenge, researchers often decompose this difficult
problem into small sub-problems and solve each of them separately. In this research, we
investigate the vanishing point detection problem. The existence of the vanishing point
in an image is due to the perspective projection of the scene into the imaging plane.
The technique of vanishing point detection finds applications in camera calibration,
road navigation, 3D reconstruction and depth map generation. In the application of
depth map generation, a depth map can be roughly generated based on vanishing point
location [8]. However, to use the vanishing point as a depth map cue, we should first tell
whether the underlying image has a vanishing point or not. If we apply the vanishing
point detection algorithm to an image that does not have a vanishing point, the obtained
results are often misleading.
For this reason, our current study consists of two parts. First, we classify images
into two types; namely, those with and without vanishing points. Second, we evaluate
1
several approaches of detecting the location of the vanishing point for an image that has
one.
1.2 Contributions of the Research
The goal of this research is to automatically classify images with or without vanishing
points. For images with a vanishing point, we would also like to point out the location
of the vanishing point. There are several major contributions in this thesis as detailed
below.
• One challenge of this research is to build the vanishing point classification and
detection database. There are more than 187,000 images in the LabelMe database,
and most of them describe a certain object which is however not suitable for our
purpose. How to find proper images is a time-consuming task. We build a database
and conduct human evaluation to get the ground truth (i.e. with or without
vanishing points). This database will be valuable for future research in this topic.
• Another challenge of this research is to select good features for the classification
purpose. In a general image classification problem, there are many features pro-
posed before, such as color, edge and shape. However, they are not suitable for
classifying perspective-projected images. Instead, we propose to use two features:
the angular histogram and the defocus degree. Both of them have a good discrim-
inant power in differentiating images with/without vanishing points.
• For the classification problem, we adopt a machine learning approach, and apply it
to extracted image features. Specifically, we use the binary support vector machine
2
(SVM) to train the decision model. We show a correct detection rate of around
80%.
• In finding the location of a vanishing point, it is challenging to group lines that are
associatedwiththesamevanishingpoint. Somepre-processingstepsaredeveloped.
Besides, we implement and improve the result in [7] and compare it with one of
the state-of-the art algorithms called the J-linkage.
1.3 Organization of Thesis
The rest of this thesis is organized as follows. We briefly review several state-of-the-
art vanishing points detection algorithms in Chapter 2. We classify existing vanishing
point detection algorithms into three categories: 1) Hough-transform-based algorithms,
2) RANSAC-based algorithms, and 3) J-linkage. Their performance and computational
complexity are compared and discussed. Then, we describe the construction of an image
database that is suitable for our purpose in Chapter 3. Some details such as how to
collect images and how to conduct preprocessing and experiments are given. In Chapter
4, we present the angular histogram feature, the line detection method and preprocessing
based on this feature. In Chapter 5, we introduce the defocus degree feature and the
corresponding classifier. In Chapter 6, we will conduct experiments to evaluate the
performance of various classifiers. Besides, the Adaboost algorithm is used to enhance
the overall performance. Finally, concluding remarks and future research directions are
given in Chapter 7.
3
Chapter 2
Background Review
Some background knowledge for this research is reviewed in this chapter.
2.1 Perspective Projection and Vanishing Points
A vanishing point is the intersection point of parallel lines in the 3D source that
converge in a 2D image. The vanishing point location and the number of vanishing point
are closely related to the perspective of the image. Generally speaking, a perspective
projection can be classified into three categories: linear perspective, curvilinear perspec-
tive and reverse perspective [9]. The linear perspective projection is the most common
one. Pictures taken from the camera belong to this category. The curvilinear perspective
projection is obtained by putting linear perspective on a sphere which approximates the
image on the retina of eyes. The reverse perspective projection is to present further
objects larger and near objects smaller. For images in the linear perspective category,
the number of vanishing points ranges from zero to three.
AsshowninFig. 2.1,thevanishingpointsarenotnecessarilylocatedinsidetheimage.
For a given image, our research goal is to tell whether the image has any vanishing point
or not. Then, if it has one or more vanishing points, we want to find the location of
these vanishing points.
4
Figure 2.1: Illustration of the linear perspective and vanishing points.
2.2 Hough Transform and Line Detection
The Hough transform is often used to detect object boundaries of line shape. The
algorithm is not sensitive to discontinuity of detected edge points. The basic idea of the
Hough transform is to map points in the image domain to lines in the Hough-transform
domain. Points lying along the same line will have a common intersection point in the
Hough-transform domain. For example, consider two points (x
1
,y
1
)and(x
2
,y
2
)thatlie
in the same line denoted by y = kx + b. Then, the two points are represented by two
lines in the Hough-transform domain:
ρ
1
= −k
1
x
1
+y
1
, and ρ
2
= −k
2
x
2
+y
2
,
with different slopes and intercepts. Yet, they have one intersection point which corre-
sponds to the line passing through (x
1
,y
1
)and(x
2
,y
2
).
In practice, we transform points in the image domain in polar form so that points
along vertical line can be also transformed. It can be proved that points along the same
5
line will be transformed into lines with the same intersection point (θ,ρ) in polar form,
and the intersection point is only related to the slope and intercept of the underlying
line.
In order to detect lines in an image, we can calculate an edge map using any proper
edge detector, which provides points as the input to the Hough transform. Then, θ and
ρ is quantized to make each bin as an accumulator. We can choose several peaks of the
accumulators and identify the corresponding lines in the image.
2.3 Line Segment Detection
Morerecently, alinesegmentdetector(LSD)[1]hasbeenproposedtoprovideanother
effective line detection algorithm. The performance of the algorithm can be shown with
several examples. There is a minimal line length for the LSD to detect. For example,
as shown in Fig. 2.2 (a), the LSD gives a shorter segment when curves are present.
When the curvature is large, the LSD will not detect it such as the back dot in the
middle of Fig. 2.2 (b). By automatically change the detection threshold, the LSD can
be more robust with respect to noise. The LSD does not detect the slow-gradient edge,
which can be rectified by down-sampling the image and remove the Gaussian noise. The
LSD considers the relative length of the edge as compared to the whole image. For
example, the square in Fig. 2.2 (f) cannot be detected due to its small size as compared
to the image dimension. After cropping, the square can be detected. The LSD works
on gray level images and, as a result, it fails to detect edges that are visible because
of different colors but with the same gray level. In addition, the LSD cannot detect
6
anisotropic regions such as the sky and fails in the region with the Gibbs effect caused
by compression.
(a) The minimal line length for
the LSD
(b) The shot segment of the
LSD for curves
(c) Robustness of the LSD
against noise
(d) Fails to detect small
gradient
(e) Detetion after down-
sampling
(f) Result is related to relative dimension
(g) Different color with same
gray level
(h) Anisotropic region (i) Gibbs effect
(j) Real world image (k) Detection result
Figure 2.2: Illustration of the performance of the LSD [1].
Though the line segment detector is a powerful tool, it contains too many local
line segments and demands high computational complexity. In contrast, the Hough-
transform-basedalgorithmselectstrongedgesintheimagebyaccumulatorsthatpreserve
7
structure lines and demands lower complexity. In our approach, we adopt the Hough
transform to detect vanishing lines.
2.4 Clustering of Vanishing Lines and Vanishing Point
Location Detection
The problem of finding the vanishing point location can be solved by clustering van-
ishing lines corresponding to the same vanishing point. There are quite a few techniques
developed for this purpose.
First, theRANSACalgorithm[10]canfitamodeltodatathathaveahighpercentage
of outliers. The main idea is to construct a minimal sample set that contains very few
data in order to get sets which contain only inliers. The RANSAC algorithm cannot
handle multi-models. To address this problem, the sequence RANSAC [11, 12], can
remove inliers sequentially and apply RANSAC to the data recursively. The problem of
the sequence RANSAC is that inaccurate inlier detection can affect the remaining model
significantly. In contrast, the Multi-RANSAC algorithm [13] estimates multiple models
in a parallel way. It can fit more than one model to the data, yet the number of model
should be known in advance.
The J-linkage algorithm [3] is a multi-model estimation algorithm, which is an exten-
sion and optimization of multi-RANSAC. Tardif [2] applied the J-linkage algorithm to
the vanishing point detection problem. It begins with constructing a preference matrix,
which is a Boolean matrix. Each row corresponds to an edge in the image and each col-
umn to a hypothesis that is the intersection point in the vanishing point detection case.
Eachrowiscalledthecharacteristicfunctionofthepreferenceset. Theedgescorresponds
8
to the same vanishing point tend to have a similar characteristic function. The Jaccard
distance is used to measure the distance between two preference sets. Thus, based on
the Jaccard distance, characteristic functions can be clustered in the preference matrix.
Vanishing points can be recomputed every time after the preference set is updated. Fig.
2.3 shows the line clustering result from J-linkage algorithm. As compared with the
(a) Indoor scene (b) Outdoor scene
Figure 2.3: The line clustering result using the J-linkage algorithm [2].
RANSAC-based algorithms, the J-linkage algorithm has two main advantages. First,
the outlier can be automatically removed by discarding smaller preference set clusters.
Second, it can filter lines and calculate the vanishing point simultaneously. A vanishing
point is recalculated when the preference matrix is updated, which make the vanishing
point detection more accurate. The RANSAC-based algorithms separate the process of
line detection and vanishing point estimation. The performance comparison of several
clustering algorithms with two test data sets called Circle5 and Stair4 is shown in Fig.
2.4 and Fig. 2.5, respectively.
9
Figure 2.4: The Circle5 set perturbed with Gaussian noise (σ =0.0075) with 50 %
outliers [3].
Figure 2.5: The Stair4 set perturbed with Gaussian noise (σ=0.0075) with 60 % outliers
[3].
10
Chapter 3
Database Construction
One of the main contributions in this thesis research is the construction of vanishing
point classification/detection database. In literature, there is no database for classifying
images with and without vanishing points. There is one database, called the YorkUrban
database, for vanishing point location detection. The database contains 100 images of
the building structure in YorkUrban [14, 15]. This image set is too selective in the image
variety. In addition, it is also not suitable for the image classification purpose since all
images have at least one vanishing point.
In order to make the database applicable to general images, we consider several
categories of different scenes with and without vanishing points with a proper balance in
the constructed database. For example, most natural scenery images do not contain a
vanishing point, and images in the category “landscape and coast” in LabelMe database
are selected. Images contain the street and the building may or may not have a vanishing
point depending on their perspective projection, and images from the street category in
the LabelMe database are selected.
For training images, we select images with obvious features to training the model.
In addition, we add challenging images for the cross validation purpose. Our database
contains3000images. Ithasalargevarietyofimagescontainingbothoutdoorandindoor
scenes. Outdoor scenes include coast, beach, cliff, garden and urban streets and roads.
11
Indoor scenes include lobby and corridor. All of them are selected from the LabelMe
database [16] and the YorkUrban database [14, 15].
Sometimes, it is not easy to tell whether an image has a vanishing point or not. We
ask ten persons to manually classify all images into four categories; namely, with zero,
one, two and three vanishing points. We reject the images with low agreement. For
example, whether images are tilted or not along the vertical direction leads to confusion
in the decision of whether there is one more vanishing point or not. We collect images
with high human agreement scores in the database. Some images from our database are
shown in Fig. 3.1.
(a) Coast (b) Field
(c) Mountain (d) Corrider
(e) Street (f) Building
Figure 3.1: Example images from database.
12
Chapter 4
Angular Histogram Feature
Extraction
4.1 Line Extraction via Hough Transform
For a given input image, we first calculate its edge map and use it as the input to
the Hough Transform. In our implementation, we apply the Canny edge detector to the
image as shown in Fig. 4.1 (a), and then transform edge pixels (namely, white pixels as
shown in Fig. 4.1 (a) ) to the Hough space.
Suppose that the line is y = kx+b with intercept b,andtherearetwopoints(x
1
,y
1
)
and (x
2
,y
2
) in the image domain. Since the transform from one point in the image
domain to the Hough space (or the polar domain) can be written as
y = kx+b, (4.1)
ρ = xcos(θ)+ysin(θ), (4.2)
we get the following two equations:
ρ = x
1
cos(θ)+y
1
sin(θ), (4.3)
ρ = x
2
cos(θ)+y
2
sin(θ). (4.4)
13
Based on Equations (4.1)-(4.4), we have
θ = arctg(−1/k), (4.5)
ρ = bsin(arctg(−1/k)). (4.6)
As derived above, we see that one line in the image domain is mapped to one point
in the Hough domain. Fig. 4.1 (b) shows the Hough transform of an edge image. In
order to extract the main lines in the image, we calculate the histogram in the (θ,ρ)
coordinates, and select the top twenty peaks. The squares shown in Fig. 4.1 (b) are
peaks in the Hough domain. Finally, we trace these peaks from the Hough domain back
to the image domain to find the corresponding edges in the image. The result is shown
in Fig. 4.1 (c).
(a) Canny edge detection (b) Hough space (one square
denotes a peak)
(c) Detected lines
Figure 4.1: Illustration of the Hough-transform-based line detection.
There are several advantages of using the Hough transform to detect lines. First,
It is tolerant of gaps in the edges. Second, it is unaffected by occlusion in the image.
Third, it is relatively robust against noise.
14
4.2 Preprocessing for Robust Line Extraction
We use accumulators and a voting scheme in the Hough space to detect main lines.
However, when there are strong edge regions such as grass, trees and sand, these regions
can be detected by the Hough transform as well. Edges in these regions should not be
viewed as part of structure lines which finally lead to vanishing points. If they are not
removed, itwillaffectthevanishingpointdetectionaccuracy. Forthisreason, wepropose
three preprocessing methods before line detection: bilateral filtering, down-sampling and
segmentation.
4.2.1 Bilateral Filtering
The bilateral filter can reduce noise while preserve edges [17]. The intensity of each
pixel is smoothed by a Gaussian average where the weighting coefficients depend not only
on the Euclidean distance but also the color intensity. The bilateral filter is a function in
both the spatial domain (controlled by the Euclidean distance) and the spectral domain
(controlled by the pixel intensity).
The bilateral filter can be used in scenes that contain textures with a high local edge
response such as grass and sand. After being transformed to the Hough space, these
small edges will form a strong peak and, as a result, the percentages of outlier lines
increase, which will affect the final vanishing point location decision. After the bilateral
filter is applied, complex textures can be smoothed while main structured lines remain.
Performance comparison between cases with and without bilateral filtering is shown in
Fig. 4.2. The black block indicates the region where intersection points are densely
distributed. If more than one of these regions are detected, the center coordinates of the
15
black block are averaged to yield the final vanishing point location. The improvement
offered by the bilater filter as a preprocessing step is obvious in vanishing point detection
as shown in these examples.
Generally speaking, if the input image has a clear structure yet with small textures,
the bilateral filter can work as a de-noising unit to make the vanishing point detection
algorithm in the following stage more robust. For example, for the top image in Fig. 4.2,
we see fewer horizontal lines since the texture of the road and small rocks is removed by
the bilateral filter. The location of the vanishing point changes by a certain amount after
the application of bilateral filtering for the first pair of outdoor images in Fig. 4.2. For
other examples in Fig. 4.2, the effect of the bilateral filter is even more significant. Note
that the bilateral filter will not smooth sharp edges so that it can enhance the primary
lines that yield the vanishing point.
16
(a) Results without bilateral
filter
(b) Results after bilateral filter
applied
Figure4.2: Comparisonoflinedetectionresultsfortestimageswithandwithoutbilateral
filtering as a preprocessing unit.
17
4.2.2 Down-sampling
The idea of image down-sampling arises from the observation that main structure
lines will be preserved while small edges will be smoothed out in low resolution images.
We downsample three test images with different sample rates (4:1, 16:1 and 64:1) and
then use the Hough transform to detect lines in the resulting images. The results are
shown in Fig. 4.3. Under the sampling rate of 4:1, the line detection result does not
differ much from that of the original image. Under the sampling rate of 64:1, we can see
that only the line contours of main objects are preserved while edges in texture region
are filtered out. Thus, down-sampling offers an effective pre-processing tool to eliminate
short and noisy line segments. We define invariant lines as those that are invariant under
a range of image scales and can select invariant lines as the main structure lines. The
invariant lines offer a good representation of the global structure in an image.
18
(a) Original size (b) 4:1 (c) 16:1 (d) 64:1
(e) Original size (f) 4:1 (g) 16:1 (h) 64:1
(i) Original size (j) 4:1 (k) 16:1 (l) 64:1
Figure 4.3: Line detection results under different image resolutions with down-sampling,
where we plot the downsampled images in the original image resolution for the ease of
visualization.
4.2.3 Mean-shift Segmentation
The mean-shift segmentation method [18] can be used in eliminating small edges in
textured regions as well. There are three parameters in the mean-shift segmentation
algorithm.
• The minimum merge region is used to specify the minimum number of pixels to be
in a segmented region.
19
• The spatial radius controls the size of each merged area. A large spatial radius will
generate a large homogenous region.
• The spectral radius controls the color similarity of two adjacent regions to be
merged.
Examples with the mean-shift segmentation algorithm as the pre-processing unit are
given in Fig. 4.4. Note that it is important to choose proper spectral and spatial radii
in the mean-shift algorithm. If the spectral and spatial radii in most images are too
large, it will merge main objects and destroy vanishing lines. On the other hand, if the
spectral and spatial radii are too small, the mean-shift segmentation will yield many
small segmentation regions and make vanishing line detection more difficult.
(a) Full resolution image (b) Original image line detec-
tion result
(c) Meanshift result
Figure 4.4: Comparison of line detection results using the original image and the image
after mean-shift segmentation.
20
For scenes containing buildings and streets, smaller spatial and spectral radii often
give better segment results since desired segmented objects in this type of image are
isolated and a smaller value can remove the texture. In contrast, larger spatial and
spectral radii are preferred for natural scenery images since the scene tends to contain
larger segments belonging to the same object such as a field, a beach, a cliff. Some
mean-shift segmentation results are shown in Fig. 4.5.
(a) Original image (b) (5,20,20) (c) (5,20,50) (d) (5,20,100) (e) (5,50,100)
Figure 4.5: Mean-shift segmentation results.
(a) Line of in Fig.4.5(b) (b) Line of in Fig.4.5(c) (c) Line of in Fig.4.5(e)
Figure 4.6: Line detection results with respect to different segmentation results.
21
To determine which line detection result to be used, we check the invariant lines
among all resolution layers. For example, as show in Fig. 4.6, we see that lines inside
the cliff vary a lot because different mean-shift parameters give different segmentation
results. However, the boundary between the sky and the cliff does not change under
different mean-shift parameters. In other words, the boundary between the sky and the
cliff is not sensitive to parameters, and they represent the geometric structure of the
image more robustly. As a result, these lines are detected as the final lines and used to
extract the angule histogram feature.
4.3 Angular Histogram
Our first goal is to tell whether there is a vanishing point or not. Since the lines
that yield the vanishing point are called vanishing lines, we check whether extracted
lines as obtained from the above discussion intersect with each other or not to produce a
vanishing point. Several exemplary images with and without vanishing points are given
in Fig. 4.7 and Fig. 4.8, respectively.
Figure 4.7: Exemplary images with vanishing points.
For images with intersecting vanishing lines, we see that the edge response from
certain angles are high. Therefore, we use the line angle as a feature to differentiate
these two types of images. Here, the angle of one detected line in the image is defined
22
Figure 4.8: Exemplary images without a vanishing point.
as the anticlockwise angle from the horizontal line to the detected line as shown in Fig.
4.9. We calculate the histogram of line angles. To do so, the line angles are quantized
to 1 degree. The line detection result and its angular histogram are shown in Fig. 4.10.
Figure 4.9: The angle of a line.
The angular histogram feature for each input image is a row vector whose element
gives the frequency of an angular bin. To reduce the feature dimension, we quantize 0 to
180 degree to 12 bins uniformly so that the quantization level of each bin is 15 degree.
In the implementation, the angle of all detected lines are first calculated. Then, we
discard lines whose angles are within a certain angle degree (say 88 to 92) since these
23
Figure 4.10: Line detection and its angular histogram.
verticalornearlyverticallinesdonotcontributetothedeterminationofvanishingpoints.
The remaining lines are used in the angle histogram calculation.
(a) A non-VP image with a
high percentage of vertical lines
(b) A VP-image with a high
percentage of vertical lines
(c) A non-VP image without
vertical lines.
Figure 4.11: Illustration of vertical lines.
Three examples are shown in Fig. 4.11. The images in Figs. 4.11 (a) and (b) both
have a high percentage of vertical lines. However, Fig. 4.11 (a) has no vanishing point
while Fig. 4.11 (b) has three vanishing points. Also, Fig. 4.11 (c) has one vanishing
point, which is inferred from the lane division markers and the road boundary.
24
4.4 Summary
To distinguish images with or without vanishing points, we introduced the line angu-
lar histogram and use it as a feature vector in this chapter. The feature vector will be
fed to a classifier to give the final decision, which will be discussed in Chapter 6.
To reduce noisy edges in the image, we proposed three preprocessing methods so as
to make the line detection process more meaningful in our target application. They are:
1) bilateral filtering, 2) downsampling and 3) mean-shift segmentation. Finally, we use
the Hough transform to connect and/or extend discontinues edges.
In the next chapter, we will introduce another feature set, called the defocus feature,
and use it to supplement the angular histogram feature.
25
Chapter 5
Defocus Feature Extraction
5.1 Defocus Map Generation
We observe that images with vanishing points often have a near-to-far depth map
variation. This characteristic can be successfully captured by the defocus degree. The
defocus degree is a measurement of the image blur caused by the classic thin length
model of a camera as shown in Fig. 5.1 (a). If an object is located in the focal plane, all
rays from a point in the object will converge to one point in the image sensor plane (see
the green line in Fig. 5.1 (a) ). If the location of an object, d,doesnotequalto df,the
rays from a point in the object projected on to the image sensor will be a spot rather
than a point (see the blue lines in Fig. 5.1 (a) ). The circle of the spot, called the circle
of confusion (CoC) [4], can be used to measure the blur degree. The defocus region can
be modeled as a convolution of an ideal step function, f(x), with a point spread function
(PSF) in form of
i(x)= f(x)∗g(x,σ), (5.1)
where g(x,σ) is the PSF and σ is proportional to the diameter of CoC, namely, σ = kc.
26
Figure 5.1: (a) The thin lens model, and (b) the diameter of CoC, c, as a function of
object distance d with df = 500mm and f
0
=80mm [4].
The calculation of the defocus degree of a given image was proposed in [19]. In this
work, we adopt the following focal blur kernel [5]:
g(x,y,σ
b
)=
1
2πσ
2
b
exp(−
x
2
+y
2
2σ
2
b
), (5.2)
where σ
b
is called the blur degree and it is proportional to the diameter of CoC. To
estimate the blur degree, we can study a blurred edge because it is easier to tell the blur
degree from the edge region rather than a smooth region. Once we get the blur degree of
an edge, we can propagate it to the smooth region. Fig. 5.2 shows different blur degrees
for different edge regions.
Figure 5.2: Different blur degrees for different edges [5].
Consider a blurred edge in the y-axis with amplitude A and blur parameter σ
b
as
shown by the blue curve in Fig. 5.3. Since it is difficult to measure the distance of this
27
curve from low to high, we measure the distance between the peak and the valley in the
edge response to the second derivative filter. As proposed by Elder and Zucker in [19],
we model the edge response to the second derivative filter via
r
x
2
(x,y,σ
2
)= Au(x)∗g
x
2
(x,y,σ
2
b
+σ
2
2
)
=
−Ax
√
2π(σ
2
b
+σ
2
2
)
3
2
exp(−x
2
/2(σ
2
b
+σ
2
2
))
=
−Ax
√
2π(S/2)
3
exp(−x
2
/2(S/2)
2
),
(5.3)
where u(x) is a step function that indicates a sharp edge, σ
2
is the scale of the second
derivative operator, A is derived from the local extrema within the window around an
edge pixel, and
(S/2)
2
= σ
2
b
+σ
2
2
. (5.4)
Instead of finding the distance between the peak and the valley directly, a multi-scale
second-derivativeGaussianfilterresponsetoanedgepointisfitinaleast-squaredmanner
[19]. Then, the defocus degree σ
b
can be evaluated by Eq. 5.3. The result of a defocus
map with respect to an input image is shown in Fig. 5.4.
Figure 5.3: The edge response to a second-derivative filter [6].
28
Figure 5.4: (a) An input image and (b) its defocus map.
5.2 Post-Processing via Matting
As discussed above, the defocus map can be obtained by evaluating pixel’s second
derivative response. However, it is not a region-based method. As a result, the defocus
map may not match the object boundary well. In this section, we propose to use the
image matting algorithm to enhance the initial defocus map result further more.
The matting algorithm [20] is an optimization technique that is widely used in soft
segmentation along hairy object boundary such as the hair. The matting process can be
expressed as
I
i
= α
i
F
i
+(1−α
i
)B
i
, (5.5)
where I
i
is the known pixel value and F
i
, B
i
and α
i
are, respectively, the foreground
value, the background value and the blending parameters to be determined. Clearly,
this is an ill-posed problem as the equation has three unknown variables for a gray level
29
image and seven variables for a color image. A closed-form solution was proposed in [20]
by assuming that F and B are constants over a small window. Note that this assumption
does not imply that the input image, I, is locally smooth since parameter α also plays
a role in Eq. 5.5. Under this assumption, we can rewrite Eq. 5.5 as
α
i
≈ aI
i
+b, ∀i ∈ w, (5.6)
where a=1/(F −B), b = −B/(F −B)and w is a small window.
Based on the line cluster model of a natural color image, we can define a cost function
as
J(α,a,b)=
j∈I
(
i∈w
j
(α
i
−a
j
I
i
−b
j
)
2
+a
2
j
), (5.7)
J(α)= α
T
Lα. (5.8)
It was shown in [20] that parameters a and b in Eq. 5.7 can be eliminated and, for an
image with N pixels, L is an N ×N Laplacian matrix that is only related to the mean
and the covariance of neighboring pixel values in a given window.
In the matting problem, scribbles are often used as constraints so that we can add a
second term to yield the following equation:
α = argminα
T
Lα+λ(α
T
−bs
T
)Ds(α−bs), (5.9)
where Ds is an N ×N diagonal matrix whose diagonal element indicates whether there
is a scribble constraint in that pixel location and bs is the constraint (scribble) value.
30
(a) Scribble (b) Alpha map (c) Scribble (d) Alpha map
Figure 5.5: The matting algorithm result.
The matting algorithm can be implemented as follows. The input of the matting
algorithm is the original image, the scribble markings in the foreground and the back-
ground serve as constraints and the output matting map is the enhanced defocus map.
One example of the defocus map after matting is shown in Fig. 5.6. We see that the
matting algorithm can preserve the edge information and propagate the scribble based
on the color and edge constraints from the input image. In the current context, since we
want to match the defocus map with the original image, we use the initial defocus map
as the scribble that provides the constraint for the final matting result. The defocus map
after matting is called the “defocus matting map” in this work.
5.3 Defocus Feature
The defocus degree provides valuable visual cues in distinguishing images with and
without vanishing points. Exemplary images are shown in Figs. 5.7 and 5.8 which show
the defocus map and the defocus matting map of several test images.
Boundary patches in images with vanishing points have a different defocus degree
because of the perspective distortion. Generally speaking, the defocus matting maps of
31
Figure 5.6: (a) The input image, (b) the initial defocus map, and (c) the defocus matting
map.
Figure 5.7: Images with vanishing points.
32
Figure 5.8: Images without vanishing points.
images with and without vanishing points are differnt in the following manner. The defo-
cusmattingmapwithvanishingpointsgivessomedirectionalinformation. Threepatches
along the upper and the lower boundaries of the defocus matting map are selected, where
the patch size is proportional to the image dimension. The average defocus value in each
patch is computed and they form a six-dimensional feature vector as shown in Fig. 5.9.
33
Figure 5.9: Illustration of the defocus feature vector.
34
Chapter 6
Experimental Results
We ask 10 subjects to evaluate whether there exists a vanishing point or not for all
training and test images as the ground truth. We discard images that do not have a
strong agreement. The training image with the ground truth is called the labeled image
data, where the label is a binary decision; namely, an image with or without a vanshing
point. Then, supervised learning is adopted to learn the model from the labeled image
data. In supervised learning, features are selected and the image labels are obtained
by human being. In contrast, unsupervised learning attempts to learn a model from
unlabeled data, where a computer need to find the natural grouping of un-labeled image
data.
6.1 Vanishing Point Existence Detection
The vinashing point existence detection problem is solved via supervised learning
since labeled images are used and features are pre-selected. A detector is trained to
learn the relationship between features and labels. Then, given a test image, it will
determine whether the image has a vanishing point or not.
The adaptive boosting (AdaBoost) is a machine learning algorithm that is often
used for improving the performance by learning the mistake in the training process.
AdaBoost adaptively adjusts the weights of miss-classified data to make the model more
35
robust [21]. AdaBoost is useful in boosting the performance of weak classifiers as long
as the classification result of weak classifiers is not random.
In the experiment, we used twenty images with the angular histogram feature and the
defocus degree feature to train a model to be used in the existence detection step. The
test images were randomly selected from the database. We used the machine learning
open source codes called “libsvm” [22] and the 5-fold cross validation. We compared the
performance of each single feature set and the combined feature sets in Table 6.1, where
we also compared the performance of SVM and Adaboost.
Table 6.1: Results for using different features and classifiers
Feature Extraction Feature Size Accuracy(%) Classifier
Angular Histogram 12 75.9 SVM
Defocus Degree 6 70.6 SVM
AH+DD 12+6 = 18 77.5 SVM
AH+DD 12+6 = 18 79.3 Adaboost
The results show that the angular histogram feature outperforms the defocus degree.
Combiningtwofeaturesetsasasinglefeaturesetyieldshigheraccuracythaneachfeature
set. The use of adaboost can enhance the performance furthermore.
6.2 Vanishing Point Location Estimation
To determine the number of vanishing points in an image, we use the k-means algo-
rithm to cluster the lines into two and three classes, respectively, check the intra-cluster
line angle variance, and choose the one that has a smaller variance. For each line cluster,
the following vanishing point estimation method is used.
36
We used the Hough transform for line detection. In order to apply the Hough Trans-
form, an initial edge map is needed. For this, we used the Canny edge detector in our
implementation. An example is given in Fig. 6.1.
(a) Original image (b) Canny edge detection result
(c) Hough space (d) Line detection result
Figure 6.1: A line detection example.
We implemented the vanishing point estimation method proposed in [7], which is
described below. The frequency of intersection point p is defined as
ϕ(p)=(ξ
2
(p)−ξ(p))/2, (6.1)
where ξ(p) is the number of lines that pass through one intersection point. For each
intersection point in the image, we calculate its frequency ϕ(p) . Ideally, vanishing lines
37
that converge to one vanishing point should have only one intersection point. However,
in practice, these lines do not always converge to one point but a small neighborhood.
Thus, we measure the probability of a vanishing point in a local neighborhood as
Γ(p)=
p∈N(p)
ϕ(p), (6.2)
where ϕ(p) is the frequency of the point in the image, N(p) is a neighborhood of point
p. We use a circular region to define the neighborhood of a point [7], where the radius
of the circle is proportional to the dimension of the image.
We implemented and compared the results of two methods in Fig. 6.2. The left
column is our implementation of the method in [7] while the right column gives the
results from [2]. The algorithm proposed by Tardif in [2] used the J-linkage algorithm to
fit a multi-model to detected lines. Although it performs well in line grouping, it does
not know the group number in advance. Besides, the Hough line detection capture the
main structure line well especially in the complex scene as shown in Fig. 6.1.
38
Figure 6.2: Vanishing point location estimation: the left column is the implementation
in [7] while the right column is the result from [2].
39
Chapter 7
Conclusion and Future Work
7.1 Conclusion
In this thesis, we studied the vanishing point existence detection problem and, if it
exists, its location estimation problem. We first examined several line detection algo-
rithms and image processing techniques and used them as pre-processing tools. Then,
we proposed ways to classify images into two classes (namely, those with or without a
vanishing point). To the best of our knowledge, this problem has never been addressed in
literature before. We extracted two features to achieve this task. They are: the angular
histogram and the defocus degree. The angular histogram achieves a better result, and
the Adaboost algorithm can improve the accuracy of SVM furthermore. For vanishing
point location estimation, we first used the Hough transform to detect lines and then
adopted the intersection point neighborhood concept to estimate the vanishing point
location. Finally, we compared its performance with that of the J-linkage algorithm.
7.2 Future Work
The vanishing point is an important characteristic of an image. Knowing whether
there is vanishing point and where the vanishing point is located facilitates us in tackling
several computer vision problems. In the future, we are interested in improving the
accuracy in both the existence detection and and the location estimation tasks with more
40
features. Note that the angular histogram and the defocus degree are local features, and
we may consider some global features to improve the performance furthermore.
41
Bibliography
[1] R. G. von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “Lsd: A line segment
detector,” URL http://www. ipol. im/pub/algo/gjmr line segment detector, 2012.
[2] J.-P. Tardif, “Non-iterative approach for fast and accurate vanishing point detec-
tion,” in Computer Vision, 2009 IEEE 12th International Conference on. IEEE,
2009, pp. 1250–1257.
[3] R. Toldo and A. Fusiello, “Robust multiple structures estimation with j-linkage,”
Computer Vision–ECCV 2008, pp. 537–547, 2008.
[4] S. Zhuo and T. Sim, “On the recovery of depth from a single defocused image,” in
Computer Analysis of Images and Patterns. Springer, 2009, pp. 889–897.
[5] J. H. Elder and S. W. Zucker, “Local scale control for edge detection and blur
estimation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on,
vol. 20, no. 7, pp. 699–716, 1998.
[6] J. Wang, H. Xu, and C.-C. J. Kuo, “Single-image depth inference based on blur
cues,” in Signal & Information Processing Association Annual Summit and Confer-
ence (APSIPA ASC), 2012 Asia-Pacific. IEEE, 2012, pp. 1–4.
[7] F. Schmitt and L. Priese, “Vanishing point detection with an intersection point
neighborhood,” in Discrete Geometry for Computer Imagery. Springer, 2009, pp.
132–143.
[8] S. Battiato, S. Curti, M. La Cascia, M. Tortora, and E. Scordato, “Depth map
generation by image classification,” in Proceedings of SPIE, vol. 5302, 2004, pp.
95–104.
[9] http://en.wikipedia.org/wiki/Perspective.
[10] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model
fitting with applications to image analysis and automated cartography,” Communi-
cations of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
[11] E. Vincent and R. Lagani´ ere, “Detecting planar homographies in an image pair,”
in Image and Signal Processing and Analysis, 2001. ISPA 2001. Proceedings of the
2nd International Symposium on. IEEE, 2001, pp. 182–187.
42
[12] Y. Kanazawa and H. Kawakami, “Detection of planar regions with uncalibrated
stereo using distributions of feature points,” in British Machine Vision Conference,
vol. 1. Citeseer, 2004, pp. 247–256.
[13] M. Zuliani, C. Kenney, and B. Manjunath, “The multiransac algorithm and its
application to detect planar homographies,” in Image Processing, 2005. ICIP 2005.
IEEE International Conference on, vol. 3. IEEE, 2005, pp. III–153.
[14] J. M. Coughlan and A. Yuille, “Manhattan world: Orientation and outlier detection
by bayesian inference,” Neural Computation, vol. 15, no. 5, pp. 1063–1088, 2003.
[15] P. Denis, J. Elder, and F. Estrada, “Efficient edge-based methods for estimating
manhattan frames in urban imagery,” Computer Vision–ECCV 2008, pp. 197–210,
2008.
[16] B.C.Russell, A.Torralba, K.P.Murphy, andW.T.Freeman, “Labelme: adatabase
and web-based tool for image annotation,” International journal of computer vision,
vol. 77, no. 1, pp. 157–173, 2008.
[17] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in
Computer Vision, 1998. Sixth International Conference on. IEEE, 1998, pp. 839–
846.
[18] D. Comaniciu and P. Meer, “Mean shift: A robust approach toward feature
space analysis,” Pattern Analysis and Machine Intelligence, IEEE Transactions on,
vol. 24, no. 5, pp. 603–619, 2002.
[19] S. Bae and F. Durand, “Defocus magnification,” in Computer Graphics Forum,
vol. 26, no. 3. Wiley Online Library, 2007, pp. 571–579.
[20] A.Levin, D.Lischinski, andY.Weiss, “Aclosed-formsolutiontonaturalimagemat-
ting,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30,
no. 2, pp. 228–242, 2008.
[21] http://en.wikipedia.org/wiki/AdaBoost.
[22] C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,” ACM
Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27,
2011.
43
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Machine learning techniques for outdoor and indoor layout estimation
PDF
Machine learning based techniques for biomedical image/video analysis
PDF
Effective graph representation and vertex classification with machine learning techniques
PDF
Accurate 3D model acquisition from imagery data
PDF
3D urban modeling from city-scale aerial LiDAR data
PDF
Object detection and recognition from 3D point clouds
PDF
Advanced visual processing techniques for latent fingerprint detection and video retargeting
PDF
Advanced coronary CT angiography image processing techniques
PDF
3D object detection in industrial site point clouds
PDF
Hybrid methods for robust image matching and its application in augmented reality
PDF
A treatise on cascaded computer generated holograms
PDF
Effective incremental learning and detector adaptation methods for video object detection
PDF
Efficient machine learning techniques for low- and high-dimensional data sources
PDF
Scalable sampling and reconstruction for graph signals
PDF
Incorporating aggregate feature statistics in structured dynamical models for human activity recognition
PDF
Automatic image matching for mobile multimedia applications
PDF
Green learning for 3D point cloud data processing
PDF
Advanced techniques for stereoscopic image rectification and quality assessment
PDF
Grounding language in images and videos
PDF
Explainable and green solutions to point cloud classification and segmentation
Asset Metadata
Creator
Ren, Yuzhuo
(author)
Core Title
Techniques for vanishing point detection
School
Viterbi School of Engineering
Degree
Master of Science
Degree Program
Electrical Engineering
Publication Date
04/29/2013
Defense Date
03/12/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
computer vision,image processing,machine learning,OAI-PMH Harvest,SVM,vanishing point
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Kuo, C.-C. Jay (
committee chair
), Ortega, Antonio K. (
committee member
), Sawchuk, Alexander A. (Sandy) (
committee member
)
Creator Email
renyuzhuo89@gmail.com,yuzhuore@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-245264
Unique identifier
UC11288011
Identifier
etd-YuzhuoRen-1611.pdf (filename),usctheses-c3-245264 (legacy record id)
Legacy Identifier
etd-YuzhuoRen-1611-0.pdf
Dmrecord
245264
Document Type
Thesis
Rights
Ren, Yuzhuo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
computer vision
image processing
machine learning
SVM
vanishing point