Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Efficient SLAM for scanning LiDAR sensors using combined plane and point features
(USC Thesis Other)
Efficient SLAM for scanning LiDAR sensors using combined plane and point features
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
E F F I C I E N T S L A M F O R S C A N N I N G L I D A R
S E N S O R S U S I N G C O M B I N E D P L A N E A N D
P O I N T F E AT U R E S
randolph charles voorhies
A thesis submitted in partial fullfilment of the requirements for
the Degree of DOCTOR OF PHILOSOPHY
University of Southern California
Department of Computer Science
August 2015
The best way to predict the future is to invent it.
— Alan Kay
Dedicated to my loving and supportive family.
A B S T R A C T
This work presents a novel SLAM algorithm for scanning (Velodyne
style) LiDAR sensors. A Hough transform algorithm is first derived
which exploits the unique geometry of these sensors to detect planar
features, and it is then shown how these features can be matched over a
sequence of scans to reconstruct the path of the sensor. A mathematical
framework is then developed to track how well constrained these
sequential alignment problems are, and to detect when they become
underconstrained. A method is then presented which determines
a minimal set of non-planar features to be extracted from scenes to
achieve full constraint within this framework. Finally, all of these tools
are combined into an online SLAM system which is able to close the
loop on a variety of indoor and outdoor datasets without the use of
odometry, GPS, or an IMU. Both the frame to frame alignment as well
as the SLAM algorithms are compared to state of the art methods and
superior performance is shown for both.
iii
P U B L I C AT I O N S
Some ideas and figures have appeared previously. The following is a
complete list of publications to which I have contributed during the
course of this degree:
[1] Rangachar Kasturi, Dmitry Goldgof, Rajmadhan Ekambaram,
Gill Pratt, Eric Krotkov, Douglas Hackett, Qinfen Zheng, Yang
Ran, Rajeev Sharma, Mark Anderson, Mark Alan Peot, Mario
Aguilar, Deepak Khosla, Yang Chen, Kyungnam Kim, Lior
Elazary, Randolph Voorhies, Daniel Parks, and Laurent Itti.
“Performance Evaluation of Neuromorphic-Vision Object Recog-
nition Algorithms”. In: Proc. International Conference on Pattern
Recognition (ICPR). 2014.
[2] Randolph Voorhies, Shane Grant, and Laurent Itti. “Finding
Planes in LiDAR Point Clouds for Real-Time Registration”. In:
Proc. IEEE/RSH International Conference on Intelligent Robots and
Systems (IROS). 2013.
[3] Ludovic Righetti, Kalakrishnan Mrinal, Peter Pastor, Jonathan
Binney, Jonathan Kelly, Randolph Voorhies, Guarav Sukhatme,
and Stefan Schaal. “An Autonomous Manipulation System
based on Force Control and Optimization”. In: Autonomous
Robots (2013).
[4] Randolph Voorhies, Lior Elazary, and Laurent Itti. “Neuromor-
phic Bayesian Surprise for Far-Range Event Detection”. In: Proc.
IEEE International Conference on Advanced Video and Signal Surveil-
lance (AVSS). 2012.
[5] Randolph Voorhies, Lior Elazary, and Laurent Itti. “Applica-
tion of a Bottom-Up Visual Surprise Model for Event Detection
in Dynamic Natural Scenes”. In: Vision Science Society Annual
Meeting (VSS). 2010.
[6] Christian Siagian, Chin-Kai Chang, Randolph Voorhies, and Lau-
rent Itti. “Beobot 2.0: Cluster Architecture for Mobile Robotics”.
In: Journal of Field Robotics (JFR) (2010).
[7] Randolph Voorhies, Christian Siagian, Lior Elazary, and Laurent
Itti. “Centralized Server Environment for Educational Robotics”.
In: Proc. IEEE/RSJ International Conference on Intelligent Robots and
Systems (IROS). 2009.
iv
A C K N O W L E D G M E N T S
Many thanks to the wonderful team at iLab. All of your hard work
was an inspiration during the creation of this work. In particular I
would like to thank my collaborator Shane Grant for his contribu-
tions to the works which made this thesis possible, and Christian
Siagian for being the first to inspire me to apply to graduate school.
Additionally, I would like to thank my advisor Laurent Itti for his
continued support starting in my undergraduate days through the
present. Finally, I would like to thank my family for their unwavering
support throughout this very long academic journey. I know you
always wanted me to be a doctor or a lawyer — this counts, right?
Talk is cheap. Show me the code. — Linux Torvalds [1]
v
C O N T E N T S
List of Figures vii
List of Algorithms viii
1 introduction 1
1.1 LiDAR Overview and Challenges . . . . . . . . . . . . . 2
1.2 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . 8
2 lidar plane detection 24
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Finding Points on Planes . . . . . . . . . . . . . . . . . . 25
2.4 Voting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Peripheral Planes . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Filtering and Refitting . . . . . . . . . . . . . . . . . . . 35
2.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 frame to frame registration 42
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2 Planar Alignment . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Planar ICP (PICP) . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4 simultaneous localization and mapping 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 PICP SLAM . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 conclusion 88
bibliography 93
vi
L I S T O F F I G U R E S
Figure 1 Typical 2D LiDAR Scanner . . . . . . . . . . . . 5
Figure 2 Example 2D SLAM Map . . . . . . . . . . . . . 6
Figure 3 Nodding Laser Rangefinder . . . . . . . . . . . 7
Figure 4 Spherical Laser Rangefinder . . . . . . . . . . . 7
Figure 5 Velodyne Laser Arrangement . . . . . . . . . . 9
Figure 6 Single scan from a Velodyne HDL-32E . . . . . 10
Figure 7 Diagram of Velodyne Conic Sections . . . . . . 26
Figure 8 Illustration of Plane Constraints . . . . . . . . . 30
Figure 9 Ball Accumulator . . . . . . . . . . . . . . . . . 39
Figure 10 Plane Curvature Voting Strength . . . . . . . . 40
Figure 11 Plane Pertubation . . . . . . . . . . . . . . . . . 40
Figure 12 Example Plane Detections . . . . . . . . . . . . 41
Figure 13 Human Portable Velodyne Capture Rig . . . . 51
Figure 14 Plane Only Frame-to-Frame Registration . . . 52
Figure 15 Frame to Frame Alignment Average Framerate 66
Figure 16 Frame to Frame Alignment Start/End Distance 67
Figure 17 Tutor Campus Center Dataset Frame-to-Frame
Alignment . . . . . . . . . . . . . . . . . . . . . 68
Figure 18 Parkside Dataset Frame-to-Frame Alignment . 69
Figure 19 Alley Dataset Frame-to-Frame Alignment . . . 70
Figure 20 Example SLAM Graph . . . . . . . . . . . . . . 74
Figure 21 Parkside SLAM . . . . . . . . . . . . . . . . . . 80
Figure 22 Tutor Campus Center SLAM . . . . . . . . . . 81
Figure 23 RTH Dataset SLAM . . . . . . . . . . . . . . . . 82
Figure 24 Alley SLAM . . . . . . . . . . . . . . . . . . . . 83
Figure 25 HNB SLAM . . . . . . . . . . . . . . . . . . . . 84
Figure 26 SLAM Average Framerate . . . . . . . . . . . . 85
Figure 27 Detailed timing of Tutor dataset showing the
time required for each state in the PICP SLAM
system. . . . . . . . . . . . . . . . . . . . . . . . 86
Figure 28 SLAM Start/End Distance . . . . . . . . . . . . 87
vii
L I S T O F A L G O R I T H M S
3.1 Finding Plane Correspondences . . . . . . . . . . . . . . 46
3.2 Combined Plane Alignment and ICP (PICP) . . . . . . 62
3.3 Choosing Minimal Points for PICP . . . . . . . . . . . . 63
4.1 PICP SLAM Algorithm . . . . . . . . . . . . . . . . . . . 76
4.2 PICP SLAM Update Algorithm . . . . . . . . . . . . . . 77
4.3 PICP SLAM Map Rendering . . . . . . . . . . . . . . . . 77
viii
1
I N T R O D U C T I O N
Scanning LiDARs are becoming an increasingly popular sensor to use
in outdoor robotics applications. The introduction of the Velodyne
3D scanning LiDAR [2] to the 2007 DARPA Grand Challenge proved
the utility of such sensors for real-world environments. To wit, 5
out of the 6 finishing teams relied on the range data from Velodyne
sensors for tasks such as obstacle avoidance and road detection [3].
Unfortunately, due to the lower resolution and higher noise than
the structured light and stereo counterparts there has been a lack
of efficient feature detection, scan matching, and SLAM algorithms
suitable for onboard processing on computationally limited platforms
using these devices. As the size and cost of such sensors inevitably
shrinks (Velodyne recently announced a 600-gram sensor for $7,999
[4]), the need for such algorithms will increase.
The following work demonstrates a new SLAM algorithm built
upon novel plane detection and scan matching methods. In the rest
of this chapter, we discuss many of the challenges associated with
SLAM implementations, and scanning LiDAR sensors in particular.
We also provide an extensive overview of the previous work in frame
to frame alignment, plane detection, and SLAM. In Chapter 2 we
1
demonstrate a novel and efficient plane detection algorithm which
exploits the geometry of scanning LiDAR sensors to provide a robust
planar segmentation of a scene. In Chapter 3 we show how these
planar segmentations can be used to perform a very computationally
efficient registration of two scans. Section 3.3 builds upon these tools
to produce an alignment method that performs well even when the
alignment is underconstrained given the planar segmentations. Finally,
Chapter4 uses all of these tools to develop a novel SLAM algorithm
which is able to beat a state of the art competitor on several difficult
datasets.
1.1 lidar overview and challenges
While there are many approaches to obtaining3D distance measure-
ments, LiDAR based techniques have been popular for outdoor appli-
cations since their inception due to their robustness and range. There
are two main technologies to determining a single range measure-
ment, time of flight (ToF) and phase shift. ToF sensors detect range
by sending out a short pulse of light, and then measuring the amount
of time for the signal to return. Such sensors generally have long
minimum distance specifications and require very sensitive timing
circuitry to achieve high precision. However they often support high
power, and thus longer range lasers. Such a system was developed by
Johnston in 1973 [5] at JPL as a sensor system for unmanned missions
2
to Mars. Phase based methods, such as the one originally developed
by Nitzan, et al. in1977 [6], measure the phase shift of the returned
signal, rather than the time of flight. Such systems are generally more
affordable, provide faster sampling rates, and have shorter minimum
ranges than their ToF counterparts. However, phase based methods
have a shorter maximum range based on the laser frequency, unless
phase ambiguities are explicitly handled.
Regardless of the technology used to sense the distance along a
single laser ray, many such measurements are necessary to build a
full representation of the world, and thus many sensor geometries are
available. The earliest such systems, for example Nitzan et al.[6], used
a dual mirror arrangement to produce an X-Y raster scan of a scene.
However, as LiDAR technology became more mature, 2D scanning
systems became relatively inexpensive and commercially available.
2D sensors such as the one shown in Figure 1 operate by pointing
a laser emitter/detector unit into a spinning mirror. As the mirror
spins, a 360
series of measurements can be taken spanning a plane.
By orienting the sensor parallel to the floor, a ”top-down” snapshot of
the world can be taken by each sweep of the mirror. These systems
quickly found commercial use in manufacturing centers as safety
”light curtains” to prevent the accidental proximity of human limbs to
heavy machinery. Robotics researchers quickly adopted these sensors,
and began developing 2D SLAM algorithms to stitch a sequence of
360
scans into a consistent floor map. An example of such a map
can be seen in Figure 2. Such maps often resemble floor plans, and
3
can provide a rough understanding of traversability and environment
structure to a mobile robot. Unfortunately, more detailed information
such as the presence of overhangs and steps, as well as semantically
useful features such as tabletops are missing.
To address this, researchers have developed ways to capture full
3D point clouds with LiDAR sensors by actuating their off the shelf
2D sensors. Two such actuation schemes are popular, the ”nodding”
LiDAR and the ”spherical” LiDAR. Nodding LiDAR systems, e.g. as
shown in Figure 3 pivot the 2D sensor such that the beams ”nod”
up and down, typically following a sine wave. Spherical systems,
e.g. Figure 4 instead rotate the 2D sensor, creating a sphere of sensor
beams. Both configurations are able to produce extremely dense point
cloud scans. However each scan typically takes multiple seconds,
limiting their applicability to dynamic real-time applications.
Beginning in2005, Velodyne Inc. began developing a LiDAR system
that would be capable of capturing full 3D scans at a high rate. By
2007, five of the six teams finishing the DARPA Grand Challenge au-
tonomous race were using Velodyne LiDAR sensors for tasks such as
obstacle avoidance and road detection. The sensor is able to capture at
such a high framerate because of its unique configuration of multiple
lasers spinning around a single axis. Figure 5 shows the arrangement
of lasers on the Velodyne HDL-32E, the 32 laser unit which was used
in this work. The 32 lasers are mounted at fixed inclination angles,
ranging from +10
to30
with even angular spacing. These lasers
are mounted to a motor, which rotates in the azimuthal direction at
4
Figure 1: Diagram of a typical 2D LiDAR sensor. A single emit-
ter/detector package is aimed downwards onto a spinning
mirror, which allows a nearly 360
view of a 2D slice of the
world.
a fixed 10 RPM. Through each revolution, the entire laser column
fires around 2,000 times producing approximately 64,000 distance
measurements per revolution, or6.4 million measurements per second.
Each laser is capable of measuring out to70 meters with an average
accuracy of2 centimeters. The high capture speed and range of this
sensor make it very attractive for real-time outdoor mapping applica-
tions. However, the sparseness of the data as distance increases makes
applying existing scan-matching and SLAM algorithms a challenge.
Figure6 shows a single scan taken from an HDL-32E demonstrating
this issue. The sparseness between scan lines causes major problems
for simple point to point alignment methods, which assume that each
point measurement corresponds to a physical point in space. This
assumption is approximately correct when dealing with very dense
5
Figure 2: Example of a2D map constructed by stitching together scans
from a 2D planar LiDAR. Figure reproduced from [7].
6
Figure 3: A ”nodding” laser rangefinder can be constructed by mount-
ing a 2D scanning LiDAR on a servo tilting mechanism. The
resulting sensor is able to capture full 3D scans of the world
by incrementing the tilt of the planar scanar once the mirror
has spun 360
. This figure reproduced from [8].
Figure 4: A ”spherical” laser rangefinder is constructed by rotating
a 2D scanning LiDAR along it’s center axis, resulting in a
sphere of measurements. This figure reproduced from [9].
7
point clouds, however as the density decreases so does the applicabil-
ity of these methods. Many methods have been developed to aid in the
frame alignment and SLAM problems, and many of these methods do
alleviate some of the difficulties posed here. However, none of these
methods are able to achieve both the speed and accuracy necessary for
online SLAM with a Velodyne style sensor. The strengths and failings
of these will be outlined in the following section, and a novel method
to address these problems will be proposed beginning in Chapter 2.
1.2 previous work
1.2.1 Frame to Frame Alignment
The seminal work in point cloud frame to frame alignment is the
Iterative Closest Point (ICP) algorithm by Chen, Yang and Medioni [10].
ICP works to minimize the total distance between points by switching
between a point matching phase, and an aligment solving phase. In
the point matching phase correspondences are made between the
two clouds using various metrics such as point-to-point or point-to-
plane. These correspondences are then used to solve for the rigid body
transformation which can minimize the distance between them. ICP
has spawned an entire subfield of research which aims to improve the
efficiency and robustness of the algorithm. Iterative Double Clustering
(IDC) [11] was introduced to replace the standard Euclidean distance
8
Figure 5: Velodyne HDL-32E laser arrangement
9
Space between scanlines
Figure 6: A single scan from a Velodyne HDL-32E taken from the
Tutor Campus Center dataset. Note how the angle of the
lasers results in decreasing resolution along the inclination
angle, and thus an increased distance between scan lines.
This sparseness makes frame to frame matching a challenge.
10
function common in ICP point matching, as the Euclidean distance
does not properly handle large rotations of the sensor. IDC approaches
this problem by computing dual error metrics and switching between
them. A standard Euclidean error is computed for translation offsets,
while a new “range-rule” which transforms the input data into polar
coordinates and matches across bounded ranges is used to solve
for rotations. The mbICP variant [12] defines a new unified error
metric which takes into account the translational and rotational error
inherent in the sensor. This new error metric is computed by taking
into account the SE2 transformations between the frames of the two
scans. Just after the release of mbICP , a new matching metric was
introduced called Polar Scan Matching (PSM) [13]. PSM reduces
the complexity of pose estimation toO(n), and the complexity of
orientation estimation toO(kn) where k is the angular resolution
of the sensor. PSM achieves this speedup by pre-segmenting its
data into smooth sections, and by working directly in the native
polar coordinates of the sensor. Other work has focused on how
to effectively subsample point sets. One review [14] examined in
detail the effects of various subsampling methods, including uniform
random sampling, gradient based sampling, and iteration dependent
sampling. Additionally, the authors introduce a new sampling method,
which aims to produce a set of points whose normals are as uniformly
distributed as possible. This idea of sub-sampling points from the
normal space will be explored later in this report.
11
A very successful solution to the frame alignment comes from
Douillard, et al. [15, 16] in which point clouds are first segmented
by detecting and subtracting the ground plane from the data, and
then clustering the remaining points into segments. Those segments
are then fed into a modified ICP to compute an optimal alignment
between successive frames. Their solution is both fast and accurate,
and has influenced the work presented here.
Another extension to ICP that has heavily influenced this work is
the Generalized ICP (GICP) variant [17]. GICP rephrases the standard
ICP minimization procedure into a probabilistic framework where
the Mahalanobis distance is used rather than the Euclidean or other
aforementioned metrics. The covariance used for the comparison of
any two points is the sum of the neighborhood covariances of those
two points. When the neighborhoods of both points are planar, this ap-
proaches a plane-to-plane distance. If only one point’s neighborhood
is planar then a point-to-plane distance is approximated, and if neither
is planar, then the algorithm approximates a point-to-point metric.
The use of the covariance matrices to define this distance allows a
completely smooth transition between all three states. In Chapter 3,
we show how these ideas can be incorporated into our plane based
registration system to recover from poorly constrained environments.
12
1.2.2 Plane Detection
The value of planar features has been recognized since the early
days of robotics. As early as 1972, researchers were creating ways
to sense planar patches. For example, [18] projects grid patterns
into the world, and then detects the distortions of these patterns
through a fast Fourier transform of the image. These deflections are
mapped into planar regions through analysis of the resulting Fourier
spectrum. By 1977, research into range images was using phase-
based scanning LiDAR sensors to analyze scenes and extract various
features. In [6], the extraction of features such as physical edges,
point normals, and planar regions is explored using such LiDAR
sensors, and the authors even briefly mention the idea of using Hough
Transforms [19] for automatic plane detection. A basic system which
demonstrates the ability to isolate horizontal surfaces in an image was
then created. Two years later, the authors describe a fully automated
plane detection algorithm [20], which implements separate search
phases for horizontal, vertical, and then arbitrary planes. The search
for horizontal planes needs only to take place in one dimension, as the
inclination and azimuth of the normal is known and only the offsets
need to be discovered. Vertical planes are found by using a 2D Hough
transform, searching in the azimuth and offset dimensions. Arbitrary
planes are then found by first segmenting the image according to a
histogram of intensity values, and then growing regions outwards by
13
constantly refitting plane equations. While quite early in the history of
point cloud processing, these ideas of Hough transforms and iterative
model refinement were very influential in later papers. By 1986, plane
detection from range images was being used for object representation
and recognition [21]. By detecting planelets [22] and analyzing their
directions and degrees of curvature, discrete surface representations
are built which form sufficient object descriptors.
In 1996, a review paper created a ground-truthed dataset for planar
patch detection from range images to evaluate the performance of four
different algorithms [23]. The dataset consisted of images of up to five
polyhedral objects placed against simple planar backgrounds. Each im-
age was hand labeled at high magnification to produce alignments for
known CAD models of the objects. Of the four algorithms analyzed,
those submitted by the USF team and the UE team were standard
region growing approaches. The WSU team submitted a clustering
based approach [22,24], while the UB submission exploits the scanline
structure of the range images to gain a significant processing speedup
[25]. Although there was no clear winner in terms of detection quality
(each of the tested algorithms displayed various strengths and weak-
nesses depending on the dataset), the UB algorithm’s execution time
was over an order of magnitude faster than the next fastest contender.
This illustrates how a careful understanding and exploitation of the
structure of incoming range data can lead to dramatic differences in
algorithm efficiency.
14
A more recent review by Borrman, et al. [26] divides modern
plane detection algorithms into Hough transform, and region growing
approaches. A naive implementation of a Hough transform to detect
3D planes would require each point in a cloud to vote for every
possible plane that could pass through it. Voting is performed in a 3D
accumulator, whose dimensions represent the q, f, and r parameters
of possible planes. Each point votes by incrementing the accumulator
along a sinusoidal manifold representing the possible planes. Peaks
in the accumulator are then found, and their coordinates represent
planes with strong support in the input. Due to the very high cost
of the naive approach (O
jPj N
q
N
f
, wherejPj is the number of
points, N
q
is the theta resolution, and N
f
is the phi resolution of the
accumulator), many variants have been developed.
One early idea to speed up the Hough transform comes from the
computer vision literature, in which the input point set is simply
randomly sub-sampled in an algorithm called the Probabilistic Hough
Transform (PHT) [27]. The authors note that the input size (and thus
runtime) can often be dramatically decreased without significant loss
in detection quality. However, deciding by how much to reduce the
input, and subsequently how to set the accumulator threshold can be
difficult. These problems are addressed by the Adaptive Probabilistic
Hough Transform (APHT) [28], which was originally applied to circle
detection in 2D images. The algorithm is supplied with an upper
bound on the number of possible objects to be detected, and constantly
monitors the state of the accumulator to determine when to terminate
15
processing by keeping track of stable points. A further refinement on
this idea takes place in the Progressive Probabilistic Hough Transform
(PPHT) [29], in which the stopping rule is removed altogether, and
the system is restructured into an ”anytime” algorithm which can
be stopped at any stage of processing and still produce meaningful
results. This is achieved by constantly updating the threshold which
must be passed by an accumulator bin. By assuming that votes fall into
accumulator bins according to a Poisson distribution, the threshold is
set such that only bins whose votes are unlikely to be from noise are
accepted. Additionally, once a bin crosses the current threshold, all
points which voted for that bin are removed from the input set, and all
votes that they had previously cast are retracted from the accumulator.
One of the most successful variants of the Hough Transform as
applied to3D plane detection is the Randomized Hough Transform
(RHT) [30]. The RHT iteratively selects three points, and then votes
for the single plane that spans them. By repeating this procedure,
candidate planes which have have high support are more likely to
receive more votes in the accumulator. The benefit of this scheme
is that the expensive manifold voting is completely eliminated as
each triplet of points votes for only one plane. Similarly to PPHT,
once an accumulator cell passes a threshold all points which lie on
that plane are removed from the input data set to increase efficiency.
Beyond the comprehensive review of Hough methods, Borrman, et
al. also introduce a novel variable resolution accumulator to handle
16
the singularities of ahq,f,ri plane parameterization. This clever
accumulator design will be discussed in further detail in Chapter 2.
As the RHT does not vote for a manifold of solutions for each point,
it is similar in spirit to the variety of RANSAC driven plane detection
methods. Originally introduced by Fischler and Bolles in 1981 [31],
RANSAC was proposed as a solution to the Location Determination
Problem (given an image of known landmarks, determine the location
from which the image was taken). RANSAC has since been applied
to a huge number of application domains - at the time of this writing
Google Scholar tallies an astounding 11,671 citations to the original
paper. RANSAC differs from many other model fitting algorithms
in that rather than starting with all available data and then trying to
eliminate outliers, it iteratively selects an absolute minimal set of data
points, computes a model, and chooses to keep only those models
which have a maximal number of inliers. This method was applied
to plane detection in 2000 by both Ameri, et al. [32] and Brenner
[33] to detect roofs for automatic building segmentation from aerial
LiDAR data. When trying to detect multiple planes in input data,
the standard approach is to first detect a dominate plane by treating
all other points as outliers. Once a dominate plane is identified, all
points belonging to it are subtracted from the input data, and the
process is repeated until no more planes are found. Unfortunately,
this approach often has trouble dealing with pseudo-outliers, i.e.
data points which are outliers to the current dominate structure,
but are inliers to other structures [34]. Gallo, et al. [35] propose an
17
extension to this standard approach by only counting inliers from
the largest connected component of all inliers in the set with their
algorithm CC-RANSAC. This helps prevent an undersegmentation of
available planes caused by jumping across boundaries. Douillard, et
al. [15] developed a probabilistic sample consensus algorithm called
GP-INSAC (as well as a mesh-based segmentation method), but show
only how to extract the ground plane from sparse Velodyne data.
Yet another approach to plane detection is that of region growing.
By first selecting a seed point and then growing outwards from that
point, models of planar patches can be built. These methods often
require a specific regular structure of input data, as they rely on N-
connected component analysis or scanline following. One notable
example of the latter was by Jian, et al. [25] which was mentioned
earlier in the review of Hoover. Poppinga, et al. presented a work in
2008 that was able to efficiently segment planes in range images in
O(n log(n)) time, as well as a fast polygonalization algorithm. The
more recent work by Georgiev, et al. 2011 [36] relies on the raster-like
structure of stereo cameras, structured light cameras, and nodding
lasers. The algorithm exploits the fact that planes in the world will
project to line segments in in the rows of the captured data. This
allows them to simply fit line segments row-by-row, and then cluster
nearby coplanar segments into planes. A similar algorithm, which is
tuned specifically to nodding laser scanners and thus allows for online
extraction of planes as the data streams in is presented by An et al.
[8].
18
Unfortunately, none of the aforementioned methods produce satis-
factory results on the Velodyne sensor. The stocastic Hough transform
and RANSAC methods both require a large number of points-on-
plane in order to have a high probability of detection. Due to the
sparsness and geometry of the sensor, we often need to detect planes
that comprise less than 2% of our data,
1
resulting in a vanishing
probability of detection. Fast scanline based region growing methods
are inapplicable to our data because the structure of Velodyne scans
violate the assumption that the intersection of a scanline with a plane
will produce a line segment. Instead, the intersection of a Velodyne’s
scanlines with with a plane will produce a conic section. In Chapter 2
we propose a solution which combines these approaches by first ex-
ploiting our knowledge of the unique sensor geometry to extract conic
sections from each laser independently, and then voting for candidate
planes in a Hough style accumulator.
1.2.3 SLAM
One of the core topics in robotics research for the past 30 years has
been finding solutions to the problem of simultaneous localization
and mapping (SLAM). Typically, solutions to the SLAM problem take
as input a series of noisy landmark and odometry measurements,
and produce an estimation of both the landmark positions as well
1 A 2m by 2m square plane facing our sensor from 20m away would comprise only
1.8% of an input scan.
19
as the robot pose without requiring any a priori information. The
groundwork in the field was layed by by Smith and Cheeseman in
their 1986 paper ”On the Representation and Estimation of Spatial
Uncertainty”[37] in which the authors show how to propagate the
Gaussian uncertainty of a mobile robot by linearizing about the mean.
This is a crucial step, as it allowed for those authors’ seminal 1990
work in which they developed a full extended Kalman filter [38] for
the joint estimation of both robot and landmark poses. This quickly
lead to an explosion in new research to refine many of the problems
associated with the early approach. In their 1991 paper, Leonard and
Durrant-Whyte listed the three most important issues to be addressed
as: ”data association uncertainty, environment dynamics, and the
impact of correlations on computational complexity” [39]. One of the
most influential works on the subject came from Montemerlo, et al. in
the 2002 paper ”FastSLAM: A Factored Solution to the Simultaneous
Localization and Mapping Problem” [40]. There, the authors address
one of the limiting factors of the EKF based approaches, in that a full
covariance matrix of k k elements must be maintained at a cost of
O
k
3
[39, 41]. Originally attributed to Murphy [42], one of the main
insights is that given a set of robot poses and landmark observations,
the observations can be made independent of each other by condi-
tioning on the poses. This process, known as Rao-Blackwellization,
significantly reduces the computational burden of SLAM solutions
and allows for systems which can track large numbers of landmarks
without incurring an exploding computational cost. Montemerlo’s
20
FastSLAM is implemented as a particle filter, where each particle
represents a possible robot pose and contains an independent EKF for
each landmark. This leads to a naive runtime ofO(MK), where M
is the number of particles and K is the number of landmarks. How-
ever, by representing the landmarks in a special tree data structure,
the runtime is reduced toO(MlogK). FastSLAM also addresses the
data association problem, as each particle is allowed to make it’s own
optimal choices regarding data association, mitigating the problems
of choosing only a single solution. A follow up paper introduces
FastSLAM2.0 [43], which uses both the latest sensor measurements
as well as the motion estimate to create each proposal distribution,
allowing a decrease in the number of necessary particles (and thus
computational resources).
A parallel line of research by Dellaert and Kaess developed an alter-
native to the particle filter. ”Square Root SAM” (
p
SAM ), published
in2005 [44], poses the SLAM problem as one of matrix factorization,
rather than recursive filtering. This interpretation allows the use of
very efficient Cholesky and QR factorizations on the sparse infor-
mation matrix representing the problem. An alternative (and more
intuitive) interpretation of their solution is that of a directed graph, in
which poses and landmarks are nodes, while controls and measure-
ments are edges. The later iSAM system [45] showed how to update
the QR factorization such that only those matrix entries corresponding
to graph elements that were changed need to be updated at each
timestep. This opened the door to the use of such methods in online
21
applications. A further refinement, iSAM2 [46], introduced the Bayes
tree to encode the factored probability density which eliminates the
need for periodic batch updates, improving the application to online
problems. In Chapter 4, we show how to apply these methods to the
problem of SLAM using planar features.
One of the first systems to use3D plane parameters in a localization
solution is that of Horn, et al. [47], which used extracted planes as
landmarks for2D localization. While their system required an a priori
map of the environment, many of the ideas presented there were
influential to later works. Weingarten, et al. [48, 49] presented one of
the first true 3D SLAM solutions to use planar features as landmarks,
and showed very good performance on a 150m
2
indoor environment.
In 2010, Pathak, et al. developed a unique dual correspondence
and registration algorithm called MUMC [50]. Given a set of planar
patches from two separate frames, MUMC is able to robustly compute
both the optimal data association between patches, as well as the
optimal coordinate transform between the two frames of reference.
While extremely robust and accurate, we found this solution to be
too computationally intensive for real time use. We show a stripped
down (but still very effective) version of this method in Chapter 3.
In the same year, Pathak et al. also showed a full pose graph SLAM
solution using planar patches [51], and demonstrated its effectiveness
on a mobile robot traversing through a simulated disaster scenario.
While their system does run online, their robot is moving at a much
slower pace than is being considered in this work. For example, their
22
robot takes 2.68 seconds to produce a laser scan while ours takes only
100 milliseconds. Aside from capture time, they require an average 8
seconds of processing time per frame, which is far too slow for our
purposes.
A final notable subset of point cloud SLAM algorithms comes
from the landmark-free global alignment methods. By treating the
entire frame as a single feature, these methods generally operate
by recursively matching and then merging each frame to a global
map. While these methods require significant processing power, they
can generally offer very detailed results. The most notable modern
example is that of the Kinect Fusion algorithm of Newcombe, et al
[52]. A similar algorithm which is tuned specifically to Velodyne type
sensors is that of Moosman, et al [53]. Unfortunately, these methods
require significant brute force processing power to achieve real time
performance, and it has yet to be proven whether these approaches
would work on our challenging datasets.
23
2
L I D A R P L A N E D E T E C T I O N
2.1 overview
Although significant progress has been made in recent years towards
6D simultaneous localization and mapping (SLAM) for point cloud
data [54] [55] [45], frame-to-frame registration, a critical step in many
SLAM algorithms, remains both a challenge and a significant bottle-
neck for real-time implementations. The typical approach of using
some variant of iterative closest point (ICP) to solve for the transfor-
mation between successive frames lacks speed as the number of points
increases and lacks accuracy as the density decreases. We propose a3D
plane extraction algorithm, capable of working on large sparse point
clouds, that when combined with a plane-to-plane based registration
method, provides accurate real-time frame-to-frame registration. Such
registration becomes particularly important when odometry is either
inaccurate or unavailable, such as when the sensor is mounted on a
bipedal robot or carried by a human. The main contribution of this
chapter is to derive a novel Hough-based voting scheme for finding
planes in 3D point clouds. Different from existing Hough methods,
24
our algorithm uses knowledge of the geometry of the sensor to guide
the voting in an efficient manner.
2.2 preliminaries
Our plane finding algorithm exploits knowledge about the sensor to
first, on a scan-line basis, find and group all points that arise from
planar surfaces. Each of these groups of points then casts weighted
votes in a Hough accumulator for all possible planes that could explain
them. The planes that pass thresholding the accumulator are refined
in a final filtering step. We utilize the following notations:
n,~ n,
ˆ
~ n, a scalar, a vector, a unit vector
M, M
+
a matrix, its Moore-Penrose pseudo-inverse
p, p
n
, p
s
a plane, its Hessian normal form, its spherical form
Here, p
n
4
= [
ˆ
~ n r], where
ˆ
~ n is the plane normal, and r is distance
from the plane to the origin along
ˆ
~ n. p
s
4
= [
q f r
] where q is the
azimuthal angle in the x-y plane, and f is the inclination angle.
2.3 finding points on planes
The first step in the plane extraction algorithm is to find all points
in the point cloud which seem likely to have originated from planar
surfaces. To do this, we look at each laser scan independently and
exploit some simple geometry: the intersection of any single rotating
25
(a) Perspective view of laser cones (red and blue) inter-
secting an actual, physical vertical plane (green).
(b) Frontal view (through
plane) of laser scan-lines
(black) intersecting the
physical plane (green).
(c) Side view of laser intersec-
tion (black) through physi-
cal plane (green) with error
bars (red) for depth mea-
surement. Purple shaded
area shows possible plane
parameterizations for each
laser that could generate
the physical plane given the
sensor noise.
Figure 7: The physical design of the sensor causes it to create virtual
laser cones as it sweeps each of its lasers through q (a).
When a plane intersects one of these cones, it generates
a conic section of varying curvature depending upon the
inclination angle of the sensor and the relative orientation of
the plane (b). Since the variance of depth measurements, s,
is known, sections of higher curvature lead to more certainty
in detected orientation of the plane (c).
26
laser, which forms a cone in space, with any possible plane in the
world, forms a conic section ( Figures 7a and 7b). This conic can be
seen by looking at the1D signal corresponding to depth measurements
from an individual laser.
An ideal solution would be to find and fit all possible conics to
this signal - conic sections with higher curvature correspond to more
confident evidence of planes with a particular orientation, and those
with lower or no curvature have much greater uncertainty about
the possible planes that generated them ( Figure 7c). Unfortunately,
finding and fitting general conics to unpartitioned data is neither
easy nor fast when dealing with thousands of points [56]. As a fast
approximation, we look for regions in the signal which correspond to
smoothly varying segments that could possibly belong to a conic.
2.3.1 Finding Breaks
Approximate conic sections in each laser’s 1D signal, d
f
(q), are found
by detecting points that break the signal into smooth and non-smooth
groups of measurements. This is done using simple filtering opera-
tions: a Gaussian kernel smoothes the input signal, and a gradient
filter takes several spatial derivatives: d
f
(q)
0
, d
f
(q)
00
, and d
f
(q)
000
. The
Gaussian is parameterized by:
G(x)=
1
s
p
2p
e
x
2
.
2s
2
(1)
27
where we choose s = 2 cm to match the variance of our sensor. For
computing the gradient we use a simple gradient filter:r= [1 1].
Breakpoints are placed at zero crossings of d
f
(q)
0
and zero crossings
of d
f
(q)
000
that have sufficiently high d
f
(q)
00
. Many corners caused
by distinct planes feature a nearly smooth transition of depth values,
but will be caught as a local extrema in d
f
(q)
0
. Although noisy, zero
crossings in d
f
(q)
000
can be used to find points where the difference
in depth changes abruptly, corresponding to different surfaces in the
world. We utilize a threshold parameter, t
d
00, to enforce that only zero
crossings of the the third derivative that had a second derivative value
greater than t
d
00 are considered break points. This helps compensate for
the sensor noise amplified by taking d
f
(q)
000
. In practice this parameter
does not need to be adjusted and we have used a value of t
d
00 = 0.01
for all data. As a general rule, we aim to over-segment the signal since
under-segmentation leads to overly confident incorrect initial plane
estimates.
2.3.2 Creating Groups
Once breakpoints are extracted from each laser scan, we place points
into groups such that the set of planes to which each group could be-
long can be calculated. A collection of points between two breakpoints
is considered a valid group if it contains at least some minimum num-
ber of points, #
p
, which we set to 15 (comprising an angle of 2.4
). This
28
constraint enforces a minimum size of planar segments and prevents
overly noisy (outlier) points from being considered. This constraint
also gives stability to the group centroid,~ c, which is used during the
voting procedure.
For each group, we find its principal components through the eigen-
vector decomposition of its points’ covariance matrix. From this
decomposition, we define the following:
ˆ
~ v
3
the principal eigenvector
ˆ
~ v
1
the eigenvector corresponding to the
smallest eigenvalue
c the curvature of the group
l
i
the eigenvalues of the decomposition
ˆ
~ v
3
corresponds to the dominant direction of the group of points,
which by convention we enforce to point in a clockwise direction
relative to the sensor origin.
ˆ
~ v
1
corresponds to the direction of least
variation amongst the points, which we expect to correspond to the
plane normal of the actual surface generating these points. However,
when the curvature c
4
=
l
1
+l
2
l
1
+l
2
+l
3
is low, our uncertainty over
ˆ
~ v
1
becomes high.
Groups that are close to each other within a small distance threshold
t
g
and pointing similar directions (
ˆ
~ v
3i
ˆ
~ v
3 j
< t
d
) are merged and their
decomposition is re-evaluated. In the next step, we use these values
to vote for plane distributions for each group.
29
Figure 8: A set of planes through the blue line (drawn here coming
out of the page) as derived from Equation (3).
2.3.3 Accumulator
To robustly detect planes, we use an accumulator framework similar
to that utilized in the randomized Hough transform. During the
accumulation step, we represent planes p with their spherical repre-
sentation p
s
. This representation is more compact than the p
n
form,
although this comes at the price of singularities when f is equal to
p
2
. These singularities can be handled by ensuring that all possible
plane parameterizations, which span the unit sphere, are equally ac-
counted for in the accumulator. This is accomplished by using the ball
accumulator of [26], which varies the number of q bins depending on
the angle f such that the bins are uniformly sized. The accumulator
is parameterized by the maximum number of bins per dimension, #
q
,
#
f
, and #
r
. For each quantized step of f, the number of q bins can be
calculated as:
Nq
f
i
=
#
q
(cosf
i
cos(f
i
+df))
cos(pdf)
(2)
Note that Nq
0
= 0 and Nq
180
= 0 must be enforced.
30
2.4 voting
For each group in a scan, we need to find a parameterization for the set
of all planes that contain that group (see Figure8 for an illustration).
This is done by finding all planes that could cause the vector ~ v
3
centered at the group centroid~ c. To ensure that we do not double
vote for any particular plane given our accumulator quantization, we
need to solve for the q and f parameters of each plane as we step
through the most dense dimension of the accumulator, r. To do so,
we arrange the following under-constrained linear equation using the
p
n
parameterization:
2
6
6
4
c
x
c
y
c
z
v
3x
v
3y
v
3z
3
7
7
5
| {z }
A
2
6
6
6
6
6
6
4
n
x
n
y
n
z
3
7
7
7
7
7
7
5
| {z }
~ x
=
2
6
6
4
r
0
3
7
7
5
| {z }
~
b
(3)
where~ c is the centroid of the group.
Any solution to this equation will be of the form
~ n=
n
a
~
j+
~
k :
~
j2 Null(A), A
~
k =
~
b
o
(4)
and so we solve for~ n by first finding any solution to Equation (3) by
using the Moore-Penrose pseudo-inverse of A:
~
k = A
+
+
I A
+
A
~ w (5)
31
where ~ w is any vector. We need only unit length normals, and so we
solve for a with the following quadratic equation:
1=
~
j+a
~
k
2
(6)
which gives the following two solutions:
ˆ
~ n(r)
=
~
j
~
k
s
(
~
j
~
k)
2
~
k
2
~
j
2
1
~
k
2
(7)
Once we have two solutions, we find corresponding q and f by
converting
ˆ
~ n(r)
into spherical coordinates.
We then vote in the accumulator for these two planes, weighting
the votes according to group’s curvature c, its smallest eigenvector
ˆ
~ v
1
,
and the calculated
ˆ
~ n(r)
. When the curvature is high, we trust that
the group’s
ˆ
~ v
1
accurately describes the normal of the physical plane
generating the group (Figure 7); thus we form a weighting function
such that in cases where curvature is high, we penalize normals
that are not similar to
ˆ
~ v
1
, whereas in cases where curvature is low,
we give all possible normals equal weight ( Figure 10). We model
this relationship using a linear combination of a line and a sigmoid
(here we use a Kumaraswamy CDF [57] because of its convenient
parameterization), where the mixing is defined by the curvature of
the segment:
32
s= c(1(1(
ˆ
~ v
1
ˆ
~ n(r)
)
2
)
3
)+
(1 c)(c(
ˆ
~ v
1
ˆ
~ n(r)
)+ 0.85 c)
(8)
2.5 peripheral planes
When an initial plane detection arises from points belonging to a
planar patch that is far away from the sensor and not tangent to any
laser, small perturbations in depth give rise to large changes in the
estimated normal. This is because of our plane parameterization and
the choice to use infinite planes. Since the plane normal vector always
connects the origin with the closest point on the infinite plane, small
angular changes in the plane become large distances when occurring
at the periphery.
We compensate for this error by making small rotational pertur-
bations to a group’s ~ v
3
vector about the global Z axis up to some
maximum amount,h, dependent on the sensor noise model. Rotat-
ing the vector about the Z axis captures most of the variation caused
by noisy depth readings (since the variance of LiDAR measurements is
usually dominated by the depth measurement, we consider the worst
case rotational difference to~ v
3
two adjacent point measurements can
cause when perturbed in depth).
33
After each of these perturbations is made, the entire voting process
is repeated with the new~ v
3R
. This additional voting, which creates
entries in the accumulator around the original vector’s entries, makes
it more likely groups on the same distant peripheral plane share votes.
Without this step small perturbations to groups arising from the same
planar patch can cause them to vote for entirely different bins in the
accumulator. See Figure 11 for an illustration of the perturbation.
Much like during voting, we would like to be able to step through
our r dimension as we rotate the original~ v
3
. Given this desired r
i
,
we must solve for the amount of rotation, h
i
, that would generate a
vector~ v
3
R with a normal of length r
i
. To solve for this, we can take
advantage of the definition of the dot product:
cos(w)= R
z
(h
i
)
ˆ
~ v
3
ˆ
~ c (9)
where R
z
(h
i
) is a rotation about the global Z axis by h
i
and we have
observed that~ v
3R
= R
z
(h
i
)~ v
3
. cos(w) is known since we know all
lengths of the right triangle created by~ c,~ v
3R
, and their normal vector
(which has length r
i
).
Simplifying this equation yields:
sin(h
i
)a+ cos(h
i
)b= g (10)
wherea= ˆ c
y
ˆ v
3
x ˆ c
x
ˆ v
3
y, b= ˆ c
x
ˆ v
3
x+ ˆ c
y
ˆ v
3
y, andg= ˆ c
z
ˆ v
3
z. The ˆ c
i
and
ˆ v
3i
refer to specific components of
ˆ
~ c and
ˆ
~ v
3
. This gives four possible
solutions for h
i
; if any one of these solutions results in a rotation
34
within the bounds ofh, we perform full voting on the resultant~ v
3
R.
Once we have exceededh by both increasing and decreasing r
i
, we
move on to another group.
2.6 filtering and refitting
Once all groups have voted for their possible planes, we threshold
the accumulator by keeping all bins which pass some value, t
a
, and
discarding the others. This reduces the potential number of planes
(depending upon environment - usually on the order of 100,000) to a
significantly more manageable amount (on the order of 100 or 1,000).
These remaining planes fall into two categories: 1) clusters of good
planes, and 2) false positives. The vast majority of these planes will
be clusters of good planes, with the size of the clusters depending
upon t
a
, which is kept as small as possible to permit planes with weak
evidence to form clusters. In an ideal case of no noise, these clusters
would collapse onto the true planes and the accumulator threshold
would be sufficient to filter out false positives. Because real data is
noisy, a small amount of empirically derived filtering is necessary to
reduce the number of planes down to the correct amount (usually on
the order of 10).
We filter these remaining planes to simultaneously combine clusters
and remove erroneous detections by performing the following in
order:
35
2.6.1 Linearity Filtering
All candidate planes that were voted for by less than two rows are
removed. This prevents any single laser from defining a candidate
plane and is designed to remove purely erroneous plane detections
that survive the accumulator threshold (e.g. consider the same low
curvature scan-line as it goes along a corner - there will be more
evidence for a false horizontal plane than for either of the vertical
planes).
2.6.2 Splitting
Individual planes are split apart by finding clusters of groups within
the set of all groups that voted for the plane. A cluster is defined by
the set of all groups whose
ˆ
~ v
3
vectors’ dot products is greater than
zero and whose points are at most t
split
apart. The former constraint
prevents groups from forming on opposite sides of the sensor (keeping
in mind our clockwise convention of Section 2.3.2), while the latter
creates distinct planar patches for coplanar surfaces. Once clusters
are formed, planes are re-fit to their corresponding points. Clusters
that do not meet a minimum number of groups, #
split
, are discarded.
This splitting procedure removes planes with insufficient support and
causes the remaining planes to have spatially localized groups.
36
2.6.3 Merging
After splitting apart groups, we perform a merging of similar planes.
For each group, we go through all of the planes that it voted for and
try to find the maximal mode for the distribution of normals. We
then remove any planes that are too far from this mode (normal dot
product less than t
merge
). We next merge planes that are nearby by
finding those that voted for the same set of groups. To do so, we
formulate a graph with nodes representing each candidate plane. Two
nodes are connected with an edge if any given group voted for both
planes. Once the graph is formed, all connected components within
it are computed, and all planes in each component are replaced by a
single plane fit to the union of all points covered by the groups in that
component.
2.6.4 Growing
As a final refinement step, we grow regions over the points in each
plane outwards until either the distance between consecutive points is
too great, or the distance from the point to the plane is too high. Once
this new set of points is found, a final least squares fit is performed
according to the formulation of [51] to find the final plane parameters,
as well as decoupled normal and offset (co)variances of the plane
37
parameters. Region growing is merely a refinement step that allows
us to consider points which were excluded during group formation.
2.7 results
Without ground truthed plane data, quantifying the effectiveness of
our algorithm versus others is difficult. A qualitative example is
shown in Figure 12, in which our plane detection algorithm as well
as a popular region growing algorithm ([36]) and randomized Hough
transform implementation ([26]) are run on a frame of one of our
datasets. Both competitors fail to robustly and accurately detect the
major planes in the scene due to the sparseness and noise in the data.
As the main purpose of our plane extraction technique is to aid in
localization and scan alignment, we will save a quantitative analysis
for the next chapter where we introduce frame to frame registration.
38
θ
φ
(a) 3D view of a r layer of the ball accumulator.
f
q
(b) A single flattened r layer of the ball accumulator.
The full accumulator is constructed by stacking
many such layers.
Figure 9: The ”ball” shaped accumulator from [26]. This accumulator
design varies the number ofq bins depending on the angle of
f such that the bins occupy approximately the same surface
area when projected onto a sphere.
39
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
vote strength
ˆ
~ v
1
·
ˆ
~ n(ρ)
±
c = 0
c = 0.1
c = 0.2
c = 0.3
c = 0.4
c = 0.5
c = 0.6
c = 0.7
c = 0.8
c = 0.9
c = 1
Figure 10: Voting strength as a function of group curvature and normal
similarity. The x axis denotes normal similarity, while the
y axis denotes vote strength. The various colored curves
represent different levels of curvature.
O
c
v
v
3
3
R
c
ρ
ρ
i
ω
Figure 11: An illustration of the perturbation of some vector ~ v
3
to
account for noise. O is the sensor origin and C is the
centroid of the group. ~ v
3
is the dominant eigenvector of
the group, which generates a normal of length r with the
origin. Rotating~ v
3
about the global Z axis creates a new
vector~ v
3R
. We seek this rotation amount given the new
desired distance for the normal connecting the origin and
~ v
3R
, denoted by r
i
.
40
(a) Our algorithm
(b) Region growing (c) Randomized Hough transform
Figure 12: An example frame from our dataset which shows the weak-
nesses of two popular plane extraction algorithms com-
pared to our approach. Note that while the Randomized
Hough Transform c does a good job of detecting the side
walls and floors, it misses the ends of the hallway which
leads to an under-constrained registration problem. While
region growing b finds more planes than RHT, it struggles
with the disparate sparsity of the data, ultimately leading to
poor plane estimation and under-constrained registration.
41
3
F R A M E T O F R A M E R E G I S T R AT I O N
3.1 overview
As detailed in Section1.2.2, the problem of frame to frame registration
of point cloud data has been very well studied. However, no known
algorithms are able to account for our unique sensor geometry, as well
as our speed and robustness requirements. In this section, we will
present an alignment algorithm which is able to use detected planar
features when available, but can smoothly fall back onto planelets
and points when large feature detection fails to fully constrain the
alignment problem. This allows our algorithm to be very fast when
the surrounding environment contains a large number of non-coplanar
planes, and to only use the minimum number of non planar features
necessary for full constraint. In this chapter, we will first introduce
the pure plane correspondence and alignment problem in Section 3.2,
and then will show how the system is augmented to handle undercon-
strained planar problems in Section 3.3.
42
3.2 planar alignment
3.2.1 Planar Correspondence
Once planar features have been extracted from two successive frames,
correspondences between those features must be determined before a
rigid body transform can be calculated. One very successful method of
correspondence determination is the MUMC algorithm of Pathak, et al
[50]. Pathak’s algorithm determines correspondence in a manner simi-
lar to RANSAC, by iterating through pairs (in the case of rotation) and
triplets (in the case of translation) of possible correspondences, and
measuring the total error that this putative transform would introduce
to the rest of the correspondences. While very good at avoiding spuri-
ous correspondences, the repeated calculation of transforms necessary
for this method makes it much too slow for our purposes
1
.
Our correspondence algorithm is a fast single-pass approach. After
observing the performance of our plane detection algorithm, we no-
ticed that a successful correspondence algorithm would need to take
two specific types of correspondences into account:
1 [51] claims an average registration time of 5.42 seconds per frame. This mirrors our
experience with our implementation of this method, which takes one the order of 1
second per frame on our data.
43
case 1: Two planes whose convex hulls look nearly identical, as
indicated by an intersection-over-union score that is close to 1.
This is the case when the entire planar feature is well segmented
in both frames.
case 2: Two planes whose convex hulls overlap, but do not have a
high intersection-over-union. This is the case when either one of
both planes is partially obscured, or when our region growing
step fails conservatively.
We can assess both cases by setting thresholds on the intersection-
over-union score, as well as the angles between the planes’ normal
vectors, and the differences between their offsets. The full algorithm
can be seen in Algorithm 3.1. The algorithm takes as input the sets of
planes detected in two consecutive frames, and returns a set of pairs
of corresponding planes. The following parameters are also required:
e ed
1
, e ed
2
The minimum required dot product between plane normals
for cases 1 and 2, respectively. In all of our tests, these are set to
cos(25
) and cos(15
).
e ex
1
, e ex
2
The maximum absolute difference between plane offsets, for
cases 1 and 2 respectively. In our tests, these values are set to
0.5m and 0.25m.
e ek The minimum intersection over union area for case 1. We use a
value of 80%.
44
e eg The minimum intersection over minimum area for case 2. We use
a value of 25%.
The algorithm operates by first projecting the convex hulls of the
two planes of a putative match onto the average plane. Next, if the
normals and offsets of those planes are close enough (by comparing to
ˆ
d
1
and
ˆ
x
1
), and the intersection over union of those projected planes is
greater than ˆ k, then a first score is computed. If this score is better than
the best case1 score, then it is saved. The exact same steps are repeated
again for case2 except with
ˆ
d
2
and
ˆ
x
2
substituted, and the intersection
over min used instead of the intersection over union. The intuition
behind this scheme is that for case 1, high intersection over union
scores are unlikely to occur by chance. Thus, the plane parameter
thresholds can be more lax than their case 2 counterparts. Case 2
requires more stringent thresholds for the plane normals and offsets,
as it only requires the area of the intersection divided by the area of the
smaller of the two planes to be above ˆ g. Such intersections are much
more common. Two external functions are required: averagePlane,
which returns a plane whose normal and offset is the average of it’s
inputs, and projectOnto which returns the2D projection of a set of
3D points onto a plane.
45
Algorithm 3.1 Finding Plane Correspondences
1: procedure FindCorrespondences(P
a
,P
b
)
2: P
ab
?
3: for p
a
inP
a
do
4: ˜ s
1
¥; ˜ p
1
?
5: ˜ s
2
¥; ˜ p
2
?
6: for p
b
inP
b
do
7: d p
a
.normal p
b
.normal
8: x jp
a
.offset p
b
.offsetj
9: ¯ p averagePlane(p
a
, p
b
)
10: h
a
projectOnto(p
a
.hull, ¯ p)
11: h
b
projectOnto(p
b
.hull, ¯ p)
12: if x <
ˆ
x
1
and d >=
ˆ
d
1
then . Case 1
13:
\
[
area(h
a
\ h
b
) /area(h
a
[ h
b
)
14: s
1
x(1
\
[
)
15: if
\
[
> ˆ k and s
1
< ˜ s
1
then
16: ˜ s
1
s
1
17: ˜ p
1
p
2
18: end if
19: end if
20: if x <
ˆ
x
2
and d >
ˆ
d
2
then . Case 2
21:
\
min
area(h
a
\ h
b
) /min(area(h
a
),area(h
b
))
22: s
2
x
1
\
min
23: if
\
min
> ˆ g and s
2
< ˜ s
2
then
24: ˜ s
2
s
2
25: ˜ p
2
p
2
26: end if
27: end if
28: if ˜ s
1
<¥ then
29: P
ab
P
ab
[hp
a
, ˜ p
1
i
30: else if ˜ s
2
<¥ then
31: P
ab
P
ab
[hp
a
, ˜ p
2
i
32: end if
33: end for
34: end for
35: returnP
ab
36: end procedure
46
3.2.2 Planar Transform Estimation
Once a set of correspondences between two sets a planes has been
determined, the rigid body transformation which best transforms the
planes in the first frame to those in the second. We use the method of
[50], which solves for this transformation by decoupling the translation
and rotation components and solving for them independently.
Translation Estimation
The decoupled translation estimation between frames a and b is solved
by the following least squares estimation: Mt = d. Here, M is a
stacked matrix of the N plane normals from frame a, scaled by the
inverse covariance:
M=
2
6
6
6
6
6
6
6
6
6
6
4
n
T
a
1
/(s
a
1
+s
b
1
)
n
T
a
2
/(s
a
2
+s
b
2
)
.
.
.
n
T
a
N
/(s
a
N
+s
b
N
)
3
7
7
7
7
7
7
7
7
7
7
5
(11)
The vector d is constructed by stacking the differences in plane
offsets, and scaling by the same inverse covariance:
d=
2
6
6
6
6
6
6
6
6
6
6
4
(d
a
1
d
b
1
) /(s
a
1
+s
b
1
)
(d
a
2
d
b
2
) /(s
a
2
+s
b
2
)
.
.
.
(d
a
N
d
b
N
) /(s
a
N
+s
b
N
)
3
7
7
7
7
7
7
7
7
7
7
5
(12)
47
Here, s
a
i
and s
b
i
are the estimated variances of the first and second
frame offsets as found in Section 2.6.4. The translation can then
estimated by use the Moore-Penrose pseudo-inverse of M:
˜
t= M
+
d (13)
Note that all translation estimation is performed using only the
inifinite representations of the planes, rather than considering the
overlap of their hulls. This is a common theme in our work, as we
found that the hull estimation was often too noisy to provide good
estimates of translation.
Rotation Estimation
Determining the optimal rotation between a set of corresponding
vector observations has been well studied in aerospace engineering
for the application of star trackers, and is known as Wahba’s problem
[58]. We use the quaternion based solution derived by Davenport [59],
known as the ”q method”. Here, a matrix B is constructed as the sum
of the product of all normal vector pairs:
B=
N
å
i=0
n
a
i
n
T
b
i
s
a
i
+s
b
i
(14)
wheres
a
i
ands
b
i
are the decoupled covariances of the plane normals
found in Section 2.6.4. A matrix K is then constructed as follows:
48
K=
2
6
6
4
B+ B
T
I tr(B) Z
Z
T
tr(B)
3
7
7
5
(15)
with
Z=
2
6
6
6
6
6
6
4
B
23
B
32
B
31
B
13
B
12
B
21
3
7
7
7
7
7
7
5
=
N
å
i=0
n
a
i
n
b
i
s
a
i
+s
b
i
(16)
The best fit rotation can then be found by computing the eigenvec-
tors of K. The eigenvector corresponding to the maximum eigenvalue
is a quaternion representing the best possible rotation. An excellent
treatment of the subject of rotation estimation, with an overview of
many alternative methods may be found in [60].
3.2.3 Results
To evaluate the effectiveness of the above plane-only alignment scheme,
we collected a dataset from an indoor office environment. This en-
vironment was selected such that at least three non-coplanar planes
would always be visible to the sensor, and thus the alignment problem
would always be fully constrained. To collect the data, a Velodyne
HDL-32E was mounted to a backpack along with a battery powered
computer to collect the data, as seen in Figure 13. The sensor was
carried in this manner through an 11.1m
2
square loop at a walking
pace. Though the sensor was rigidly attached to the backpack, its high
49
mounting point caused it to sway during movement, exacerbating
the motion induced by walking. For each frame of data, planes were
extracted using the method of Chapter2 and correspondences were es-
timated to the planes of the previous frame according to Algorithm3.1.
Rotation and translation estimates were then calculated as outlined in
Section 3.2.2. Each frames transform estimate was then applied to the
previous estimate to create a recursive estimate of the entire path. As
a comparison, alignment was also attempted by a standard ICP algo-
rithm. The results can be seen in Figure 14. To test the effectiveness
of our plane extraction method, we also computed paths by feeding
planes extracted from a randomized Hough transform ([61]) and a
region growing implementation ([36]) into our correspondence and
transformation estimation methods. Unfortunately, these alternative
plane extraction algorithms fail to consistently find enough planes to
fully constrain the alignment problem, leading to catastrophic failures.
Because of this, their paths are not shown in the figure. Quantitative
data can be found in Table1, which shows the average computation
time per frame, the total distance traveled, and the final error. Our
method averages a computation time of only 46 milliseconds per
frame, and reported a total travel distance of 44.228 meters rather
than the actual 44.4 meters. The final error was 0.625 meters, which
significantly outperformed the other methods.
50
Figure 13: Human Portable Velodyne Capture Rig
51
Method Frame Time (ms) Distance Traveled (m) Final Error (m)
Ideal 100 44.4 0
Our Method 46 44.228 0.625
ICP 463 / 192 24.072 3.398
Randomized Hough Transform 535 / 86 85.631 5.461
Region Growing 157 86.319 5.025
Table 1: Evaluation metrics from the discussed data set. For each
method, we show the average frame time, the total distance
traveled, and the final error between the start point and end
point. For reference, the first row shows the ideal values for
each metric. Note that both ICP and the Randomized Hough
Transform rows have two frame times listed. The first ICP
time was recorded using a single thread, while the second
was recorded while using all 12 cores of the test machine.
The first Randomized Hough Transform time was recorded
with the stock algorithm from 3DTK [61], while the second
version uses a custom vectorized random number generator.
All timing evaluation was done on an Intel Core i7 X990.
−10 −5 0
0
2
4
6
8
10
12
14
x (meters)
y (meters)
Our Approach
ICP
Ground Truth
−10 −5 0
−1
0
1
x (meters)
z (meters)
Figure 14: Paths generated by frame-to-frame registration using our
approach (blue) and ICP (red) as the sensor was walked on
a backpack rig around an indoor11.1 meter per side square
loop (green).
52
3.3 planar icp (picp)
The above scan alignment scheme proves very successful in structured
environments, however it is unable to cope with situations where
the number of plane correspondences leaves an under-constrained
correspondence problem. Such situations are common in the real
world due either to the structure of the scene, or to failures in the
plane detection or correspondence algorithms. Regardless of the cause,
in order to perform well in large outdoor environments our system
must be augmented with the capability to fall back on non planar
features. In this section, we introduce our algorithm which we call
Planar ICP (PICP) which is able to selectively add a minimal number
of points into our alignment problem such that full constraint can be
reached. This selectivity allows the system to maintain the maximum
possible speed given the planarity of the scene.
3.3.1 Constraint Determination
In order to fully constrain a 6D alignment problem using only infi-
nite planes requires at least three non-coplanar planes. To intuitively
understand the reasoning behind this, take the example of a known
correspondence between two infinite ground planes. Given this cor-
respondence, any SE3 transform which either moved the coordinate
frame in the X/Y plane and/or rotated it around the Z vector (i.e.
53
yaw) would satisfy the alignment. Thus, a single plane correspondence
constrains a single dimension of translation (Z), and two dimensions
of rotation (roll, and pitch). If a known correspondence were then
added for a vertical wall then the rotation would be fully constrained,
though one dimension of translation would be unknown, allowing the
alignment coordinate frame to slide forwards and backwards along
the L shape made by the wall and the floor. Adding a third orthogonal
wall would then fully constraint the system. In theory, any three non-
coplanar planes (e.g. three slightly tilted ground planes) can provide
full constraint. However, in practice we consider each observation
to be noisy and would thus prefer three mutually orthogonal planes
to three non coplanar but similar planes. In this section, we will
introduce our method for quantifying how well constrained a planar
alignment problem is, and in the next section we will address how to
choose the best set of additional features from the data to provide full
constraint.
The Constraint Matrix
As planes are discovered in the scene and correspondences are found,
we can keep track of how well constrained the alignment problem will
be through a construct we call a Constraint Matrix (C).C is a 3x3 ma-
trix, and is best visualized as a 3D ellipsoid representing the amount
of constraint in each dimension. The direction of the major axis of this
ellipsoid shows the best constrained direction, and the length of that
axis shows the amount of constraint. For example, a perfect sphere
54
would indicate a system which is equally well constrained along all
axes, while a flat ”dish” would indicate constraint along only two
directions.
Given a set of N plane correspondences with normal vectors n
a
i
and
n
b
i
, and weights g
i
, we can construct a constraint matrix as follows:
C =
1
4
N
å
i=0
g
i
n
a
i
n
T
a
i
(17)
Once the constraint matrix is built, the bounding ellipsoid can
be parameterized by a rotation matrixR
c
and a set of three radii
r
c
0
, r
c
1
, r
c
2
. This parameterization is extracted fromC through singular
value decomposition:
C
SVD
= U
2
6
6
6
6
6
6
4
r
2
c
0
0 0
0 r
2
c
1
0
0 0 r
2
c
2
3
7
7
7
7
7
7
5
R
c
(18)
This formulation is similar to Principal Component Analysis ex-
cept rather than using a true normalized covariance matrix, we use a
weighted scatter matrix. Once the constraint ellipsoid has been calcu-
lated, the system can be queried to determine how well constrained the
system is in the direction of a new candidate data point. If the system
is already fully constrained in that direction, then the data point could
be skipped to save on computation time. This constraint checking is
performed as follows. Given a query normal n of unit length, it is first
transformed into the axis-aligned frame of the ellipsoid:
55
˜ n=R
T
n (19)
and is then converted into polar coordinates
hq
n
,f
n
i=
*
arctan( ˜ n
1
, ˜ n
0
) , arccos
0
@
˜ n
2
q
˜ n
2
0
+ ˜ n
2
1
+ ˜ n
2
2
1
A
+
(20)
The spherical coordinate ellipsoid equation can then be used to find
the distance from the origin to the edge of the ellipsoid in the direction
of the query normal:
d
q
=
cos(q)
2
sin(f)
2
r
2
c
0
+
sin(q)
2
sin(f)
2
r
2
c
1
+
cos(f)
2
r
2
c
2
1
2
(21)
Here, d represents the effective number of normals that have been
inserted into the constraint matrix in the query direction. If a fully
constrained system is thought of as a sphere of radius ˆ t, then a score
between 0 and 1 can be assigned to any query vector to represent the
utility of adding that vector to the system:
s
q
= 1 min
d
q
ˆ t
(22)
56
3.3.2 Non Planar Constraints
After all of the corresponding planes have been added into the system,
we must decide what non-planar features can be used from the data
to fully constrain it. For this, we draw inspiration from the General-
ized Iterative Closest Point (GICP) algorithm of Segal, et al [17]. As
a descendent of the original ICP algorithm, GICP follows the same
basic formula of determining point correspondence, followed by a
rigid body transform calculation, and repeating. What makes their
algorithm unique is that instead of using a point-point distance, or
even a point-plane distance, a smoothly varying hybrid which takes
into account the shape of the covariance matrix of the neighborhood
around each point is used. In locally flat areas, this metric approxi-
mates a plane-plane distance, while in areas with high curvature it
approaches a point-point distance. In practice, we found that this
scheme works very well on our data and is able to provide very high
quality scan matching. Unfortunately, the computation cost is ex-
tremely high as the point neighborhood covariance calculation, as
well as the alignment optimization are expensive. To reduce this cost,
we must selectively choose only the best possible points on which to
perform these calculations.
Algorithm 3.3 outlines our method for choosing this reduced set of
constraints. This algorithm takes as parameters a constraint matrixC
which has been partially filled by the planar correspondences, a point
57
cloud from which to choose pointsQ, and a constraint threshold ˆ t.
The algorithm first computes a very fast normal vector estimation for
each point inQ using the Fast Approximate Least Squares (FALS)
method of Badino, et al.[62]. Next, for each point inQ a utility score
s is calculated for these normals given the input constraint matrixC
and threshold ˆ t using Equations (19) to (22). This provides an initial
estimate of the utility of each point. The points are then iterated
though in order of this initial utility estimate. For each point, a new
utility score s is calculated and a Bernoulli distribution is sampled
with p= s to determine if the point is to be chosen for inclusion in the
returned set
˜
Q. If so, the point is inserted into
˜
Q and a neighborhood
covariance matrix is computed for it and attached as metadata. The
constraint matrixCis also updated with the point, and if the resulting
smallest radius is greater than the constraint threshold ˆ t then the set
˜
Q is finalized and returned.
3.3.3 The PICP Algorithm
The full PICP algorithm is outlined as Algorithm 3.2, and is explained
in detail as follows. PICP takes as input a set of corresponding planes
from frames a and b (P
ab
), as well as the raw point clouds from frames
a and b (Q
a
andQ
b
). Using the plane correspondences, an initial
constraint matrix is constructed according to Equation (17) in Lines 2
to6. The weight (g) used here is the average number of points covered
58
by the two planes’ convex hulls. Once all plane correspondences
have been integrated into the constraint matrix, the length of the
smallest axis is calculated and compared to the constraint threshold
˜ t on Line 8. If the length of this smallest axis is greater than the
threshold, then the minimum constraint sphere is fully contained
within the calculated constraint ellipsoid, indicating that the system is
already fully constrained by the planes. In this case, a transformation
is computed using only the plane correspondences as per Section3.2.2,
and the algorithm is completed. If there are still constraints left to fill,
then Algorithm 3.3 is used to select a small set of point features to
provide those constraints.
Once this subset of points (
˜
Q
a
) is chosen, PICP proceeds much
like a standard ICP algorithm by iteratively switching between a cor-
respondence finding phase, and a transform solving phase. Note
that plane correspondences are fixed during this stage, encouraging
the solution to only slide along the previously unconstrained dimen-
sions. Correspondence finding is done by simply finding the nearest
neighbor inQ
b
to each point in
˜
Q
a
using a fast approximate nearest
neighbors implementation (FLANN [63]). Note that the points in
˜
Q
a
are transformed by the latest T on each iteration.
After computing the point correspondences, a rigid body trans-
formation is solved as a nonlinear optimization problem using the
Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method [64],
as implemented in the Google Ceres library [65]. We set up the
optimization problem to minimize the following error function:
59
min
x
1
2
r
a
k f
a
(x)k
2
+r
b
k f
b
(x)k
2
(23)
which describes the robust total error over two residual blocks:
point alignments (a) and plane alignments (b), with loss functions r
a
and r
b
. This error is evaluated given a candidate transformation (x),
which is represented as a 6D se3 vector:
x=
0
B
B
@
t
w
1
C
C
A
(24)
where t is a 3D translation vector, andw is a 3D angle-axis rotation
vector.
The point alignment block a is as described in the original GICP
paper [17], and evaluates the point-wise error as:
f
a
(x)=
N
å
i=0
exp(x) q
a
i
q
b
i
T
M
i
exp(x) q
a
i
q
b
i
(25)
where q
a
i
and q
b
i
are the i’th corresponding points from frames
a and b, and exp(x) is the SE3 matrix representation of the se3 rigid
body transformation vector x. M is the inverse of the frame aligned
sum of the covariance matrices of the neighborhoods around the two
points, which turns Equation (25) into a Mahalanobis-like distance
metric:
M
i
=
R C
a
i
R
T
+ C
b
i
1
(26)
60
where C
a
i
and C
b
i
are the neighborhood covariance matrices of the
two points, andR is the rotation block ofT from the previous iteration
of Algorithm 3.2. The intuition behind this distance metric is that if
the two point neighborhoods are planar, then Equation (25) will only
penalize errors normal to that plane, but will allow the points to slide
relative to each other in the plane.
The plane residual block f
b
(x) produces a vector describing the
alignment error over each element of the Hessian normal form for
every plane correspondence:
f
b
(x)=
0
B
B
B
B
B
B
B
B
B
B
@
d
0
d
1
.
.
.
d
N
1
C
C
C
C
C
C
C
C
C
C
A
(27)
Each sub-block d
i
is then defined as follows:
d
i
=
0
B
B
@
R
T
n
b
i
n
a
i
t
T
n
b
i
+ o
b
i
o
a
i
1
C
C
A
(28)
R andt are the rotation matrix and translation vector extracted from
the exponential map of x:
exp(x)=
2
6
6
4
R t
0 1
3
7
7
5
(29)
The Jacobian for each of the sub blocks shown in Equation (28) is as
follows:
61
J
i
=
2
6
6
4
¶n
¶t
¶n
¶w
¶o
¶t
¶o
¶w
3
7
7
5
=
2
6
6
4
0
33
R
T
n
b
i
R
T
n
b
i
T
0
13
3
7
7
5
(30)
The operator []
produces the cross-product matrix given a vector:
[a]
de f
=
2
6
6
6
6
6
6
4
0 a
3
a
2
a
3
0 a
1
a
2
a
1
0
3
7
7
7
7
7
7
5
(31)
Algorithm 3.2 The PICP algorithm, which combines planar features
with a minimal number of point features to compute a rigid body
transform between two scans
1: procedure PICP(P
ab
,Q
a
,Q
b
, ˜ t )
2: C 0
3: forhp
a
, p
b
i in P
ab
do
4: g (jp
a
j+jp
b
j) /2
5: add p
a
toC with g according to Eq. 17
6: end for
7: Calculate r
2
fromC according to Eq. 18
8: if r
2
> ˆ t then
9: return T as calculated by Sec. 3.2.2
10: end if
11:
˜
Q
a
choosePoints(C,Q
a
) . See Alg. 3.3
12: T I
13: while not converged do
14: Q
ab
findCorrespondences(T
˜
Q
a
,Q
b
)
15: T findTransform(P
ab
,Q
ab
)
16: end while
17: return T
18: end procedure
3.4 results
To test the effectiveness of our frame to frame alignment scheme,
three outdoor datasets were collected around the USC campus using
62
Algorithm 3.3 The choosePoints algorithm - a subfunction of PICP .
This method chooses a subset of point features to provide full rigid
body constraints.
1: procedure choosePoints(C,Q, ˆ t)
2: N FALSNormals(Q)
3: S ?
4: for i in indices(N ) do
5: Calculate s givenC,N
i
, and ˆ t through Eq’s 19 - 22
6: S
i
s
7: end for
8: SortQ andN in increasing order of S
9:
˜
Q ?
10: for i in indices(N ) do
11: Recalculate s givenC,N
i
, and ˆ t through Eq’s 19 - 22
12: if bernoulli(s) then
13: addN
i
toC with g= 1 according to Eq. 17
14: calculateCovariance(Q
i
)
15:
˜
Q
˜
Q[Q
i
16: Calculate r
2
fromC according to Eq. 18
17: if r
2
> ˆ t then
18: break
19: end if
20: end if
21: end for
22: return
˜
Q
23: end procedure
63
the same sensor and mounting scheme as in Section 3.2.3. All three
datasets were collected along a closed loop, ensuring that the sensor
location and orientation was the same in the last frame as in the
first. The dataset for Figure18 was collected outside of the Parkside
international dormitory, and follows an L-shaped path. The dataset for
Figure 17 was collected by walking around the Ronald Tutor Campus
Center, and finally the dataset for Figure19 was taken in an outdoor
corridor between buildings just north of the campus center. As a
comparison, our hybrid plane and point feature alignment algorithm
(which we call PICP) is compared against Segal et al.’s Generalized-
ICP (GICP) [17], as well as the plane-only alignment system described
in Section 3.2.
The results can be seen in Figures 17 to 19. Subfigure (a) of these
figures shows a 3D comparison of the paths generated by the two
algorithms, with GICP in red and PICP in blue. Subfigure (b) shows a
top-down view of the paths overlaid onto a satellite view (courtesy of
Bing maps), while subfigure (c) shows a plot of the z coordinate of the
paths over all frames. Subfigure (d) displays only the PICP path from
subfigure (b) colored by the algorithm framerate at each position in
the path, again overlaid onto a satellite image. Subfigure (e) plots the
framerate of all algorithms over each frame of execution. Quantitative
data on the runtime can be found in Figure 15 and a rough metric of
loop closure quality can be seen in Figure 16.
While all three algorithms fail to close the loop on all three datasets
due to their length and complexity, PICP generally provides a better
64
tracking solution at a significantly reduced computational cost than
GICP . Additionally, as can be seen from the Tutor Campus Center
dataset (17), our non-planar features play a very important roll when
planar constraints are lost. In this case, as the sensor is carried around
the eastern most edge of the building, the scene opens onto a long
pathway that has no nearby planes in the north/south orientation,
causing the plane-only solution to drift wildly in that direction. This
occurs again as the sensor is carried onto the final southernmost
east-west stretch. Whenever the datasets remain well-constrained, the
plane only approach tends to slightly outperform GICP . This is likely
because GICP’s nonlinear optimizer accumulates more small errors
over time than the closed form transform solving of the plane-only
approach. However, this small difference in accuracy is not worth
the cost of potential catastrophic errors when the system becomes
unconstrained. In the next solution, we will show how we can use the
planes used for this frame-to-frame alignment can be integrated into a
global world view to produce a consistent SLAM solution with near
zero loop closure distances.
65
Tutor Parkside Alley
0
5
10
15
20
0.61 0.54
0.97
12.84
10.86
11.62
6.69
8.28
8.99
Average Framerate (Hz)
GICP
Plane Only
PICP
Figure 15: Average framerate for all three datasets using GICP , PICP ,
and Plane Only frame to frame alignment.
66
Tutor Parkside Alley
0
10
20
30
40
50
60
70
80
41.06
8.99
11.99
47.17
28.05
4.8
7.78
34.63
4.98
Meters
GICP
Plane Only
PICP
Figure 16: Distance between start location and end location for all
three datasets using GICP , PICP , and Plane Only frame to
frame alignment. This provides a rough idea of how close
an algorithm is able to come to ”closing the loop.” However,
this data must be interpreted alongside Figures 17 to 19.
For example, even though GICP appears to have the best
performance on the Parkside dataset, a quick glance at
Figure18b shows a total failure by GICP to track the sensor
position.
67
X
0
50 Y 0
Z
0
20
40
GICP
PICP
Plane
(a) 3D View (Comparison)
0 50
X (meters)
20
0
20
40
Y (meters)
Plane
GICP
PICP
(b) Top View (Comparison)
0 500 1000 1500 2000
Frame
10
0
10
20
30
Z (meters)
GICP
PICP
Plane
(c) Z Position (Comparison)
0 50
X (meters)
20
0
20
40
Y (meters)
2
4
6
8
10
Frame Rate (Hz)
(d) PICP Timing
0 500 1000 1500 2000 2500
Frame
0
10
20
30
40
50
Frame Rate (Hz)
Plane
PICP
GICP
(e) PICP Framerate
Figure 17: Tutor Campus Center Dataset Frame-to-Frame Alignment
68
X
0
50
Y
0
50
Z
0
GICP
PICP
Plane
(a) 3D View (Comparison)
0 50
X (meters)
0
20
40
60
Y (meters)
Plane
GICP
PICP
(b) Top View (Comparison)
0 500 1000 1500 2000 2500
Frame
30
20
10
0
10
20
30
40
Z (meters)
GICP
PICP
Plane
(c) Z Position (Comparison)
0 50
X (meters)
0
20
40
60
Y (meters)
2
4
6
8
10
Frame Rate (Hz)
(d) PICP Timing
0 500 1000 1500 2000 2500
Frame
0
5
10
15
20
25
30
35
40
Frame Rate (Hz)
(e) PICP Framerate
Figure 18: Parkside Dataset Frame-to-Frame Alignment
69
X
50
0
50
Y
0
Z
50
0
50
GICP
PICP
Plane
(a) 3D View (Comparison)
20 0 20 40
X (meters)
0
20
40
Y (meters)
Plane
GICP
PICP
(b) Top View (Comparison)
0 200 400 600 800 1000 1200
Frame
25
20
15
10
5
0
5
10
Z (meters)
GICP
PICP
Plane
(c) Z Position (Comparison)
20 0 20 40
X (meters)
0
20
40
Y (meters)
4
6
8
10
Frame Rate (Hz)
(d) PICP Timing
0 200 400 600 800 1000 1200
Frame
0
5
10
15
20
25
30
Frame Rate (Hz)
Plane
PICP
GICP
(e) PICP Framerate
Figure 19: Alley Dataset Frame-to-Frame Alignment
70
4
S I M U LTA N E O U S L O C A L I Z AT I O N A N D M A P P I N G
4.1 introduction
While the methods presented in the previous chapter provide reason-
able path reconstructions over short distances, the accumulation of
small errors over time causes large amounts of drift on large difficult
datasets. In this chapter a Simultaneous Localization and Mapping
(SLAM) formulation is presented to counter this effect. By constantly
updating and localizing within a global map of all known planes in
the scene, our algorithm is able to demonstrate near zero drift by
closing the loop on several difficult datasets. In the following chapter,
we show how to use the tools developed over previous chapters, as
well as an ”off the shelf” graph optimization framework to create such
a SLAM system.
4.1.1 Smoothing and Mapping
As discussed in Section1.2.3, the formulation of the SLAM problem
as one of graph optimization was popularized by Dellaert with the
71
introduction of Square Root Smoothing and Mapping (
p
SAM ) [44].
Further developments introduced an online variant known as Iterative
Smoothing and Mapping (iSAM) [45], and it’s successor iSAM2 [46].
These algorithms provide a convenient way to parameterize a SLAM
problem as a factor graph, in which variables represent unknown
variables to be estimated (e.g. poses or landmarks), and factors rep-
resent the observed measurements between them. These methods
accommodate factor noise models, and attempt to find a minimum
solution to all variables in the system given the factors and their noise.
Thus, to formulate our SLAM problem in a Smoothing and Mapping
framework, we must define our variables and factors.
Variables
The two quantities we wish to estimate are1) the positions and orienta-
tions of the sensor as it moves through the scene, and 2) the positions
and orientations of observed planes. Positions are represented in the
system as se3 vectors, and planes are represented in their Hessian
normal form as
n
x
n
y
n
z
r
. Note that graph only has information on
(and can only optimize) the infinite form of the planes. In order to
render a semantically interesting map including the convex hulls of
all known planes, we must keep track of the plane extents separately -
this is discussed in Section 4.2.
72
Factors
Our SLAM formulation includes two types of factors: 1) ”control”
(or odometry) factors, which describe the relative position from one
sensor pose to the next, and2) ”observation” factors which describe
the observation of a single planar feature from a pose. Figure 20
shows an example graph taken from a short section of one of our
datasets. There, the red nodes represent pose estimates and the blue
nodes represent plane feature estimates. The red lines between pose
estimates are control factors, while the purple lines between poses
and plane landmarks are observation factors. In order to allow for
optimization by the graph framework, a factor must provide a function
to compute an error metric between the measured value (either the
control input, or the local plane measurement) and the hypothetical
measured value given two novel variable values.
Control factors provide the following error function, which takes as
input two poses (x
a
and x
b
), and computes the error given a known
control (or odometry) input (u):
e(x
a
, x
b
ju)= u x
1
a
x
b
(32)
Note that all coordinates are represented as6D se3 vectors, and thus
the subtraction operator represents the distance between u and x
1
a
x
b
on the Lie manifold positioned around u.
Observation factors provide an error function which takes as input
a pose value (x) and a plane value (p), and describes the difference
73
x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22
z1
z7
z9
z10 z11
z12
Figure 20: Example graph from our SLAM formulation, extracted from
a small section of the HNB dataset. Red nodes (x
i
) represent
estimated poses, while blue nodes (z
i
) represent detected
planar landmarks. Note that some landmark nodes have
been removed for clarity of the figure.
between the measurement that would have been taken at x of p to the
actual measurement z:
e(x, pjz)= z x
1
p (33)
Note here that both x and p are in a global coordinate system, thus
the operation x
1
p generates a hypothetical plane measurement in the
local coordinate system to be compared with z.
4.2 picp slam
Once the variables and factors have been defined for our SLAM prob-
lem, it is simply a matter of bookkeeping to implement a full SLAM
algorithm. Algorithm4.1 shows how we combine the iSAM2 frame-
work with our plane detection and frame alignment methods. Taking
as input only a sequence of point cloudsS, the PICP SLAM algorithm
74
iterates through the sequence to generate a globally consistent factor
graphG representing both the scene, and the sensor’s path through it.
Lines4 to8 of this algorithm are exactly the PICP algorithm described
in Chapter 3. The resulting transform T is fed into the PICP SLAM
update method on Line 8 as a control input, along with the plane
correspondencesP
ab
. Line 8 describes the update procedure of PICP
SLAM, which takes as input the current factor graphG, a control
input T, and a set of plane correspondences P
ab
. This algorithm first
inserts T into the graph, and then generates a ”rendered” view of the
current map, using Algorithm 4.3. For each set of plane correspon-
dences, the algorithm then tries to find a match to the rendered map.
This matching is done on Line5, and uses the same correspondence
finding algorithm described in Section 3.2.1. If a match has been
found, then the pair of plane detections are inserted into the graph as
observations of the matched landmark. Otherwise, if the pair of corre-
spondences represents an unmatched chain spanning at least ˆ h frames,
then a new landmark is created and those observations are attached
to it. Otherwise, the observation count is incremented on the pair
and the algorithm continues. Once all plane observations have been
integrated, the graph is optimized using the iSAM2 algorithm. Finally,
a global merge step is executed to attempt to reduce the number of
redundant landmarks in the map. Again, the algorithm described in
Section3.2.1 is used here to find any matches between planes in the
map. If matches are found, then one of the plane variables is deleted
from the graph, and all factors pointing to it are remapped to the
75
match. Once completed, the optimized graph is returned from the
function and the algorithm begins again on the next frame.
Algorithm4.1 The PICP SLAM high level algorithm, which recursively
updates a graph structure to maintain a global estimate of all observed
planar features. This algorithm takes as input a sequence of point
cloud measurementsS, and returns a globally consistent factor graph
G.
1: procedure PICPSLAM(S)
2: G [x
0
= 0]
3: for cloudsQ
t1
,Q
t
in sequenceS do
4: P
t1
detectPlanes(Q
t1
)
5: P
t
detectPlanes(Q
t
)
6: P
ab
findCorrespondences(P
t1
,P
t
)
7: T PICP(P
ab
,Q
t1
,Q
t
, ˆ t)
8: G updatePICPSLAM(G, T,P
ab
)
9: end for
10: returnG
11: end procedure
4.3 results
Similarly to the analysis of Section 3.2.3, we show the effectiveness
of our algorithm by running our algorithm on a series of datasets in
which a Velodyne sensor is carried through a scene on the backpack
mounted rig shown in Figure 13. Every path is a closed loop, as each
ends in the same position and orientation as it begins. The three
outdoor datasets of Section 3.2.3 are used, as well as two new indoor
environments taken from loops walked through the RTH and HNB
buildings on the USC campus. We compare our results against the
state of the art Lidar Odometry and Mapping (LOAM) algorithm [9],
as implemented in the ROS framework [66] by the authors.
76
Algorithm4.2 A single update iteration of PICP SLAM. This algorithm
performs matching of observations against a globally maintained map,
and decides when new measurements should be merged into existing
landmarks, and when new landmarks should be created. Additionally,
a global merge step is included to help minimize the number of
redundant landmarks in the system.
1: procedure updatePICPSLAM(G, T,P
ab
)
2: add control T to graph withS
x
3: M renderMap(G)
4: for p
ab
inP
ab
do
5: p
map
matchToMap(M, p
b
)
6: if p
map
then
7: Insert p
a
, p
b
intoG as measurements of p
map
withS
z
8: else if p
ab
has been observed > ˆ h times then
9: Create a new landmark p
map
10: Insert p
a
, p
b
intoG as measurements of p
map
withS
z
11: else
12: Increment observation count on p
ab
13: end if
14: end for
15: updateISAM2(G)
16: globalMerge(G)
17: returnG
18: end procedure
Algorithm 4.3 The PICP SLAM Render Map algorithm, which projects
every hull observation of each landmark into the latest pose estima-
tions, and computes their unions.
1: procedure renderMap(G)
2: M ?
3: for plane p inG do
4: h
p
?
5: Get all hull measurements of p intoH fromG
6: for hull h inH do
7: Get location estimate x
z
from which h was observed
fromG
8: Project h into global coordinates using x
z
as h
g
9: h
p
h
p
[ h
g
10: end for
11: Insert h
p
into M
12: end for
13: return M
14: end procedure
77
The results, shown in Figures 21 to 25, demonstrate the excellent
stability of our algorithm. Subfigure (a) of these figures shows a3D
comparison of the paths generated by the two algorithms, with LOAM
in red and PICP in blue. Subfigure (b) shows a top-down view of the
paths overlaid onto a satellite view (courtesy of Bing maps), while
subfigure (c) shows a plot of the z coordinate of the paths over all
frames. Subfigure (d) displays the only the PICP SLAM path from
subfigure (b) colored by the algorithm framerate at each position in
the path, again overlaid onto a satellite image. Subfigure (e) plots the
framerate of PICP SLAM over each frame of execution. Quantitative
data on the runtime can be found in Figure 26 and a metric of loop
closure quality can be seen in Figure28. The runtime values for LOAM
are listed here as 5 Hz, as it is implemented in a non-blocking manner
and will thus skip frames when processing power is limited, causing
a reduction in tracking performance. On our test machine equipped
with a 2.80 Ghz Intel i7-3840QM processor, a framerate of 10 Hz
quickly caused catostrophic tracking errors, while a framerate of2.5
Hz produced the same paths as5 Hz. We thus list all processing times
as 5 Hz. Additionally, LOAM requires two CPU threads effectively
doubling the required processing power implied by Figure 26.
On nearly
1
every dataset that was run, PICP SLAM was able to
produce an accurate trajectory and reconstruction of the world around
it. The success of this algorithm is in large part due to the robust
1 One dataset was excluded from this analysis, as it featured a remarkably rectangular
maintenance golf-cart which followed the author as he walked through a scene.
Moving planar features are not currently handled by the SLAM algorithm, and so
the dataset was discarded.
78
odometry estimates provided by the PICP algorithm. By providing
a very good initial estimate of the robot pose and plane parameters,
PICP allows the iSAM2 optimizer to converge quickly onto a consistent
global solution.
Additionally, the algorithm is very computationally efficient. Planar
features have proven to be cheap to detect and easy to use for align-
ment. The additional work required by PICP when planes are not
available is minimized by a careful tracking of necessary constraints.
Figure 27 shows a detailed timing plot for the Tutor dataset which
breaks down the timing of each frame by the various stages of the
algorithm. As can be seen in the figure, the nearest neighbors selection
from PICP causes a large spike in computation time near frame 1500
where the environment becomes under constrained. However, during
the first part of this run where the environment is well constrained,
computation time is minimized. It should also be noted that while
all stages PICP SLAM are single threaded, many of the expensive
operations in the plane and corresponding finding algorithms are
embarrassingly parallelizeable.
79
0
50
0
50
0
50
LOAM
PICPSLAM
(a) 3D View (Comparison)
0 50
X (meters)
0
20
40
60
Y (meters)
LOAM
PICPSLAM
(b) Top View (Comparison)
0 500 1000 1500 2000 2500
Frame
10
0
10
20
30
40
50
60
70
Z (meters)
LOAM
PICPSLAM
(c) Z Position (Comparison)
0 50
X (meters)
0
20
40
60
Y (meters)
2
4
6
8
10
Frame Rate (Hz)
(d) PICP SLAM Timing
0 500 1000 1500 2000 2500
Frame
0
5
10
15
20
25
Frame Rate (Hz)
(e) PICP SLAM Framerate
Figure 21: Parkside SLAM
80
0
50
0
50
0
50
LOAM
PICPSLAM
(a) 3D View (Comparison)
0 50
X (meters)
20
0
20
40
Y (meters)
LOAM
PICPSLAM
(b) Top View (Comparison)
0 500 1000 1500 2000
Frame
0
10
20
30
Z (meters)
LOAM
PICPSLAM
(c) Z Position (Comparison)
0 50
X (meters)
20
0
20
40
Y (meters)
2
4
6
8
10
Frame Rate (Hz)
(d) PICP SLAM Timing
0 500 1000 1500 2000 2500
Frame
0
5
10
15
20
25
Frame Rate (Hz)
(e) PICP SLAM Framerate
Figure 22: Tutor Campus Center SLAM
81
0
0
20
0
20
LOAM
PICPSLAM
(a) 3D View (Comparison)
20 0 20
X (meters)
20
0
20
40
Y (meters)
LOAM
PICPSLAM
(b) Top View (Comparison)
0 200 400 600 800 1000 1200 1400
Frame
5
4
3
2
1
0
1
Z (meters)
LOAM
PICPSLAM
(c) Z Position (Comparison)
20 0 20
X (meters)
0
20
Y (meters)
2
4
6
8
10
Frame Rate (Hz)
(d) PICP SLAM Timing
0 200 400 600 800 1000 1200 1400
Frame
0
2
4
6
8
10
12
14
Frame Rate (Hz)
(e) PICP SLAM Framerate
Figure 23: RTH Dataset SLAM
82
0
20
40
0
20
40
0
20
40
LOAM
PICPSLAM
(a) 3D View (Comparison)
20 0 20 40
X (meters)
0
20
40
Y (meters)
LOAM
PICPSLAM
(b) Top View (Comparison)
0 200 400 600 800 1000
Frame
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0
Z (meters)
LOAM
PICPSLAM
(c) Z Position (Comparison)
20 0 20 40
X (meters)
0
20
40
Y (meters)
4
6
8
10
Frame Rate (Hz)
(d) PICP SLAM Timing
0 200 400 600 800 1000 1200
Frame
2
4
6
8
10
12
14
16
Frame Rate (Hz)
(e) PICP SLAM Framerate
Figure 24: Alley SLAM
83
10
0
10
10
0
10
10
0
10
LOAM
PICPSLAM
(a) 3D View (Comparison)
15 10 5 0 5 10 15
X (meters)
0
5
10
15
20
Y (meters)
LOAM
PICPSLAM
Ground Truth
(b) Top View (Comparison)
0 200 400 600 800 1000120014001600
Frame
6
4
2
0
2
4
Z (meters)
LOAM
PICPSLAM
(c) Z Position (Comparison)
14 12 10 8 6 4 2 0 2
0
2
4
6
8
10
12
1
2
3
4
5
6
7
8
9
10
Frame Rate (Hz)
(d) PICP SLAM Timing
0 200 400 600 800 1000120014001600
Frame
0
5
10
15
20
25
30
Frame Rate (Hz)
(e) PICP SLAM Framerate
Figure 25: HNB SLAM
84
Tutor HNB Parkside RTH Alley
0
2
4
6
8
10
12
14
16
5
5
5
5 5
5.33
7.52
5.54
8.03
9.24
Average Framerate (Hz)
LOAM
PICP SLAM
Figure 26: Average framerate for all five datasets using LOAM, and
PICP SLAM. Note that the LOAM framerate listed here
is only 5 Hz rather than the 10 Hz listed in the original
paper, as running at full framerate caused the non-blocking
algorithm to drop frames and reduced performance. Both
algorithms were run on a Intel i7-3840QM, though LOAM
consumed two threads while PICP SLAM consumed only
one. See Section 4.3 for a discussion of these numbers.
Runtimes listed with a
symbol indicate that there was
an error of greater than 5 meters in loop closure, which we
consider a major tracking failure.
85
0 500 1000 1500 2000
Frame
0
100
200
300
400
500
Time (ms)
1. Plane Detection & Correspondence
2. PICP Choose Points
3. PICP Nearest Neighbors
4. PICP Other
5. SLAM
Figure 27: Detailed timing of Tutor dataset showing the time required
for each state in the PICP SLAM system.
86
Tutor HNB Parkside RTH Alley
0
1
2
3
4
5
6
7
46.66
5.6
66.11
0.85
2.69
0.14
0.28
0.95
0.23 0.21
Meters
LOAM
PICP SLAM
Figure 28: Distance between start location and end location for all
three datasets using LOAM, and PICP SLAM. This demon-
strates how close an algorithm is able to come to ”closing
the loop.”
87
5
C O N C L U S I O N
In this thesis we have presented a novel simultaneous localization
and mapping (SLAM) system for scanning LiDAR sensors that is
suitable for online use. Scanning LiDAR sensors are becoming more
widely available, and as a result are an attractive choice for mobile
robots because of their long range and fast scanning speed. However,
to our knowledge no other algorithms exists which can provide ac-
curate online localization and mapping solutions without resorting
to other extrinsic or intrinsic data sources such as GPS or accurate
wheel odometry. The ideas and methods presented in this work were
developed to help robots equipped with scanning LiDARs to operate
within man-made environments at rapid speeds without the aid of
any other data. The development of our system took place over three
distinct phases, each of which was presented in the three chapters of
this thesis following the introduction in Chapter 1. At each stage we
discussed the rational and design decisions that were made, followed
by implementation details and concluded with tests and comparisons
to state of the art methods. The following is a summary of those
chapters.
88
In Chapter2, we introduced the main focus of this thesis which is
the use of planar surfaces for localization and mapping. Because of
the sparsity of the point clouds generated by scanning LiDAR sensors,
we found that traditional frame-to-frame alignment methods failed
to reliably converge on accurate solutions. We hypothesized that if
planar features could be extracted from LiDAR scans, those features
could form the basis of a very efficient SLAM system. Existing plane
detector approaches were surveyed and run on our data, however
none provided satisfactory results. We then developed a novel plane
detector specifically tuned to scanning LiDAR sensors which achieves
high efficiency and accuracy by exploiting the geometry of the sensor
itself. By recognizing that 3D planar surfaces would manifest as
conic sections in each of the 1D scanlines of the sensor, we were
able to formulate our algorithm as a Hough transform in spherical
coordinates. In our algorithm, each scanline is first segmented into
approximate conic sections, each of which is then allowed to cast votes
into a spherical accumulator. These votes are weighted according to
the curvature of the section and the orientation of the putative plane,
which we show is a good proxy for the confidence that a section
belongs to a given plane.
In Chapter 3 we showed the utility of our plane detector for provid-
ing features for frame to frame registration. The main novelty in the
first half of this section is the creation of a very efficient correspon-
dence algorithm which uses a two stage metric for matching sets of
planes between two frames. By coupling this algorithm with our plane
89
detector, we were able to show fast and accurate path reconstructions
indoors. While we showed that planar correspondences can be suffi-
cient for simple indoor localization, more complex scenes can cause
localization to fail when there are not enough planes to constrain the
problem.
In the second half of Chapter3, starting in Section3.3, we developed
a framework for analyzing how well constrained a planar alignment
problem is, and then showed how a minimal number of point features
could be selected from a scene to fully constrain the problem. Our pro-
cedure, which we call PICP , combines these planar correspondences
and minimal point features to generate alignment solutions using
the minimum necessary computational power. In highly structured
scenes with good plane correspondences, the system uses very little
computation time as it is allowed to rely only on planes. As scenes
become less structured, our plane detector may fail to detect planes
either due to excess clutter (e.g. obscuring tree branches) or an al-
together lack of planes within sensor range (e.g. a long hallway or
street). In such cases, the system will begin to search for point features
to fill the missing planar constraints. The result is an algorithm which
uses minimal computational power whenever possible, but is able
to smoothly scale up its computational load as necessary to ensure
high accuracy. We demonstrate this accuracy and efficiency on several
outdoor datasets in Section3.4 and show its superiority to a state of
the art alignment algorithm.
90
While the algorithm in Chapter 3 works well for frame to frame
alignment, any such algorithm will exhibit drift over long trajectories.
To combat this, we integrated our plane detector and frame to frame
alignment algorithm into a global SLAM solution in Chapter4. This
SLAM system was formulated as a graph optimization problem where
planar patches form landmark nodes, with links to pose nodes from
which those planes were observed. Pose nodes are linked together
with alignment results from PICP , and the entire graph is optimized
to find a minimal energy solution. By using several bookkeeping
algorithms described in Section 4.2, PICP SLAM is able to produce
drift free maps and path reconstructions on very difficult datasets.
In Section 4.3, PICP SLAM is compared against a state of the art
algorithm, and shows better stability and accuracy on all datasets.
In summary, we have developed a series of algorithms which were
built on top of each other to produce a fast and accurate SLAM
solution for scanning LiDAR sensors. Such an algorithm will be
highly applicable to autonomous robotic navigation and mapping,
especially in situations where intrinsic and extrinsic data is unavail-
able or unreliable such as humanoid robots operating in GPS denied
environments. This thesis also demonstrates the viability of planar fea-
tures for real time navigation in general, and shows how to cope with
situations where those features fail. Future work could concentrate
on improvements and extensions along several dimensions. Very easy
performance improvements could be gained by simply parallelizing
many stages of the algorithm. Because many of the computations
91
are independent and the code was written in a functional style, such
improvements should be easy to realize. More in depth efforts could
focus on better point selection schemes for the PICP algorithm. As
the timing results of Figure27 show, this stage is the most computa-
tionally intensive when scenes become under-constrained. Presently,
this search is done by a simple k-d tree but further research may turn
up more efficient methods of point selection. To improve the low
level plane detection, future research could focus on using temporal
memory to better guide the Hough voting procedure. At present the
voting procedure is stateless, as it has no memory of results from
previous frames. Because of the high scanning speed of the sensor,
performance gains might be found by assuming temporal consistency
between scans to reduce the planes for which each candidate conic
section must vote. Finally, a very interesting line of research could
be found by investigating ways for autonomous robots to use the
generated plane maps for tasks beyond simple path reconstruction
and point cloud integration. For example, planar environment maps
could provide semantic structure to the world allowing a robot to
easily understand the difference between a “room” and a “hallway.”
This could allow for interesting semantic compression schemes which
record only the minimum necessary information for a map, such as
the boundaries of rooms and the hallways which connect them.
92
B I B L I O G R A P H Y
[1] Linus Torvalds. Message to linux-kernel mailing list. url: http:
//lkml.org/lkml/2000/8/25/132.
[2] Velodyne. Velodyne Lidar. url: http://velodynelidar.com/.
[3] Tom Abate. “Robots, start your engines”. In: SFGate (2007).
[4] Sean Higgins. “Velodyne Announces $7,999 ”Puck” LiDAR Sen-
sor”. In: SP AR Point Group (2014).
[5] AR Johnston. “Infrared laser rangefinder”. In: NASA New Tech-
nology Report No. NPO-13460, Jet Propulsion Laboratory, Pasadena,
California (August 1973) (1973).
[6] David Nitzan, Alfred E Brain, and Richard O Duda. “The mea-
surement and use of registered reflectance and range data in
scene analysis”. In: Proceedings of the IEEE65.2 (1977), pp.206–
220.
[7] Giorgio Grisetti, Cyrill Stachniss, and Wolfram Burgard. “Im-
proved techniques for grid mapping with rao-blackwellized
particle filters”. In: Robotics, IEEE Transactions on 23.1 (2007),
pp. 34–46.
[8] Su-Yong An, Lae-Kyoung Lee, and Se-Young Oh. “Fast incre-
mental 3d plane extraction from a collection of 2d line segments
93
for3d mapping”. In: Intelligent Robots and Systems (IROS), 2012
IEEE/RSJ International Conference on. IEEE. 2012, pp. 4530–4537.
[9] Ji Zhang and Sanjiv Singh. “LOAM: Lidar Odometry and Map-
ping in Real-time”. In: Robotics: Science and Systems Conference
(RSS). Berkeley, CA, July 2014.
[10] Yang Chen and G´ erard Medioni. “Object modelling by registra-
tion of multiple range images”. In: Image and vision computing
10.3 (1992), pp. 145–155.
[11] Feng Lu and Evangelos Milios. “Robot pose estimation in un-
known environments by matching2d range scans”. In: Journal
of Intelligent and Robotic Systems 18.3 (1997), pp. 249–275.
[12] Javier Minguez, Luis Montesano, and Florent Lamiraux. “Metric-
based iterative closest point scan matching for sensor displace-
ment estimation”. In: Robotics, IEEE Transactions on 22.5 (2006),
pp. 1047–1054.
[13] Albert Diosi and Lindsay Kleeman. “Fast laser scan matching
using polar coordinates”. In: The International Journal of Robotics
Research 26.10 (2007), pp. 1125–1153.
[14] Szymon Rusinkiewicz and Marc Levoy. “Efficient variants of
the ICP algorithm”. In: 3-D Digital Imaging and Modeling, 2001.
Proceedings. Third International Conference on. IEEE.2001, pp.145–
152.
[15] Bertrand Douillard, James Underwood, Noah Kuntz, Vsevolod
Vlaskine, Alastair Quadros, Peter Morton, and Alon Frenkel.
94
“On the segmentation of3D LIDAR point clouds”. In: Robotics
and Automation (ICRA), 2011 IEEE International Conference on.
IEEE. 2011, pp. 2798–2805.
[16] Bertrand Douillard, A Quadros, Peter Morton, James Patrick
Underwood, M De Deuge, S Hugosson, M Hallstrom, and Tim
Bailey. “Scan segments matching for pairwise 3D alignment”.
In: Robotics and Automation (ICRA), 2012 IEEE International Con-
ference on. IEEE. 2012, pp. 3033–3040.
[17] Aleksandr Segal, Dirk Haehnel, and Sebastian Thrun. “Generalized-
ICP .” In: Robotics: Science and Systems. Vol. 2. 4. 2009.
[18] Peter M Will and Keith S Pennington. “Grid coding: A prepro-
cessing technique for robot and machine vision”. In: Artificial
Intelligence 2.3 (1972), pp. 319–329.
[19] Paul VC Hough. Method and means for recognizing complex pat-
terns. US Patent 3,069,654. Dec. 1962.
[20] Richard O Duda, David Nitzan, and Phyllis Barrett. “Use of
range and reflectance data to find planar surface regions”. In:
Pattern Analysis and Machine Intelligence, IEEE Transactions on 3
(1979), pp. 259–271.
[21] Richard Hoffman. “Object recognition from range images”. PhD
thesis. Michigan State University, 1986.
[22] Richard Hoffman and Anil K Jain. “Segmentation and classifi-
cation of range images”. In: Pattern Analysis and Machine Intelli-
gence, IEEE Transactions on 5 (1987), pp. 608–620.
95
[23] Adam Hoover, Gillian Jean-Baptiste, Xiaoyi Jiang, Patrick J
Flynn, Horst Bunke, Dmitry B Goldgof, Kevin Bowyer, David W
Eggert, Andrew Fitzgibbon, and Robert B Fisher. “An experi-
mental comparison of range image segmentation algorithms”.
In: Pattern Analysis and Machine Intelligence, IEEE Transactions on
18.7 (1996), pp. 673–689.
[24] Patrick J Flynn and Anil K Jain. “BONSAI: 3D object recogni-
tion using constrained search”. In: IEEE Transactions on Pattern
Analysis and Machine Intelligence 13.10 (1991), pp. 1066–1075.
[25] Xiaoyi Jiang and Horst Bunke. “Fast segmentation of range
images into planar regions by scan line grouping”. In: Machine
vision and applications 7.2 (1994), pp. 115–122.
[26] Dorit Borrmann, Jan Elseberg, Kai Lingemann, and Andreas
N ¨ uchter. “The3D Hough Transform for plane detection in point
clouds: A review and a new accumulator design”. In: 3D Re-
search 2.2 (2011), pp. 1–13.
[27] Nahum Kiryati, Yuval Eldar, and Alfred M Bruckstein. “A prob-
abilistic Hough transform”. In: Pattern recognition 24.4 (1991),
pp. 303–316.
[28] Antti Yla-Jaaski and Nahum Kiryati. “Adaptive termination of
voting in the probabilistic circular Hough transform”. In: Pattern
Analysis and Machine Intelligence, IEEE Transactions on16.9 (1994),
pp. 911–915.
96
[29] Jiri Matas, Charles Galambos, Josef Kittler, et al. “Progressive
Probabilistic Hough Transform”. In: BMVC. 1998, pp. 1–10.
[30] Lei Xu, Erkki Oja, and Pekka Kultanen. “A new curve detec-
tion method: randomized Hough transform (RHT)”. In: Pattern
recognition letters 11.5 (1990), pp. 331–338.
[31] Martin A Fischler and Robert C Bolles. “Random sample con-
sensus: a paradigm for model fitting with applications to image
analysis and automated cartography”. In: Communications of the
ACM 24.6 (1981), pp. 381–395.
[32] Babak Ameri and Dieter Fritsch. “Automatic 3D building recon-
struction using plane-roof structures”. In: ASPRS, Washington
DC (2000).
[33] Claus Brenner. “Towards fully automatic generation of city mod-
els”. In: International Archives of Photogrammetry and Remote Sens-
ing 33.B3/1; PART 3 (2000), pp. 84–92.
[34] Charles V Stewart. “Bias in robust estimation caused by disconti-
nuities and multiple structures”. In: Pattern Analysis and Machine
Intelligence, IEEE Transactions on 19.8 (1997), pp. 818–833.
[35] Orazio Gallo, Roberto Manduchi, and Abbas Rafii. “CC-RANSAC:
Fitting planes in the presence of multiple surfaces in range data”.
In: Pattern Recognition Letters 32.3 (2011), pp. 403–410.
[36] Kristiyan Georgiev, Ross T Creed, and Rolf Lakaemper. “Fast
plane extraction in3D range data based on line segments”. In:
97
Intelligent Robots and Systems (IROS),2011 IEEE/RSJ International
Conference on. IEEE. 2011, pp. 3808–3815.
[37] Randall C Smith and Peter Cheeseman. “On the representa-
tion and estimation of spatial uncertainty”. In: The international
journal of Robotics Research 5.4 (1986), pp. 56–68.
[38] Rudolph Emil Kalman. “A new approach to linear filtering and
prediction problems”. In: Journal of Fluids Engineering82.1 (1960),
pp. 35–45.
[39] John J Leonard and Hugh F Durrant-Whyte. “Simultaneous
map building and localization for an autonomous mobile robot”.
In: Intelligent Robots and Systems’ 91.’Intelligence for Mechanical
Systems, Proceedings IROS’91. IEEE/RSJ International Workshop on.
Ieee. 1991, pp. 1442–1447.
[40] Michael Montemerlo, Sebastian Thrun, Daphne Koller, Ben Weg-
breit, et al. “FastSLAM: A factored solution to the simultane-
ous localization and mapping problem”. In: AAAI/IAAI. 2002,
pp. 593–598.
[41] Jerry M Mendel. “Computational requirements for a discrete
Kalman filter”. In: Automatic Control, IEEE Transactions on 16.6
(1971), pp. 748–758.
[42] Kevin P Murphy et al. “Bayesian Map Learning in Dynamic
Environments.” In: NIPS. 1999, pp. 1015–1021.
98
[43] Michael Montemerlo and Sebastian Thrun. “FastSLAM 2.0”. In:
FastSLAM: A Scalable Method for the Simultaneous Localization and
Mapping Problem in Robotics (2007), pp. 63–90.
[44] Frank Dellaert and Michael Kaess. “Square Root SAM.” In:
Robotics: Science and Systems. 2005, pp. 177–184.
[45] Michael Kaess, Ananth Ranganathan, and Frank Dellaert. “iSAM:
Incremental smoothing and mapping”. In: Robotics, IEEE Trans-
actions on 24.6 (2008), pp. 1365–1378.
[46] Michael Kaess, Hordur Johannsson, Richard Roberts, Viorela
Ila, John J Leonard, and Frank Dellaert. “iSAM2: Incremental
smoothing and mapping using the Bayes tree”. In: The Interna-
tional Journal of Robotics Research (2011), p. 0278364911430419.
[47] Joachim Horn and G ¨ unther Schmidt. “Continuous localization
of a mobile robot based on 3D-laser-range-data, predicted sen-
sor images, and dead-reckoning”. In: Robotics and Autonomous
Systems 14.2 (1995), pp. 99–118.
[48] Jan Weingarten and Roland Siegwart. “EKF-based 3D SLAM
for structured environment reconstruction”. In: Intelligent Robots
and Systems, 2005.(IROS 2005). 2005 IEEE/RSJ International Con-
ference on. IEEE. 2005, pp. 3834–3839.
[49] Jan Weingarten and Roland Siegwart. “3D SLAM using pla-
nar segments”. In: Intelligent Robots and Systems, 2006 IEEE/RSJ
International Conference on. IEEE. 2006, pp. 3062–3067.
99
[50] Kaustubh Pathak, Andreas Birk, Narunas Vaskevicius, and Jann
Poppinga. “Fast registration based on noisy planes with un-
known correspondences for 3-D mapping”. In: Robotics, IEEE
Transactions on 26.3 (2010), pp. 424–441.
[51] Kaustubh Pathak, Andreas Birk, Narunas Vaskevicius, Max Pf-
ingsthorn, S¨ oren Schwertfeger, and Jann Poppinga. “Online
three-dimensional SLAM by registration of large planar surface
segments and closed-form pose-graph relaxation”. In: Journal of
Field Robotics 27.1 (2010), pp. 52–84.
[52] Richard A Newcombe, Andrew J Davison, Shahram Izadi, Push-
meet Kohli, Otmar Hilliges, Jamie Shotton, David Molyneaux,
Steve Hodges, David Kim, and Andrew Fitzgibbon. “KinectFu-
sion: Real-time dense surface mapping and tracking”. In: Mixed
and augmented reality (ISMAR), 2011 10th IEEE international sym-
posium on. IEEE. 2011, pp. 127–136.
[53] Frank Moosmann and Christoph Stiller. “Velodyne slam”. In: In-
telligent Vehicles Symposium (IV), 2011 IEEE. IEEE.2011, pp.393–
398.
[54] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. “Fast-
SLAM 2.0: An Improved Particle Filtering Algorithm for Simul-
taneous Localization and Mapping that Provably Converges”.
In: Proceedings of the Sixteenth International Joint Conference on
Artificial Intelligence (IJCAI). IJCAI. Acapulco, Mexico, 2003.
100
[55] Frank Dellaert and Michael Kaess. “Square Root SAM: Simul-
taneous localization and mapping via square root information
smoothing”. In: The International Journal of Robotics Research25.12
(2006), pp. 1181–1203.
[56] Xunnian Yang. “Curve fitting and fairing using conic splines”. In:
Computer-Aided Design 36.5 (2004), pp. 461–472. issn: 0010-4485.
doi: 10.1016/S0010-4485(03)00119-2.
[57] Ponnambalam Kumaraswamy. “A generalized probability den-
sity function for double-bounded random processes”. In: Journal
of Hydrology 46.1 (1980), pp. 79–88.
[58] Malcolm D Shuster. “The generalized Wahba problem”. In: The
Journal of the Astronautical Sciences 54.2 (2006), pp. 245–259.
[59] Paul B Davenport. A vector approach to the algebra of rotations with
applications. National Aeronautics and Space Administration,
1968.
[60] F Landis Markley and Daniele Mortari. “Quaternion attitude
estimation using vector observations.” In: Journal of the Astro-
nautical Sciences 48.2 (2000), pp. 359–380.
[61] Automation Group (Jacobs University Bremen) and Knowledge-
Based Systems Group (University of Osnabruck). 3DTK - The
3D Toolkit. Mar. 2012.
[62] Hern´ an Badino, Daniel Huber, Yongwoon Park, and Takeo
Kanade. “Fast and accurate computation of surface normals
101
from range images”. In: Robotics and Automation (ICRA), 2011
IEEE International Conference on. IEEE. 2011, pp. 3084–3091.
[63] Marius Muja and David G Lowe. “Fast Approximate Nearest
Neighbors with Automatic Algorithm Configuration.” In: VIS-
APP (1). 2009, pp. 331–340.
[64] SJ Wright and J Nocedal. Numerical optimization. Vol.2. Springer
New York, 1999.
[65] Sameer Agarwal, Keir Mierle, et al. Ceres Solver. http://ceres-
solver.org.
[66] Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully
Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. “ROS:
an open-source Robot Operating System”. In: ICRA workshop on
open source software. Vol. 3. 3.2. 2009, p. 5.
102
colophon
This document was typeset using the typographical look-and-feel
classicthesis developed by Andr´ e Miede. The style was inspired
by Robert Bringhurst’s seminal book on typography “The Elements of
Typographic Style”. classicthesis is available for both L
A
T
E
X and L
Y
X:
http://code.google.com/p/classicthesis/
Final Version as of July 1, 2015 (classicthesis).
Abstract (if available)
Abstract
This work presents a novel SLAM algorithm for scanning (Velodyne style) LiDAR sensors. A Hough transform algorithm is first derived which exploits the unique geometry of these sensors to detect planar features, and it is then shown how these features can be matched over a sequence of scans to reconstruct the path of the sensor. A mathematical framework is then developed to track how well constrained these sequential alignment problems are, and to detect when they become underconstrained. A method is then presented which determines a minimal set of non-planar features to be extracted from scenes to achieve full constraint within this framework. Finally, all of these tools are combined into an online SLAM system which is able to close the loop on a variety of indoor and outdoor datasets without the use of odometry, GPS, or an IMU. Both the frame to frame alignment as well as the SLAM algorithms are compared to state of the art methods and superior performance is shown for both.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
City-scale aerial LiDAR point cloud visualization
PDF
Robust loop closures for multi-robot SLAM in unstructured environments
PDF
3D urban modeling from city-scale aerial LiDAR data
PDF
Informative path planning for environmental monitoring
PDF
Outdoor visual navigation aid for the blind in dynamic environments
PDF
Data-driven autonomous manipulation
PDF
Learning objective functions for autonomous motion generation
PDF
Sense and sensibility: statistical techniques for human energy expenditure estimation using kinematic sensors
PDF
Toward situation awareness: activity and object recognition
PDF
Sample-efficient and robust neurosymbolic learning from demonstrations
PDF
Incorporating aggregate feature statistics in structured dynamical models for human activity recognition
PDF
Eye-trace signatures of clinical populations under natural viewing
PDF
Robot life-long task learning from human demonstrations: a Bayesian approach
PDF
Data-driven robotic sampling for marine ecosystem monitoring
PDF
Vision-based studies for structural health monitoring and condition assesment
PDF
Precision agriculture and GIS: evaluating the use of yield maps combined with LiDAR data
PDF
Depth inference and visual saliency detection from 2D images
PDF
Leveraging prior experience for scalable transfer in robot learning
PDF
Crowd-sourced collaborative sensing in highly mobile environments
PDF
Hierarchical tactile manipulation on a haptic manipulation platform
Asset Metadata
Creator
Voorhies, Randolph Charles
(author)
Core Title
Efficient SLAM for scanning LiDAR sensors using combined plane and point features
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
07/20/2015
Defense Date
04/22/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
ICP,LiDAR,OAI-PMH Harvest,plane detection,robotics,SLAM
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Itti, Laurent (
committee chair
), Sukhatme, Gaurav Suhas (
committee member
), Weiland, James D. (
committee member
)
Creator Email
rand.voorhies@gmail.com,voorhies@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-600729
Unique identifier
UC11300388
Identifier
etd-VoorhiesRa-3649.pdf (filename),usctheses-c3-600729 (legacy record id)
Legacy Identifier
etd-VoorhiesRa-3649-0.pdf
Dmrecord
600729
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Voorhies, Randolph Charles
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
ICP
LiDAR
plane detection
robotics
SLAM