Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Rapid creation of photorealistic large-scale urban city models
(USC Thesis Other)
Rapid creation of photorealistic large-scale urban city models
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
RAPID CREATION OF PHOTOREALISTIC LARGE-SCALE URBAN CITY
MODELS
by
Charalambos Poullis
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2009
Copyright 2009
Charalambos Poullis
ii
Dedication
To my parents, George and Zena
- my first and best teachers
To my sister, Elena
- thanks for putting up with me throughout these years
iii
Acknowledgments
This thesis wouldn’t have been possible without the help and support of several people.
I would like to express my deep and sincere gratitude to my Ph.D. supervisor Dr. Suya You.
Throughout my years in the Ph.D. program he provided encouragement, sound advice and many
great ideas. With his invaluable support, guidance and patience, he helped to make this thesis
possible.
IamgratefultoDr. UlrichNeumann,directoroftheComputerGraphicsandImmersiveTechnologies
Laboratory at USC, for welcoming me to his lab at a difficult time during my doctoral years.
I wish to thank Dr. Gerard Medioni for kindly sharing the source code for the tensor voting
technology developed in the I.R.I.S lab at USC. This was an invaluable help with my research.
IwanttoexpressmygratitudetomycommitteemembersDr. RamakantNevatia,Dr. MichaelZyda,
Dr. C.C.-Jay Kuo. Their insightful comments have helped to considerably improve this thesis.
IamgratefultotheadministrativestaffattheComputerScienceDepartmentatUSCforassistingme
throughouttheyearsinnumerousways. IamespeciallygratefultomygraduateadvisorSherriFagan
forensuringthatmydoctoralstudiesrunsmoothly. AimeeBarnard(IMSC),ZacharyWilliams(OIS),
Nichole Philips(IMSC), Steve Schrader, Lifeng(Mai) Lee, Julieta DeLaPaz, Jun Camacho, deserve
special mention.
I would like to thank Dr. C. L. Max Nikias for giving me the opportunity to join the Integrated
Media Systems Center during my master’s studies which eventually lead to my doctoral studies.
I am indebted to my fellow members of the Computer Graphics and Immersive Technologies Labo-
ratory at USC and the Graphics Lab at the Institute for Creative Technologies for making all these
years interesting and pleasant. I would like to thank Lu Wang, Jonathan Mooser, Sang Yun Lee,
QuanWang,SeonMinRhee,JamesTaehyunRhee,WeiGuan,Qian-YiZhou,TanasaiSucontphunt,
iv
ZhenZhen Gao, Huahang Liu, Pradeep Vaghela, Jinhui Hu, Chris Tchou, Tim Hawkins, Andrew
Jones, Jessi Stumpfel, Charles-Felix Chabert, Per Einnarson, Bruce Lamond.
I would also like to thank my many friends outside the university who have helped and supported
me. I am especially thankful to Christos Kasparis, Demetris Demetriou, Marios Georgiou, Antonis
Hadjikyriakos, Marinos Yiannakis, Alexia Kirmitsis, Barbara Argyrou, Artemis Kaldellis, Kimie &
Ben Silver, Natasa Andreou.
Last but not least, I would like to thank my family, my father George Poullis, my mother Zena
Poullis and my sister Elena Poullis, for their unconditional love and support they have shown me
throughout my life. To them I dedicate this thesis.
v
Table Of Contents
Dedication ii
Acknowledgments iii
List of Tables vii
List of Figures viii
Abstract xii
Chapter 1: Introduction 1
Chapter 2: Background and related work 5
3D Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Texture Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3: System Overview and Contributions 14
Rapid Creation of Photorealistic Large-Scale City Models . . . . . . . . . . . . . . . . . . 14
Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Building Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Geometry Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Texture Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Systems Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 4: Preprocessing 21
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 5: Building Extraction 30
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Primitive-based Building Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Building Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Common Linear Roof Types: Parameterized Geometric Primitives . . . . . . . . . . 32
Complex Linear Roof Types: Polygonal Primitive . . . . . . . . . . . . . . . . . . . . 40
Non-linear Roof Types: Ellipsoidal Primitive . . . . . . . . . . . . . . . . . . . . . . 43
Probabilistic Building Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Normal Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
vi
Probabilistic Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Classification and Removal of Unstable Regions, Ground, Trees and Vertical Features 49
Region Boundary Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Chapter 6: Geometry Reconstruction 52
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Optimization-driven Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Linear and Non-Linear Surface Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Linear Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Surface Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Boundary Simplification & Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3D Model Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Experimental Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Chapter 7: Texture Composition 72
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Texture Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Chapter 8: HAL-10K: An Integrated System 83
System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Texturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Chapter 9: Conclusion 90
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Bibliography 92
Appendix 97
vii
List of Tables
1 The algorithm for the polygonal primitive. . . . . . . . . . . . . . . . . . . . . . . . . 41
2 Mesh vs Model comparison. Size refers to the file size of an ASCII OBJ file format
measured in Mb. The reduction % for each geometric feature is shown in red. . . . . 60
3 Scalability for semi-automatic primitive-based method. Processing time required for
the reconstruction of average-sized building models. . . . . . . . . . . . . . . . . . . 70
4 Scalability for automatic probabilistic method. Processing time required for the re-
construction of U.S. cities. Benchmark results were generated on an AMD Athlon
5200+ machine with 6Gb of RAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5 Qualitative and quantitative model evaluation. . . . . . . . . . . . . . . . . . . . . . 71
viii
List of Figures
1 Reconstructed buildings from current modern-GIS. . . . . . . . . . . . . . . . . . . . 2
2 Passive sensor(Landsat). The earth reflects bundles of light emitted by the sun which
are then captured by the passive sensor on the Landsat satellite. . . . . . . . . . . . 6
3 Active sensors. In 3(a) the scanner is attached to an airborne device(airplane) and
emits rays to the ground in a scanline fashion. In 3(b) the outgoing red-rays emitted
by the scanner(traditionally 360
◦
on the horizontal and up to 180
◦
on the vertical
axis) are reflected by the building structures(blue-rays). . . . . . . . . . . . . . . . . 7
4 Reported results from [DP] using ground-based LiDAR. . . . . . . . . . . . . . . . . 10
5 Interactive system of [PGD] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6 Results from existing work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7 Augmented Virtual Environment from [NYH
+
03]. Integration of 5 real-time video
streams. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8 System overview. The two modeling techniques presented are enclosed in the dashed
rectangles. Primitive-based(dark color), Probabilistic(bright color). . . . . . . . . . 18
9 The software HAL-10K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10 Problems addressed and achieved results. . . . . . . . . . . . . . . . . . . . . . . . . 20
11 Module 1: Preprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
12 Resampling of the 3D pointcloud data into a regular grid and hole filling of the 2D
XYZ map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
13 Triangulated mesh generated from the resampled points. . . . . . . . . . . . . . . . . 23
14 Refinement process overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15 The set of smooth normal labels L is computed from a half-sphere of a user-defined
radius i.e. the smooth normals shown in (c). The number of smooth normals in this
case is 492. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
16 A close-up comparison between the original and filtered data. The discontinuities are
preserved. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
17 The filtering is performed in two steps: normal-based optimization using graph-cuts
forsmoothingthenormals,andpoint-basedgradient-descentoptimizationforsmooth-
ing the points using the smooth normals. . . . . . . . . . . . . . . . . . . . . . . . . 28
18 Results for different smoothness weights λ and different sizeskLk of the normal label
set L. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
ix
19 Module 2: Building extraction. Consists of a primitive-based method for the semi-
automatic extraction of linear and non-linear roof types, and a probabilistic method
for the automatic extraction of linear roof types. . . . . . . . . . . . . . . . . . . . . 31
20 Semi-automatic Primitive-based Building Extraction. . . . . . . . . . . . . . . . . . . 32
21 Common linear roof types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
22 Parameterization of basic building block I. O marks the local origin. α,β are the
parameters used to parameterize the X and Y direction, respectively. The red lines
indicate where the parameters α,β are equal to zero respectively. . . . . . . . . . . . 33
23 The different roof types generated by varying the parameters α,β(top view). The
common roof types of Figure 21 can be identified using the parameterization of the I
building block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
24 A parameterized primitive for the identification of all L-shaped roof type variants.
O
1
,O
2
are the local origins for the two subdivided rectangles. The red lines indicate
where the parameters α,β,γ,δ are equal to zero respectively. The shaded area shows
overlap between the two parameterized rectangles. The points P
3
,P
6
are parameter-
ized with α,β as well as γ,δ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
25 Example classification of points using a gaussian mixture model consisting of 3 com-
ponents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
26 Automatic roof type identification using the parameterized geometric primitive for
the building block I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
27 Salt-box roofs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
28 Shed roofs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
29 Automatic roof type identification using the I-shaped parameterized geometric prim-
itive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
30 (a) Singular mode polygonal primitive. (b) Complex mode polygonal primitive . . . 42
31 Example of a complex mode polygonal primitive consisting of 9 linear surfaces. . . . 42
32 Ellipsoidal primitive. The parameters α and β are determined from the user defined
control points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
33 Ellipsoidal primitive. The parameters α and β are determined from the user defined
control points in (a). The orientation and type is automatically determined by the
optimization (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
34 Non-linear building structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
35 Automatic Building Extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
36 Normal computation. The normal vector(blue) for each point P
i
is computed as the
normalized sum of the 8 cross products of the vectors |~ a×
~
b|,|
~
b×~ c|,|~ c×
~
d|, etc, as
given by equation 27. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
37 An example of a depth distribution G
d
. All points P
i
whose depth probability
Pr
d
(d
P
i
) falls inside the gray area defined by Pr(μ
d
±κσ
d
) are accepted as belonging
to the current region(iff the same is also true for the normal distribution G
~ n
). The
orange line represents the degenerate case where the variance σ
d
=0. . . . . . . . . 48
x
38 The red line represents the variance σ
G
computed from the data and the blue line
representsthefunction e
1
p
. Thevariancefunction σ
0
(p)ensuresthatthevarianceand
covariance matrix of the depth and normal distributions respectively, will never equal
zero, and is represented by the gray overlaid line. . . . . . . . . . . . . . . . . . . . . 49
39 Color-coded segmentation results. All points contained in the same disjoint, contigu-
ous region are denoted by the same color. Each map size is 1Kx1K. . . . . . . . . . 50
40 Boundaries extracted from the segmented regions for a complex building. Boundary
pointsaredisplayedinblueandthelinesconnectingtwoneighboringboundarypoints
are displayed in yellow.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
41 Module3: GeometryReconstruction. Includestwomethodscorrespondingtothetwo
building extraction methods presented in Chapter : Optimization-driven and Linear
Reconstruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
42 ReconstructedgeometryresultingfromtheautomaticidentificationusingtheI-shaped
parameterized geometric primitive. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
43 Buildings of various complexities extracted using the polygonal primitive. . . . . . . 55
44 Original LiDAR data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
45 Reconstructed models of USC campus overlaid on original data. . . . . . . . . . . . . 56
46 Geometry Reconstruction module. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
47 The extracted roof boundaries after the refinement process using GMMs. The blue
pointsindicate a boundary pointand the yellowlines are edgesconnecting those points. 58
48 Boundary grouping into elements and point classification using GMMs. The colors
indicate the component of the GMM that maximizes the probability of the feature
descriptor F
P
j
of a point. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
49 Mesh vs Model comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
50 Complete3DreconstructionofdowntownBaltimoreandsurroundingareas. Thearea
covered is about 16km
2
(i.e. input data size 10Kx4K). . . . . . . . . . . . . . . . . . 62
51 AframefromasimulationofacataclysmicdisasterscenariotakingplaceinBaltimore
using our lightweight, standalone 3D models. . . . . . . . . . . . . . . . . . . . . . . 63
52 Close up of downtown Baltimore and surrounding areas. . . . . . . . . . . . . . . . . 64
53 Close-up of downtown Denver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
54 Close-up of downtown Oakland. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
55 Downtown Atlanta(north). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
56 Downtown Atlanta(south). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
57 (a) Evaluation of correctness and accuracy. (b) Geo-referencing. Satellite imagery is
registered to the reconstructed polygonal models. . . . . . . . . . . . . . . . . . . . . 69
58 Module 4: Texture Composition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
59 Image-based object subdivision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
60 Texture composition result: 3D models and the composite texture atlas. The green
color in (a) indicates non-visible faces for which no information is available. . . . . . 77
61 A close-up of the textured 3D model for downtown Baltimore. . . . . . . . . . . . . . 78
xi
62 A suburban area textured with satellite imagery. In this case, the amount of overlap
between the images is minimal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
63 A textured 3D model for a university campus. The green points indicate the inter-
actively marked correspondences between the geometry and the imagery, which were
used for the recovery of the camera pose. . . . . . . . . . . . . . . . . . . . . . . . . 79
64 Textured virtual environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
65 A large-scale virtual environment textured using satellite images. . . . . . . . . . . . 81
66 The automatically reconstructed model for downtown Oakland textured using 30
oblique aerial images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
67 System overview of HAL-10K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
68 The graphical user interface of HAL-10K. . . . . . . . . . . . . . . . . . . . . . . . . 84
69 The reconstructed models for an area in Atlanta. . . . . . . . . . . . . . . . . . . . . 85
70 The registration is performed by defining a set of correspondence between the 3D
object features and the 2D image features. . . . . . . . . . . . . . . . . . . . . . . . . 86
71 Texturing of the USC campus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
72 Integration of reconstructed buildings with 3D road models. . . . . . . . . . . . . . . 88
73 USC campus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
xii
Abstract
In recent years there has been an increasing demand for applications which employ miniature repre-
sentations of the real world to recreate realistic and immersive virtual environments. Many applica-
tionsrangingfromcomputergraphicsandvirtualreality, toGeographicalInformationSystemshave
already successfully used real world representations derived from the combination of multi-sensory
data captured from aerial or satellite imagery and LiDAR(Light Detection and Ranging) scanners.
However, despite their widespread and successfull application, the creation of such realistic 3D con-
tent remains a complex, time-consuming, expensive and labor-intensive task. In fact, the creation
of models is still widely viewed as a specialized art, requiring personnel with extensive training and
experience to produce useful models.
In this thesis, we focus on historically-difficult problems in creating large-scale (city size) scene
modelsfromsensordata,includingrapidextractionandmodelingofgeometrymodels,re-production
of high-quality scene textures, and fusion and completion of the geometry and texture data to
produce photorealistic 3D scene models. We address the current problems and limitations of state
of the art techniques and present our solutions, including a fully automatic technique for extraction
ofpolygonal3DmodelsfromLiDARdata,andaflexibletextureblendingtechniqueforgenerationof
photorealistictexturesfrommultipleopticalsensorresources. Theresultisaunified(multi-sensory),
comprehensive(structure and appearance) and immersive representation of large-scale areas of the
real world.
Inthefirstpartofthisthesis,weaddresstheproblemofrapidlycreatingrealisticgeometryrepresen-
tations of the real world entirely from remote sensory data captured by an airborne LiDAR scanner,
and present two technologies which share the same framework. Firstly, we present a primitive-based
technique for the reconstruction of buildings with linear and non-linear roof types using a minimal
setofthreeprimitives. Weleveragethesymmetryconstraintsfoundinman-madestructuresandin-
xiii
troduce an extendible parameterization of geometric primitives for the automatic identification and
reconstruction of buildings with common linear roof types. The parameterization reformulates the
reconstructionasanon-linearoptimizationproblemwithreduceddimensionality,thereforeconsider-
ably reduces the computational time required for the reconstruction of the models. Additionally, we
introduce a linear and non-linear primitive for the reconstruction of buildings with complex linear
and non-linear roof types such as churches, domes, stadiums, etc.
The primitive-based technique provides a robust and efficient way for reconstructing geometry for
any kind of building containing linear or non-linear surfaces, but requires some user interaction.
Secondly, we present a complete, automatic probabilistic modeling technique for the reconstruction
ofbuildingswithlinearrooftypes. Weintroducearobustclusteringmethodbasedontheanalysisof
the geometric properties of the data, which makes no particular assumptions about the input data,
thus having no data dependencies. The boundaries extracted automatically from the clustered data
are then refined and used to reconstruct the geometry for the scene.
Inthesecondpartofthisthesis, wepresentatexturingpipelineforthecompositionofphotorealistic
textures from multiple imagery data. Information captured by multiple optical sensors is combined
togethertocreateanamalgamatedtextureatlasforthescenemodels. Thus, byintegratingmultiple
resources, missing or occluded information can be successfully recovered.
Finally, we have integrated the developed techniques to produce a complete modeling system HAL-
10K,and extensivelytested thesystemwithseveralcity-sizedatasets includingUSC campus, down-
town Baltimore, downtown Denver, the city of Atlanta, etc. We present and evaluate our experi-
mental results.
1
Chapter 1
Introduction
The rapid and efficient creation of virtual environments has become a crucial part of a wide range
of applications ranging from computer graphics and virtual reality to Geographical Information
Systems(GIS) applications. In particular, civil and defense applications employ such technologies
for the simulation of real world operations areas. In such cases the models of urban buildings are
of significant value since they facilitate planning, response, and real-time situational awareness in
highly-occludedurbansettings. Thepersonnelthatsimulate,plan,monitor,andexecuteresponsesto
naturalorman-madeeventscangaininsightandmakebetterdecisionsiftheyhaveacomprehensive
view of the structures and activity occurring at an operational scene. The models are essential
components of such a view, helping people comprehend spatial and temporal relationships. In
addition, a photorealistic appearance is essential for the enhancement of the visual richness of the
models and the immersive experience of the users.
While models are important assets, the creation of photorealistic large-scale models remains at best
a difficult, time-consuming manual task. The science for rapidly sensing and modeling wide-area
urban sites and events, in particular, pose difficult or unsolved problems. Over the years, a wealth
of research, employing a variety of sensing and modeling technologies, has been conducted to deal
with the complex modeling problem. Different types of techniques, ranging from computer vision,
computer graphics, photo-grammetry, and remote sensing, have been proposed and developed so
far, each has unique strengths and weaknesses, and each performs well for a particular dataset but
may fail under another.
Similarly, the generation of high-resolution photorealistic textures has been extensively investigated
and many algorithms have been already proposed. However, the complexity of these algorithms in-
2
creasesconsiderablywhendealingwithlarge-scalevirtualenvironments. Inaddition,thesetexturing
techniques for large-scale environments are limited to using a single image per building which makes
itimpossibletorecovertexturesformissingoroccludedareasandrequiresevenmoretime-consuming
manual work. These problems impose a serious limitation in achieving a realistic appearance for
the models and in extent the immersive experience of the users is diminished. In addition, visual
realism is a significant component to the success of the application and achieving such a realistic
look-and-feel can help reduce the gap between the physical reality and virtual reality.
(a) Buildings have a variety of different roof types, i.e.
flat, sloped, gabled, etc. [Goo].
(b) The reconstructed buildings in 1(a). All roofs ap-
pear flat and slanted. Models do not contain color
information.
(c) Buildings from [Mic] with textures from aerial im-
ages. Some roofs are slanted and texture quality is
very low.
(d) Low texture resolution from [Mic]
Figure 1: Reconstructed buildings from current modern-GIS.
Examples of state of the art applications employing virtual environments are the GIS systems de-
veloped by [Goo, Mic]. These GIS systems integrate information from multi-sensory input such
as GIS databases, satellite imagery, aerial imagery and LiDAR, into a three dimensional space to
3
achieve an accurate representation of the real world. Figure 1(a) shows a 3D view of a satellite
image of a Los Angeles area taken from [Goo]. Similarly, Figure 1(b) shows the same area with the
reconstructed buildings being overlaid on the satellite image to provide a more realistic view of the
area. Although this 3D representation conveys more information than a 2D representation, it still
lacks visual realism, primarily due to the generic look of the reconstructed models. Moreover, the
reconstruction of the 3D models is at best semi-automatic which makes the creation of large-scale
3D environment a time-consuming and expensive task. Similarly from [Mic], Figure 1(c) and Figure
1(d) show more detailed buildings reconstructed semi-automatically from aerial and satellite im-
agery, with textures from low-resolutionaerial images. However, the texture generation is limited to
using single images for entire areas which fails to capture details of the buildings’ facades. Another
significant disadvantage is that texture information of occluded areas cannot be recovered using a
single image.
In this thesis, we focus on historically-difficult problems in creating large-scale (city size) scene
modelsfromsensordata,includingrapidextractionandmodelingofgeometrymodels,re-production
of high-quality scene textures, and fusion and completion of the geometry and texture data to
produce photorealistic 3D scene models. We address the current problems and limitations of state
of the art techniques and present solutions for the rapid creation of photorealistic large-scale urban
city models.
Inthefirstpartofthisthesis,weaddresstheproblemofrapidlycreatingrealisticgeometryrepresen-
tations of the real world entirely from remote sensory data captured by an airborne LiDAR scanner,
and present two technologies which share the same framework. Firstly, we present a primitive-based
technique for the reconstruction of buildings with linear and non-linear roof types using a minimal
set of three primitives. We leverage the symmetry constraints found in man-made structures and
introduce an extendable parameterization of geometric primitives for the automatic identification
and reconstruction of the most commonly occurring buildings with linear roof types. The param-
eterization reformulates the reconstruction as a non-linear optimization problem with reduced di-
mensionality, therefore considerably reduces the computational time required for the reconstruction
of the models. Additionally, we introduce a linear and non-linear primitive for the reconstruction of
buildings with complex linear and non-linear roof types such as churches, domes, stadiums, etc.
The primitive-based technique provides a robust and efficient way for reconstructing geometry for
any kind of building containing linear or non-linear surfaces, but requires some user interaction.
4
Secondly, we present a complete, automatic probabilistic modeling technique for the reconstruction
ofbuildingswithonlylinearrooftypes. Weintroducearobustprobabilisticclusteringmethodbased
ontheanalysisofthegeometricpropertiesofthedata,whichmakesnoparticularassumptionsabout
theinputdata, thushavingnodatadependencies. Theboundariesextractedautomaticallyfromthe
clustereddataarethenrefinedusingagaussian-basedrefinementprocessandareusedtoreconstruct
the geometry for the scene.
The primitive-based and probabilistic modeling techniques can be used alternatively to generate
high-fidelity polygonal 3D model representations of urban cities. In the following chapters, the two
techniques which share the same framework are described in terms of the same pipeline as shown in
the system overview in Figure 8.
In the second part of this thesis, we address the problem of creating realistic appearance for the
reconstructed 3D models and we present a texturing pipeline for the composition of photorealistic
textures from multiple imagery data. Information captured by multiple optical sensors is combined
togethertocreateanamalgamatedtextureatlasforthescenemodels. Thus, byintegratingmultiple
resources, missing or occluded information can be successfully recovered.
Finally, we present a complete modeling system HAL-10K which integrates the developed tech-
niques. We have extensively tested the system with several city-size datasets including USC cam-
pus, downtown Baltimore, downtown Denver, the city of Atlanta, etc, and present and evaluate our
experimental results.
5
Chapter 2
Background and related work
Aplethoraofworkhasbeenproposedforsolvingthecomplexproblemofcreatingrealistic3Dcontent
for applications in computer graphics, computer games, feature films, virtual reality, GIS, etc. The
majority of the existing work shares the same processing pipeline, however several distinctions exist
between the different processing components. Below is an overview of the state-of-the-art in this
area.
3D Modeling
Overtheyears,alotofworkhasbeenconductedintheareaof3Dscenemodelingandmanydifferent
techniques have being proposed for the solution of this complex problem. These techniques can be
better categorized in terms of the types of sensors used to capture the input datasets.
Sensors
Passive sensors measure levels of energy which are naturally emitted, reflected, or transmitted
by the object. Quite often the source of radiative energy is the sun(Figure 2) or a man-made light
source. Examples of passive sensors are:
• Photographic camera. A camera can produce optical images by recording the light waves
which are scattered toward the camera lens very similar to the way the human eye works.
• Thematic Mapper. This is the primary sensor on the Landsat satellites and has seven bands
each being sensitive to a different range of electromagnetic radiation. The bands are designed
6
to detect differences in plant production, soil moisture, and mineral content in soils, providing
a useful tool in assessing and monitoring land use practices.
Figure 2: Passive sensor(Landsat). The earth reflects bundles of light emitted by the sun which are
then captured by the passive sensor on the Landsat satellite.
A significant disadvantage of passive sensors is that they can only be used in areas where naturally
occurring energy is available. This can only take place during the day-time when the sun is illumi-
nating the object of interest and thus the reflected energy can then be recorded. However, naturally
emitted energy such as thermal infrared can still be detected day or night as long as the amount of
energy is large enough to be recorded by the sensor.
Active sensors on the other hand, provide their own energy source for the illumination of the
object. They emit electromagnetic waves which travel to the object and are then reflected back
toward the sensor. Examples of active sensors are:
• Doppler radar. A ground-based system that emits radio energy in a radial pattern as the
transmitter rotates. The sensor measures the reflection, or echoes, of this energy which can be
used to calculate further information such as the speed of an object.
• Synthetic aperture radar (SAR) and Interferometric SAR(IFSAR). Airborne imaging systems
which emit microwave signals and receive the reflected signals from the terrain. Elevation and
reflectance properties can then be derived by analyzing the captured data.
7
• Ground-based and Airborne Light Detection and Ranging(LiDAR). LiDAR scanners emit a
laser pulse and measure the time it takes to travel to the object, be reflected and return back.
Based on the time of flight, a measurement of the distance can be computed.
(a) Airborne-LiDAR scanning (USGS). (b) Ground-based LiDAR scanning.
Figure 3: Active sensors. In 3(a) the scanner is attached to an airborne device(airplane) and
emits rays to the ground in a scanline fashion. In 3(b) the outgoing red-rays emitted by the scan-
ner(traditionally 360
◦
on the horizontal and up to 180
◦
on the vertical axis) are reflected by the
building structures(blue-rays).
A key advantage of the active sensors is their ability to obtain measurements anytime, regardless
of the time of day. Active sensors can be used for examining wavelengths that are not sufficiently
provided by the sun, such as microwaves, or to better control the way a target is illuminated.
However, active sensors require the generation of a fairly large amount of energy to adequately
illuminate targets.
Techniques
A good survey on large-scale modeling techniques can be found in [HYN03]. In [FZ03b] the au-
thors present a method for reconstructing large-scale 3D city models by merging ground-based and
airborne-based LiDAR data. The elevation measurements are used to recover the geometry of the
roofs. Facade details are then incorporated by the high resolution capture of a ground based system
which has the advantage of also capturing texture information which aids in the creation of a real-
istic appearance of the model. However, at the cost of having detailed facades they neglect to deal
with the complexities and wide variations of the buildings’ roof types. The authors [FSZ04] later
extended their method to incorporate texture information from oblique aerial images. Although
8
they combine multiple aerial images to determine the model’s textures, their method is restricted to
traditional texture mapping rather than combining all available texture information to generate a
composite texture i.e. blending. Therefore, a significant color difference between images will cause
visible and non-smooth transitions between neighboring polygons of different texture images.
In [VKH06] the authors employ an automatic approach for the creation of 3D geometric models
of the buildings and terrain. A set of simple prismatic models are combined together to model
different classes of buildings. Complex roof shapes are handled by automatically decomposing them
into multiple simpler geometric primitives. A roof topology graph is used to identify the different
parametricshapesthatcanbecombinedtoconstructcomplexroofshapes. However,theassumption
that all constructed graphs will be small is not always true and this can complicate matters since
graph/sub-graph matching is an NP-complete problem.
In a similar approach the authors in [MSS
+
08] present a building segmentation system for densely
builtareas. Theyareabletosegmentanddelineatebuildingsaswellasstructuresontopofbuildings
without requiring scanning for the sides of buildings. The results look impressive however the use
of data dependent parameters is abundant throughout their system.
Pollofeys et al [GFM
+
07], propose a multi-view plane-sweep-based stereo algorithm which runs in
real-timeontheGPUforthereconstructionofbuildingsfromvideocaptures. Although,theirresults
seemverypromisingthegeometryofthereconstructed3Dmodelsisrelativelycomplexandbecomes
a bottleneck when dealing with large-scale areas. In addition, their approach uses video captured
from ground, thus providing information only for the building facades.
In [YHNF03a] You et al, present an interactive primitive-based modeling system for the reconstruc-
tionofbuildingmodelsfromLiDARdata. Usingtheuserinput,theirsystemautomaticallysegments
the building boundaries, performs model refinement, and assembles the complete building model.
However, user input is required for the detection as well as the identification of buildings and their
roof types.
In [RC02a] they develop an interactive system for reconstructing geometry using non sequential
viewsfromuncalibratedcameras. Thecalculationofall3Dpointsandcamerapositionsisperformed
simultaneouslyasasolutionofasetoflinearequationsbyexploitingthestrongconstraintsobtained
bymodelingamapasasingleaffineview. However,duetotheconsiderableuserinteractionrequired
by the system, its application to large-scale areas is very limited.
9
In [HYN03] they exploit the strong rigidity constraints of parallelism and orthogonality present
in indoor and outdoor architectural scenes. Using these constraints they calibrate the cameras and
recovertheprojectionmatricesforeachviewpoint. [SA02]and[LCC]useelevationinformationfrom
LIDAR as well as optical imagery. Region-based segmentation and knowledge-based classification
are then integrated to detect building regions.
[MV99] develops a system for the extraction of buildings by determining the parameters of a stan-
dard gabled roof using invariant moment analysis. [IMWA02] presents a method for extracting flat
roof top buildings using the Minimum-Description Length(MDL) principle. [YHNF03b] developed
an interactive system in which the user can guide the segmentation and reconstruction of the build-
ings. [RB] separates the terrain from points on buildings and other object classes, by performing a
hierarchic application of robust interpolation using a skew error distribution function.
In [PGV
+
04] the proposed system can deal with uncalibrated image sequences acquired with a
hand-held camera. Based on tracked or matched features the relations between multiple views are
computed. Fromthisboththestructureofthesceneandthemotionofthecameraareretrievedand
the ambiguity on the reconstruction is restricted from projective to metric through self-calibration.
In [NP02] Nevatia et al, propose a user-assisted system for the extraction of 3D polygonal models
of buildings from aerial images. Low level image features are initially used to build high level
descriptions of the objects. Using a hypothesize and verify paradigm they are able to extract
impressive models from a small set of aerial images. The authors later extended their work in
[LJN02] to automatically estimate camera pose parameters from two or three vanishing points and
three 3D to 2D correspondences.
Rottensteiner et al [RTCK05] describe a method for roof plane delineation that tries to eliminate
user-defined thresholds and relies on the framework of uncertain projective geometry and on robust
estimation. In addition, the authors describe a new algorithm for the detection of step edges for
delineating roof polygons, taking into account domain specific information in order to eliminate
disturbances caused by trees adjacent to buildings.
Using only high resolution satellite images, [LW05] proposed a probabilistic modeling approach, for
the automated building extraction. In a similar context, [IS] describes a knowledge-based approach
to automatic 3D reconstruction of buildings from aerial imagery by combining a 2D GIS map and
knowledge about the imaging geometry and acquisition parameters. [RC02b] uses map constraints
10
to reconstruct architectural models of buildings from many views. [HKN00] uses IFSAR data to
provide cues for more accurate detection and reconstruction using the electro-optical images.
In [DP] a ground-based LiDAR scanner is used to record a rather complex ancient structure of
significant cultural heritage importance. Multiple scans shown in Figure 4(a), were aligned and
mergedtogetherusingasemi-automaticprocessandacomplete3Dmodelwascreatedoftheoutdoor
structure shown in Figure 4(b). The reconstructed model is shown to contain high-level of details
however the complexity of the geometry (90 million polygons for one building) limits this approach
to the very accurate reconstruction of single buildings rather than large-scale.
(a) Three ground-based LiDAR scans aligned and merged. (b) A3Dmodelofacomplexoutdoorscenereconstructed
using 51 ground-based LiDAR scans.
Figure 4: Reported results from [DP] using ground-based LiDAR.
Another ground-based approach is presented in [MAW
+
07] where a two-stage process is employed
in order to quickly fuse multiple stereo depth maps. The results are impressive especially for a
real-time system, however the resulting geometry is too complex and requires further processing in
order to make it usable in another application. A similar ground-based approach is presented in
[ZPB07]wheremultiplerangeimagesareintegratedbyminimizinganenergyfunctionalconsistingof
a total variation regularization force and an L
1
data fidelity term. Similarly, the resulting geometry,
although impressive, is too “heavy” for most applications.
In a different approach, [DTM96] proposed an interactive system which can reconstruct buildings
using ground imagery and a minimal set of geometric primitives. [PGD] later extended this system
to incorporate pointcloud and mesh support as part of the reconstruction as shown in Figure 5.
Although,high-fidelitymodelscanbegenerated,therequireduserinteractionincreasesconsiderably
for large-scale areas. Moreover, the user interaction depends on the desired level of detail of the
reconstructed models which may vary considerably according to the application.
11
Figure 5: Interactive system of [PGD]
Texture Composition
Textures enhance the visual richness and allow for a realistic appearance of models. A variety of
different techniques have already been proposed for the composition of such realistic textures and
vary according to the application and type of textures they require, i.e. static or dynamic.
Techniques
The method introduced in [Deb96] and [DYB98] use a small set of images to interactively recon-
struct a 3D model of the scene using a set of parameterized primitives. A view-dependent texture
mapping method is then employed for the computation of the texture maps of the model(Figure
6(a)). Although this technique is sufficient to create realistic renderings of the scene from novel
view-points its computation is still too expensive for real-time applications, like games or virtual
reality and the novel viewpoints are constrained to be close to the initial camera positions. In ad-
dition, view-dependent texture mapping works seamlessly in cases where the images are taken at
regularly sampled intervals which is not generally true in the context of large-scale areas since some
images may come from satellites and aerial images thus having a wide baseline between them.
A different approach is proposed in [ZWT
+
05] to seamlessly map a patchwork of texture images
onto an arbitrary 3D model. By specifying a set of correspondences between the model and any
number of texture images their system can create a texture atlas.
In [MD03] they order geometry into optimized visibility layers for each photograph. The layers are
subsequently used to create standard 2D image-editing layers which become the input to a layered
12
(a) A novel viewpoint render using the Fa¸ cade system
[Deb96, DYB98].
(b) A novel render from [DTG
+
04].
Figure 6: Results from existing work.
projective texture rendering algorithm. However, this approach chooses the best image for each
surface rather than combining the contributions of all the images to minimize information loss.
In[DTG
+
04]reflectancepropertiesofthesceneweremeasuredandlightingconditionswererecorded
for each image taken. An inverse global illumination technique was then used to determine the true
colors of the model’s surfaces which could then be used to relight the scene(Figure 6(b)). The
generated textures are photorealistic, however for large-scale areas it requires considerable amount
of manual work for capturing and processing the data.
A method for creating renders from novel view-points without the use of geometric information
is presented in [LH96], where densely regularly sampled images are blended together. The image
acquisition process is simple and reproducible for relatively small objects, however the complexity
of the capturing process greatly increases for large objects and especially in cases where the object
exists in an outdoor environment where the lighting conditions cannot be controlled as in a lab
environment. In addition, the proposed method does not involve the use or generation of geometric
information thus, it limits its use to applications for visualization purposes only.
The authors in [BAF04] estimate a set of blending transformations that minimizes the overall color
discrepancy in overlapping regions in order to deal with the unnatural color texture fusion due to
variations in lighting and camera settings.
The authors in [RL02] propose a method for computing the blending weights based on a local and
global component. This results in smooth transitions in the target image in the presence of depth
discontinuities. A slightly different approach for texture generation and extraction is proposed in
13
[Tur01]. Given a texture sample in the form of an image, they create a similar texture over an
irregular mesh hierarchy that has been placed on a given surface, however this approach cannot
capture the ”actual” appearance of the models.
Another line of research uses video resources for the dynamic texturing of large-scale environments.
Neumann et al [NYH
+
03], incorporate the dynamic input of multiple video cameras and generate
projective textures for the models as shown in Figure 7.
Figure7: AugmentedVirtualEnvironmentfrom[NYH
+
03]. Integrationof5real-timevideostreams.
14
Chapter 3
System Overview and
Contributions
Rapid Creation of Photorealistic Large-Scale City Models
In this thesis, we focus on the difficult problems in creating large-scale (city size) scene models
from sensor data, including rapid extraction and modeling of geometry models, re-production of
high-quality scene textures, and fusion and completion of the geometry and texture data to produce
photorealistic 3D scene models. We address the current problems and limitations of state of the
art techniques and present solutions for the rapid creation of photorealistic large-scale urban city
models.
The goal of creating a complete modeling system is achieved in four steps, as is also shown in the
system overview Figure 8:
• Preprocessing
• Building Extraction
• Geometry Reconstruction
• Texture Composition
15
Preprocessing
Theraw3Dpointclouddataisunstructuredandmaycontainnoiseandmissinginformation. During
the preprocessing, the raw data is resampled into memory manageable components which are then
converted into 2D XYZ maps, the internal data representation used by our system. Additionally,
holefillingandrefinementisperformedusingagraph-cutandgradient-descentoptimizationinorder
to reduce the inconsistencies and artifacts caused by noise during the acquisition process.
Building Extraction
The preprocessed data contains a complete representation of the scene including buildings, ground,
trees, etc, hence, further processing is required to separate the buildings. During the building
extraction, the roof boundaries of the buildings are extracted from the resampled 2D XYZ maps
resulting from the preprocessing. We present two technologies which share the same framework and
are described in terms of the same pipeline as shown in Figure 8.
Firstly, a primitive-based method is presented for the semi-automatic extraction of buildings with
linear and non-linear surfaces. To achieve this, three novel geometric primitives are introduced:
1. a parameterized geometric primitive for buildings with common linear roof types,
2. a polygonal primitive for buildings with complex linear roof types and,
3. an ellipsoidal primitive for buildings with non-linear roof types.
Theprimitivesareusedtoidentifyandextractthebuildings’roofboundarieswhicharethenrefined
through a coordinate-wise fitting optimization. A significant advantage of this approach is that a
single parameterized primitive can handle multiple roof types and therefore reducing the number of
primitives required to reconstruct a large-scale urban area.
The primitive-based technique provides a robust and efficient way for reconstructing geometry for
any kind of building containing linear or non-linear surfaces, but requires some user interaction.
Secondly, a probabilistic method is presented for the automatic extraction of buildings with only
linear surfaces. A robust clustering technique based on the probabilistic analysis of the geometric
properties of the data, is used to separate the data into disjoint regions which are then used to
extracttheroofboundaries. Akeyadvantageoftheclusteringmethodisthatitmakesnoparticular
assumptions about the input data thus has no data dependencies.
16
The primitive-based and probabilistic modeling techniques can be used alternatively to generate
high-fidelity polygonal 3D model representations of urban cities. In the following chapters, the two
techniques are described in terms of the same pipeline as shown in the system overview in Figure 8.
Geometry Reconstruction
During the Geometry Reconstruction, the geometry for the 3D models is generated from the previ-
ously extracted roof boundaries. Two separate methods are presented.
Firstly, we present an optimization-driven surface reconstruction. In the primitive-based building
extraction method described above, the refined boundaries are determined by the coordinate-wise
fitting optimization. A linear or non-linear surface fitting is then performed depending on the type
oftheprimitivecorrespondingtoeachbuilding. Thegeometryforthe3Dmodelsisfinallygenerated
by extruding the reconstructed roof surfaces to the ground.
Additionally,wepresentalinear surfacesreconstruction. Aspreviouslymentioned,intheprobabilis-
tic building extraction method, the roof boundaries are automatically extracted from the clustered
regions. A linear surface fitting is then performed on the clustered regions to produce the roof
surfaces. Finally, asimplificationandrefinementprocesssmoothstheextractedroofboundariesand
3D models are extruded from the fitted surfaces to the ground.
Texture Composition
The3Dmodelsofthesceneandimagerycapturingtheirreal-worldappearancearecombinedtogether
to create composite texture atlases for the realistic appearance of the 3D models, in two steps:
1. ImageRegistration. Asetofcorrespondencesareinteractivelyspecifiedbetweenthe3Dgeom-
etry representing the scene and the imagery capturing the appearance of the same geometry.
A non-linear optimization is then employed to recover the camera poses.
2. TextureRendering. Thegeometryissubdividedintosmallerprimitives. Anon-linearblending
function which ensures a smooth and seamless transition between the different images, is then
used to compute the composite texture maps. Finally, the composite textures are packed into
compact texture atlases.
17
Systems Integration
We have integrated the developed techniques to produce HAL-10K, a complete modeling system
whichisshowninFigure9,andextensivelytestedthesystemwithseveralcity-sizedatasetsincluding
Baltimore downtown, Denver downtown, and city of Atlanta, etc. This work represents a significant
success to produce a consistent work flow that allows non-expert and non-artists to rapidly fuse
aerial LiDAR and imagery to construct large-scale scene models with highly visual realism.
18
Figure 8: System overview. The two modeling techniques presented are enclosed in the dashed
rectangles. Primitive-based(dark color), Probabilistic(bright color).
19
Figure 9: The software HAL-10K.
20
Contributions
Figure 10: Problems addressed and achieved results.
Figure10showstheproblemsaddressedinthisthesis, namelytheproblemof3Dcitymodelingfrom
remotesensordataandtheproblemofgeneratingrealistictexturesforthegenerated3Dmodels,and
the achieved results. The result is a set of standalone, water-tight, light-weight, realisticly-textured,
polygonal 3D models. The most significant contributions of the developed work are:
1. A primitive-based modeling technique. ([PYN08],[PY09a])
• A minimal set of flexible, interactive primitives for the reconstruction of complex linear
and non-linear structures.
• An extendable 2D parameterization of geometric primitives for the automatic identifica-
tion and reconstruction of the most commonly occurring linear buildings.
2. A probabilistic modeling technique. ([PYN09])
• A fully automatic technique for extraction of polygonal 3D models from LiDAR data.
• An automatic probabilistic clustering technique based on the analysis of the geometric
properties of the data with no data dependencies.
3. Arenderingpipelineforthecompositionofrealistictexturesfrommulti-sensoryinput. ([PYN07],
[PY09b])
4. AprototypesystemnamedHAL-10Kwhichimplementstheproposedtechniquesandintegrates
all the above components into a unified framework. ([PYN09], [PYN08], [PY09a], [PYN07],
[PY09b])
• A complete, efficient and robust modeling and rendering pipeline for the rapid creation
of photorealistic large-scale urban city models.
21
Chapter 4
Preprocessing
Overview
Theraw3Dpointclouddataisunstructuredandmaycontainnoiseandmissinginformation. During
the preprocessing, the raw data is resampled into memory manageable components which are then
converted into 2D XYZ maps, the internal data representation used by our system. Additionally,
holefillingandrefinementisperformedusingagraph-cutandgradient-descentoptimizationinorder
to reduce the inconsistencies and artifacts caused by noise during the acquisition process.
This module takes as input the raw 3D pointcloud data and converts it into a set of 2D XYZ maps,
in three steps: Resampling, Hole filling and Filtering, as shown in Figure 11.
Figure 11: Module 1: Preprocessing.
22
Resampling
Typically, the raw 3D pointcloud data captured by the airborne LiDAR scanner shown in Figure
12(a), cannot be processed as a whole. For this reason, we subdivide the raw data into smaller
parts and resample each part into a 2D XYZ map. A 2D XYZ map, shown in Figure 12(b),
is an efficient data representation which considerably reduces the computational complexity and
processingtime, sinceallsubsequentoperationsareperformedin2D,ratherthanin3D.Inaddition,
fastimageprocessingtechniquessuchasholefillingandnormalcomputationcanbeeasilyperformed
as explained below.
Ideally, wewouldliketominimizetheinformationlossoftheinputdatabyusingallpoints, however
this is not practically feasible considering the space/memory limitations and the size of the data.
We overcome this trade-off between information loss and space/memory limitations by having the
user specify the size of the maps in terms of an error tolerance τ between neighboring samples in
the map. Any points in the map which may contain more than one 3D data point are replaced by
their average.
(a) 3D Pointcloud (b) 2D XYZ map. Note that the color channels R, G, B
of the map correspond to the axes X, Y, Z respectively.
Figure 12: Resampling of the 3D pointcloud data into a regular grid and hole filling of the 2D XYZ
map.
Asasecondstep,aniterativeholefillingalgorithmisperformedtorecovermissinginformationusing
local neighborhood information. Figure 12(a) shows the input raw 3D pointcloud data, and Figure
12(b) the resampled and hole-filled 2D XYZ map for the USC and surrounding area. Note that the
color channels R, G, B of the map correspond to the axes X, Y, Z respectively. Figure 13(a),13(b)
show the triangulated mesh generated from the resampled points.
23
(a) Mesh (b) Mesh(top view)
Figure 13: Triangulated mesh generated from the resampled points.
Filtering
Noise in the measurements of the original data and inconsistencies introduced by the resampling
process are removed while ensuring that important features such as discontinuities are preserved.
To achieve this, we exploit the fact that normals provide directional information which define the
surface geometry of an object and perform a normal optimization based on graph-cuts to smooth
the normals, followed by a point optimization using gradient-descent to smooth the points.
The process is summarized in Figure 14. A set of uniformly distributed normals(Figure 15(c))
computed from a half-sphere(Figure 15(a),15(b)) are used to re-label the noisy normals computed
fromtheoriginalpoints([BVZ99]). Thisresultsinsmoothingthenormalswhilepreservingimportant
features such as discontinuities [FZ03b]. A gradient-descent optimization then refines the original
points such that their computed normals are identical to the smooth re-labeled normals.
We restate the problem of smoothing the noisy normals as a problem of finding an optimal labeling
f : N
p
−→ L which assigns a normal label l∈ L to each point p∈ N, where f is piecewise smooth
and consistent with the original data, and N is the initial normal map. This labeling problem can
be efficiently solved using graph-cuts to minimize an energy function of the form,
E(f)=E
data
(f)+λ∗E
smoothness
(f) (1)
where λ is a smoothness weighting factor. Refer to Appendix for further details about graph-cuts.
24
Figure 14: Refinement process overview
The energy data term in equation 1 provides a per-pixel measure of how appropriate a label l∈ L
is for a pixel p∈N in the observed data and is given by,
E
data
(f)=
X
p∈N
D
p
(f(p)) (2)
where D
p
(f
p
) is the data function for a point p∈N and is defined as,
D
p
(f
p
)=|N
p
−N
f(p)
| (3)
where N
p
is the original normal of the point p and N
f(p)
∈L is the assigned normal label under the
labeling f(p):N
p
−→N
f(p)
. Thus, the energy data term in equation 2 becomes,
E
data
(f)=
X
p∈N
|N
p
−N
f(p)
| (4)
25
The energy smoothness term in equation 1 provides a measure of the difference between two neigh-
boring pixels p,q∈N with labels N
f(p)
,N
f(q)
∈L respectively and is given by,
E
smoothness
(f)=
X
{p,q}∈N
V
{p,q}
(N
f(p)
,N
f(q)
) (5)
where N
f(p)
,N
f(q)
∈ L are the assigned normal labels under the labeling f and V
{p,q}
is the inter-
action potential between the neighboring pixels p,q∈N. Let N
p
and N
q
be the original normals in
the observed data of thepixels p,q∈N respectively, wedefine a measureof the observed smoothness
between pixels p,q∈N as,
Δ
p,q
=|N
p
−N
q
| (6)
Similarly, we define a measure of smoothness for the global minimization. Let N
f(p)
and N
f(q)
be
the assigned normal labels under a labeling f, we define a measure of the smoothness between the
labels of neighboring pixels p,q∈N as,
˜
Δ
p,q
=|N
f(p)
−N
f(q)
| (7)
Using the smoothness measure defined for the observed data and the smoothness measure defined
for any given labeling the energy smoothness term in equation 5 becomes,
E
smoothness
(f)=
X
{p,q}∈N
(K
p,q
∗
˜
Δ
p,q
) (8)
where the K
p,q
=1+²−e
−
2−Δp,q
σ
2
|
is the Boltzmann distribution of the energy function Δ
p,q
which
gives the probability of the initial smoothness, and ² is a small constant. This ensures that two
neighboring pixels p,q ∈ N with similar normal orientations in the observed data will have small
Δp,q and thus a high probability, given by K
p,q
, that the assigned labels under the labeling f will
also have similar orientations and therefore
˜
Δ
p,q
will be small.
We define the set of smooth normal labels L to be a set of uniformly distributed normals lying on
the surface of a half-sphere. Figure 15(a) shows the point map corresponding to the 3D half-sphere
inFigure15(b). Figure15(c)showsthesetofsmoothnormallabelscomputedfromthehalf-sphere’s
point map in Figure 15(a). The radius of the half-sphere can be altered accordingly, to control the
smoothness and the number of normal labels in the set L.
26
(a) Half-sphere point map. (b) Half-sphere in 3D(top-view). (c) Smooth normal labels com-
puted from 15(a) .
Figure15: ThesetofsmoothnormallabelsLiscomputedfromahalf-sphereofauser-definedradius
i.e. the smooth normals shown in (c). The number of smooth normals in this case is 492.
The result of the normal-based graph-cut optimization is a smooth normal map. Using the initial
noisy points as initial estimates, a point-based gradient-descent optimization is employed during
whichthespatialpositionoftheinitialpointsisiterativelyadjusted,suchthatthecomputednormal
N
p
atapointp∈N matchesthesmoothlabelnormalN
f(p)
returnedbythegraph-cutoptimization.
Our experiments have shown that the point-based gradient-descent optimization converges to a
solution within the first ten iterations, primarily due to the fact that the initial points provide an
accurate initial estimate which is already close to the solution. For example, the initial normals
in Figure 17(b) are computed from the original points in Figure 17(a). A graph-cut optimization
performed using these initial normals and the smooth normal labels from Figure 15(c) results in
the re-labeled normals shown in Figure 17(c). Finally, a gradient-descent optimization iteratively
refines the original points such that the computed normal at each point matches the re-labeled
normal(Figure 17(d)). Figure 17(e) shows the difference measured as the dot product between the
initial and smooth normal map.
Figure 16 shows a close-up comparison between the original and filtered data.
As previously explained, there are two parameters which control the refinement process:
• the smoothness term weight λ in equation 1,
• thesizeofthesetofsmoothlabelsL,whichwasdefinedintermsoftheradiusofthehalf-sphere.
A closeup of a mesh produced by triangulating the refined points is shown in Figure 18. Different
number of smooth normal labels kLk (computed by varying the radius of the half-sphere) and
27
(a) Original data. (b) Filtered data.
Figure 16: A close-up comparison between the original and filtered data. The discontinuities are
preserved.
different values for the smoothness weight λ are shown. Our experiments have shown that λ=0.25
and L=[100,500] generate satisfactorily smooth and visually pleasing results.
28
(a) Initial point map. (b) Initial normal map. (c) Smooth normal map.
(d) Smooth point map. (e) Dot-product between initial and smooth
normal maps.
(f) Euclidian distance between initial and
smooth point maps(normalized).
Figure 17: The filtering is performed in two steps: normal-based optimization using graph-cuts for smoothing the normals, and point-based
gradient-descent optimization for smoothing the points using the smooth normals.
29
(a) Original data
(b) kLk=17,λ=0.1 (c) kLk=17,λ=0.25 (d) kLk=17,λ=0.5 (e) kLk=17,λ=0.75 (f) kLk=17,λ=1.0
(g) kLk=104,λ=0.1 (h) kLk=104,λ=0.25 (i) kLk=104,λ=0.5 (j) kLk=104,λ=0.75 (k) kLk=104,λ=1.0
(l) kLk=492,λ=0.1 (m) kLk=492,λ=0.25 (n) kLk=492,λ=0.5 (o) kLk=492,λ=0.75 (p) kLk=492,λ=1.0
Figure 18: Results for different smoothness weights λ and different sizeskLk of the normal label set L.
30
Chapter 5
Building Extraction
Overview
The preprocessed data contains a complete representation of the scene including buildings, ground,
trees, etc. Thus, further processing is required to extract the buildings. During the building extrac-
tion, the roof boundaries of the buildings are extracted from the resampled 2D XYZ maps resulting
from the preprocessing. We present two methods for the extraction of buildings with different roof
types.
Firstly, a primitive-based method is presented for the semi-automatic extraction of buildings. We
introduce three novel geometric primitives for linear, complex linear and non-linear roof types,
respectively. The primitives are used to identify and extract the buildings’ roof boundaries which
are then refined through a coordinate-wise fitting optimization. A significant advantage of this
approach is that a single parameterized primitive can handle multiple roof types. This considerably
reduces the number of primitives required to reconstruct a large-scale urban area.
Additionally,aprobabilistic methodispresentedfortheautomaticextractionofbuildingswithlinear
surfaces. Arobustclusteringtechniquebasedontheprobabilisticanalysisofthegeometricproperties
of the data is used to group the data into disjoint regions which are then used to extract the roof
boundaries. A key advantage of the clustering method is that it makes no particular assumptions
about the input data, thus having no data dependencies.
The Building Extraction module takes as input a 2D XYZ map and produces a set of regions with
their associated roof boundaries. The system overview is shown in Figure 19.
31
Figure 19: Module 2: Building extraction. Consists of a primitive-based method for the semi-
automaticextractionoflinearandnon-linearrooftypes,andaprobabilisticmethodfortheautomatic
extraction of linear roof types.
Primitive-based Building Extraction
An overview of the primitive-based building extraction method is shown in Figure 20. Initially, 2D
roof boundaries are marked interactively on the 2D XYZ map. Three novel geometric primitives
are developed for the refinement of the initial roof boundaries through a coordinate-wise fitting
optimization:
1. An extendable 2D parameterization of geometric primitives for the automatic identification
and extraction of the most commonly occurring linear roof types.
2. A polygonal primitive for the extraction of buildings with complex linear roof types which
reduces the dimensionality of the optimization’s search space by sharing geometry between
multiple surfaces in close proximity.
3. An ellipsoidal primitive for the extraction of buildings with non-linear roof types.
Building Detection
The building detection is performed interactively by having the operator mark a set of boundaries
on the 2D XYZ maps. The user-defined boundaries are then used for the automatic identification
32
Figure 20: Semi-automatic Primitive-based Building Extraction.
and extraction of the linear roof types in the case of the parameterized geometric primitives, and for
the semi-automatic extraction of linear and non-linear roof types in the case of the polygonal and
ellipsoidal primitives, respectively.
Common Linear Roof Types: Parameterized Geometric Primitives
The identification of building roof types is an essential part of the modeling process. In many cases,
and in particular for low-resolution data and complex buildings, this is still a very difficult task even
forahumanoperator. Weaddressthisproblemandpresentanovel,extendable2Dparameterization
of geometric primitives for the automatic identification of the roof types. We focus on the most
commonly occurring linear roof types shown in Figure 21, and present a parameterization of the
basic building block “I“. In addition, we demonstrate how the parameterization can be extended to
other basic building blocks as defined by [Sch93], such as block ”L“.
(a) Flat (b) Shed (c) Gable (d) Hip (e) Pyrami-
dal
(f) Mansard (g) Saltbox
Figure 21: Common linear roof types.
33
Parameterization of Building Block I
We leverage the symmetry constraints found in man-made structures to parameterize a geometric
primitive representing the basic building block I, using a minimal set of variables. Figure 22 shows
thegeometricstructureoftheparameterizedprimitivecorrespondingtotheI-shapedbuildingblock.
Figure 22: Parameterization of basic building block I. O marks the local origin. α,β are the pa-
rameters used to parameterize the X and Y direction, respectively. The red lines indicate where the
parameters α,β are equal to zero respectively.
We define the set of control points S
C
as the set containing the exterior points V
1
...V
4
and the
interior points P
1
...P
4
. By specifying a local coordinate system the control points are expressed as
linear functions of the two variables α,β such that,
V
1
=[−
w
2
,−
h
2
], V
2
=[
w
2
,−
h
2
] (9)
V
3
=[−
w
2
,
h
2
], V
4
=[
w
2
,
h
2
] (10)
P
1
=[−α
w
2
,−β
h
2
], P
2
=[α
w
2
,−β
h
2
] (11)
P
3
=[α
w
2
,β
h
2
], P
4
=[−α
w
2
,β
h
2
] (12)
where w is the width, and h is the height of the building block.
This parameterization ensures the enforcement of the symmetry constraints since any change on the
two parameters α,β will cause a symmetric change to all other points. In order to ensure that all
34
the internal points P
1
...P
4
lie within the area defined by the exterior points V
1
...V
4
the parameters
are bound in the range 0≤α,β≤1.
By simply varying the values of the two parameters α,β all commonly occurring roof types can be
produced by a single primitive as demonstrated in Figure 23. This is a key advantage over other
primitive-based methods since it significantly reduces the number of primitives required to model
a scene and allows a single primitive to be used for the automatic identification of multiple roof
types. Moreover, another significant advantage is that the optimization involves only two unknowns
which are bound-constraint due to the condition that 0≤α,β≤1, thus making the convergence to
a solution extremely fast.
Figure23: Thedifferentrooftypesgeneratedbyvaryingtheparametersα,β(topview). Thecommon
roof types of Figure 21 can be identified using the parameterization of the I building block.
A non-linear, bound-constraint optimization is then performed to find the optimal values of the two
variables α,β such that E
error
in equation 13 is minimum i.e. the sum of the squared error of the
fitting of the five planar surfaces Π
1...4
defined by the primitive as shown in Figure 22 and given by,
E
error
=
N
X
n=0
X
∀p∈P
(χ
p
−f(χ
p
,α,β))
2
(13)
where N is the number of planes(5), f(.,.,.) is a least-squares fitting function and χ
p
are the data
samples.
35
Parameterization of Building Block L
Figure24: AparameterizedprimitivefortheidentificationofallL-shapedrooftypevariants. O
1
,O
2
are the local origins for the two subdivided rectangles. The red lines indicate where the parameters
α,β,γ,δ are equal to zero respectively. The shaded area shows overlap between the two parameter-
ized rectangles. The points P
3
,P
6
are parameterized with α,β as well as γ,δ.
In this work we focus on identifying the most commonly occurring roof types shown in Figure 21
whicharehandledbythepreviouslydescribedI-shapedprimitive. However,suchaparameterization
can be extended to handle other groups of complex linear roof types and basic building blocks L, H,
U and T as described by Schmitt [Sch93].
Figure 24 shows a parameterized primitive which can identify all variants of L-shaped roof types.
Thisparameterizationincludesonlyfourparametersα,β,γ,δsubjecttothecondition0≤α,β,γ,δ≤
1, and subdivides the space into seven planes Π
1...7
. Similarly to the I-shaped parameterization, the
exterior and interior control points are expressed as a function of the four parameters α,β,γ,δ and
the structural parameters width w and height h as shown below. Thus, additional parameterized
primitives can be created if needed in a similar way, to specifically handle more complex groups of
linear roof types.
V
1
=[−
w
1
2
,−
h
2
], V
2
=[
w
1
2
,−
h
2
] (14)
V
3
=[
w
1
2
,
h
2
−h
2
], V
4
=[
w
2
,
h
2
−h
2
] (15)
V
5
=[
w
2
,
h
2
2
], V
6
=[−
w
1
2
,
h
2
] (16)
36
P
1
=[−α
w
1
2
,−β
h
2
], P
2
=[α
w
1
2
,−β
h
2
] (17)
P
3
=[α
w
1
2
,β(
h
2
−h
2
)], and P
0
3
=[−γ(
w
2
−w
1
),−δ
h
2
2
] (18)
P
4
=[γ
w
2
,−δ
h
2
2
], P
5
=[γ
w
2
,δ
h
2
2
] (19)
P
6
=[−α
w
1
2
,β
h
2
], and P
0
6
=[−γ
w
2
,δ
h
2
2
] (20)
Robust Classification and Parameter Optimization
The coordinate-wise optimization is driven by measuring the fitting error of the planes contained in
the primitive. Thus, the surface fitting is an imperative part of the optimization and can greatly
affect the convergence to the correct solution.
Inordertoensurethatoutliersareremovedfromthesurfacefittingweperformarobustclassification
using Gaussian Mixture Models(GMM) to separate the outlier points produced by objects such as
elevator engines, air-conditioners and chimneys (which are commonly found on the roofs), therefore
removing the otherwise significant bias produced by the outliers from the distribution of the true
surface points lying on the roof. Hence, the classification makes the surface fitting more robust
and less susceptive to the presence of outliers which subsequently improves the performance of the
optimization and the accuracy of the fitted surfaces.
During the optimization, a GMM is used to model the distribution of the elevation of all points
p∈ P inside the planar surface boundaries. A GMM is a superposition of K gaussian densities of
the form,
p(x)=
K
X
k=1
π
k
N(x|μ
k
,Σ
k
) (21)
where each gaussian density N(x|μ
k
,Σ
k
) is a component of the mixture and has its own mean μ
k
and covariance Σ
k
. The parameters π
k
are the mixing coefficients for which π
k
≥0 and
P
K
k=1
π
k
=
1. The calculation of the parameters π = {π
1
,...,π
k
}, μ = {μ
1
,...,μ
k
} and Σ = {Σ
1
,...,Σ
k
}
is performed using an expectation maximization(EM) algorithm which maximizes the log of the
likelihood function given by,
ln p(X|π,μ,Σ)=
N
X
n=1
ln{
K
X
k=1
π
k
N(x
n
|μ
k
,Σ
k
} (22)
where X={x
1
,...,x
N
} are the data samples, i.e. elevation.
37
(a) Roof cluttered with objects. (b) Point classification (G0-red, G1-green,
G2-blue).
(c) The three components of the GMM for 25(b).
GMM
comp.
Mixing
coeff.(κ)
Mean(μ) Variance(σ
2
)
G
0
0.684208 0.526408 0.0000000479
G
1
0.225278 0.538507 0.0000405911
G
2
0.0905144 0.493301 0.000859696
(d) The values of the GMM for 25(a).
Figure 25: Example classification of points using a gaussian mixture model consisting of 3 compo-
nents.
Figure 25(a) shows a building with several objects of different elevation located on the roof and
the parameters for the corresponding GMM components are shown in Table 25(d). We define the
dominant component of the GMM, G
dom
, to be the component with the highest mixture coefficient
κ
dom
in the GMM e.g. G
0
in Table 25(d). The dominant component G
dom
is then used to classify
allthepointsp∈P whosepropabilityismaximizedin G
dom
(equation23)asthetruesurfacepoints
38
P
inliers
∈P, and everything else as outliers P
outliers
=P −P
inliers
.
P
inliers
=∀p∈P∀i∈{0,1,2}(Pr(p|G
dom
)≥Pr(p|G
i
)) (23)
Figure 25(b) shows all the points inside the detected boundaries being color-coded based on the
distribution that maximizes their probability. Red points have the highest probability in G
0
-which
is the most dominant component-, green points in G
1
and blue points in G
2
.
Using only the inlier points the fitting equation 13 now becomes,
E
error
=
N
X
n=0
X
∀p∈P
inliers
(χ
p
−f(χ
p
,α,β))
2
(24)
where N is the number of planes contained in the primitive.
Experimental Results
An example is shown in Figure 26(a) where the detected boundaries are shown in red and the
automaticallyrecoveredinteriorboundariesproducedbytheoptimizationareshowninyellow. Note
that even for a human operator it is hard to identify the roof type of these buildings from Figure
26(a). The reconstructred buildings are shown in Figure 26(b), overlaid to the original data.
(a) Detected boundaries on the 2D XYZ
map(red lines) and recovered interior
boundaries(yellow lines).
(b) Reconstructed buildings from 26(a) overlaid on original
data(Gable roof type).
Figure 26: Automatic roof type identification using the parameterized geometric primitive for the
building block I.
39
A saltbox roof type is shown in Figure 27(a) and the automatic identification result using the
parameterized primitive is shown in Figure 27(b). Due to the symmetry constraints the roof is
divided into three planes instead of the actual two planes. However, this does not affect the visual
appearance of the reconstructed model since the two lower planes shown in Figure 27(b) with white
arrows are co-planar.
(a) A building with a salt-box roof. (b) Automatic identification result overlaid on the 2D
XYZ map. The lower two planes are co-planar.
Figure 27: Salt-box roofs.
(a) A building with a shed roof. (b) All values 0 ≤ α,β ≤ 1 are
global minimums. The two planes
are co-planar.
(c) Reconstruction in 3D. The yel-
low plane indicates a selection.
Figure 28: Shed roofs.
An example of a shed roof is shown in Figure 28(a) and the automatically reconstructed model in
Figure 28(c). Similarly, the two planes in this case are co-planar due to the symmetry constraints.
40
In addition, in this example any value for α,β is a global minimum since the roof surface is flat and
all planes will be co-planar.
Figure29(a)showsanexampleofabuildingwithanoverlappinggableroof. Althoughthisparticular
roof type cannot be handled by the I-shaped primitive, the optimization converges to the best
solutionwhichinthiscaseisthesinglegableroof. Thisdemonstratestherobustnessandeffectiveness
of the parameterized primitive in handling even cases where the roof shape significantly deviates
from the shapes in Figure 21.
(a) A building with a overlapping gable roof. (b) Best solution return by the optimization.
Figure29: AutomaticrooftypeidentificationusingtheI-shapedparameterizedgeometricprimitive.
Complex Linear Roof Types: Polygonal Primitive
The parameterized geometric primitives described in Section can be used for the extraction of
buildings with common linear roof types. However, the extraction of buildings with complex linear
roof types poses a more challenging problem. We introduce a flexible polygonal primitive for the
identification of buildings with complex linear surfaces. A unique characteristic of the polygonal
primitive is its dual modality(singular or complex) which can be used to extract and refine a single
linear surface or a set of linear surfaces.
Initially, the operator specifies a set of control points that define the shape of the building. Connec-
tivity information is then automatically derived and the primitive is initialized on the fly as having
a singular or complex mode. A singular mode polygonal primitive refers to the simple case where
only a single linear surface exists. A complex mode polygonal primitive refers to the case where
multiple linear surfaces exist in close proximity to each other. In the complex mode, linear surfaces
41
with neighboring control points or similar edges are integrated together into a single primitive shar-
ing common geometry, such as points and edges. Thus, an entire building consisting of multiple
linear surfaces is optimized as a single complex mode primitive with shared geometry as opposed to
optimizing multiple linear primitives individually, which significantly reduces the computation time.
In addition, it guarantees that the solution will be consistent to the initial connectivity.
SimilarlytoSection arobustclassificationusingGMMisfirstperformedontheinteriorpointsofeach
surface contained in the primitive to remove the outliers, followed by a coordinate-wise optimization
for the refinement of the spatial location of the control points by minimizing the surface fitting error
of equation 13.
Step 1 The user marks a set of control points for a set of surfaces S.
Step 2 Automatically derived connectivity information determines the number of polygonal
primitives B.
1. Surfaces sharing edges and control points are merged into a single primitive consisting of
multiple planar surfaces.
2. Otherwise a single primitive is created consisting of a single planar surface.
Step 3 Perform coordinate-wise optimization for each primitive b∈B.
1. For each surface s∈S contained in s∈b,
• Fit a GMM given by the equation 21 to the set of all points P lying on the surface s.
• UsethedominantcomponentG
dominant
oftheGMMtofindthesetofinliersP
inliers
∈
P such that P
inliers
=∀p∈P∀i∈{0,1,2}(Pr(p|G
dom
)≥Pr(p|G
i
)).
2. Perform plane fitting to all surfaces s∈S which are part of b and measure the error given
by equation 24.
3. Find global minimum by adjusting the spatial position of the control points.
Table 1: The algorithm for the polygonal primitive.
Table 1 shows a description of the algorithm for this primitive. A key advantage of the polygonal
primitive is the fact that multiple linear surfaces sharing the same geometry are optimized together
therefore significantly reducing the number of unknowns. Moreover, the optimized boundaries are
guaranteed to be water-tight without misalignments between neighboring surfaces.
Experimental Results
Figure 30(a) shows an example using this primitive. In this case, the primitive is initialized in
singular mode since only one surface is present. Figure 30(b) on the other hand, shows an example
42
of four polygonal primitives being used to identify a complex building. In this case, two complex
mode polygonal primitives and 2 singular mode polygonal primitives were used.
(a) The green points are the control points
clicked by the user.
(b) The yellow lines indicate the initial user de-
tected boundaries. The red lines indicate the re-
fined boundaries after the optimization.
Figure 30: (a) Singular mode polygonal primitive. (b) Complex mode polygonal primitive
(a) Complex building with sloped surfaces extracted us-
ing 1 complex polygonal primitive consisting of 9 linear
surfaces.
(b) Building from Figure 31(a). The optimization in-
cludedallsurfaces(9)andsharedthesamecontrolpoints
shown in green with white cross-arrows.
Figure 31: Example of a complex mode polygonal primitive consisting of 9 linear surfaces.
Figure 31(b) shows the reconstructed model of a building identified by a single complex mode
polygonal primitive containing nine linear surfaces shown in Figure 31(a). As previously explained,
the optimization minimizes the error function given by,
43
E
error
=
9
X
n=0
X
∀p∈P
inliers
(χ
p
−f(χ
p
,α,β))
2
(25)
for all nine surfaces together and refines the spatial position of the control points indicated by the
white cross-arrows. In this particular example the number of unknown parameters is reduced from
38(due to duplicates) to 18 points which reduces the dimensionality of the optimization’s search
space and subsequently reduces the computational time.
Non-linear Roof Types: Ellipsoidal Primitive
Theparameterizedgeometricprimitivesandpolygonalprimitivecansuccessfullydealwithanykind
of building with linear roof types however, they cannot be used to extract buildings containing non-
linearsurfacessuchasdomes,stadiums,etc. Hence,weintroduceanellipsoidalprimitivedesignedto
handlealltypesofnon-linearsurfaceseitherdome-likeorstadium-like(hollow)byfittinganellipsoid
to the data given by the equation,
x
2
α
2
+
y
2
β
2
+
z
2
γ
2
=0 (26)
Initially, the operator marks three control points indicating a point on the perimeter, the centroid
and another point on the perimeter which is in the perpendicular direction to the direction formed
by the first point and the centroid. The input control points are used to determine the parameters
α and β in equation 26. A non-linear optimization is then performed to recover the optimal value
for the single unknown variable γ such that the equation 26 is minimized.
Figure 32: Ellipsoidal primitive. The parameters α and β are determined from the user defined
control points.
Figure 32 shows the ellipsoidal primitive. The parameters α and β are determined from the user
defined control points and the unknown parameter γ which controls the elevation, is recovered
44
by minimizing the fitting function using a non-linear optimization. Additionally, the type and
orientation of the fitted surface can be automatically determined as a dome-like or stadium-like.
Experimental Results
Figure 33(a) shows the user defined control points. After the optimization converges the result is
shown overlaid on the original 2D XYZ map as is shown in Figure 33(b).
(a) Userdefinedcontrolpointsareconnectedbytheyellow
lines.
(b) The perimeter of the fitted ellipsoid is overlaid on the
2D XYZ map.
Figure 33: Ellipsoidal primitive. The parameters α and β are determined from the user defined
control points in (a). The orientation and type is automatically determined by the optimization (b).
Figure 34(a) shows the arena of Figure 33 with a dome-like non-linear roof reconstructed using
the refined boundaries produced by the primitive. Similarly, Figure 34(b) shows a stadium-like
structure(hollow) being overlaid on the original data. The orientation and type of the roof was
automatically determined in both cases.
(a) Dome-like structure. (b) Stadium-like structure.
Figure 34: Non-linear building structures.
45
Probabilistic Building Extraction
AnoverviewoftheprobabilisticbuildingextractionmethodisshowninFigure35. Initially,anormal
map is computed by considering local neighborhood information to calculate a normal vector for
eachpointintheXYZmap. ThenormalandXYZmapsarethenusedbytheprobabilisticclustering
which segments the map into disjoint, contiguous regions. Next, unstable regions and regions which
resemble ground, trees and vertical features are removed from further processing. Finally, the 2D
exterior boundaries are extracted for all the remaining regions.
Figure 35: Automatic Building Extraction.
Normal Computation
Local neighborhood information is used to compute the normal at each point in the map. For every
point P
i
of an XYZ map M we define the normal N
Pi
of that point as,
N
P
i
=
1
8
∗
8
X
j=1
N
P
j
(27)
46
whereN
Pj
isthenormalcomputedwiththeneighboringpointP
j
withinthe8-neighborhoodsystem.
Each of the 8 normals N
P
j
, is computed as the cross product of the vectors connecting the point
P
i
and two consecutive(in clockwise order) neighboring points P
j
,P
j+1
, as indicated by the vectors
~ a,
~
b,~ c,
~
d,~ e,
~
f,~ g,
~
h in Figure 36.
Figure 36: Normal computation. The normal vector(blue) for each point P
i
is computed as the
normalized sum of the 8 cross products of the vectors|~ a×
~
b|,|
~
b×~ c|,|~ c×
~
d|, etc, as given by equation
27.
Probabilistic Clustering
A plethora of different approaches and techniques have been proposed throughout the recent years
to resolve the complex problem of data segmentation and in particular of airborne LiDAR data.
However, all existing approaches, such as [FZ03a, MSS
+
08, PYN08], rely on data dependent param-
eters in order to segment the data into different regions of similar height and/or orientations. Our
proposed approach is a data-independent, automatic, deterministic segmentation with an underly-
ing region growing algorithm based on the probabilistic analysis of the geometrical properties of the
data.
A region R
i
is defined as a set of points S
R
i
∈ M, where M is the XYZ map. The clustering will
result in T disjoint, contiguous regions such that,
M =
T
X
i=0
R
i
and ∀i∈T∀j∈T(i6=j|R
i
∩R
j
=®) (28)
We define a region descriptor D
Ri
= (G
d
,G
~ n
) associated with each region R
i
, consisting of two
probability distributions:
47
1. A 1D probability distribution of the depths χ
d
of all points P ∈S
Ri
which is modeled by the
1D Gaussian function,
G
d
=
1
p
2πσ
2
d
e
−
1
2∗σ
2
d
∗(χ
d
−μ
2
d
)
(29)
where σ
d
is the variance and μ
d
is the mean depth.
2. A 3D probability distribution of the normals
~
N
p
of all points P ∈S
R
i
(as computed earlier in
), which is modeled by the 3D Gaussian function,
G
~ n
=
1
(2π)
3
2
1
|Σ
~ n
|
1
2
e
−
1
2
(
~
Np−
~
Nμ)Σ
−1
~ n
(
~
Np−
~
Nμ)
(30)
where Σ
~ n
is the covariance matrix and
~
N
μ
is the mean normal vector.
We initialize the first region R
0
with the left-most, top-most point in the map M, and progressively
add to the region, neighboring points P
i
which satisfy equations 31 and 32 given by,
Pr
d
(d
Pi
)≥Pr
d
(μ
d
+κσ
d
) (31)
Pr
~ n
(~ n
Pi
)≥Pr
~ n
(
~
N
μ
+κ
~
V
Σ
~ n
) (32)
where Pr
d
and Pr
~ n
are the probability functions G
d
and G
~ n
, d
P
i
and ~ n
P
i
is the depth and normal
vectorofthepoint P
i
, μ
d
andσ
d
isthemeandepthandvarianceofthedistribution G
d
oftheregion
R
0
respectively,
~
N
μ
is the mean normal vector of the distribution G
~ n
of the region R
0
,
~
V
Σ
~ n
is the
diagonal of the covariance matrix Σ
~ n
of the distribution G
~ n
from equation 30, and κ which is set to
1, controls the maximum allowed deviation about the mean for G
d
and G
~ n
.
Figure 37 shows the effect of the factor κ of equations 31 and 32. Any point P
i
with a higher
probabilitythantheprobabilityofthedistribution’smeanperturbatedbyafactor κofthevariance,
is accepted as belonging to the region R
0
. The gray area shown in the graph of Figure 37 indicates
all accepted probabilities of this region.
Once a point P
i
is determined to belong to a region, the depth and normal distributions G
d
and
G
~ n
are recomputed to reflect the depths and normals of all points in that region. This process is
repeated until no more neighboring points satisfy the conditions of equations 31 and 32, in which
case a new region R
1
is initialized with a point that has not yet been considered.
48
Figure 37: An example of a depth distribution G
d
. All points P
i
whose depth probability Pr
d
(d
P
i
)
falls inside the gray area defined by Pr(μ
d
±κσ
d
) are accepted as belonging to the current region(iff
the same is also true for the normal distribution G
~ n
). The orange line represents the degenerate
case where the variance σ
d
=0.
An obvious problem that arises, is the initial degenerate condition of the distributions of each
region descriptor D
Ri
. As previously mentioned, we begin the process by initializing a region R
i
with a single point in the map M. Computing a gaussian distribution using a single point is a
degenerate case and causes the variance σ
d
of the depth distribution and the covariance matrix Σ
~ n
to be zero(orange line in Figure 37). As a result, if the variance is zero then the only condition that
satisfiesequations31and32,isifthedepthd
P
i
ofapointequalsthemeanμ
d
ofthedepthdistribution
G
d
, and similarly the normal ~ n
Pi
of a point equals the mean
~
N
μ
of the normal distribution G
~ n
, i.e.
d
P
i
= μ
d
∧~ n
P
i
=
~
N
μ
. In order to overcome this problem during the initial stages, we express the
variance and diagonal vector of the covariance matrix as a function of the current number of points
in the region R
i
given by,
σ
0
(p)=max(σ
G
,e
1
p
) (33)
where p is the current number of points in the region R
i
, max(α,β) is a function that returns the
maximum of the two input parameters α,β, and σ
G
is the variance computed by fitting a gaussian
distribution to the data. As shown in Figure 38 the variance is exponentially decreasing as the
number of points increases. This is because the more points are contained in a region the more
stable and robust the variance calculation will be when fitting the depth and normal distributions
to the data.
49
Figure 38: The red line represents the variance σ
G
computed from the data and the blue line
represents the function e
1
p
. The variance function σ
0
(p) ensures that the variance and covariance
matrix of the depth and normal distributions respectively, will never equal zero, and is represented
by the gray overlaid line.
Figure 39 shows results of the probabilistic clustering applied on areas in Baltimore and Denver.
Each disjoint region is represented by a different color. Note that the colors are reused and may
appear for different regions.
Classification and Removal of Unstable Regions, Ground, Trees and Ver-
tical Features
The segmentation returns a set of disjoint, contiguous regions which may include ground, trees, and
other unwantedareas. Weproceed byremoving unstable regions, ground, trees and verticalfeatures
such as walls from further processing. Any region R
i
which meets any of the following criteria is
removed:
1. A region R
u
is classified as unstable iff it contains less than three points, since a minimum of
three points are required to define a surface.
2. A region R
vf
is classified as a vertical feature iff the mean normal vector
~
N
μ
of its normal
distribution G
~ n
is near perpendicular to the nadir direction.
3. The region R
g
whose depth distribution G
d
has the lowest mean depth μ
d
and contains the
largest number of points is classified as the ground.
50
(a) Baltimore Area 1 (b) Atlanta Area 1
(c) Baltimore Area 2 (d) Denver Area 1
Figure 39: Color-coded segmentation results. All points contained in the same disjoint, contiguous
region are denoted by the same color. Each map size is 1Kx1K.
4. AregionR
t
isclassified asatreeiffeveryelementofthediagonalvector
~
V
Σ
~
nofthecovariance
matrix Σ
~ n
in equation 30 is higher than a user-defined value τ
σ
and, the mean depth μ
d
in
equation 29 is less than a user-defined value τ
d
. This criterion results from the observation
that trees are non-linear surfaces which exhibit a large normal variation, as opposed to linear
surfaces.
51
In special scenarios, such as disaster and evacuation simulations, a complete reconstruction of the
entire city is required. Therefore it is imperative that we model all structures in the scene including
buildings, trees, ground, bridges, raised highways, boats etc. In such cases, the criteria 3 and 4
described above are turned off during the processing. This results in a complete reconstruction of
the entire city using simple polygonal 3D models.
Region Boundary Extraction
Thefinalstepistoextractexteriorboundariesfortheremainingregions. Toachievethis,weemploy
Suzuki’s algorithm for the extraction of the region boundaries as described in [SA85]. The result is
a two-dimensional closed contour B
Ri
for each remaining region R
i
. The boundary B
Ri
is a dense
set of sequentially, neighboring points. Figure 40(a) shows the extracted boundaries overlaid on the
depth map for the complex building shown in Figure 40(c). This example demonstrates the high
accuracy of the extracted boundaries resulting from the preservation of structural details by the
segmentation such as elevator engines, air-conditioners, etc. The color-coded segmentation result
computed as described in Section is shown in Figure 40(b) for comparison.
(a) Extracted boundaries(dense). (b) Color-coded segmentation result. (c) Satellite image.
Figure 40: Boundaries extracted from the segmented regions for a complex building. Boundary
points are displayed in blue and the lines connecting two neighboring boundary points are displayed
in yellow.
52
Chapter 6
Geometry Reconstruction
Overview
Figure 41: Module 3: Geometry Reconstruction. Includes two methods corresponding to the two
buildingextractionmethodspresentedinChapter: Optimization-drivenandLinearReconstruction.
The building boundaries extracted as explained in Chapter define the roof type of each building.
During the Geometry Reconstruction, the boundaries are used to create 3D model representations
for the buildings.
Building boundaries which were extracted using the primitive-based method are represented as
curvilinear features corresponding to linear and non-linear surfaces, and are already refined by the
coordinate-wise optimization. An optimization-driven reconstruction performs a final surface fitting
53
using the optimized parameters and refined spatial locations of the control points in order to create
the 3D models.
Similarly, the building boundaries which were extracted using the probabilistic method are repre-
sented as un-refined curvilinear features corresponding to linear surfaces and consisting of a dense
number of points. A linear reconstruction performs a surface fitting followed by a boundary sim-
plification and refinement. The fitted linear surfaces and the refined boundaries are then used to
create the 3D models.
The Geometry Reconstruction module takes as input the segmented regions and their boundaries
and generates a set of lightweight, water-tight, polygonal 3D models. The system overview is shown
in Figure 41.
Optimization-driven Reconstruction
Linear and Non-Linear Surface Fitting
In Chapter we presented a minimal set of three primitives that can be used to identify and refine
boundaries containing buildings. The result of this refinement is the set of the refined boundaries,
the GMM classification of the points lying within the boundaries and the optimal values for the
parameters corresponding to each primitive(parameterized I - (α,β), parameterized L - (α,β,γ,δ),
polygonal - (spatial location of control points), ellipsoidal - (γ)), returned by the boundary opti-
mization.
Next, using the optimized values for all parameters and control points a least-squares fitting process
is performed to reconstruct the surfaces of the buildings. A linear surface fitting is performed on the
boundaries which were extracted by the parameterized or polygonal primitives. On the other hand,
a non-linear surface fitting is performed on the boundaries which were extracted by the ellipsoidal
primitive. The fitted surfaces are finally extruded in the vertical direction towards the ground, and
surfaces for the models’ facades are created. The result is high-fidelity, light-weight, water-tight,
polygonal 3D model representations of the scene.
54
Experimental Results
Figure 42(a) shows an example of a hip-roof identified and reconstructed using the I-shaped param-
eterized primitive. Similarly, Figure 42(b) shows the reconstruction of a pyramidal roof type. The
reconstructed 3D models are overlaid on the original geometry.
(a) Hip roof type. (b) Pyramidal roof type.
Figure 42: Reconstructed geometry resulting from the automatic identification using the I-shaped
parameterized geometric primitive.
Figure 43 shows examples of buildings of various complexities extracted using the polygonal primi-
tive. The example of the complex building shown in Figure 43(a) was reconstructed with 4 complex
polygonalprimitives. Thetowerandflatbuildingswerereconstructedusingtheparameterizedprim-
itive. Figure 43(b) shows an area reconstructed using singular and complex polygonal primitives.
Figure 45 shows a large-scale virtual environment created using the three primitives. The models
were reconstructed from the original data in Figure 44.
55
(a) Complex building extracted using 4 complex
polygonal primitives.
(b) An area containing complex buildings recon-
structed using the polygonal primitive.
(c) Reconstructed models. (d) Reconstructed models.
Figure 43: Buildings of various complexities extracted using the polygonal primitive.
56
Figure 44: Original LiDAR data.
Figure 45: Reconstructed models of USC campus overlaid on original data.
57
Linear Reconstruction
The Automatic Geometry Reconstruction method takesthe segmentedregions and their boundaries
andgeneratesasetoflightweight,water-tight,polygonal3Dmodels. Theprocessinvolvesfoursteps
shown in Figure 46.
Firstly, linear surfaces are fitted to each of the regions. Secondly, the dense boundaries extracted in
Section are simplified into boundaries consisting of a reduced set of points. A refinement process is
then performed on the boundaries, in order to remove irregularities resulting from the noise in the
data. Lastly, 3D models are created from the resulting boundaries and fitted surfaces.
Figure 46: Geometry Reconstruction module.
Surface Fitting
A planar least-squares fitting is performed on every region R
i
to determine the best linear surface
passing through that region. We represent each fitted linear surface Π
i
by a normal vector
~
N
Πi
perpendicular to the surface and a scalar δ
Π
i
, and we compute the new 3D positions for the points
of each region, as the projection of each point to that plane. Due to the possibility of having a large
number of points contained in each region, a RANSAC-guided sampling([BF80]) is used during the
fittingtoprimarilyreducethecomputationtime,increasestabilityandremoveanysmallbiascaused
by outliers.
58
Boundary Simplification & Refinement
The boundaries extracted, as described in Section , consist of a dense set of points. Firstly, bound-
aries which are spatially close(within one pixel) to each other are grouped into the same element,
and further processing is performed on the entire set of boundaries contained in each element. The
reason for processing neighboring roof boundaries together is because they are more likely to have
similar orientations(or perpendicular) which helps with the determination of the principal orienta-
tions of the buildings. In the example of Figure 47 the boundaries are grouped into 47 elements
based on a one pixel proximity. Examples of elements are shown in Figure 48.
Figure47: TheextractedroofboundariesaftertherefinementprocessusingGMMs. Thebluepoints
indicate a boundary point and the yellow lines are edges connecting those points.
Next, a boundary refinement process is then applied in order to remove the artifacts and linearize
the dense boundaries. A Gaussian Mixture Model(GMM) is used to classify the boundary points
into different orientations. As mentioned previously a boundary B
i
consists of a set of dense points.
A three dimensional feature descriptor F
P
j
is used to represent each boundary point P
B
i
j
and is
defined as,
F
Pj
=(
~
T
x
,
~
T
y
,κ
e
) (34)
59
whereT
x
,T
y
isthe localtangent orientationand κ isthe localextrinsic curvaturewhichis expressed
in terms of the discrete first(’) and second(”) derivatives as,
κ
e
=
x
0
y
00
−y
0
x
00
(x
0
2
+y
0
2
)
3/2
(35)
In contrast to existing work, we do not assume that buildings always have four sides or parallel
sides, thus we do not use GMMs of fixed order K, but instead we compute K using a Minimum
DescriptionLength(MDL)estimatorcriterionproposedby[Ris83]. Hence,foreachsetofboundaries
in an element we first determine the best number of components the GMM should have and then
perform the fitting using the EM algorithm.
Figure 48 shows the classification of the boundary points for three different elements. The color
of each boundary point P
j
indicates the component of the GMM of that element which maximizes
the probability of the point’s feature descriptor F
P
j
. As it can be seen, the primary advantage of
using a GMM for the classification of the boundary points, is that it can better separate the outlier
points produced by noise and accumulating errors from the resampling process, thus removing the
otherwise significant bias during the boundary refinement. Moreover, the use of MDL to determine
the number of components K of the GMM model allows the application of this technique to linear
as well as non-linear shapes.
(a) Element 1. (b) Element 2. (c) Element 3.
Figure 48: Boundary grouping into elements and point classification using GMMs. The colors
indicate the component of the GMM that maximizes the probability of the feature descriptor F
Pj
of a point.
Finally, the classification of the points is used to cluster sequential boundary points into groups
which are then reduced to single lines using a least-square line fitting algorithm. The refined and
linearizedboundariesareshowninFigure47. Asignificantreductiontothenumberofpointsandan
60
improvement in the linear structure of the roof boundaries is clearly evident, however small artifacts
still remain. This is primarily caused by the similar nature of the artifacts with the buildings’
features, which makes it very difficult to discriminate between real features and features due to
noise.
3D Model Creation
In order to create a 3D model representing each region, the previously fitted linear surfaces are used
to determinethe elevationof themodel. Next, the closedcontourboundary B
R
i
ofeachregion R
i
is
used to generate facades for the model by extruding vertical surfaces to the ground. The result is a
set of water-tight i.e. no holes, light-weight i.e. simple geometry, polygonal 3D models representing
the scene.
Experimental Results
Figure 49 shows a part of the downtown Baltimore consisting of a variety of different buildings
of various shape complexities. Figure 49(a) shows a render of the original pointcloud data for
this area and Figure 49(b) shows a render of the automatically generated 3D model for the same
area. The accuracy and high level of details can be visually verified. Note that for this case the
comparison is between the reconstructed 3D model and the entire area, thus all regions including
the ground, bridges, trees, etc are kept. A quantitative comparison between the geometric features
of our generated 3D model and the triangulated mesh of the original data is shown in Table 2. As it
is demonstrated, the reconstructed 3D models are visually pleasing and closely resemble the original
mesh while at the same time, achieving a surface fitting error of ²=0.972 and a polygon reduction
of 97.5% of the original.
Features→ Verts Edges Polygons Size(Mb)
Mesh 1046529 3135496 2088968 151.4
Our Model 90656 135984 51442 9.3
Reduction(%) 99.9% 95.6% 97.5% 93.8%
Table2: MeshvsModelcomparison. SizereferstothefilesizeofanASCIIOBJfileformatmeasured
in Mb. The reduction % for each geometric feature is shown in red.
Figure 50 shows the generated 3D models for a 16km
2
area of the downtown Baltimore and sur-
rounding areas. Close-ups of the downtown area is shown in Figure 52. In this example, the input
data was divided into 36 different maps of 1Kx1K each, during the preprocessing. The generated
61
(a) Original pointcloud data. (b) The automatically generated 3D model for (a). Note
that for this example we have kept all regions(ground,
trees, etc) for comparison purposes.
Figure 49: Mesh vs Model comparison.
3D models corresponding to each of those maps were later combined together in a postprocessing
step. Misclassified trees and small patches of separately segmented high-ground resulting from the
presenceofhillyareas,weremanuallyremovedduringpost. TheBaltimorecitymodelwasgenerated
in 11hrs as indicated in Table 4 using an AMD Athlon 5200+ machine with 6Gb of RAM. Figure 51
shows a frame from a simulation of a cataclysmic disaster scenario using the generated 3D models.
The simplicity and compactness of the models allows for their use in a variety of different applica-
tions and complex simulations. Figure 53, Figure 54 and Figures 55,56 show the reconstructed 3D
models for the cities of Denver, Oakland and Atlanta respectively.
62
(a) Reconstructed models (b) Intensity return
Figure 50: Complete 3D reconstruction of downtown Baltimore and surrounding areas. The area
covered is about 16km
2
(i.e. input data size 10Kx4K).
63
Figure 51: A frame from a simulation of a cataclysmic disaster scenario taking place in Baltimore using our lightweight, standalone 3D models.
64
Figure 52: Close up of downtown Baltimore and surrounding areas.
65
Figure 53: Close-up of downtown Denver.
66
Figure 54: Close-up of downtown Oakland.
67
Figure 55: Downtown Atlanta(north).
68
Figure 56: Downtown Atlanta(south).
69
Evaluation
The qualitative evaluation of the reconstructed 3D models is very difficult since there is no ground-
truth for comparison. In our work, we use the following criteria for the qualitative and quantitative
evaluation of the reconstructed models, in terms of the following parameters:
• Completeness. Completeness is defined as the ratio of the number of reconstructed buildings
in the model and the actual number of buildings in the data.
• Quality. To determine the quality of the models, a qualitative evaluation is performed by
visually inspecting the models for any artifacts such as misalignments between neighboring
surfaces and inconsistencies between the surfaces and the original data.
• Correctness. The correctness and accuracy of the reconstructed models is measured by over-
laying the reconstructed models on the original LiDAR data(Figure 57(a)). We quantitatively
evaluate the models by measuring the deviation of the fitted surfaces to the actual data. In
our calculations we measure this deviation as the square root of the sum of the distance of
each original point from the reconstructed surface it belongs to.
(a) Model overlaid on triangulated mesh of the original
data.
(b) Geo-referencing. Textured model with satellite im-
agery.
Figure 57: (a) Evaluation of correctness and accuracy. (b) Geo-referencing. Satellite imagery is
registered to the reconstructed polygonal models.
• Geo-referencing. The reconstructed models represent areas from the real-world therefore, any
imagery capturing the appearance of that same geometry should register precisely to the
reconstructed model as well. The error of the registration is an indication of the deviation
of the reconstructed geometry from the original(Figure 57(b)).This deviation is measured as
70
the square root of the sum of the square of the differences - in pixels - between the projected
3D points of the model and the corresponding 2D image points. These correspondences are
specified interactively by an operator.
• Scalability. The scalability is measured in terms of the time required for the generation of a
3D model.
1. In the semi-automatic primitive-based method, our experiments show that the required
timeisbasedonthecomplexityofthebuildingsandtheprimitivesused. Table3indicates
the time required for the reconstruction of average-sized buildings, i.e. occupying an area
of up to 1000 pixels. The parameterized primitive refers to the flat, shed, gable, hip,
pyramidal, mansard and saltbox roof types.
Task→
Primitive↓
Detection Identification &
Reconstruction
Total
Parameterized 1s 1s-5s 2s-6s
Polygonal interactive(clicks) 1s-43s ' 45s
Ellipsoidal interactive(clicks) 4s ' 6s
Table 3: Scalability for semi-automatic primitive-based method. Processing time required for the
reconstruction of average-sized building models.
2. In the automatic probabilistic method, our experiments have shown that the required
time for the reconstruction is based primarily on the number and size of the segmented
regions. Table 4 shows the times required for the reconstruction of U.S. cities, for each
step of our pipeline:
Area →
Module↓
Baltimore Atlanta Denver (area 1) Denver (area 2)
Preprocessing 2hrs 3hrs 0.5hr 1hr
Segmentation 8hrs 9hrs 1hr 2hrs
Modeling 1hr 1.5hrs 0.2hr 0.5hr
Total 11hrs 13.5hrs 1.7hrs 3.5hrs
Table 4: Scalability for automatic probabilistic method. Processing time required for the recon-
struction of U.S. cities. Benchmark results were generated on an AMD Athlon 5200+ machine with
6Gb of RAM.
Table5showsthequantitativeandqualitativeevaluationofourresultswiththeautomatic. Densely
builturbanareassuchasdowntowndistrictsgenerallyresultedinveryaccurateandvisuallypleasing
3D models, which is mainly due to the fact that they are low-vegetation areas with buildings of high
71
complexity and of relatively large areas. On the other hand, suburban and residential areas have
performed the poorest primarily due to the high vegetation density, and in addition due to the
minimal differences between the sizes and complexities of the buildings and the vegetation. For
this reason, it is very hard even for a human operator to accurately evaluate the completeness and
correctness of the suburban models strictly by looking at the input data.
Area →
Criterion↓
Baltimore Atlanta Denver(suburbs) Denver
(downtown)
Completeness 100% 99% 90%−95% 98%
Correctness 92% 87% 80%−90% 92%
Accuracy 0.972 0.83 0.514 0.469
Geo-referencing 0.556 0.672 0.435 0.428
Table 5: Qualitative and quantitative model evaluation.
72
Chapter 7
Texture Composition
Overview
In this chapter we present a flexible blending method for the generation of photorealistic textures
from multiple optical sensor resources. The system overview is shown in Figure 58.
Figure 58: Module 4: Texture Composition.
73
Image Registration
The input to this module is a set of 3D models representing the scene and a set of images capturing
the appearance of the same scene. The output is the recovered camera poses corresponding to the
images, as shown in Figure 58.
Aplethoraoftechniqueshavealreadybeenproposedforsolvingtheproblemofcameraposerecovery.
Firstly,wedefinethepinhole-cameramodelusedtodescribeeachcamera. Theextrinsicandintrinsic
parameters of each camera are specified by the camera matrix C in equation 36,
C =
2
6
6
6
6
4
α −αcot(θ) u
0
0
β
sin(θ)
v
0
0 0 1
3
7
7
7
7
5
| {z }
intrinsic
2
6
6
6
6
4
r
11
r
12
r
13
t
x
r
21
r
22
r
23
t
y
r
31
r
32
r
33
t
z
3
7
7
7
7
5
| {z }
extrinsic
(36)
where α = kf
x
, β = kf
y
, (f
x
,f
y
) is the focal length on the x and y axis respectively, θ is the skew
angle, u
0
,v
0
is the principal point on the x and y axis respectively and r
1−3
,t
x−z
determine the
camera’s rotation and translation relative to the world.
Given a minimum of three 2D to 3D correspondences specified interactively by the operator the
camera extrinsic and intrinsic parameters can be accurately estimated. The camera pose estimation
is performed using a non-linear Levenberg-Marquardt optimization [Lev44, Mar63] which minimizes
the error function E
C
k
for each camera C
k
,
E
C
k
=
1
n
n
X
i=0
q
(I
i
x
−P
i
x
)
2
+(I
i
y
−P
i
y
)
2
) (37)
where I
i
is the ith image point, P
i
is the projection of the ith 3D world point and n is the number
of 2D to 3D correspondences.
Anobviousproblemofusinganoptimizationtorecoverthecameraextrinsicandintrinsicparameters
is that it can get stuck in local minima and fail to converge to the global minimum solution. For
example,varyingthefocallength(f
x
,f
y
)hasthesameeffectasvaryingthetranslationt
z
. Similarly,
varying the principal point u
0
,v
0
has the same effect as varying the translation t
x
or t
y
. In order
to significantly decrease the likelihood of this problem occurring, we optimize the camera extrinsic
and intrinsic parameters in an iterative, alternate process.
74
Firstly, the operator provides an initial estimate of the camera parameters. In the majority of the
cases, we have found that just pointing the camera towards the object suffices and the optimization
converges to a global minimum. Secondly, we perform the optimization in an iterative process. The
extrinsicparametersareoptimizedfirstuntiltheoptimizationconvergesfollowedbyanoptimization
of the intrinsic parameters. This process is repeated until the optimization converges to a minimum
solution.
Texture Rendering
The Texture Rendering module takes as input a set of scene 3D objects, the recovered camera poses
and their associated images, and returns the resulting 3D objects along with a set of texture atlases.
Thegenerationoftheresultingtexturedmodelsinvolvestwosteps: Image-basedObjectSubdivision
and Composite Texture Rendering, as shown in Figure 58.
Initially, the scene objects are decomposed into their corresponding faces which are then subdivided
basedontheirimagevisibility. Aboundingvolumehierarchyisusedasanefficientdatastructurefor
the representation of the subdivided objects in the scene. Next, the optimal texture map resolution
is determined for each face and the composite texture maps are rendered. Finally, the composite
texture maps are packed into one or more texture atlases.
Image-based Object Subdivision
In order to determine the optimal resolution of the texture map for each face, the face’s vertices are
projected into all the images using the recovered camera poses. The largest projected area for the
face is defined as the optimal resolution. However, in some cases not all the projected points fall
within the bounds of the image frame which can lead to problems when computing the texture map
resolution. The problem is demonstrated by the example in Figure 59(a) where the scene consists of
three rectangles and two cameras having a white frame and a black frame respectively as shown in
the insets. Figure 59(c) and Figure 59(e) show the projections of the objects into the image frame
of the left and right camera respectively. The projected points which fall outside the bounds of the
image frame can be heavily distorted due to the perspective nature of the projection and therefore
cannot be used for the computation of the projected area.
To overcome this problem we perform an image-based visibility clipping where the 2D projections
of the objects are clipped in image space and then backprojected to the 3D space where the objects
75
are subdivided. This process is performed for all the cameras and all the objects in the scene and it
guarantees that all faces in the scene are entirely visible in all the cameras. Figure 59(d) and Figure
59(f)showstheprojectionsoftheobjectsaftertheimage-basedobjectsubdivision. Similarly,Figure
59(b) shows the subdivided 3D objects produced by this process.
Composite Texture Rendering
Thegenerationofthecompositetexturemapsstartswiththepreparationofthesceneforraytracing.
Irregular faces are subdivided into the simple primitives used by the raytracer e.g. triangles and
quadrilaterals. A bounding volume hierarchy is then used as the internal data structure for the
efficient representation of the data.
Next, a texture map is raytraced for each face. In order to minimize the information loss we exploit
alltheinformationavailablefromallcameras. Thisisachievedbyconsideringthecolorcontribution
of each image in which the face is visible. The integration of these contributions is performed by a
blending function which ensures the smooth transition between multiple cameras and is expressed
as a non-linear function of multiple parameters given by,
f
blend
=kαW
1
×βW
2
×γW
3
×δW
4
ײW
5
k (38)
where α,β,γ,δ,² are the importance factors of the five weights respectively and W
1..5
are the nor-
malized weights for the following parameters:
1. W
1
istheweightoftheresolutionofthecurrentimageandisgivenby,W
1
=
(I
width
×I
height
)
(I
max
w
idth
×I
max
h
eight
)
.
Images with higher resolution are given a higher weight than images with low resolution.
2. W
2
is the weight of the distance of the camera from the point on the face and is given by,
W
2
= kT
camera
−Pk
2
, where T
camera
is the camera translation and P is the 3D position of
the point. An image from a camera which is far away from the point e.g. an image from a
satellite, is given a lesser weight than an image from ground or aerial camera.
3. W
3
is the weight of the distance of the projected point from the closest image border. This
ensuresthesmoothtransition(a.k.a. feathering)betweenmultipleimagesattheimageborders.
76
(a) Scene setup. Geometry before the image-based sub-
division.
(b) Geometry after the image-based subdivision.
(c) Projection into left camera before subdivision. (d) Projection into left camera after subdivision.
(e) Projection into right camera before subdivision. (f) Projection into right camera after subdivision.
Figure 59: Image-based object subdivision.
4. W
4
is the weight of the distance of the projected point from the principal point of the image.
Thisisinordertoaccountfortheradiallensdistortionwhichincreasesasanon-linearfunction
of the distance to the principal point.
77
5. W
5
is the weight of the dot product between the camera’s viewing direction and the face’s
normal. A camera which views a face at a grazing angle is given a lesser weight than a
front-view camera.
Hence, the resulting color of a point P on a face visible by n cameras is given by,
C
P
=
n
X
i=0
f
i
blend
×I
i
P
proj
(39)
where f
i
blend
is the combined weight computed by the blending function for the i’th camera, and
I
i
P
proj
is the color at the projected point P
proj
of the point P, in the image frame of the i’th camera.
Figure 60 shows a render of the test scene introduced in Figure 59(a) from a novel viewpoint using
the composite texture maps. The integration and smooth transition between the black and white
imagesisevidentbythegrayareaswhicharevisibleinbothcameras. Non-visiblefacesforwhichno
information is available are assigned a default green color. Similarly, backfacing faces are assigned
a black color.
(a) Novel viewpoint render for the test scene in Figure 59(a). (b) The composite texture atlas for (a).
Figure 60: Texture composition result: 3D models and the composite texture atlas. The green color
in (a) indicates non-visible faces for which no information is available.
The composite texture maps produced for each face in the scene are of various sizes and shapes. To
reduce the number of texture maps required for a particular scene, we pack the textures into larger
collections of texture atlases. We employ a simple texture packing technique which first sorts the
texturemapsbasedontheirwidthandheight,andthencopieseachmapsequentiallyinanatlasofa
78
user-defined size. Additionally, a set of texture coordinates are computed and assigned to each face
in the atlas. A variety of more advanced techniques have already been proposed [LPRM02, GRA03]
for efficiently packing textures into atlases and can be used instead. The result is a standalone 3D
model with a set of associated texture atlases as shown in the example of Figure 60.
Experimental Results
Figure 61(a) and Figure 61(b) show the textured 3D models using aerial oblique imagery. Similarly,
a suburban area is textured using satellite imagery as shown in Figure 62.
(a) (b)
Figure 61: A close-up of the textured 3D model for downtown Baltimore.
Figure 62: A suburban area textured with satellite imagery. In this case, the amount of overlap
between the images is minimal.
AnotherexampleinFigure63showstheresultoftheapplicationoftheproposedtexturecomposition
process on the 3D models created using the primitive-based method, where three oblique aerial
images are integrated together into composite texture atlases. Although, it is evident that the
generated textures help achieve a high-level of visual realism for the reconstructed models, there are
79
some cases where the textures may not be as visually convincing. Our experiments have shown that
anessentialcomponenttothesuccessofthetexturingprocessistheimageregistration. Theblending
and integration of the multiple textures can only be as good as the camera pose recovered from the
marked correspondences. An inaccurate computation of the camera poses will cause the blending
to occur between non-corresponding parts of the images which will cause unnatural artifacts known
as ’ghost’ effects. In particular, when dealing with remote sensor imagery capturing a large-scale
area, even the slightest misalignment between the images and the models is amplified due to the
scale difference between the image size and the captured buildings, and the perspective nature of
the projection.
Figure 64(a) shows the model textured with high-resolution(4K) satellite imagery and Figure 64(b)
shows an area in the scene with textures generated from satellite, aerial and ground imagery. A
novel viewpoint render of the complete environment using satellite imagery is shown in Figure 65.
Figure 66 shows the automatically reconstructed model for downtown Oakland textured with 30
high resolution oblique aerial images.
Figure63: Atextured3Dmodelforauniversitycampus. Thegreenpointsindicatetheinteractively
marked correspondences between the geometry and the imagery, which were used for the recovery
of the camera pose.
80
(a) Entire area textured with high-resolution(4K) satellite imagery.
(b) A reconstructed area with textures from satellite, aerial and ground imagery.
Figure 64: Textured virtual environment.
81
Figure 65: A large-scale virtual environment textured using satellite images.
82
(a)
(b)
Figure 66: The automatically reconstructed model for downtown Oakland textured using 30 oblique
aerial images.
83
Chapter 8
HAL-10K: An Integrated System
In this chapter we present an integrated prototype system named HAL-10K. We have integrated
the developed techniques to produce a complete modeling system, and extensively tested thesystem
with several city-size datasets including the USC campus, Baltimore downtown, Denver downtown,
and the city of Atlanta, etc.
System Overview
An overview of the HAL-10K system is shown in Figure 67. The software design is of particular
importance since it provides an uncoupled implementation of the individual modules. This allows
the use of a particular module separately from the other modules, thus reducing the amount of
information required to process the data at any given time. In addition, this separation allows each
module to be usedin awide rangeof differentapplications anddoesnotlimit theuseto a particular
application.
Figure68showsthegraphicaluserinterfaceofthesoftware. Itconsistsofasingle3Dworldviewer for
displaying and navigating through the reconstructed models, multiple image viewers for displaying
the input images, a contents list containing all available images and objects and a output window
for communicating information to the user.
84
Figure 67: System overview of HAL-10K.
Figure 68: The graphical user interface of HAL-10K.
85
Modeling
The reconstruction of 3D building models requires a set of unstructured 3D points captured by
an airborne LiDAR scanner. The processing and modeling of the building structures is performed
as described in Chapter . In addition, intensity return images can also be incorporated to aid
with the detection of buildings’ boundaries. The reconstructed models are finally exported to an
Alias|WavefrontOBJformat[Wav]. Figure69showsthereconstructedbuildingmodels forthe USC
campus.
Figure 69: The reconstructed models for an area in Atlanta.
86
Texturing
The texturing of 3D models requires a set of optical images. Firstly, the registration is performed
using a set user-defined correspondences between the 3D object features and 2D image features.
Figure 70 shows the recovery of a camera pose. The registered model is then back-projected(yellow
lines) in the image viewer for visual verification. In addition, using the OpenGL Shading Language
a projective texture mapping is performed for real-time texturing.
An example of the registration of an oblique image to the reconstructed 3D building models of the
USC campus is shown in Figure 71.
Figure 70: The registration is performed by defining a set of correspondence between the 3D object
features and the 2D image features.
87
Figure 71: Texturing of the USC campus.
88
Results
Figure 72 and Figure 73 show the final result of the reconstructed USC campus. The 3D recon-
structed building models are textured using the composite atlases and are integrated with the 3D
models of the roads.
Figure 72: Integration of reconstructed buildings with 3D road models.
89
Figure 73: USC campus.
90
Chapter 9
Conclusion
We have presented a complete and robust system for rapidly creating realistic virtual cities from
LiDAR and imagery sensor data. Two main components are developed for the system: a modeling
techniquefortheextractionofpolygonal3DmodelsfromLiDARdata,andaflexibletextureblending
technique for the generation of photorealistic textures from multiple imagery data.
Firstly, we addressed the problem of rapid creation of large-scale virtual environments from LiDAR
data. We presented a primitive-based approach and introduced a novel 2D parameterization of
geometric primitives for the automatic identification and reconstruction of common linear building
types. The parameterization increases the computational speed of the reconstruction process by
exploiting common symmetry constraints found in man-made structures and can be extended to
other common linear buildings types. In addition, we have presented two flexible primitives for the
reconstruction of buildings containing complex linear and non-linear surfaces. A key characteristic
of this method is the optimization-driven boundary extraction and refinement which results in high-
fidelity 3D models.
Additionally, we have proposed a fast and fully automatic approach for reconstructing virtual en-
vironments which does not rely on any data dependent parameters. In order to achieve this we
introduced a robust and effective clustering technique which segments the data based on a proba-
bilistic analysis of their geometric properties. Experimental results were demonstrated, which verify
the validity of the modeling approach and its successfull application on a variety of different data
sets. Finally,wehavepresentedtheeffectivequalitativeandquantitativemeasureswehaveemployed
for the evaluation of the generated 3D models.
91
Thirdly, we addressed the problem of texturing and presented a rendering pipeline for the compo-
sition of photorealistic textures which allows the effective handling of missing or occluded areas.
Imagery registered interactively to the geometry is used to recover the camera poses and texture
atlasesaregeneratedbyintegratingmultipleinformationtogether,thereforeminimizinginformation
loss. The multiple information is integrated using the proposed non-linear blending function which
significantlyreducestheappearanceofartifactsinthecompositetextures. Furthermore, thetexture
composition process is independent of the 3D models, thus it can be applied in different contexts.
The result is a standalone 3D textured model.
Discussion
In this thesis we have presented our approach for the efficient and rapid reconstruction of photoreal-
istic large-scale virtual environments and have shown how information from multiple data resources
can be combined for the generation of a unified, comprehensive and immersive representation of
large-scale areas.
The primary contribution of this research is the rapid creation of photorealistic large-scale urban
citymodelsandtheintegrationoftheproposedmethodsintoaprototypesystem. Inaddition, there
are several contributions in each module as shown below:
1. Preprocessing
(a) Afilteringtechniquebasedonnormaloptimization(graph-cuts)andpointoptimization(gradient-
descent) for smoothing the data, while retaining important discontinuities.
2. Modeling
• A semi-automatic primitive-based method consisting of a minimal set of three primitives:
(a) A parameterized geometric primitive for the reconstruction of common linear roof
types.
(b) A flexible polygonal primitive for the reconstruction of complex linear roof types.
(c) A non-linear primitive for the reconstruction of complex buildings with non-linear
roof types.
• A probabilistic method which clusters the data based on a probabilistic analysis of its
geometric properties. This method is fully automatic and has no data dependencies.
92
3. Texturing
• A rendering pipeline for the composition of photorealistic textures. This texture pipeline
integrates multiple information into a set of amalgamated texture atlases.
4. An integrated system.
• HAL-10K. A prototype modeling system which integrates our proposed methods.
We have extensively tested the performance of our proposed methods with a wide range of remote
sensing data including aerial photographs, satellite images, LiDAR as well as ground images and
presented our results.
93
Bibliography
[BAF04] NobuyukiBannai, Alexander Agathos, and Robert Fisher. Fusing multiplecolor images
for texturing models. Technical Report EDIINFRR0230, The University of Edinburgh,
July 2004.
[BF80] R.C.BollesandM.A.Fischler.Randomsampleconsensus: Aparadigmformodelfitting
withapplicationstoimageanalysisandautomatedcartography.InImageUnderstanding
Workshop, pages 71–88, 1980.
[BJ01] Yuri Boykov and Marie-Pierre Jolly. Interactive graph cuts for optimal boundary and
region segmentation of objects in N-D images. In ICCV, pages 105–112, 2001.
[BVZ99] Yuri Boykov, Olga Veksler, and Ramin Zabih. Fast approximate energy minimization
via graph cuts. In ICCV, pages 377–384, 1999.
[Deb96] Paul Ernest Debevec. Modeling and rendering architecture from photographs. PhD
thesis, University of California, Berkeley, 1996.
[DP] Gardner A. Hawkins T.-Poullis C. Stumpfel J. Jones A.-Yun N. Einarsson P. Lund-
gren T. Fajardo M. Martinez P. Debevec P., Tchou C. Estimating surface reflectance
properties of a complex scene under captured natural illumination.
[DTG
+
04] PaulDebevec,ChrisTchou,AndrewGardner,TimHawkins,CharalambosPoullis,Jessi
Stumpfel, Andrew Jones, Nathaniel Yun, Per Einarsson, Therese Lundgren, Marcos
Fajardo, and Philippe Martinez. Estimating surface reflectance properties of a complex
scene under captured natural illumination. Technical report, University of Southern
California, ICT, 2004.
[DTM96] Paul E. Debevec, Camillo J. Taylor, and Jitendra Malik. Modeling and rendering archi-
tecture from photographs: A hybrid geometry- and image-based approach. In Proceed-
ingsoftheACMConferenceonComputerGraphics,pages11–20,NewYork,August4–9
1996. ACM.
[DYB98] Paul Debevec, Yizhou Yu, and George Boshokov. Efficient view-dependent image-based
rendering with projective texture-mapping. Technical Report CSD-98-1003, University
of California, Berkeley, May 20, 1998.
[FSZ04] Christian Fr¨ uh, Russell Sammon, and Avideh Zakhor. Automated texture mapping of
3Dcitymodelswithobliqueaerialimagery. In 3DPVT,pages396–403.IEEEComputer
Society, 2004.
[FZ03a] C. Fruh and A. Zakhor. Constructing 3D city models by merging ground-based and
airborne views. In IEEE Computer Vision and Pattern Recognition or CVPR, pages II:
562–569, 2003.
94
[FZ03b] Christian Fr¨ uh and Avideh Zakhor. Constructing 3D city models by merging ground-
based and airborne views. In CVPR, pages 562–569. IEEE Computer Society, 2003.
[GFM
+
07] D. Gallup, J. M. Frahm, P. Mordohai, Q. X. Yang, and M. Pollefeys. Real-time plane-
sweeping stereo with multiple sweeping directions. In IEEE Computer Vision and Pat-
tern Recognition or CVPR, pages 1–8, 2007.
[Goo] Google. Google earth. http://earth.google.com.
[GRA03] GRAPHITE. http://www.loria.fr/ levy/Graphite/index.html, 2003.
[HKN00] A. Huertas, Z. Kim, and R. Nevatia. Multisensor integration for building modeling.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR-00), pages 203–210, Los Alamitos, June 13–15 2000. IEEE.
[HYN03] Jinhui Hu, Suya You, and Ulrich Neumann. Approaches to large-scale urban modeling.
IEEE Computer Graphics and Applications, 23(6):62–69, 2003.
[IMWA02] Y. Ishikawa, I. Miyagawa, K. Wakabayashi, and T. Arikawa. Building reconstruction
based on MDL principle from 3D feature points. In Photogrammetric Computer Vision,
page B: 90, 2002.
[IS] George Vosselman Ildiko Suveg. 3d reconstruction of building models.
[LCC] Yi-ChenShaoYen-ChungLaiJiann-YeouRauLiang-ChienChen*,Tee-AnnTeo.Fusion
of lidar data and optical imagery for building modeling.
[Lev44] K.Levenberg. Amethodforthesolutionofcertainnon-linearprobelmsinleastsquares.
Quart. Appl. Math., 2:164–168, 1944.
[LH96] Marc Levoy and Pat Hanrahan. Light Field Rendering. In Computer Graphics Pro-
ceedings, Annual Conference Series, 1996 (ACM SIGGRAPH ’96 Proceedings), pages
31–42, 1996.
[LJN02] Sung Chun Lee, Soon Ki Jung, and Ramakant Nevatia. Automatic pose estimation of
complex 3D building models. In WACV, pages 148–152. IEEE Computer Society, 2002.
[LPRM02] Bruno L´ evy, Sylvain Petitjean, Nicolas Ray, and J´ erome Maillot. Least squares con-
formal maps for automatic texture atlas generation. ACM Transactions on Graphics,
21(3):362–371, July 2002.
[LW05] Veronique Prinet Liu Wei. Building detection from high-resolution satellite image using
probability model. In Geoscience and Remote Sensing Symposium, 2005. IGARSS ’05.
Proceedings. 2005 IEEE International, pages 6, On page(s): 3888– 3891, 2005.
[Mar63] D. W. Marquardt. An algorithm for least-squares estimation of non-linear parameters.
Journal of the Society of Industrial and Applied Mathematics, 11(2):431–441, 1963.
[MAW
+
07] P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J. M. Frahm, R. G. Yang, D. Nister,
and M. Pollefeys. Real-time visibility-based fusion of depth maps. In International
Conference on Computer Vision, pages 1–8, 2007.
[MD03] Alex Reche Martinez and George Drettakis. View-dependent layered projective texture
maps. In Pacific Conference on Computer Graphics and Applications, pages 492–496.
IEEE Computer Society, 2003.
[Mic] Microsoft. Microsoft virtual earth. http://www.microsoft.com/virtualearth.
95
[MSS
+
08] B. C. Matei, H. S. Sawhney, S. Samarasekera, J. Kim, and R. Kumar. Building segmen-
tation for densely built urban regions using aerial LIDAR data. In IEEE CVPR, pages
1–8, 2008.
[MV99] H. G. Maas and G. Vosselman. Two algorithms for extracting building models from
rawlaseraltimetrydata. ISPRS Journal of Photogrammetry and Remote Sensing, 54(2-
3):153–163, July 1999.
[NP02] RamakantNevatiaandKeithE.Price. Automaticandinteractivemodelingofbuildings
in urban environments from aerial images. In ICIP (3), pages 525–528, 2002.
[NYH
+
03] Ulrich Neumann, Suya You, Jinhui Hu, Bolan Jiang, and Jong Weon Lee. Augmented
virtualenvironments(AVE):Dynamicfusionofimageryand3Dmodels. InVR,page61.
IEEE Computer Society, 2003.
[PGD] Charalambos Poullis, Andrew Gardner, and Paul Debevec. Photogrammetric modeling
and image-based rendering for rapid virtual environment creation.
[PGV
+
04] Marc Pollefeys, Luc J. Van Gool, Maarten Vergauwen, Frank Verbiest, Kurt Cornelis,
JanTops, and Reinhard Koch. Visual modeling with a hand-held camera. International
Journal of Computer Vision, 59(3):207–232, 2004.
[PY09a] Charalambos Poullis and Suya You. Photorealistic large-scale urban city model recon-
struction. In VR, pages 153–160. IEEE, 2009.
[PY09b] Charalambos Poullis and Suya You. Realistic virtual cities. In I3D, page to appear.
ACM, 2009.
[PYN07] Charalambos Poullis, Suya You, and Ulrich Neumann. Generating high-resolution tex-
tures for 3D virtual environments using view-independent texture mapping. In ICME,
pages 1295–1298. IEEE, 2007.
[PYN08] Charalambos Poullis, Suya You, and Ulrich Neumann. Rapid creation of large-scale
photorealistic virtual environments. In VR, pages 153–160. IEEE, 2008.
[PYN09] Charalambos Poullis, Suya You, and Ulrich Neumann. Automatic creation of massive
virtual cities. In VR, pages 153–160. IEEE, 2009.
[RB] Franz Rottensteiner and Christian Briese. Building extraction from lidar data.
[RC02a] D. P. Robertson and R. Cipolla. Building architectural models from many views using
map constraints. Lecture Notes in Computer Science, 2351:155–??, 2002.
[RC02b] D. P. Robertson and R. Cipolla. Building architectural models from many views using
map constraints. In European Conference on Computer Vision, page II: 155 ff., 2002.
[Ris83] J. Rissanen. A universal prior for integers and estimation by minimum description
length. Ann. of Statist., 11(2):416–431, 1983.
[RL02] Ramesh Raskar and Kok-Lim Low. Blending multiple views. In Pacific Conference on
Computer Graphics and Applications, pages 145–155. IEEE Computer Society, 2002.
[RTCK05] F. Rottensteiner, J. Trinder, S. Clode, and K. Kubik. Automated delineation of roof
planes from liDAR data. In Workshop on Laser Scanning, pages xx–yy, 2005.
[SA85] S. Suzuki and K. Abe. Topological structural analysis of digitized binary images by
border following. Computer Vision, Graphics and Image Processing, 30:32–46, 1985.
96
[SA02] Ioannis Stamos and Peter K. Allen. Geometry and texture recovery of scenes of large
scale. Computer Vision and Image Understanding, 88(2):94–118, 2002.
[Sch93] G. Schmitt. In Architectura et machina, 1993.
[Tur01] Greg Turk. Texture synthesis on surfaces. In SIGGRAPH, pages 347–354, 2001.
[VKH06] V. Verma, R. Kumar, and S. Hsu. 3D building detection and modeling from aerial
LIDAR data. In IEEE CVPR, pages II: 2213–2220, 2006.
[Wav] Alias Wavefront. Obj file format.
[YHNF03a] Suya You, Jinhui Hu, Ulrich Neumann, and Pamela Fox. Urban site modeling from
liDAR. In Vipin Kumar, Marina L. Gavrilova, Chih Jeng Kenneth Tan, and Pierre
L’Ecuyer,editors,ICCSA (3),volume2669ofLecture Notes in Computer Science,pages
579–588. Springer, 2003.
[YHNF03b] Suya You, Jinhui Hu, Ulrich Neumann, and Pamela Fox. Urban site modeling from
liDAR. In Vipin Kumar, Marina L. Gavrilova, Chih Jeng Kenneth Tan, and Pierre
L’Ecuyer,editors,ICCSA (3),volume2669ofLecture Notes in Computer Science,pages
579–588. Springer, 2003.
[ZPB07] C. Zach, T. Pock, and H. Bischof. A globally optimal algorithm for robust TV-L1 range
image integration. In International Conference on Computer Vision, pages 1–8, 2007.
[ZWT
+
05] Kun Zhou, Xi Wang, Yiying Tong, Mathieu Desbrun, Baining Guo, and Heung-Yeung
Shum. Texturemontage. ACM Trans. Graph., 24(3):1148–1155, 2005.
97
Appendix
Graph-cut
In [BVZ99, BJ01] the authors interpret image segmentation as a graph partition problem. Given an
input image I, an undirected graph G =< V,E > is created where each vertex v
i
∈ V corresponds
to a pixel p
i
∈ I and each undirected edge e
i,j
∈ E represents a link between neighboring pixels
p
i
,p
j
∈I. In addition, two distinguished vertices called terminals V
s
,V
t
, are added to the graph G.
An additional edge is also created connecting every pixel p
i
∈I and the two terminal vertices, e
i,V
s
and e
i,Vt
. For weighted graphs, every edge e∈E has an associated weight w
e
.
A cut C⊂E is a partition of the vertices V of the graph G into two disjoint sets S,T where V
s
∈S
and V
t
∈T. The cost of each cut C is the sum of the weighted edges e∈C and is given by
|C|=
X
∀e∈C
w
e
(40)
The minimum cut problem can then be defined as finding the cut with the minimum cost. An
algorithm for solving this problem has been proven to require polynomial-time [BVZ99].
Energy minimization function
Finding the minimum cut of a graph is equivalent to finding an optimal labeling f :I −→L which
assigns a label l∈L to each pixel p∈I, and f is piecewise smooth and consistent with the original
data. The energy function is then given by,
E(f)=E
data
(f)+λ∗E
smooth
(f) (41)
where λ is the weight of the smoothness term.
98
Energy data term
The data term in equation 41 measures the cost of re-labeling the original data with a new labeling
f. It is defined us the sum of the per-pixel measure(D
p
) of how appropriate each label f
p
−→l∈L
is, for each pixel p∈I in the original data and is given by,
E
data
(f)=
X
p∈I
D
p
(f
p
) (42)
Energy smoothness term
The smoothness term in equation 41 measures the cost of re-labeling neighboring pixels with a new
labeling f. It is defined as the sum of the differences between two neighboring pixels p,q∈I under
a labeling f
p
−→l
p
∈L and f
q
−→l
q
∈L respectively and is given by,
E
smooth
(f)=
X
{p,q}∈N
V
{p,q}
(f
p
,f
q
) (43)
where N is the set of neighboring pixels and V
{p,q}
measures the difference between the neighboring
pixels, also known as the interaction potential function.
Abstract (if available)
Abstract
In recent years there has been an increasing demand for applications which employ miniature representations of the real world to recreate realistic and immersive virtual environments. Many applications ranging from computer graphics and virtual reality, to Geographical Information Systems have already successfully used real world representations derived from the combination of multi-sensory data captured from aerial or satellite imagery and LiDAR(Light Detection and Ranging) scanners. However, despite their widespread and successfull application, the creation of such realistic 3D content remains a complex, time-consuming, expensive and labor-intensive task. In fact, the creation of models is still widely viewed as a specialized art, requiring personnel with extensive training and experience to produce useful models.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
3D urban modeling from city-scale aerial LiDAR data
PDF
Integrating complementary information for photorealistic representation of large-scale environments
PDF
Combining object recognition and tracking for augmented reality
PDF
Interactive rapid part-based 3d modeling from a single image and its applications
PDF
Line segment matching and its applications in 3D urban modeling
PDF
City-scale aerial LiDAR point cloud visualization
PDF
Automatic image matching for mobile multimedia applications
PDF
Hybrid methods for robust image matching and its application in augmented reality
PDF
Correspondence-based analysis of 3D deformable shapes: methods and applications
PDF
Rapid creation of photorealistic virtual reality content with consumer depth cameras
PDF
Multi-scale dynamic capture for high quality digital humans
PDF
Scalable dynamic digital humans
PDF
Exploration of parallelism for probabilistic graphical models
PDF
Domain specific software architecture for large-scale scientific software
PDF
User modeling for human-machine spoken interaction and mediation systems
PDF
Expanding constraint theory to determine well-posedness of large mathematical models
PDF
The incremental commitment spiral model process patterns for rapid-fielding projects
PDF
CUDA deformers for model reduction
PDF
Human pose estimation from a single view point
PDF
Point-based representations for 3D perception and reconstruction
Asset Metadata
Creator
Poullis, Charalambos
(author)
Core Title
Rapid creation of photorealistic large-scale urban city models
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
02/25/2009
Defense Date
12/05/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
city modeling,computer graphics,large-scale modeling,OAI-PMH Harvest,rapid reconstruction,rendering
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
You, Suya (
committee chair
), Kuo, C.-C. Jay (
committee member
), Neumann, Ulrich (
committee member
)
Creator Email
charalambos@poullis.org,poullis@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m1985
Unique identifier
UC1311702
Identifier
etd-Poullis-2620 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-146274 (legacy record id),usctheses-m1985 (legacy record id)
Legacy Identifier
etd-Poullis-2620.pdf
Dmrecord
146274
Document Type
Dissertation
Rights
Poullis, Charalambos
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
city modeling
computer graphics
large-scale modeling
rapid reconstruction
rendering