Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Integrating complementary information for photorealistic representation of large-scale environments
(USC Thesis Other)
Integrating complementary information for photorealistic representation of large-scale environments
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
INTEGRATING COMPLEMENTARY INFORMATION FOR PHOTOREALISTIC
REPRESENTATION OF
LARGE-SCALE ENVIRONMENTS
by
Jinhui Hu
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(COMPUTER SCIENCE)
May 2007
Copyright 2007 Jinhui Hu
Acknowledgements
I would like to thank my advisor, Prof. Ulrich Neumann, for giving me the opportunity to work in his
research group and leading me through the journey of my Ph.D. research. Special thanks to Dr. Suya You,
who guided me to work on projects, select research topics and write papers.
Thanks to all my thesis committee members, Prof. Ramakant Nevatia and Prof. C.-C. Jay Kuo. Their
insightful comments helped me to finish my Ph.D. research and thesis. I would also like to thank Dr. Karen
Liu, who agreed to be one of my qualify exam committee members and gave many valuable comments on
my research.
Thanks to all the members in CGIT. Pamela Fox and Jonathan Mooser helped me a lot on paper
writing. I would also like to thank the other members, Lu Wang, Zhigang Deng, Zhengyao Mo, Ismail Oner
Sebe, Ilya Eckstein, Quan Wang, Sang Yun Lee, Taehyun Rhee , Charalambos 'Charis' Poullis, Bolang
Jiang. I thank them for sharing years of valuable time with me.
I would also like to thank the IMSC staff, Nichole Phillips, Victoria MacKenzie and Alexis Maxwell.
Thanks for their kind help during my years in IMSC.
Thanks to my roommates Peng Fei, Zhefei Xiong, my friends Bin Zhang, Wei Ye, Xiaofei, Wang and
other CSSA friends. I thank them for sharing five years of happy and unforgettable time with me.
Finally and most importantly, I would like to thank my family, my father, mother, sister and brother.
They are always encouraging and supporting me, and I could not finish this long journey without them.
I would like to dedicate this thesis to my parents.
ii
Table of Contents
Acknowledgements .........................................................................................................................................ii
List of Tables...................................................................................................................................................v
List of Figures ................................................................................................................................................vi
Abstract ..........................................................................................................................................................xi
Chap 1 Introduction.........................................................................................................................................1
1.1 Complementary Datasets......................................................................................................................2
1.2 Hybrid Representation..........................................................................................................................3
1.3 Goal of This Research ..........................................................................................................................8
Chap 2 Related Work ......................................................................................................................................9
2.1 Introduction ..........................................................................................................................................9
2.2 Photogrammetry .................................................................................................................................11
2.3Active Sensors .....................................................................................................................................14
2.4 Hybrid Sensors ...................................................................................................................................16
2.5 Discussion and Conclusion.................................................................................................................18
Chap 3 Overview and Contribution...............................................................................................................20
3.1 Overview ............................................................................................................................................20
3.2 Contributions ......................................................................................................................................22
Chap 4 Urban Modeling from LiDAR ..........................................................................................................23
4.1 Introduction ........................................................................................................................................23
4.2 System Overview................................................................................................................................25
4.3 Model Reconstruction from LiDAR Data...........................................................................................26
4.4 Urban Model Classification...............................................................................................................27
4.5 Model Refinement and Fitting............................................................................................................28
4.6 Model Relationship and Editing .........................................................................................................37
4.7 Results and Evaluation .......................................................................................................................38
4.8 Conclusion..........................................................................................................................................39
Chap 5 Integrating Aerial Image ...................................................................................................................43
5.1 Introduction ........................................................................................................................................43
5.2 An Automatic Approach.....................................................................................................................45
5.3 An Interactive Approach.....................................................................................................................50
5.4 Conclusion..........................................................................................................................................60
Chap 6 Integrating Ground Images - Part I: Vanishing Hull........................................................................65
6.1 Introduction ........................................................................................................................................65
6.2 Vanishing Hull....................................................................................................................................68
6.3 Vanishing Points Detection ................................................................................................................71
6.4. Determining Vanishing Hull..............................................................................................................74
6.5 Vanishing Hull for General Edge Error Model...................................................................................76
6.6 Analysis and Results...........................................................................................................................79
6.7 Conclusion..........................................................................................................................................86
iii
Chap 7 Integrating Ground Images - Part II: Pose Recovery ........................................................................88
7.1 Introduction ........................................................................................................................................88
7.2 Pose Estimation ..................................................................................................................................89
7.3. Texture Generation and Mapping......................................................................................................95
7.4. Results ...............................................................................................................................................96
7.5 Conclusion..........................................................................................................................................98
Chap 8 Integrating Ground Images - Part III: Image Repair .......................................................................104
8.1. Introduction .....................................................................................................................................104
8.2. Perspective Effects Estimation ........................................................................................................108
8.3. Intelligent Copy and Paste...............................................................................................................109
8.4. Results and Discussion ....................................................................................................................115
8.5. Conclusion.......................................................................................................................................117
Chap 9 Integrating Videos...........................................................................................................................121
9.1 Introduction ......................................................................................................................................121
9.2 Related Work....................................................................................................................................122
9.3 Texture Storage.................................................................................................................................123
9.4 Texture Painting from Video ............................................................................................................126
9.5 Approach Details ..............................................................................................................................128
9.6 Implementation Details.....................................................................................................................132
9.7 Results ..............................................................................................................................................136
9.8 Conclusion........................................................................................................................................138
Chap 10 Applications ..................................................................................................................................140
10.1 Introduction ....................................................................................................................................140
10.2 Static Scene Modeling ....................................................................................................................141
10.3 Video Projection.............................................................................................................................142
10.4 Dynamic Object Visualization........................................................................................................143
10.5 Dynamic Fusion and Imagery Projection .......................................................................................145
Chap 11 Conclusions...................................................................................................................................147
References...................................................................................................................................................149
iv
List of Tables
Table 1.1 Complementary properties of datasets............................................................................................3
Table 2.1 Comparison of different data sources and techniques...................................................................19
Table 6.1 Parameter settings.........................................................................................................................79
Table 6.2 Outdoor real images comparison ..................................................................................................86
Table 9.1 Comparison of software implementation and graphics hardware acceleration...........................136
v
List of Figures
Figure 1.1 The hybrid representation .............................................................................................................4
Figure 1.2 USC LiDAR data .........................................................................................................................5
Figure 1.3 Aerial images ...............................................................................................................................6
Figure 1.4 Ground image and video frame....................................................................................................7
Figure 2.1 Downtown Los Angeles................................................................................................................9
Figure 2.2 3D model of Tokyo .......................................................................................................................9
Figure 2.3 Virtual London............................................................................................................................10
Figure 2.4 3D models and GIS data for lower- Manhattan...........................................................................10
Figure 2.5 Berkeley’s clock tower................................................................................................................11
Figure 2.6 Sphere panorama.........................................................................................................................12
Figure 2.7 Aerial image of England .............................................................................................................12
Figure 2.8 3D models reconstructed from aerial images by Ascender II .....................................................13
Figure 2.9 LIDAR capture system................................................................................................................14
Figure 2.10 DEM from LIDAR: USC campus and surrounding area ..........................................................14
Figure 2.11 A complex model created by interactive guiding of CSG primitive fitting...............................15
Figure 2.12 3D model automatically extracted from LIDAR.......................................................................16
Figure 2.13 An oblique view of San Francisco modeled by hybrid system .................................................17
Figure 4.1 An example of USC modeling results.........................................................................................24
Figure 4.2 Algorithmic structure of the proposed system ............................................................................25
Figure 4.3 Range image................................................................................................................................26
Figure 4.4 Classifying the LiDAR model.....................................................................................................27
Figure 4.5 Geometry primitives used for representing a building model .....................................................29
Figure 4.6 High order primitives ..................................................................................................................30
Figure 4.7 Edge detection ...........................................................................................................................31
Figure 4.8 Plane primitive fitting ................................................................................................................32
vi
Figure 4.9 A complex building with multiple extracted cuboids.................................................................32
Figure 4.10 A complex building with four roofs elements .........................................................................33
Figure 4.11 Fitting a pole with a sloped top surface....................................................................................34
Figure 4.12 Fitting a partial sphere with a cylinder bottom.........................................................................35
Figure 4.13 High order primitive fitting.......................................................................................................37
Figure 4.14 Modeling results of USC...........................................................................................................40
Figure 4.15. Modeling result of part of the Washington DC with complex shapes .....................................41
Figure 4.16. A close up view of a complex building in the Washington DC ...............................................42
Figure 5.1. Algorithmic structure and work flow of integrating an aerial image .........................................43
Figure 5.2 An example of hybrid modeling..................................................................................................44
Figure 5.3 LIDAR data and aerial image......................................................................................................45
Figure 5.4 Data classification .......................................................................................................................46
Figure 5.5 Edge detection.............................................................................................................................47
Figure 5.6. Process edges ............................................................................................................................48
Figure 5.7 Hypotheses and modeling .........................................................................................................49
Figure 5.8 Complete models of Purdue dataset ............................................................................................50
Figure 5.9 Line detection and image rectification ........................................................................................52
Figure 5.10 Outline extraction......................................................................................................................53
Figure 5.11 Outlines comparison..................................................................................................................54
Figure 5.12 Linear primitive fitting..............................................................................................................55
Figure 5.13 Non-linear primitive fitting ......................................................................................................55
Figure 5.14 Hybrid modeling result .............................................................................................................56
Figure 5.15. Compute normal.......................................................................................................................56
Figure 5.16 Normal correction and polygon intersection .............................................................................58
Figure 5.17 Compare results.........................................................................................................................61
Figure 5.18 Two overviews of the modeling results of USC campus .........................................................62
Figure 5.19 Two close-up views of the modeling results of USC campus..................................................63
vii
Figure 5.20 Hybrid modeling results with ground view images..................................................................64
Figure 6.1. Edge error model.......................................................................................................................69
Figure 6.2. The vanishing hull......................................................................................................................69
Figure 6.3. The shape of a vanishing hull.....................................................................................................70
Figure 6.4. Grouping with an edge error model ...........................................................................................71
Figure 6.5. Compute probability distribution for a full edge region.............................................................77
Figure 6.6. Performance comparison of VH and ML method on synthetic data with random noise............81
Figure 6.7. Performance comparison on synthetic data with Gaussian noise...............................................82
Figure 6.8. Real data comparison .................................................................................................................84
Figure 6.9. Our door images.........................................................................................................................85
Figure 6.10. Compare rectified image ..........................................................................................................85
Figure 6.11. Vanishing hull of a real image .................................................................................................87
Figure 7.1 Hard constraints optimization .....................................................................................................92
Figure 7.2 Translation estimation.................................................................................................................93
Figure 7.3 A mosaic image...........................................................................................................................94
Figure 7.4 Results of generated textures.......................................................................................................99
Figure 7.5 Remove occlusions......................................................................................................................99
Figure 7.6 Results of generated textures.....................................................................................................100
Figure 7.7 Results of generated textures.....................................................................................................101
Figure 7.8. Rendering results of USC central area .....................................................................................102
Figure 7.9 Rendering results of a dorm area in USC..................................................................................103
Figure 8.1. An example of occlusion removal............................................................................................105
Figure 8.2. Line extraction for vanishing points detection .........................................................................108
Figure 8.3. Rectified image ........................................................................................................................109
Figure 8.4. Data flow and transformations .................................................................................................110
Figure 8.5. Process of removing occlusions ...............................................................................................111
Figure 8.6. Boundary regions for match and cut ........................................................................................112
viii
Figure 8.7. Graph cut to find a cut path......................................................................................................113
Figure 8.8. Blending with constant widow size and adaptive blending......................................................114
Figure 8.9. Result of Microsoft Digit Image Pro 10...................................................................................116
Figure 8.10. Patch size in different image repair techniques......................................................................117
Figure 8.11. Results....................................................................................................................................118
Figure 8.12. More results............................................................................................................................118
Figure 8.13. Structure replacement.............................................................................................................118
Figure 8.14. Texture replacement between buildings with different perspective effects ...........................119
Figure 8.15. Generate new patches.............................................................................................................119
Figure 8.16. Extend buildings ....................................................................................................................119
Figure 8.17 Rendering results of a dorm area in USC with trees removed ................................................120
Figure 9.1 Polygon clipping .......................................................................................................................123
Figure 9.2 Texture quality ..........................................................................................................................124
Figure 9.3 Base-buffer allocation ...............................................................................................................125
Figure 9.4 Overview of the texture painting from video system .............................................................126
Figure 9.5 Model based image warping .....................................................................................................128
Figure 9.6 Result of the model based image warping.................................................................................128
Figure 9.7 Using bilinear interpolation to improve warped image quality.................................................129
Figure 9.8 Occlusion processing ................................................................................................................130
Figure 9.9 Selective texture painting..........................................................................................................130
Figure 9.10 Refining texture alignment......................................................................................................131
Figure 9.11 Detect active buffer.................................................................................................................133
Figure 9.12 Texture painting with simulation dataset ................................................................................138
Figure 9.13. Simulated data for a large-scale environment ........................................................................138
Figure 9.14 Dynamic textures generated from three real video streams.....................................................139
Figure 10.1 AVE system components ........................................................................................................141
Figure 10.2 University Park models ...........................................................................................................142
ix
Figure 10.3 An AVE system screen snapshot ............................................................................................142
Figure 10.4 Examples of tracked vehicles and people................................................................................143
Figure 10.5 Parameters to define a dynamic model polygon......................................................................144
Figure 10.6 Image projection .....................................................................................................................144
Figure 10.7 Dynamic models .....................................................................................................................144
Figure 10.8 Quad-window AVE display ....................................................................................................146
x
Abstract
A wealth of datasets from different sensors exists for environment representation. The key observations of
this thesis are that the different datasets are complementary and that fusing information from
complementary datasets reduces errors in processing each dataset. In addition, a fusion method benefits
from the merit of each dataset, hence helps us to represent large-scale environments in an efficient and
accurate way.
This thesis presents a hybrid approach fusing information from four complementary datasets, LiDAR
data, aerial images, ground images and videos, to photorealistically represent large-scale environments.
LiDAR data samples are dense in surface points and they directly measure model heights with accuracy up
to centimeters. However, edges from LiDAR data are jaggy due to the relatively low sampling rate (usually
one meter) of the sensor and reconstruction results from LiDAR lack color information. On the other hand,
aerial images provide detailed texture and color information in high-resolution, making them necessary for
texture data and appealing for extracting detailed model features. However, reconstruction from stereo
aerial images often generates sparse points, making them unsuitable for reconstruction of complex surfaces,
such as curved surfaces and roofs with slopes. Ground images offer high-resolution texture information and
details of model façades, but they are local, static and lack the capability to provide information of the most
recent changes in the environment. Live videos are real-time, making them ideal for updating the
information of the environment, however, they are often low-resolution. A natural conclusion is to combine
the geometry, photometry, and other sensing sources to compensate for the shortcomings of each sensing
technology and obtain a more detailed and accurate representation of the environment.
In this thesis, we first fuse information from both LiDAR and an aerial image to create urban models
with accurate surfaces and detailed edges. We then enhance the models with high-resolution façade textures
to improve the visual quality, and update them with dynamic textures from videos to capture the most up-
to-date environment information. The representation results have accurate surfaces, detailed edges,
overview colors, ground view textures and real-time imagery information.
xi
Chapter 1
Introduction
With the rapid development of technologies in modeling, remote sensing and imaging, it becomes
increasingly more feasible to model, capture and render a large-scale, photorealistic, dynamical
environment. Such an environment, which captures the most up-to-date model, image, and texture
information, has many applications, including 2D and 3D GIS, simulation, data visualization, fly through,
city planning, and surveillance. The data resources could originate from many different sources, such as
LiDAR (Light Detection and Ranging), aerial images, ground images, GPS, or infra-red sensors. Different
applications impose different requirements on the input data sources. For 2D and 3D GIS, the precision of
the geographical location is critical. Simulation and data visualization, however, need a friendly user
interface and convenient interactions. High quality fly-through requires photorealistic rendering, while
planning and surveillance requirements focus on having the most up-to-date model, texture and image
information. The task of integrating the data from different resources to model, capture, and render a large-
scale, dynamic environment to satisfy these varying requirements is the main motivation of this research.
Different sensors provide various data for environment representation, however, each of these data
sources has its own advantages and drawbacks. A key observation of this research is that the information
from different datasets is complementary, hence integrating information from different datasets can
compensate the shortcomings, and benefit from the merit of each sensing technology.
LiDAR data samples are dense in surface points and they directly measure model heights with
accuracy up to centimeters. However, edges from LiDAR data are jaggy due to the relatively low sample
rate (usually one meter) of the sensor and reconstruction result lack color information. On the other hand,
1
aerial images provide detailed texture and color information in high-resolution, making them necessary for
texture data and appealing for extracting detailed model features. However, reconstruction from stereo
aerial images only leads to sparse points, which makes them unsuitable for reconstruction of complex
surfaces, such as curved surfaces and roofs with slopes. Combining both LiDAR data and aerial images can
generate globally textured 3D models of large-scale environments with accurate surfaces and edges.
However, such models are only top-view, and lack façade textures. Ground images offer high-resolution
texture information and details of model façades, but they are local, static and lack the capability to provide
information of most recent changes of environment imageries. Live videos are real-time, which makes
them ideal to update the information of the environment, however, they are often low-resolution. A natural
conclusion is to combine the geometry, photometry, and other sensing technologies and fuse these data
sources to obtain more detailed and accurate representation of the environment. In this thesis, we present a
system that combines these four complementary datasets in a natural way to represent large-scale
environments. Our representation results have top view surfaces, accurate edges, overview colors, ground
view textures and real-time imagery information.
The rest part of this chapter first analyzes the complementary properties of the four datasets, then
discusses the challenges in each step of the hybrid representation; and concludes with the goal of this
research.
1.1 Complementary Datasets
The four datasets used in this research, LiDAR data, aerial images, ground images and live videos, are
complementary in the following five categories (Table 1.1):
• Geometry VS. Color. LiDAR directly measures surface depth and geometry information, but lack
color information. The other three datasets, aerial image, ground images and videos, have rich
color and texture information, but lack direct geometry information.
• Surface VS. Edges. LiDAR points are dense, making them appealing for surface reconstruction,
especially curved surfaces, but lack detailed edge information due to relatively low sampling rate.
Aerial images provide details of roof edges; and ground images provide details of façade edges.
2
• Global VS Local. Both aerial images and airborne LiDAR are global, making them ideal for large-
scale environments modeling; on the other hand, ground images and videos are local, however,
they provide more details.
• Top-view VS Ground-view. Aerial images and LiDAR are generally top-view; and even oblique
aerial view images only capture part of façade details. On the hand, ground images and videos
provide details of facades, but generally lack the roof information.
• Static VS Dynamic. Aerial images, LiDAR and ground images are all static, making them void in
providing the most recent information on the environment. This problem can be compensated for
with live videos.
Table 1.1 Complementary properties of datasets.
Datasets/
Properties
Geometry
VS. Color
Surface VS.
Edges
Global VS.
Local
Top-view VS.
Ground-view
Static VS.
Dynamic
Aerial Images Color Edges Global Top-view Static
LiDAR Geometry Surface Global Top-view Static
Ground Images Color Edges Local Ground-view Static
Videos Color Edges Local Ground-view Dynamic
1.2 Hybrid Representation
Motivated by the complementary properties of the different datasets, we present a hybrid representation
that fuses these four datasets in a natural way to represent large-scale environments. As shown in Figure
1.1, surfaces information are extracted from LiDAR data; edges are extracted from an aerial image; and
these two data sets are fused to generate urban models with accurate surfaces and detailed edges. The
models are then enhanced with ground view images to improve the visual effects, and live videos are
projected and painted onto the models to reflect the most up-to-date environment changes. The hybrid
method fuses information from different datasets, hence compensating for the shortcomings of each dataset
and benefitting from the merit of all the datasets. The hybrid method creates a detailed and accurate
representation of the dynamic environment.
3
Surface Fitting Outline Extraction
Modeling and Post-processing
LiDAR Point Cloud
Photorealistic, Updated Environment
Aerial Image
Edge Information Surface Information
Ground Images
Pose Recovery and Texture Mapping
Hybrid Modeling
Enhancing with textures
Live Updating
Live Videos
Figure 1.1 The hybrid representation.
1.2.1 Urban Modeling from LiDAR
Urban models are the key factor in environment representation. A survey on city models by the
European Organization for Experimental Photogrammetric Research (OEEPE) shows that 95% of the
participants indicated that 3D building data is of primary interest within city models [For99]. While current
sensing and modeling technologies offer many methods suitable for modeling a single object or a small
number of objects, accurate large-scale urban models still remain costly and difficult to produce, requiring
enormous effort, skill, and time, which results in a painfully slow development of such visual databases .
Of all the urban modeling datasets, airborne LiDAR (Figure 1.2 (a)) has become a rather important
information source for generating high quality 3D digital surface models. A LiDAR sensing system permits
an aircraft to quickly collect a height field with accuracy of centimeters in height and sub-meter in ground
position. Due to its advantages as an active technique for reliable 3D determination, LiDAR offers a fast
and effective way to acquire models for a large urban site.
However, many challenges exist in modeling from LiDAR. Sample-rate limitations and measurement
noise obscures small details and occlusions from vegetation and overhangs make the LiDAR data difficult
to process. Automatic segmentation is another challenge. Although buildings can be segmented from flat
grounds with a height threshold, hilly terrains and vegetations close to buildings make most automatic
4
algorithms fail. Grouping and shape modeling for complex shapes are also difficult, especially for curved
surfaces, which is the main reason that most automatic methods only work for simple shapes. On the other
hand, completely manual methods need intensive user inputs and the accuracy depends entirely on the user
click. Furthermore, the reconstructed models from LiDAR generally have jagged edges and bumpy surfaces
(Figure 1.2(b)), and a model for a University campus size generally has millions of triangle meshes, making
it void in many applications. In general, an interactive model refining method with miner user input is
necessary to reduce the number of triangles and improve the visual effects.
Figure 1.2 USC LiDAR data.
1.2.2 Integrating Aerial Image
LiDAR based modeling can generate high accurate surfaces, however, the model lack color
information and the edges are not accurate due to the relative low sampling rate of LiDAR data. On the
other hand, noisy LiDAR data makes the shape modeling very difficult, even for an interactive system with
user interactions. Aerial images (Figure 1.3) provide rich color information and accurate edge information.
Fusing aerial images into LiDAR will generate more accurate models with edge details; and color
information from aerial images can help the LiDAR segmentation and shape modeling.
One challenge of fusing the two datasets is the registration problem. Since an aerial image are often
downloaded from free websites [Goo], the camera parameter are generally unavailable. We need a method
to estimate the pose with an un-calibrated camera. Another difficult problem is to extract the outlines from
the aerial image. Automatic method works well only for simple shapes, while complex buildings are more
5
difficult to analyze. Color and textures of aerial images provide rich information for modeling, but on the
other hand, it also makes the analyzing process complicate. For a fully automatic method, building outlines
are often missing, long lines tend to break into short pieces, curves are very hard to extract, and shadows
and textures introduce unnecessary lines. Although the performance can be improved by using cues from
range images, it often fails for complex shapes such as those addressed in this thesis (Figure 5.2). On the
other hand, manual methods tend to extract slanted outlines and cause co-linear lines to be non-co-linear
and orthogonal lines to be non-orthogonal. An interactive method to minimize the user inputs and improve
the outline accuracy is necessary.
Figure 1.3 Aerial images.
1.2.3 Integrating Ground Images
Models from LiDAR and aerial images only have the top-view information, the results need to be
enhanced with textures from ground images. Texture mapping is a successful technique to generate scenes
that appear both detailed and realistic while using relatively simple geometry. Ground images can be used
to generate textures to improve the visual effects, which requires camera calibration and pose recovery.
Automatic camera calibration and pose recovery is a challenging task for outdoor building images. The
camera orientation can be estimated using vanishing points [Cip99], however, the estimation and error
analysis of vanishing points lacks a general theoretic framework. For real scenes, heavy trees occlusion of
outdoor urban buildings (Figure 1.4 (a)) causes vanishing points extraction to be a difficult problem. The
removal of occlusions to generate visually more pleasant textures is also challenging. Furthermore, given a
6
camera’s orientation, its translation (relative to models) recovery is often an under-constrained problem
using a single image due to lack of features in rough building models and the narrow field of view of a
camera. Lastly, it is very hard to capture an image that covers a whole building due to the small field of
view a camera and the physical barriers of narrow streets. Multiple images are often necessary to generate
textures for a whole building, which causes other problems such as illumination variation and parallax in
different images. A technique to automatically recover camera pose and generate high quality textures for
given urban building models (especially rough models) is necessary.
Figure 1.4 Ground image (left) and video frame (right).
1.2.4 Integrating Videos
Static textures from ground images are useful in creating a realistic imagery of a virtual environment,
however, they have been difficult to employ in applications that require rapid and dynamic updates of the
environment. On the other hand, videos (Figure 1.4 right) can be used as a texture resource to create
dynamic textures to reflect the most up-to-date changes of environment imagery.
Using videos as the texture extraction resource has several advantages. Texture information is always
updated, which helps to capture the most recent environment information, such as dynamic objects and
illumination changes; Using video as resource also aids in segmenting foreground textures from
background textures (e.g. trees from buildings), which is usually hard to achieve using a static image.
However, using videos as the resource will also have the following challenges:
• Outdoor hand held camera tracking is difficult.
7
• In order to maintain constantly updated texture information, we need to continually capture video,
which leads to infinite texture storages.
• Moving objects need to be separated from static textures, and furthermore, foreground textures
need to be removed in order to keep only background textures.
• Real time implementation is critical.
In general, we need techniques to paint the live videos in real time in order to continuously update the
environment with live information.
1.3 Goal of This Research
This research aims at representing large-scale, dynamic, photorealistic environments by fusing information
from different datasets, and explores the following areas:
• A fast, convenient and accurate way to model large-scale urban building models with complex
shapes based on integration of LiDAR data and aerial images.
• Algorithms to estimate and analyze vanishing points estimation, and camera poses recovery. And
using these algorithms to integrate ground images for texture mapping models.
• Algorithms to semi-automatically remove occlusions from a single image and generate visually
seamless rectified textures.
• A new system that uses live video as the texture resource, and directly paints textures onto 3D
models in real time to update the environment with live information.
• Fusing all the information into an integrated, interactive, 3D environment for information
visualization and decision-making.
8
Chapter 2
Related Work
This chapter discusses the related work of large-scale urban modeling.
2.1 Introduction
Urban models are used widely by government agencies for development planning as well as climate, air
quality, fire propagation, and public safety studies. Commercial users include phone, gas, and electric
companies. In most of these cases the models of buildings, terrain features, vegetation, and traffic
networks are the primary features of interest. A survey on city models by the European Organization for
Experimental Photogrammetric Research (OEEPE) showed that 95% of the participants indicate that 3D
building data is of primary interest within city models [For99].
With advances in sensing and modeling technologies, large-scale urban modeling is both feasible and
essential in many applications. The demand for urban modeling dates back to early human societies
forming urban agglomerations [Shi01]. Early city models were created with wood from elaborate manual
Figure 2.2 3D model of Tokyo [TOK]
Figure 2.1 Downtown Los Angeles [UCL]
9
measurements. Computer technology, computer graphics, and computer aided design (CAD) softwares
now offer powerful tools for creating and visualizing digital models of cities (Figures 2.1, 2.2). However,
these tools still require data to model the real-world structures. Manual measurement and entry are
impractical; so various sensors are employed to acquire accurate data for 3D urban landscapes [Rib02].
Structure height and footprint data are commonly obtained from aerial images and LiDAR. The resulting
3D building models are usually integrated into spatial databases and geographic information systems (GIS)
to support urban planning and analysis applications (Figure 2.3, 2.4).
In prior decades, model acquisition mainly focused on imagery. [Mat90, Gul95] provide an overview
of methods using aerial images. [Ven89, Bra95] focus on building extraction methods. [May99] presents a
survey on automatic building extraction from aerial imagery, classifying methods along three axes:
complexity of data, complexity of models, and complexity of strategy. [For99] groups aerial image
modeling techniques into automatic and semi-automatic categories.
Recent advances include a wider variety of data sources. [Shi01, Bat01] summarize urban modeling of
large cities worldwide from an application viewpoint. According to [Bat01], in 2001, there were 63 serious
application of city modeling, 38 of which were developed in cities with populations of over one million,
and 25 for smaller cities. Web-links are provided for the large city projects.
Figure 2.3 Virtual London from a combination of
CAD, GIS and panoramic images [Bat01]
Figure 2.4 3D models and GIS data for lower-
Manhattan [SIM]
10
The complexity of application demands and technology challenges make urban modeling an intensive
research area. We examine current research with respect to several performance criteria including data
acquisition sources, user interactions, geometry details, model completeness, and intended applications.
Although modeling systems vary with respect to the above criteria, data acquisition exerts a dominant
influence on the characteristics and use of the models produced by any system. Therefore, we cluster the
Figure 2.5 Berkeley’s clock tower [Deb96]
modeling methods into those based on photogrammetry, active sensors, and hybrid sensor systems. We
summarize the performance of each technique with respect to the above criteria.
2.2 Photogrammetry
Modeling from images is a classic problem in computer vision and remote sensing. Photogrammetry offers
a cost-effective means to obtaining large-scale urban models [For99]. The techniques in this category use
nt tems
dev
ilding top information and
Large urban areas are difficult to construct due to the limited area visible
graphed scenes. Users select edge
uild a model from 3D primitives, and then project the model back into the
original image to verify model accuracy (Figure 2.5). View dependent texture mapping is employed to
render photorealistic novel views. This method’s appeal is its efficiency, which derives from a user
interface that provides unified interaction with the image and geometry. Such methods offer an efficient
and effective compromise between the tedium of
manual entry CAD modeling and stereo image
methods that often lack robustness.
Image sequences are also useful data sources
[Kar01]. Marc Polyfey et al. [Pol03] use a hand-
held camera to capture videos of natural scenes,
and automatically reconstruct detailed realistic
2D images without any a priori 3D data. Differe image sensors lend themselves to modeling sys
eloped for terrestrial images, panoramic images, and aerial images.
2.2.1 Terrestrial images
Terrestrial images or ground-level images offer the most convenient data sources. Such data provides
high fidelity ground, vegetation, and building facade details, however, it lacks bu
its range is limited by occlusion.
in each image and the calibration needed to stitch images together. Models created by systems in this
category are usually used for aesthetic applications that only require 3D models of a few structures.
Among the successful systems in this category is Devebec’s facade system [Deb96]. This interactive
system allows users to recover a basic geometric model of the photo
features to help the system b
11
Figure 2.7 Aerial image of England [LAS]
Figure 2.6 sphere panorama [MIT]
models. Both systems [Deb96][Pol03] present very impressive results, however, the modeling results are
local due to the physical limitation of the input data source.
2.2.2 Panoramic images
Panoramic images are another convenient and economic data source (Figure 2.6). Cameras with
special lenses or mirror systems acquire such images, or they can be stitched together from multiple planar-
projection images.
A collection of panoramic images can provide users with a 3D representation of an urban area using
Image-Based Rendering (IBR) techniques. McMillan [Mc95] developed methods for generating novel
views from a small set of initial panoramic images. QuickTime VR [Che95] offers a simple and fast
method for simply viewing panoramic images.
These methods provide realistic but limited views of urban areas. However, since neither approach
eometry data, the integration with other data and scaling to large areas is
diff
system for large urban areas are not yet reported.
2.2
building footprints and roof heights; and can also be rectified into ortho-projections to facilitate the merge
provides any explicit 3D g
icult. Shum et al [Shu99] use an interactive system to extract 3D models from panoramic mosaics. In
the MIT City Scanning project [MIT], image data has been gathered for a portion of the MIT campus, but
3D models from this
.3 Aerial images
Aerial images have several advantages over terrestrial images (Figure 2.7). They can provide accurate
12
of multiple images to cover large geographic areas. Ortho-projections also facilitate fusion with 2D GIS
information to integrate the models with rich databases of information.
e shape). The “best” hypothesis is
ased on shadows (or wall heights in oblique aerial images). Finally, the height is
com
ows 3D models reconstructed from
multiple aerial im
Reconstruction of complete buildings is difficult when a sin
semi-automatic techniques.
Extensive research has been done with aerial images for urban modeling using a single or multiple
images. Building extraction techinques from a single aerial image [Lin98] derive the shape from shadows.
Edges are detected from first- or second-order intensity discontinuity. These edges are linked into
segments and combined into groups based on knowledge or constraints (e.g.: parallelism). Then hypothesis
are formulated based on knowledge (buildings usually have a rectangl
selected and verified b
puted based on shadows. Other methods [Van98] use geometric constraints to extract 3D models from
a single image. However, 3D models from a single image are usually limited to simple shapes, such as
rectangles. In the presence of occlusion, multiply images are needed. Multiple aerial images are used with
stereo algorithms to extract 3D models [Ser00, Nor01]. Figure 2.8 sh
ages [Han01].
Fully automatic techniques are highly desirable due to the complexity and quantity of data needed to
represent large urban areas [May99]. However, there are many barriers to fully automatic systems. The
major two difficulties are automatic segmentation and reconstruction. Segmentation identifies the
buildings in an image and is made difficult by image noise, lighting conditions, and occlusions.
gle building is comprised of a set of complex
substructures. In most cases, aerial images
lack façade information so the resulting
models lack visual realism. The integration of
façade data is usually a manual process and
requires additional sensor data. At this time,
semi-automatic systems are more mature and
practical [Lee00, Gul99]. Many commercial
systems, such as Cybercity[Cyb], are based on
Figure 2.8 3D models reconstructed from aerial
images by Ascender II [Han01]
13
The use of knowledge [Nev98] and machine learning methods [Bel02, Guo01] continue to improve the
performance of automatic building extracting.
Sate
Ground laser scanner
y, [Hui01] uses a vehicle-borne
sens
d 3D façade models. Vehicle-based scanner systems capture
detailed building façade data, however, they lose
ngs and
nt data are obtained.
e 2.9), a scanned laser emits pulses and the time of flight is measured. Given
cloud of 3D point measurements is acquired. LIDAR data
usually requires pre-processing. Noise is filtered and the data are differentially corrected and assembled
llite images [Zha99, Yu99, Cli01] offer another
data source suitable for large-scale modeling.
2.3. Active Sensors
Christian [Chr01] presents a system using a truck
equipped with one camera and two 2D laser
scanners. The camera captures images for textures, the horizontal laser scanner tracks the motion of the
truck, and the vertical scanner captures 3D building façade models. Similarl
or system with three single-row laser scanners and six line cameras. A navigation system fuses data
from GPS, inertial, and odometer sensors to synchronize the laser and image data and track the system
position. Both systems generate texture
accuracy for upper portions of tall buildi
no structure roof or footpri
2.3.1 Airborne LiDAR
Technology for LiDAR sensing evolved in
the 1970’s [Ahm02]. The development of GPS in
the late 1980’s provided an accurate positioning
method, thereby increasing the accuracy and
lowering the cost of LiDAR data. Now, LiDAR
is widely used to capture large-scale 3D terrain data. The basics of the system involve a plane or helicopter
flying over a territory (Figur
Figure 2.9 LIDAR capture system [Ren01]
Figure 2.10 DEM from LIDAR: USC campus and
surrounding area [You03]
accurate sensor position and orientation, a
14
into
rmation for structure roofs and most opaque surfaces,
rban modeling. In recently years, intensive research
modeling systems are manual or semi-automatic, and
Our approach based on LiDAR falls in this category
ar grid n to
ta points. Complex buildings are divided into
con
structure segmentation from DEM. This is usually
usions between buildings or vegetation against buildings create
tation is greatly aided by any 2D footprint information from GIS or CAD data;
how iori. Of the automatic building extracting techniques
and then uses least square fitting to locate roof
scan lines [Ahm02]. Hole filling is performed to create a final regular-grid digital elevation model
(DEM) (Figure 2.10).
Since airborne LIDAR provides accurate 3D info
these measurements can greatly simplify large-scale u
is conducted in this area [Mor00 et al]. Again, some
some are fully automatic.
Semi-automatic
They are several semi-automatic LiDAR systems.
[You03]. Raw LiDAR data is re-sampled into regul
produce a mesh model (Figure 2.10). Using a depth filter and user guidance, mesh points are classified as
terrain or buildings. The building models are further refined automatically using CSG (constructive solid
geometry) models parametrically fit to the segmented da
, followed by hole-filling and triangulatio
nected sets of CSG elements, such as cubes, spheres, and cylinders (Figure 2.11). High-order
primitives (superquadrics) can model irregular shapes. Details are discussed in Chapter 4.
Automatic
Automatic modeling from LiDAR requires
accomplished using height analysis. Occl
difficulties. Segmen
ever, these must be registered and acquired a pr
proposed for LiDAR, the main differences lie in
their segmentation and reconstruction
techniques.
Ahmed [Ahm02] uses a 2D parameter space
for plane detection to extract building roofs
(Figure 2.12). Morgan [Mor00] re-samples the
irregular-spaced LiDAR data into a regular grid
Figure 2.11 A complex model created by interactive
guiding of CSG primitive fitting [You03]
15
planes. Zhao [Zha00] and Seresht [Ser00] both use
image information and DEM for building
reconstruction. Norbert [Nor97a] uses a 2D ground
plan from GIS data to segment and reconstruct
ans [Han99] uses dense laser scanner
data
As we discussed, different sensors provide various data for large-scale urban modeling. The main data
sources are ground and aerial image sensors, aerial active sensors, and 2D footprint data from GIS or CAD
data. Each of these data sources and corresponding techniques has their own advantages and disadvantages
[Rib02]. Images proved detailed texture and color information and they can provide very high accuracy,
making them required for texture data and appealing for extracting edge or footprint data. On the other
hand, LIDAR data samples are dense and provide accurate 3D building and terrain surface information. A
natural conclusion is to fuse these data sources to obtain more accurate and automatic urban models.
agery and Laser Range Sensors
eatures extracted from images can aid in defining model features. Edges from stereo images can
AR model features [Ker02]. High-resolution images can obtain accurate
models from low resolution LiDAR [Rob01]. Semantic information such as shadows can also help in
LiDAR modeling [Joc00]. A fusion of 3D points from both sensor datasets can improve the quality of
model results [Mic98].
Cues from range data can aid in image segmentation and modeling. Correspondence (or image feature
matching) is a classic and difficult computer vision s
buildings. H
. They segment buildings based on height texture
measures, morphological filtering, and local
histogram analysis. The reconstruction phase is based
on invariant moments analysis. George [Geo99] also uses high-density data for planar face reconstruction.
2.4 Hybrid Sensors
Figure 2.12 3D model automatically extracted from
LIDAR [Ahm02]
2.4.1 Hybrid Im
F
improve the accuracy of LiD
problem. Correspondence based on intensity data i
16
difficult and non robust, thereby limiting the
performance of image based modeling
techniques. LiDAR directly measures the heights
of buildings, providing excellent initial estimates
for image matching. The cues from active
sensors significantly reduce computational
elim
method. Models projected onto images often reveal errors that can be measured
subj
es to generate complete models [Rib02] (Figure 2.13).
Grou
o01, Nor97] use ground plan information to help modeling from DSM (Digital Surface Model) or
aerial images. Ground plans provide accurate building features and boundaries that are useful for
segmenting buildings from DSM. Segmentation is simplified and edges provide orientation data that is
useful in model reconstruction.
complexity and processing time as well as
inating many false correspondences [Hue00].
Image data can verify LiDAR models and provide textures for visualization [Rot02, Arm02, Ken02],
since LiDAR does not provide texture data. When there is a lack of ground-truth measurements,
quantitative evaluation of modeling results are difficult. Images offer a natural means to verify the
modeling results from any
Figure 2.13 an oblique view of San Francisco
modeled by hybrid system [Rib02]
ectively or quantitatively. Similarly, images projected as textures onto models can reveal discrepancies
[Neu03].
Data from aerial sensors (aerial images and LiDAR) and ground sensors (ground images and laser
scanners) can be fused for more complete 3D building reconstruction. Aerial sensors provide accurate
footprint and height data, while ground sensors provide accurate façade data. Interactive methods enable
the combining of aerial images and ground imag
nd laser scanner data is fused automatically with models from LiDAR [Fru03]. The automation is
enabled by a horizontal laser scanner that acquires accurate spatial position for registering the two model
sources.
2.4.2 Hybrid DSM or Aerial Image and 2D GIS
2D GIS data with structure footprints are often available in modern cities. Many methods [Oga98,
Cla00, Ge
17
During model reconstruction, 2D building outl
for roof shapes, and height information about surfa
of plan and DSM data often focuses on model-ba
primitives can be fit to the point clouds identified
possible lack of extracted DSM points per segment, fittin
2.5 Discussion and Conclusion
Due to lack of actual site measurements, quan
techniques is difficult. Current systems are often using relative criteria and strategy to verify the accuracy
and correctness of reconstructed model. For example, we can verify the reconstructed model dimensions
by embedding the model into original range sensing da
ines from ground plans are used in forming hypotheses
ces extracted from DSM points. Research in the fusion
sed reconstruction, making the assumption that shape
by a segment of the ground plan [Geo02]. Due to a
g is usually performed in aggregated segments.
tita racy of different
ta. As the sensing data is physically acquired from
very well. Another commonly used strategy is to
util
and technique are necessary to provide more reliable and quantitative
eval
mai the capability to model complex models is a clear technology trend. Second, the data provided by
le sensor technology seems unlikely to produce detailed and
tive evaluation of modeling accu
the real world, it appears to represent the real structures
ize imagery geo-referencing (aerial photographs, ground image/video captured with high-resolution
digital cameras, and terrain maps) to verify the accuracy of the reconstructed geometry model. By
projecting those images onto the geometry models, we are able to immediately observe the errors resulting
from model reconstruction. This strategy appears to allow us to verify the accuracy of very fine building
structures. Still, better strategy
uation.
Table 2.1 lists the performance comparison of different sensors and their corresponding modeling
techniques. From it and above technique evaluation, we observe some clear conclusions. First, most
complex model acquisition is time-consuming, requiring a great deal of operator intervention, resulting in
painfully slow evolution of visual such databases. A technique that require little user intervention while
ntain
any single sensor type is incomplete. One sing
varying characteristics of urban building models. Hybrid techniques that combine different sensors and
techniques are necessary to achieve our goal of highly accurate and complete urban modeling systems. The
18
last observation is that although a number of hybrid methods exist, they only handel simple shapes, and
none of these techinques integreates dynamic information to reflect the up-to-date environment changes.
Data Source Convenience Resolution Model
completeness
Function
Table 2.1: Comparison of different data sources and techniques
Ground High High facade aesthetic
Panoramic High High facade aesthetic
Images
Aerial Medium Medium top-view both
Ground laser scanner Low High facade aesthetic
Active
sensor
LIDAR Low Medium top-view spatial
Aerial borne and ground Low/Medium High complete both
Aerial image and LIDAR Low Medium top-view both
Hybrid DSM or Aerial image
and 2D GIS
Low High/Medium complete both
19
Chapter 3
Overv Contr
3.1 Overview
To represent nvironment, we se info rom bot and an l image to
urfac detailed e . We then he model high-
on extures, and update t th dynami from v pture t st up-to-
ronment information.
Chapt atic n R data to m n areas h complex
lding sha o ds into diffe ents, suc bes, cy spheres
rquadrics. Surface data points from LiDAR are automatically segmented, and best elements are
fitted into the raw LiDAR data to get refined models. The modeling process is a hierarchical approach that
allows users to create a hierarchical building model composed of geometric primitives. This approach has
demonstrated its flexibility and capability for a wide range of complex buildings with irregular shapes.
Models from LiDAR have accurate surface, but lack edge details and color information. Chapter 5
integrates an aerial image to improve the accuracy and visual effects. Edge and color information from the
aerial image are used to aid the model and refine process. Both automatic method using cues from a range
image for segmentation and semi-automatic method using a primitive based method for outline extraction
are presented, experiment results show that the automatic method works well only for simple shapes, while
the interactive method can extract complex surfaces with minor user inputs.
To integrate ground view images for façade textures, we need to estimate the camera’s pose, which is
accomplished using vanishing points techniques. Chapter 6 presents a novel theoretical framework, coined
the Vanishing Hull, to estimate and analyze the optimal solution and stability of vanishing points with a
iew and ibution
a large-scale e first fu rmation f h LiDAR aeria
create urban models with accurate s es and dges enhance t s with
resoluti
date envi
façade t hem wi c textures ideos to ca he mo
er 4 presents a semi-autom system based o LiDA odel urba wit
bui pes. We divide building c mpoun rent elem h as cu linders,
and supe
20
given edge error model. This theory is then use n Chapter 7 to estimate vanishing points, and
furthermore, camera’s poses are estimated based the vanishing points. The equations for camera
orientation estimation with infinite vanishing points a e derived, and the translation is computed using two
a single image does not provide enough constraints. Finally, textures are generated
ve the illumination and parallax problem in different
Chapter 8 presents a novel Intelligent Copy and Paste tool (ICP) to remove occlusions from a single
visually seamless textures. The key idea of the ICP tool is to use an existing source
tem automatically finds a nearby source patch and transfers it to the
des
. In
Cha
d i
on
r
new techniques when
using a color calibration and blending techniques to sol
images.
image and generate
image patch to repair an occluded destination patch. We first explicitly estimate the perspective effects in
the editing image using vanishing points to compensate for the perspective differences between the source
and destination patches. A user removes undesired occlusions by simply defining the destination region
with a rectangle. The ICP sys
tination, using a graph-cut algorithm to seamlessly blend the source region into the destination. The ICP
tool is simple to use and the results show its speed and effectiveness in removing occlusions. We further
show that the tool can also be used in broad applications, including replacing textures, extending structures,
and generating rectified images for textures.
Static textures are useful in creating a realistic image of a virtual environment, however, they have
been difficult to employ in applications that require rapid and dynamic updates of environment imagery
pter 9, we use videos as the texture resource to create dynamic textures to reflect the most up-to-date
changes of environment imagery. An efficient way for textures storage of a large-scale environment is
presented, and dynamic textures are generated in real time by employing videos as texture resources.
Chapter 10 presents an application in AVE (Augmented Virtual Environment) for information
visualization and decision-making, and Chapter 11 concludes the thesis and points some future work.
21
3.2 Contributions
The primary contribution of this research is the hybrid representation that fuses information from four
datasets to create a detailed, accurate and photorealistic representation of a large-scale dynamic
environment. In each of the steps, we also have novel contributions as following:
• A practical primitive based modeling system. Primitives are used in both outline extraction from
an aerial image and surface fitting from LiDAR data. Our system minimizes the user input and
creates urban models with accurate surfaces (accuracy up to ten centimeters) and edges (accuracy
up to a foot) in a fast and convenient way. A wide range of complex buildings in different large-
scale urban areas, including the USC campus and part of Washington DC, have been modeled
using our system, which shows the system’s capability and efficiency in modeling large-scale
urban buildings.
• A novel framework, vanishing hull. This new concept can be used to quantitatively analyze the
stability and accuracy of vanishing points estimation.
• Novel techniques to solve the challenges in generating textures from ground images: heavy tree
occlusions, a camera’s narrow field of view and under-constrained translation problem.
• An intelligent image-editing tool, the ICP tool. The ICP tool is very powerful in removing
occlusions for highly-structured perspective images. It also has broad applications, including
replacing textures, extending structures, and generating rectified images for textures.
• A video painting system that directly paints videos onto 3D models as textures in real-time.
22
Ch p
Urba
4.1 In
The rapi
virtual re ems and building visualization. While many methods are suitable for modeling a
singl o
produce. r goal is the rapid and reliable creation of
com x
Airb A LiDAR
syst p
of centim
limitatio d overhangs
lead h hance
eir utility and visualization value. Figure 4.1 (a) shows the raw USC campus LiDAR data, (b) refined
building models extracted with our system, and (c) the visual quality obtained with textures projected
[Mar78] on the refined models.
One of the challenges in modeling complex buildings is robust segmentation of irregular objects from
noised background. With limited user assistance, however, the arduous can be slacked. Our system is
semi-automatic in that it requires limited user interaction to indicate the interested buildings to be
processed. Once the user input is provided, the system automatically segments the building boundaries and
a 4
n Modeling from LiDAR
troduction
d creation of accurate three-dimensional environment models is useful for applications such as
ality training syst
e r a small number of buildings, an accurate campus-size model remains costly and difficult to
This problem is the main impetus for our work. Ou
ple 3D building models based on LiDAR data.
orne LiDAR data is an important dataset for generating high-quality surface models.
em ermits an aircraft flyover to quickly collect a height field for a large environment with an accuracy
eters in elevation and sub-meter accuracy in ground position (typical). However, sample rate
ns and measurement noise obscures small details and occlusions from vegetation an
to t e data voids in many areas. Our approach is to segment and refine the building models to en
th
23
finds the best surface points for model fitting. By a set of appropriate geometric primitives and
fitting strategies, the system can model a range of com lex buildings with irregular shapes.
dapting a
p
Figure 4.1 (a) Raw LiDAR data after mesh formation, (b) Refined models extracted semi-
automatically by our system, and (c) an image of the refined models rendered using projective
texture mapping.
24
4.2 System Overview
25
Model reconstruction
(Re-sampling, hole-filling, tessellation)
Model classification
(Building detection and segmentation)
Modeling fitting
(Linear and non-linear fitting)
Modeling assembly
(Element relationship)
LiDAR point cloud
Building models Vegetation and ground
Figure 4.2 Algorithmic structure of the proposed system
Model refinement
(Primitive selection, surface segmentation)
Our system (Figure 4.2) begins with a model reconstruction phase followed by a model classification,
refinement and fitting phase. The model reconstruction phase processes the raw LiDAR point cloud to
create a regular-grid 3D mesh model of the environment. Re-sampling, hole-filling, and tessellation
comprise this phase. Triangle meshes are used as the 3D geometric representation since they are easily
converted to other geometric representations; many level-of-detail techniques use triangle meshes;
photometric information is easily added with texture projections; and graphics hardware supports fast
rendering of triangle meshes.
The model classification, refinement and fitting
phases process the reconstructed 3D geometric mesh
model. The global building footprints and roof data
provided by the LiDAR reconstruction is used to
determine a building’s geo-location and isolate it from
the surrounding terrain. Based on the shape of a
building roof (flat-roof, slope-roof, dome-roof, gable-
roof, etc.), we classify a building section or element of
interest (EOI) into one of several groups, and for each
group we define appropriate geometric primitives,
including linear-fitting primitives such as a cube,
wedge, cylinder, polyhedron, and sphere; and
nonlinear-fitting primitives such as superquadrics.
Once a building is segmented, geometric primitives
are fit to an element’s mesh model data, and the best
fitting models represent the complete building structure.
Cu ents
of inte ed (EOI) and associated group type. Once the user input is provided, the system automatically
rrently, our system is semi-automatic in that it only requires user interaction to indicate the elem
rest
26
gment the surface points, computes the element boundary, does the primitive model fitting, and
model. Our system also provides a range of editing tools, allowing users
to further refine the models or obtain a specific representation quickly and accurately.
llation to reconstruct a continuous 3D surface model.
and it has undefin be filled
d interpolation for hole-filling. To preserve edge
f the distance the
ployed to adapt for varying size of holes. When
cts only close neighb terpolation;
re sufficient points for interpolation. Figure 4.3
f an arena rooftop.
se
assembles the complete building
4.3 Model Reconstruction from LiDAR Data
With the cooperation with Airborn1 Inc. [Air], we obtained the LiDAR model of the entire USC campus
and surrounding University Park area. The raw data is a cloud of 3D point samples registered to a world
coordinate system (ATM - Airborne Topographics Mapper). We processed this data by a uniform grid re-
sampling, hole-filling, and tesse
The re-sampled data is a uniform-grid range-image
(Figure 4.3). We used an adaptive-weight neighborhoo
information, the weights are set by an inverse function o
point to be interpolated. An adaptive window size is em
the hole size is only a few points, a small window sele
for large holes, the window size is increased to ensu
illustrates the effect of hole-filling in the range image o
ed areas that need to
between the neighbor points and
ors for weighted in
Figure 4.3 An arena range image is shown (left) before interpolation. Note the hole in the arena
rooftop marked with the red rectangle. The hole is filled in the interpolated range image (right).
Triangle meshes are used as the 3D geometric representation as they are easily converted to other
geometric representations. We have tested several tessellation methods including the closest-neighbor
triangulation and Delaunay triangulation, and found that the Delaunay triangulation appears to be superior
to preserve the topology and connectivity information of the original data.
c. The system allows a user to select
any portion of the input data to reconstruct a 3D mesh model with a user defined re-sample resolution.
Once the parameters of data size and re-sampling resolution are set, the system automatically performs the
steps to process the 3D point cloud and outputs the reconstructed 3D mesh model in VRML format. Figure
4.1 (a) shows part of the reconstructed model of the USC campus and surrounding University Park area at
original sample resolution.
e bare-land subset is the
reconstructed 3D mesh model with the buildings removed.
Figure 4.4 Classifying the LiDAR model as two categories: (left) bare-land, and (middle) buildings. The
extracted buildings are very rough that there are many artifacts remained around buildings. The initial
classification has to be refined in order to remove the undesired areas to improve its utility and visualization
value (right)
The whole processing of model reconstruction is fully automati
4.4 Urban Model Classification
To extract the buildings from the reconstructed 3D mesh model, the points of the mesh model have to be
classified according to if they belong to terrains, buildings, or something else. In our system, we classify
the original LiDAR model into two categories: buildings, and bare-lands. The building subset is a
collection of the building models represented as parametric forms, while th
27
The classification approach is conducted based on an obvious fact: the objects, which have the height
above a certain value, must be either vegetation or buildings. So, by applying a height threshold to the
reconstructed 3D mesh data, we create an approximate building mask. The mask is applied to filter all the
mesh points, and only those masked points are extracted as building points. Figure 4.4 illustrates the
resu
her refined in order to remove the undesired areas. Our strategy is to use an
accurate geometry model to fit the building mesh data to produce a constrained CG building model. Once
rate geometry, we can easily remove those artifacts from
the initial classification by combining the geometry shape cues. Figure 4.4 (right) illustrates the accurate
classification of the bare-land and the buildings embedded in the land.
divide a complex building into several basic
building primitives and model them using a parametric representation. As the models from constructive
solid geometry allow the composition of complex models from basic primitives that are represented as
parametric models, our approach is quite general. Also, as the type of primitive is not limited, may contain
objects with curved surfaces, so the flexibility of model combinations is very high, hence we can model a
range of complex buildings with irregular shapes and surfaces by combining appropriate geometry
primitives and fitting strategies.
n the shape of a building roof (flat-roof, slope-roof, dome-roof, gable-roof, etc.), we classify a
building section into one of several groups, and for each group we define a set of appropriate geometry
lts of applying the approach to classify the USC campus mesh model as the bare-land (Figure 4.4 left)
and the building areas (Figure4.4 middle). As we can see, the extracted building subset is very rough that
there are many artifacts remained around the buildings. Furthermore, the reconstructed models from
LiDAR generally have jagged edges and bumpy surfaces (Figure 4.1(a)), and the model of the USC campus
has more than 4 millions of triangle meshes, making it void in many graphics applications. The initial
reconstruction has to be furt
we obtain the refined building models with accu
4.5 Model Refinement and Fitting
Our model refinement is a primitive-based approach. We
4.5.1 Building Primitives
Based o
28
Cuboids
Spheres + cylinders Slope + Polyhedron
Plane
Figure 4.5 Geometry primitives used for representing a building model
primitives, including the standard CG primitives such as a plane, slope, cube, polyhedron, wedge, cylinder,
and sphere, etc., and high-order surface primitives such as ellipsoids, and superquadrics. These geometry
prim are the basic units used for building construction. They also can be combined with each other to
.5 illustrates a set of building primitives and their
relationships defined for modeling a complex building.
itives
form more complex new primitives. Figure 4
A high-order surface primitive is useful to model irregular shapes and surfaces, such as classical dome-
roof buildings and a coliseum or arena. Superquadrics are a family of parametric shapes that are
mathematically defined as an extension of non-linear general quadric surfaces, and have the capability of
describing a wide variety of irregular shapes with a small number of parameters [Han99]. In our work, we
use them as a general form to describe all the nonlinear high-order primitives, as defined in Equation 4.1.
2 / 2 / π η π ≤ ≤ −
⎥
⎥
⎥
⎦
⎤
⎢
=
η
ω η
ω η
ω η
ε
ε ε
ε ε
1
2 1
2 1
sin cos
cos s
) , (
2
a r
⎢
⎡ co
1
a
π ω π < ≤ −
(4.1)
⎢
⎣
sin
3
a
where
1
ε and
2
ε are the deformation parameters that control the shape of primitive, and parameters
1
a ,
2
a and
3
a defines the primitive size in x, y and z direction, respectively. By selecting different
29
combination of these parameters,
superquadric can model a wide variety of
irregular shapes, and also many standard
CG primitives as well (Figure 4.6).
Once defined, each building primitive
is represented as a parametric model. The
parametric model descri a small but fixed set of variable parameters. The number of
parameters depends on the properties of each primitives and the knowledge assumed for the model fitting.
For example, a generic plane in 3D space can be represented as
which has 3 fitting, we may
also be established for other primitives, which are
bas
4.5.2 Primitive Selection
As we classify the building sections into several groups, in which appropriate building primitives are
defined, we have to provide the system the information to indicate the selected building section and
associated type. We term the building section being processed as the “Element of Interest” (EOI) that is an
area roughly bounding the building section. Currently, the EOI information needs to be provided by a user
in that only few mouse-clicks are required. Once the user input is provided, the system automatically
segments the building borders and surface points, and uses the indicated building primitive to fit the
building mesh model. The amount of user interaction depends on the type of building primitives associate
with the group. For example, the cube primitive is determined by two points and an orientation. So, this
c by ax z
Figure 4.6 High order primitives.
1 , 1 . 0
2 1
= = ε ε 2 , 1 . 0
2 1
= = ε ε 1 , 1
2 1
= = ε ε
bes the primitive by
+ + = (4.2)
parameters need to be estimated. However, in the case of slope-roof
reduce the parameters from 3 to 2 by setting either parameter a or b to zero. This is based on the
observation that if a building’s orientation is nearly parallel to the x or y axis of a defined world-
coordinates, then either parameter a or b will be close to zero for most buildings, i.e. the roof of a building
usually has only one slope along the x or y direction. We use the term “zero x/y slope” to indicate the
constraint for slope-roof fitting. Similar constraints can
ed on our observation and applications. We will detail those in the following sections.
30
primitive fitting needs only two user mouse-clic
most cases, 2 or 3 user mouse-clicks are suff
the amount of user interaction.
ks to indicate two diagonal-points on the roof surface. In
icient. The constraints established for primitives also reduce
4.5
only used building primitives, including their mathematical
Plane Primitives.
he flat-roof is a typical roof type of
man
determined by two reference points and an
orientation. If we align the building’s orientation to the global direction that is defined in our working
coordinates,
.3 Primitive Fitting
The following describes the most comm
parametric representations, and examples of building fitting. The detailed algorithm procedures for each
primitive are also described.
T
-made buildings, which can be modeled
using the plane-primitive group, including
3D plane, cuboids, polyhedron, and the
combined primitives such as hollow-cuboids.
They all share the same property that the
depth surface can be described by equation
(4.2). A 3D plane primitive is usually
a b
Figure 4.7 (a) The 8-neighbor method for edge detection
produces many incorrect edge points. (b)Erroneous edge
points are discarded by use of connectivity information. In
both cases, edge points passed the same d
filters.
epth and slope
we can reduce the specified parameters to 2, i.e. each plane is specified by two diagonal-
points.
After the user indicates (mouse clicks) the two reference points, the system automatically estimates all
corners of the building roof based on the global direction (Figure 4.8). The estimated corner points are then
used for detecting the roof edges using a depth discontinuity constraint. We proposed an improved 8-
neighbors connectivity algorithm to detect building edges (Figure 4.7). First, we used the geometry
connectivity information of Delaunay reconstruction to track the connected edge points. Only those edges
that lie along the Delaunay triangulation are accepted as the possible edge points. Second, we utilized a
31
depth filter to constrain the detected
edges. The depth filter is applied to all
the possible edge points, and only those
poin having similar depth values as
points are
passe
using
least-square fitting, and then the roof
n based on the
fitte
urface segmentation. We opted not to use surface-normal
r experiments show that the depth discontinuity constraint
r segmenting the surface points, the least-square fitting is
Slope Primitive.
n rizontal or vertical normal direction. Similar to the
ts
that of the defined reference
d as correct edge points (Figure
4.8). Once the roof borders have been
extracted, we parameterize them
corners are refined agai
d roof borders.
Plane depth fitting is performed on the se
The depth discontinuity constraint is used for s
information due to its sensitivity to noise. Ou
performs well for surface segmentation. Afte
applied to fit the depth values of those points. F
Slope is a special case of the plane with no
Figure 4.8 The two green points are the user-defined corners,
points are detected edge points, purple points are
ented surface points, and the green lines are fitted edges.
gmented surface points inside the roof border (Figure 4.8).
igure 4.9 shows a complex building with several cuboids.
-zero ho
red
segm
Figure 4.9 A complex building with multiple extracted cuboids produced with 16 mouse clicks in about 2
minutes. (a) Reconstructed raw LiDAR; (b) Refined model; (c) Refined model embedded in original model.
Note the accuracy of the surface and edges compared to the original LiDAR data.
32
F roofs elements. The reconstructed 3D mesh (left), the extracted
model (center), and the embedded model (right). Note the accuracy of the surface and edges.
igure 4.10 A complex building with four
plane primitive, a sloped roof with rectang
plane fitting method.
ular edges is also extracted with two reference points using the
sloped surfaces, however, is more complex. A 3D plane defined The depth fitting for
in E
act that most roofs of real buildings usually have two symmetric slopes. To facilitate
this structure, we combine the two connected slope primitives to form a new primitive: roof. In this case,
three reference points (rather than four if we model the two slopes separately) are need for parameter
estimations: two on the slope edges, and one on the roof ridge. The surface points of the two symmetric
slopes are segmented using the above method. The least-square fitting is performed on the depth values of
the segmented surface points for each of the two slope primitives. The accurate roof ridge is computed
based on the intersection of the two modeled slope planes.
Cylinder Primitive.
quation 4.2 has 3 parameters to be estimated, where the two parameters a, b, represent two slopes in the
x and y directions, and the parameter c is an offset. We reduce the parameters from 3 to 2 based on the
“zero x/y slope” constraint. In the case of a building does not meet the condition, we perform an
orientation alignment to orient the building to the reference direction. The least-square method is also used
for parameter estimation with the segmented surface points inside the detected roof borders.
We observe the f
Surface fitting of a generic cylinder is a nonlinear optimization problem. However, we observe the fact
. Based on this that most cylinder primitives in buildings have an axis perpendicular to the ground
33
constraint, we can eliminate the rotation parameter from the estimation to simplify the primitive as a
vertical cylinder with circle-roofs.
The roof extraction and surface segmentation are similar as the plane case using the depth discontinuity
concentric circles: one for the circle center and two for the radius. To guarantee that there are enough
To achieve an accurate boundary reconstruction from the ragged mesh data, we defined two filters to
refin
The depth filter is similar as the one we applied for plane primitives, but using the circle center’s depth
value as a filtering threshold. The distance filtering is a recursive procedure. Using the detected edge
then use these initial estimates to filter the detected edges. Any edge points whose distance to the circle
center is less than a threshold will pass the filtering. The distance deviation is used as a filtering threshold.
After the distance filtering, the refined edge points are used recursively to estimate a new border parameter.
Figure 4.11 Fitting a pole with a sloped top surface. (a) The user input three points (green points); (b)
Detected edge points (red points) in the initial estimation step, and surface points in purple; (c) Refined edge
points after iteration; (d) The refined model; (e) Refined model overlay with original model.
a b c d e
constraint. Two concentric circles are defined for segmentation: the inner circle for roof border detection,
and the outer circle for surface point segmentation. Three parameters are required for specifying the
surface points for accurate segmentation and model fitting, the defined circles should cover all the possible
surface points on the rooftop.
e the detected edges: a depth filter constraining the edge points to have similar depth values as that of
the defined center, and a distance filter constraining the edge points to be inside of the estimated circle.
points, we first fit them to the circle model to obtain initial estimates of the circle center and radius. We
34
a b c d
e
Figure 4.12 Fitting a partial sphere with a cylinder bottom. (a) The user input two points (green points); (b)
Detected surface points of the initial estimation step; (c) Refined surface points after iteration; (d) The refined
model; (e) Refined model overlay with original model.
Sphere Primitive.
The dome-roof is a very popular roof type in classical buildings. A simple dome-roof can be modeled
as a sphere shape, but more complicated tones may need high-order surfaces to represent them. Similar to
the cylinder primitive, the surface of a sphere is also a quadric surface.
To detect the roof border and surface points, two reference values are needed: one for dome-roof center
and another one for roof size. To guarantee enough surface points for accurate segmentation and fitting,
the defined area should cover all the possible points on the roof surface. Since the section-projection of a
sph
High-order Primitives.
he standard CG primitives have limited capability to model complex objects. One of the innovative
feat es of our system is that it supports high-order modeling primitives to facilitate irregular building
ere is a circle, the methods for sphere-roof detection and surface segmentation are almost the same as
those we used for the cylinder primitive, except for not using the depth filtering as we don’t have the sphere
center in 3D space.
The model fitting is performed on all the segmented spherical surface points. As in the cylinder case,
the distance constraint has also been recursively used to achieve an accurate model reconstruction. The
sphere primitive can also be combined with other type primitives. The most popular usage is the sphere-
cylinder combination. Another usage of sphere primitives that is it can be used as an initializing step for
high-order surface fitting. Since high-order surface fitting normally is a non-linear problem, an appropriate
selection of initial estimates is vital to guarantee a convergence to an optimal solution.
T
ur
35
structures. Superquadrics are a family of parametric shapes that are mathematically defined as an extension
of non-linear generic quadric surfaces. They have the capability of describing a wide variety of irregular
shapes with a small number of parameters. We use the superquadric as a general form to describe all the
nonlinear high-order primitives.
A generic superquadric su ace is described in Equation 4.1, which has parameters to cont l the rf five ro
shape and size deformations. In practice, we often write it as an implicit representation
1 ) ( ) ( ) (
1
1
2
ε
2 2
2
3
2
2
2
1
= + +
⎟
⎟
⎟
⎠
⎞
⎜
⎜
⎜
⎝
⎛
ε
ε
ε ε
a a
y
a
x
z
And the inside- outside function is defined as:
1
1
2 2
2 2 2
) ( ) ( ) ( ) , , (
ε
ε
ε
ε ε
z y x
z y x F + + =
⎟
⎞
⎜
⎛
2
⎟ ⎜
3 2 1
a a a
⎟
⎠
⎜
⎝
Given any point (x, y, z), its position relative to the superquadric surface can be determined by the
following rules:
High-order surface fitting is nonlinear. We use the Levenberg-Marquardt (LM) method [Wil92] to
perform the superquadric surface fitting.
As an example of applying the high-order primitives to model irregular building shapes, we describe
the steps to model the Los Angeles Arena with an ellipsoid primitive. The ellipsoid is a special case of
superquadrics with the deformable parameters 1 , 1
2 1
= = ε ε (Equation 4.3).
1. Object segmentation
1 ) , , ( z y x F
The region-growing approach [Ric] is employed to segment the irregular object from its background.
Giv
(4.3)
>
⇔
⇔
⇔
point on the surface
(4.5)
point inside the surface
(4.4)
1 ) , , ( = z y x F
1 ) , , ( < z y x F
point outside the surface
en a seed-point, the algorithm automatically segments the seeded region based on a defined growing
36
rule. In our implementation, the surface normal and depth information are used to supervise the growing
procedure.
2. Initial surface fitting
To guarantee a converged optimal solution, an appropriate initial value is required for the Levenberg-
Ma
Once initialized, the system fits the ellipsoid primitive to the segmented surface points using LM
algo a modeling example, the algorithm needs 606 iterations to converge to the cor
solution. Figure 4.13 shows the fitting process. The refined model appears to represent the LiDAR data
very
Figure 4.13 is a good example to show the advantage of the parameterized models. The original
buil 10,000 data points and around 20,000 triangles, while the extracted ellipsoid surface
has only 8 parameters, which is a great reduction of storage space. Parameterized models can also be
rend
rough.
4.6. Model Relationship and Editing
All
we add some constraints, including collinear, coplanar, and two points in a line vertical to the ground. The
. (a) Segmented edge and surface points (purple); (b) Sphere
init
rquardt (LM) algorithm [Wil92]. A sphere primitive fitting is used for initialization.
3. High-order surface filling
rithm. In this Aren rect
well.
ding has more than
ered using any user-defined resolution, which is convenient for multi-level-of-details rendering during
interactive fly-th
Figure 4.13 High order primitive fitting
ialization, note that the edge of the sphere does not match the edge of the original data. c)Refined
model with LM optimization; (d)The refined model overlay with the original model, note that the edge
and surface match well.
the elements mentioned above are fitted separately. However, if we don’t constrain the relationship
between elements, there will be gaps between elements (due to data noisy). To make a water-tight model,
37
user only need to adjust the key points of each elements (e.g., diagonal points of cube), the whole element
will be adjusted based on the key points.
ecome very noisy, or there are heavy occlusions, the edge fitting and corner
com
n. Surface points
are ce there are dense that the fitting is reliable.
ake a compact building. In the element editing mode, user can select an element, such as a
cyli
a prototyped modeling system. Based on our on-hand LiDAR dataset, we have
modeled the entire USC campus and surrounding University Park area including the Coliseum, LA Arena,
museums, and gardens (Figure 4.14). The system has been tested using a range of different building
structures. For example, Figure 4.14 top row shows the model of the LA Natural History Museum across
the street from USC campus. Note the inclusion of slanted roof segments and domes makes this facility
y
he LiDAR data. Figure 4.14 the second line shows the whole USC campus model
and ome models to the south of USC, all these buildings can be modeled in around tow days. The third
iginal model has more than 2 million points and 4
When the LiDAR data b
putation results become unreliable. In such situation we allow the user to disable automatic edge
detection function and interactively move the selected corner points to the correct positio
still automatically detected, sin
Our system also supports some model editing function, including original points, refined points and
elements editing. In the original point editing mode, users can move any selected original point in a 3D
space to the correct corner position. In the refine point editing mode, users can edit the key points of each
element to m
nder, and then copy, paste, delete, undelete, and change the parameter and so on. These three editing
modes make our system very convenient for modeling.
4.7 Results and Evaluation
We have implemented
very complicated to model. Our system allows users to create the building models in several minutes b
selecting a few points in t
s
line compares the original and refined models. The or
million triangles, while the refined model only have around 10,000 points and 20,000 triangles with much
better visual quality and the same accuracy. The bottom line shows two images rendered from the refined
models with textures projected using the projective texture mapping technique .
38
The system is further tested on two datasets, part of the Washing DC and the Carson city. Figure 4.15
and 4.16 show the results.
Eva
nts the real
stru
rojecting those images onto the geometry models,
we found that our model edges are accurate in general, but some details are missing due to the relatively
4.8 Conclusion
luation
In our work, we use two strategies to evaluate our modeling system. The first strategy is to verify the
reconstructed model dimensions by embedding the model in the original LiDAR data. As the LiDAR data
is physically acquired from the real world with height accuracy up to centimeters, it represe
ctures very well. We have used the strategy to evaluate the reconstructed accuracy with every primitive
we proposed in the system, and the evaluation shows that our model is loyal to the original LiDAR in both
surfaces and edges (Figure 4.15-4.16).
The second strategy is to use imagery geo-referencing to verify the accuracy of the geometry model.
We evaluated our system using several imagery sources including ground images (Figure 4.1), aerial
photographs (Chapter 5), and videos (Chapter 8). By p
low sampling rate of the LiDAR data.
This chapter presents a primitive based modeling system. Our system is flexible and accurate, and models a
large-scale environment with minor user interactions. Compared with the original LiDAR data, the refined
model significantly reduces the number of points and triangle meshes, dramatically improves the visual
effects, and at the same time maintains the accuracy in both surfaces and edges as the original data.
However, the modeling result lacks color information and edge details due to the limitation of a single data
source.
39
Figure 4.14 From left to right, top to bottom: a) shows the original data of Natural History Museum Of
LA. b) shows the refined model, which includes all the elements we talked about. c) and d) shows all the
models of USC. e)shows part of the original data of USC campus, f)is the refined model from the same
view point. d)shows an image rendered from the refined model using projective texture mapping. The
image shows the high quality of the refined models. f) is an image rendered in the same view point but
with a ground added to the refined models.
40
Figure 4.15. Modeling result of part of the Washington DC with complex shapes. a) The original LiDAR
data; (b) the refined model result in 10minutes; (c) the refined model overlay with the original model,
note that the surfaces and edges match very well.
41
42
Figure 4.16. A close up view of a complex building with slope roofs and curved surfaces in the
Washington DC. (a) The original LiDAR data; (b) the refined model result in a few minutes; (c) the
refined model overlay with the original model, note that the surfaces and edges match very well.
Chapter 5
Integrating Aerial Image
5.1 Introduction
The LiDAR based modeling technique generates highly accurate surfaces, however, the model lacks color
information and the edges are not accurate due to the relatively low sampling rate of the LiDAR data. On
the other hand, noisy LiDAR data makes the shape modeling very difficult even for an interactive system
with user interactions. Aerial images provide rich color information and accurate edge information. Fusing
aerial images into LiDAR can generate more accurate models with edge details, and color information from
aerial images will help LiDAR segmentation and shape modeling.
This chapter addresses the issue of integrating an aerial image into LiDAR data for rapid creation of
accurate building models. As shown in Figure 5.1, the basic ideas of this work are: features extracted from
high-resolution image can improve the accuracy of
low-resolution LiDAR model features; cues from
LiDAR can aid in image segmentation, significantly
reducing the computational complexity and processing
time as well as improving the quality of model results;
and colors from aerial images can be used as textures
LiDAR data, and (c) shows the hybrid modeling
Model reconstruction
(Hole-filling, tessellation)
Model classification
(Segmentation, detection)
Model refinement and optimization
(Building primitives, primitive selection, model fitting)
LiDAR point cloud
Refined Building
Range Image Aerial Image
Edge Information Surface Information
Figure 5.1. Algorithmic structure and work flow
of integrating an aerial image.
to improve visual quality. Figure 5.2 (a) shows a high
resolution aerial image, (b) shows the corresponding
43
results, where the façade textures are captured using ground images and will be discussed in Chapter 7.
The rest of the chapter is organized as following: n 5.2 presents an automatic approach to model
simple shapes from an aerial image and LiDAR; Section 5.3 presents an interactive hybrid modeling
system that can model complex shapes; and Section 5.4 concludes this chapter.
Sectio
Figure 5.2 From top to bottom. (a) an input aerial image; (b) input LiDAR data; (c) the hybrid modeling
tures are from ground images) . result with user interactions in five minutes (façade tex
44
5.2
on a dataset of Purdue
Campus.
5.2.1 Image registration
As afore mentioned, the modeling result can benefit from the integrated information of multiple sensor
sources. These different data sources must first be spatially registered. The information extracted from one
data source which is used to help the analysis of another data source can be roughly classified into two
types: knowledge-level information and pixel-level information. Different level of information has
different requirement on the accuracy of registration. The knowledge-level information is higher-level cues,
e.g. a region information of building, or color and texture information of imagery, which requires low
accuracy on the registration. Huertas et al. [Hue00] extracted shape cues from ISFAR images, and then
used these cues to guide the analysis of the panchromatic (PAN) images. Their results showed that
combining different data sources in knowledge-level can significantly reduce the computation complexity
of fusion algorithm. The pixel-level information is lower-level information, such as edges, footprint, and
corners. To use the pixel-level information, the registration error should be less than the minimum
resolution of the data sources. For example, if the resolution of a LiDAR data is 1 meter, and that of a
aerial image is 0.5 meter, then the registration error should be less than two pixels. Otherwise it is
An Automatic Approach
This section demonstrates an automatic approach to extract outlines for simple shapes from an orthographic
aerial image, and integrate them into the LiDAR modeling. The technique is tested
Figure 5.3 a) Low quality range image from LIDAR data. b) High resolution aerial image
45
meaningless to use the information such as edges from images to improve the accuracy of LiDAR models.
e of Purdue campus, and the associated high-resolution aerial
image. The resolution of the aerial image is around 4 times higher than that of the range image. Since the
aerial image is orthographic, we manually selected 12 pair of point correspondences to find the affine
registration transform between the range and aerial image.
Our goal of using the aerial
image is: employing texture and color
information from imagery to
automate the processes of extraction
buildings from LiDAR, and using
edges extracted from the high-
resolution aerial image to refine the
accuracy of LiDAR models.
A first step is to segment buildings from the background, i.e., classify the data into two sets: a building
set and a terrain set. The range image from LiDAR contains depth for each point, so it is natural to use the
depth for data classification. A simple depth filter is used. Objects below a certain height threshold are
classified as terrains, otherwise, they must belong either to buildings or vegetations. Figure 5.4 (a) shows
the result of using this simple depth filter to classify the range image of Figure 5.3. The dark parts denote
terrains, while the white parts are either buildings or vegetations. However, it is difficult to further classify
the vegetations from the buildings using only the height information. Hans [Han99] presents a technique
using height texture to tackle this problem, however, it is difficult to apply for general cases, especially
when the vegetations are close to buildings. Our solution to this problem is to use the color information
provided by the aerial image. We first map the white points (buildings or vegetations in Figure 5.4) in the
range image to the aerial image, then classify those points into buildings or vegetations based on the color
information. Currently we are using a RGB color classification, i.e., if a point color is green, it is classified
Figure 5.3 shows a LiDAR range imag
5.2.2 Image based classification
Figure 5.4 (a) Data classification result based on pure depth
information, and (b) refined result of using color information from
the aerial image.
46
to the vegetation set, otherwise it belongs to the building set. The refined result is shown in Figure 5.4 (b),
in w
result is shown in Figure 5.5 (a). As we can
ts in the map, making it hard to find correct building edges. We use
sev ange image to refine the result. The building region information
re 5.4 (b)) is used as a cue to filter the edge map so that only the
gure 5.5(b) shows the refined edges near a cube shape building.
the number of matching hypothesis to be large. To further refine
btained from the range image is used to filter the edge map so that
dges, hence the number of hypotheses of building extraction.
hich the vegetation areas have been correctly removed.
5.2.3 Edge extracting from aerial image
Next, we need to extract edges from the high-resolution aerial image. The Canny algorithm is used to
extract building edges from the aerial image, and the edge map
see that there are too many edge poin
eral shape cues extracted from the r
from the segmented range image (Figu
edges near the building regions are kept. Fi
Still, the edge number is large, making
this, the building boundary information o
only the edges near the building bound
we dramatically reduce the number of e
ary are kept (Figure 5.5(c)). Using the cues from the range image,
Figure 5.5 (a) Detected edges from aerial images, (b) refined edges near one building, and (c) refined
edges near building boundary.
5.2.4 Hypothesis and modeling
To extract a building shape, we need to group the detected edges. The general feature grouping
problem is a classic, but still open question in computer vision research. However, we can reduce the
difficult by focusing on a set of building shapes, such as rectangles. A hypotheses-verification strategy is
used to group the building edges. We first build several hypotheses for the building edges, specifically
47
Figure 5.6. (a) Edges after applying a linking filter; (b) edges after slope filter; and (c) combined
edges as hypotheses candidates.
speaking, for a rectangular shape, each two pair of parallel edges form a hypothesis. We then verify them
using the information provided by the range image.
The number of possible hypotheses is quadric to the number of edges. In order to further reduce the
number of hypotheses, we use several filters to reduce the number of edges. First we link the edges
extracted from the aerial image in anti-clockwise way, and edges with lengths less than a threshold is
removed (Figure 5.6(a)). Second, since most of the buildings are parallel to each other, there exists a
global building direction. We use a slope filter to further reduce the number of edges (Figure 5.6(b)). In the
last step, edges are combined according to geometric proximity. Then we estimate the size of building
boundary based on the range image, and use it as another length filter to remove the short combined edges.
The final result is shown in Figure 5.6(c), where the red edges are edges before combination, and white
edges are the resulting candidate edges for the hypothesis formation.
hy xample, there are 5 edges, hence can form 3 rectangle hypotheses (Figure
ch rectangle is mapped back to the range image (Figure 5.7(b)). The
one
8(c).
In the hypothesis formation step, each two pair of parallel edges is selected to form one rectangular
pothesis. In Figure 5.6(c), for e
5.7(a)). To verify the hypothesis, ea
with the maximum overlapping area with the range image is selected as the building boundary. So far,
we have extracted the building outlines from the aerial image for building modeling. Then we extract the
cube’s surface information, height, from the LiDAR data as described in Chapter 4. The two information
are then combined to obtain the complete model parameters. The final modeling result is shown in Figure
48
Figure 5.7 a) Rectangle hypotheses; b) Rectangle mapped back to range image; c) Modeling result.
5.2
or example, the resolution of the aerial image we used
is a
is not that strict. For the pixel level
info
.5 Discussion and Results
Integrating information from the LiDAR data and aerial image for building modeling has several
advantages. First, it improves the model accuracy. F
round 4 times that of the range image. The edges extracted from aerial image are much more accurate
than those detected from the LiDAR data. Second, it helps to automate the modeling process. For example,
in our previous system, the user is required to select two seed points (two mouse clicks) to model a cube
shape building, which is automated by using the above approaches. Finally, it reduces the computation
complexity. For example, by using the shape cues from the range image, we dramatically reduce the
number of hypotheses (more than 90 percent), hence significantly improve the overall performance.
However, there are several issues we should pay attention to when combining information from
multiple sensors. First, the different data sources should be spatially registered. For the knowledge level
information, such as cues about building regions, registration precision
rmation, such as edges and corners, then pixel level registration precision is required. Our work uses
both knowledge level and pixel level information for modeling, so registration precision is critical. Second,
images contain more information than LiDAR, such as textures and colors. However, this complex
information also makes the analysis complicated. One problem is that there are too many edges, which
makes extracting the correct edge information difficult. Another problem is that grouping edges to form
hypothesis for complex building shapes lack general solution. This is also the reason that most currently
automatic systems can only handle some simple model shapes like rectangles.
49
The automated hybrid modeling system is tested on the Purdue campus model. Although the primitive
based modeling technique is flexible to model complex buildings, the automated method can only extract
outlines for simple shapes like rectangles. Figure 5.8 shows the results of applying the system to model the
LiDAR data of the Purdue campus. Due to lack of actual measurement of the buildings, quantitative
evaluation of modeling accuracy is not feasible. We use two methods for evaluation. The first method is
embedding the refined model into original LiDAR model (Figure 5.8(c)). The second method is using
i the
system.
creation of accurate building
models, and enhances these models with ground view images to create fully textured models (details about
magery geo-referencing to verify the accuracy of the models (Figure 5.8(d)). Both results confirm
accuracy of the proposed modeling
a
b
c d
5.3 An Interactive Approach
As afore mentioned, the main limitation of the automated method is that it can only model simple shapes.
In this section, we present an interactive hybrid modeling system that can model complex shapes with
curved surface. The modeling technique is tested on the dataset of the USC campus.
The hybrid system combines LiDAR and a single aerial imagery for rapid
Figure 5.8 Complete models of Purdue dataset. (a) Original LiDAR data, (b) refined
models, (c) refined models embed in original models, and (d) refined models with aerial
image projection for accuracy verification.
50
inte
single dataset. Compared with results from LiDAR only, the
hybrid model has accurate edge information and more building details. Compared with results from stereo
aerial images, the hybrid model has detailed surface information (such as slope and curved surface), and the
height is more accurate thanks to the accuracy of LiDAR data. Our system creates the complex building
compound (Figure 5.2. (c)) in only a few minutes.
5.3.1 Registration with Non-Orthographic Image
We need to register the two datasets to fuse different information into an integrated framework. The
aerial image of USC campus (Figure 5.2 (a)) is non-orthographic, so we cannot directly use an affine
transformation to register it to the LiDAR data. We opt to use a rectification step followed by an affine
registration procedure. Since our goal of using the aerial image is to extract footprints for buildings, it is
necessary to estimate the pose of the aerial image and rectify it to create an orthographic image. The
rectifie
Be access
the camera’s internal parameters, so vanishing points [Cip99] are used to estimate the camera’s internal
e only offers the vanishing points of x-y directions, while the
rial image is taken far
from
grating ground images are described in Chapter 7). We first interactively extract outlines (including
curves) for complex building shapes from a high-resolution aerial image (Figure 5.2 (a)); then
automatically fit parameterized surface information with a primitive based method from LiDAR data
(Figure 5.2 (b)); finally we integrate ground view images with automated pose recovery to generate fully
textured CAD models (Figure 5.2 (c)).
The hybrid method benefits from the merit of each dataset, and creates accurate complex photorealistic
models with minor user interactions, thus suitable for large-scale urban modeling. The hybrid modeling
method is superior to techniques based on a
d image can then be registered to LiDAR using affine transformation.
cause the aerial image (Figure 5.2 (a)) is downloaded from the Internet [Goo], we do not have
to
parameter and orientation. The aerial imag
vanishing point of z direction is inferred by assuming that the principle point is at the camera’s center
[Cip99]. We automatically detect parallel lines (Figure 5.9 (a) shows part of the lines), and estimate the
camera’s focal length and orientation using the three vanishing points. Since the ae
the ground, the difference of the depth of buildings can be ignored. We project it onto a plane with the
recovered camera pose to generate the rectified image (Figure 5.9. (b)).
51
Figure 5.9 (a) Line detection, (b) image rectification.
An affine model (Equation 5.1) is used to register the rectified aerial image to the 2D range image
generated from the LiDAR data. Since the 2D range image is low-resolution, it is very difficult to select the
exact corners, we use the fitted corners of the refined model from LiDAR (Chap. 4). We manually selected
Alth
⎦
⎢
⎣
⎥
⎦
⎢
⎣ ⎦ ⎣ ⎥
⎦
⎢
⎣
y aerial y range y , ,
d shadows and textures introduce
(5.1)
18 pair of point correspondences to find the registration transform between the LiDAR and aerial image.
ough full perspective transformation is possible to register the original aerial image to the 3D LiDAR
data, experiments show that it fails for aerial images close to orthographic, which causes singular cases
while using SVD to compute the pose.
⎥
⎤
⎢
⎡
+
⎥
⎤
⎢
⎡
⎥
⎤
⎢
⎡
=
⎥
⎤
⎢
⎡
x aerial x range x
t
t
I
I
d c
b a
I
I
, ,
5.3.2 Outline Extraction from Aerial Image
⎥
Similar to the Purdue dataset, we first try to automatically extract lines from the original aerial image to
avoid noise introduced by the rectification process. Edges are detected using the Canny edge detector and
lines are extracted using Hough Transform as a preprocess step. These lines are then filtered with a user
defined length threshold and a slope threshold of the global building orientation. We then cluster the lines
into two groups, x and y directions, based on their slopes (Figure 5.9 (a)).
Automatic method works well for simple shapes (as mentioned in Section 5.2), however, complex
buildings are more difficult to analyze. As shown in Figure 5.9(a), many building outlines are missing, long
lines tend to break into short pieces, curves are very hard to extract, an
52
unnecessary lines. Although performance can be improved by using cues from range images, it often fails
for complex shapes such as those addressed here.
Due to the difficulty of the automated method, we leverage the situation to use a primitive based
method with user interactions to extract every outline for buildings with curved surfaces. Similar to the
LiDAR modeling system, a number of primitives are designed to represent the outlines of building shapes,
which are classified into linear primitive and non-linear primitives. Linear primitive models include: plane,
cube, wedge etc, while non-linear primitives include quadric and high order curves.
Two principles are enforced in designing our algorithms for outline extraction, thereby minimizing the
amount of user interaction and improving accuracy. To minimize the user interaction, we allow the user to
clic
put data points, and the curve is discretized with the user defined
resolution.
Our primitive based outline extraction method has many advantages over the manual method. As
shown in Figure 5.11 top row, the extracted outlines keep the regularity of buildings, such as parallelism
and orthogonality, which is true for man-made objects. On the other hand, the manual method (Figure 5.11
bottom row) tends to extract slanted outlines, and the results depend entirely on the user click, which causes
k only a few points for each primitive rather than every vertex. For a cube primitive (Figure 5.10(a)),
only two user clicks along the diagonal are necessary since the other two vertices can be inferred given the
building’s orientation. For a roof with four sloped surfaces (Figure 5.10(b)), only four rather than six input
points are necessary. To improve accuracy, we automatically find the building corner close to the user
click. If there are no detected corners within a distance threshold, the user click is used instead. For curves
(Figure 5.10(c)), we allow the user to click a few control points (generally less than 10), then a quadric
function is used to fit a curve to the in
Figure 5.10 Outline extraction. Yellow dots: user clicks.
a b c
53
co-linear lines to be non-co-linear and orthogonal lines to be non-orthogonal. For the complex building
with 20 primitives, only 52 user clicks are needed, while the manual method needs 134 clicks (a 61%
redu
As described in Chapter 4, xtract surfaces from the
reconstructed LiDAR model at-roof, slope-roof, dome-
roof etc.), a set of geometric tives: plane, cube, wedge,
etc, and nonlinear fitting pr itted in Chapter 4.
For linear primitives, the LiDAR model based on a
height filter and the conn surfaces of a tower shape,
where different color encode nts of different surfaces. The su aces are fit using the points of each face
with linear least squa 2 (b)). A region growing
used to segment surface points for curved rooftops (Figure 5.13 left, purple points), then the
Lev
ction). Our method also extracts curves and discretizes them according to predefined resolution, while
the manual method can only approximate them with polygons.
Figure 5.11 Outlines comparison.
5.3.3 Surface Extraction from LiDAR Revisit
technique is
we present a primitive based fitting technique to e
s. According to the shapes of building rooftops (fl
primitives are defined. They are linear fitting primi
imitive: superquadrics. Here we show some results om
we automatically segment the surface points from
ectivity properties. Figure 5.12 (a) shows the segmented
s poi rf
re technique to obtain the optimal parameters (Figure 5.1
enberg-Marquardt (LM) algorithm is used to fit the non-linear parameters for superquadrics (Figure
5.13 right). Finally, the fitted local models are assembled to a complete building model.
54
5.3.4 Modeling and Processing
We fuse the interactively extracted outlines from the aerial image and automatically extracted surfaces
from the LiDAR model to reconstruct the complete model.
For each primitive, we map the outlines to the LiDAR model with the pre-computed affine
transformation matrix to find the surface information. The surface for a flat cube is just one height, which
is fitted from the LiDAR surface points. We assign the same height for each vertex on top of the cube, and
use the predefined ground height for the bottom vertices, then triangulate the vertices to create the model.
The same method is used for other flat roof primitives, such as polyhedrons and flat cylinders. For a roof
with slopes, we find the correspondin he parameterized slope and the
of surface with curved outlines. Since the surface is
fitte
fined
as t
(5.2)
Figure 5.12 Linear primitive fitting.
Figure 5.13 Non-linear primitive fitting.
g height for each roof vertex using t
mapping function, then triangulate them to generate the model.
The situation is more complicated for a curved ro
d from LiDAR data without considering the constraints of outlines from aerial images, the fitted
surface will generally not pass through the mapped outlines. We use a non-linear optimization technique
considering both surface and outline constraints to solve this problem. A weighted energy function is
defined as in Equation 5.2, where f is the non-linear surface function, Error is the energy function de
he distance of each point to the fitted surface,
i
P is a surface point and
'
i
P is an outline point, and w is
the weight for outlines. The z values of the outline points are found by mapping the outline from aerial
image to LiDAR data. We set w to a high value to penalize surfaces that deviate from the extracted
outlines. Again, the LM algorithm is used to find the best solution, and we tessellate the surface to create
the model.
Outline P
i
Surface P
i
i i
z y x P f Error w z y x P f Error f E
∈ ∈
+ =
'
)) , , ( ( ( )) , , ( ( ( ) (
'
55
All the primitives are then assembled together to create the final model, and we map each vertex to the
aerial image to find the texture coordinate for texture mapping. Figure 5.14 shows the result rendered using
a VRML viewer with roof textures from the aerial image.
Mo
In our system, we do not require the user to
clic
order, we can generate the correct normals for the whole primitive.
del processing
To improve the quality, a series of processes are applied to the model, which includes normal
correction, primitive intersection and redundant polygons removing.
Normals are important for correct lighting during real time rendering, and incorrect triangle vertex
winding order will reverse the normal directions, hence cause wrong rendering results (Figure 5.16(a)).
Although models with reversed normals can be rendered correctly by turning on two-side-lighting in OPGL
rendering, this method dramatically decreases the performance.
k the corners in a specific order while extracting outlines. This makes the system flexible and easy to
use, however, the normals of the generated models may be reversed. We need an algorithm to generate
correct normals for triangles with random ordered vertices in both
clockwise and counter-clockwise orders.
Observing that a primitive is a closed object, we propose a region
growing algorithm to compute the correct normals for each primitive.
As illustrated in Figure 5.15, given a seed triangle ∆ABC, and
supposing its vertex winding order is correct as in counter-clockwise
A C
E
B
D
F
Figure 5.15. Compute normal.
cube , slope
curved surface
Figure 5.14 Hybrid modeling result.
56
We first find the three neighbor triangles of ∆ABC, each of which has an incident edge with triangle
∆ABC, and correct the vertex winding order for them based on ∆ABC. E.g., the vertex winding order of
∆BCD should be CBD because ∆ABC visits vertex B first, then C. We then push the three ne
triangles into a stack, and find their neighbors and fix their vertex winding order. This growing process is
continued until all triangles have been visited. nly problem is to find a seed triangle with a
correct vertex winding order. This is accomplished by initializing the seed with a triangle at the bottom of
the primitive since its normal direction always points to the negative Z direction. The algorithm is
summarized as the following pseudo code:
1. For each primitive, find a bottom triangle.
2. If the z component of its norm x winding order, and push it
into the stack.
. Loop step 3 and 4, return when all triangles are visited.
guaranteed to fix the vertex winding order for all triangles since each primitive is
clos
itive based modeling system, where
com
remove them to improve the rendering performance.
ighbor
Now the o
al vector is positive, reverse the verte
3. Pop a node from the stack, and find its three neighbors.
4. Correct the vertex winding order of the three neighbors based on the popped node, and push them
into the stack.
5
The algorithm is
ed and all triangles in a primitive are connected.
Another process is the primitive intersection. Our system is a prim
plex buildings are represented by a number of different primitives. These primitives may intersect,
however, there are no vertices or lines at the intersection part. We need to find the intersection, split the
polygons, and insert some vertices. There are several reasons for this process. First, vertices and lines can
be used for 3D to 2D correspondence to compute a camera’s pose for façade texture mapping with ground
view images (Chapter 7). Another reason is that traditional texture mapping techniques also need vertices
to assign texture coordinates. On the other hand, the intersection polygons of two primitives are
completely invisible, which makes them redundant. It is necessary to find the redundant polygons and
57
a b c
Since general 3D object intersection is complex, we simplify the problem to 2D based on the
characteristic of our models. Building roofs can be texture mapped with a single aerial image, so we only
focus on the intersection of facade walls, which are all v
Figure 5.16 Normal correction and polygon intersection.
ertical for our models. We project all the walls into
x-y s ace. Figure 5.16 (c) shows that a intersection line is
foun
such cases are rare in our models.
ter th lygons completely inside a primitive, and
rem
model is ML format (Figure 5.14).
5.3
e than 100 buildings within two days. The original
LiD as only 20 thousand triangles with
ing system for two purposes:
ad the aerial image that covers
the refined m el based on the
ows two overview rendered images
plane, and find the intersections of lines in a 2D p
d and two vertices are inserted. This 2D method will not work for models hanging in the air, however,
Af e primitive intersection process, we detect all the po
ove these redundant polygons, which generally reduces by 20% the number of triangle meshes. The
then exported as VR
.5 Results and Comparison
As described in Chapter 4, we have used the primitive based modeling system to model a whole
University campus and surrounding area with mor
AR has more than 4 million triangle meshes, and the refined model h
much better visual quality.
An aerial image is first integrated into the original LiDAR model
generating top-view textures and verifying the model accuracy. We dow
the whole USC campus from the Internet[Goo], then integrate it into
LiDAR data with techniques described in Section 5.3.1. Figure 5.18 sh
nlo
od
58
of the result of the USC campus with top-view textures. Two close up views of several complex building
compounds are show in Figure 5.19, which demonstrates the accuracy of the LiDAR data modeling.
To demonstrate the effectiveness of the hybrid modeling technique described in Section 5.3, we have
selected a complex building compound from LiDAR data (Figure 5.2 (b)), and downloaded a high-
resolution aerial image from the Internet (Figure 5.2(a))[Goo]. We then modeled the building compound
within five minutes. Figure 5.20(a) shows an overview of the building compound, one close-up view is also
shown in 5.20(b). Further details about ground textures will be described in Chapter 7.
Our hybrid modeling sy . Since the surface
eo aerial images [Lee00] have only flat roofs (a), and
the
hard to distinguish in aerial images, but such information is precisely
mai
lobal and generates visually similar results in a
able for large-scale urban modeling. Compared with other commercial
soft
s with curved surfaces.
stem is fast, accurate, and the rendering result is realistic
information is fitted from LiDAR data points, it is loyal to the original data. Manual user verification shows
that the error of building height is within several centimeters compared with the LiDAR data. The precision
of the outlines depends on the aerial image, which is within one foot for the image we used.
The hybrid modeling technique is superior compared with techniques based on aerial images or LiDAR
only. As shown in Figure 5.17, the models from ster
height precision depends on the resolution of aerial image and the camera calibration precision, which
is generally hard to achieve centimeter precision. The models from LiDAR data (result of Chapter 4) have
slope roof information and accurate height (b), but lack outline details. The hybrid modeling result (c)
benefits from both datasets with accurate surface and detailed outlines. We also found that fusing
information from the two datasets helps reduce errors in the modeling process of each dataset. Buildings
with slightly different heights are
ntained in LiDAR data. On the other hand, color information from the aerial image also helps the user
to select the correct primitives in LiDAR data fitting.
Compared with façade system [Deb96], our system is g
much shorter time; hence it is applic
ware, such as Maya, our system is much faster and more accurate. Although an artist can generate a
visually similar model to our results, the manually generated model is often inaccurate. Furthermore, it is
very time consuming to measure every aspect of buildings and adjust every vertex to the correct position,
especially for irregular shape
59
5.4 Conclusion
This chapter presents a hybrid modeling system using two datasets, LiDAR data and an aerial image. An
automated method is presented, which improves the edge accuracy and reduces the computational
complexity, however, works well only for simple shapes. We then present an interactive method that
generates realistic complex models for large-scale with accurate surfaces and detailed outlines.
60
No detail
Flat
Polygon
Slope
Outline detail
Curved surface
Figure 5.17 (a) Stereo aerial images result; (b) LiDAR data result (c) Hybrid modeling result.
61
Figure 5.18 Two overviews of the modeling results of USC campus.
62
Figure 5.19 Two close-up views of the modeling results of USC campus.
63
64
Hybrid modeling results with ground view images. Figure 5.20
Chapter 6
Integrating Ground Images - Part I: Vanishing Hull
Chapter 5 shows some results of façade textures, which are generated from ground images. To extract
textures from ground images, we need to estimate the camera’s pose, which is achieved using vanishing
point techniques. Before going to details of poses estimation, this chapter presents some theory, called the
Vanishing Hull, to estimate and analyze vanishing points. The concept is then used to estimate vanishing
points and compute the camera’s poses for texture mapping in Chapter 7.
6.1 Introduction
A vanishing point is defined as the intersection point of a group of image lines that correspond to the
projection of parallel lines in 3D with an ideal pin-hole camera model. The position of a vanishing point in
the image plane is only determined by the camera center and the orientation of the 3D lines in the camera
system. Vanishing points are valuable in many vision tasks. A traditional application of vanishing points is
building detection in aerial images [Shu99]. Vanishing points can be used to group images lines, which are
then used to form hypotheses of building edges. Cameras can be calibrated using three orthogonal
vanishing points [Cap90, Cip99], and pose can be recovered [Ant00, Tel03]. Other applications of
vanishing points include robot navigation [Sch93] and 3D reconstructing from a single image [Cri00,
Gui00, Lie99]. The goal of this chapter is to present a new consistent framework for vanishing points
detection, and stability and accuracy analysis.
65
6.1.1 Motivation
Much research has been conducted on accurately identifying the position of vanishing points. This
aries from simple line grouping [Heu98] to more complicated methods using statistical models [Col90].
ost previous research work focuses on finding the group of lines corresponding to valid vanishing points,
ts detection problem, and the performances are often evaluated empirically [Shu99].
that focused on finding a theory to quantitatively analyze the stability
a
eometric viewpoint. Observing that the region spanned by lines sampled from an edge error model is a fan
solution of the vanishing point
nder statistical meaning, its variance defines the accuracy of the vanishing point and its shape determines
ing point. Hence, the vanishing hull concept provides a theoretical framework to
quantitatively analyze the region, optimal solution, stability and accuracy of vanishing points. Besides a
, a typical situation for vanishing points estimation.
v
M
or the vanishing poin
However, there is very little research
and accuracy of vanishing points estimation. This is the main motivation of this work.
Different from most previous work [Alm03, Bri91, Bos03, Bur00], we attack this problem from
g
region (Figure 6.1, Section 6.2.1), we intersect all the fan regions to form a convex polygon, called the
Vanishing Hull. This chapter shows that the vanishing hull has some interesting properties and it can lead
us to an optimal estimation of vanishing points. In more details, the vanishing hull gives the region of the
true vanishing point, and its distribution gives the probability of the vanishing point. The expectation of the
vanishing hull, the centroid for a uniform distribution, gives the optimal
u
the stability of the vanish
framework for analyzing vanishing points with image noises, we also present a novel geometric edge
grouping method based on edge error models.
Before continuing, we would like to point out that we are not the first one to explore the idea of
estimating the probability distribution of geometric entities in compute vision. Similar work has been done
by Wolfgan Forstner and Stephan Heuel[Heu01a, b]. Forstner and Heuel combine geometry and statistics
uncertainty [Cla98, Kan92, Tor02] for line grouping [Heu01b] and 3D reconstruction [Heu01a, Sei99].
While their work convert the geometric problem of joint and union into an algebra problem using double
algebra [Bar85, Car94], we are more focused on the geometric entities, such as the shape and centroid of
the vanishing hull. More significantly, we avoid the usage of covariance propagation, which becomes
cumbersome when the number of lines are large
66
Forstner and Heuel’s method is suitable for 3D reconstruction with pairs of lines, while our method is more
suitable for vanishing points estimation with large number of lines
1
.
6.1. 2 Related work
There are two key problems in identifying vanishing points, finding the group of image lines that
correspond to a true vanishing point and computing the position of the vanishing point with the presence of
image noises. According to the methodology used in the two steps, we classify different methods into two
classes, clustering methods and voting methods.
Clustering methods first find possible clusters using the intersection of all pairs of lines [Lie98] or
image gradient orientations [Mcl95]. Then a criterion of distance or angle is used to assign each line to
different clusters. The drawbacks of clustering methods are the high computational complexity and that a
hard threshold is needed to group lines into clusters. Liebowitz and Zisserman [Lie98] first group the image
lines, then use histogram to find the dominant vanishing points. A Maximum-Likelyhood estimator is used
to compute the position of the vanishing point. McLena and Kotturi [Mcl95] integrate edge detection and
line clustering to the process of vanishing points detection, and then use a non-linear method to compute
the position of vanishing points with a statistical edge error model.
Voting methods can be classified into image space methods and Gaussian sphere methods according to
the space they use for voting. Rother [Rot00] accumulates the votes of image lines in the image space, then
search the vanishing points using knowledge such as orthogonal criteria and camera criteria. However, this
method is computational expensive.
A popular method is using the Hough Transform in Gaussian sphere space [Bar83], which is a global
feature extraction method [Cap90]. Many improvements have been made to address the shortcomings of
Hough-based approaches[Col90, Lut94]. Shufelt [Shu99] uses the knowledge of primitive models to reduce
spurious maxima, and an edge error model to improve the robustness to image noises. The drawback of
voting in Gaussian sphere is that the accuracy is limited to the discretization of the accumulator space,
hence it is hard to achieve the precision that the image can provide. Antone and Teller [Ant00, Tel03,
Bos03] use a hybrid method of Hough Transform and least square to detect and compute vanishing points.
1
Our experiments show that the larger the number of lines, the more accurate the vanishing point estimation.
67
G. Schindler and F. Dellaert [Sch04] use a EM method to compute vanishing points in Manhattan world
[Cou03].
Recently, Almansa et al. [Alm03] propose a system using vanishing regions of equal probability to
detect
. We first introduce the vanishing hull concept and its
prop
Sin
Various edge error models have been presented. McLean and Kotturi [Mcl95] use a statistical model to
present both the error of the line centroid and the error of its orientation. Other models using both geometry
and statistics can be found in [Heu01]. Shufelt [Shu99] presents a simple but effective edge error model.
Inspired by the idea of a fan edge region, we derive the concept of vanishing hull by the intersection of all
these regions. We first adapt this simple edge error model to our vanishing hull framework, and then extend
the concept to general edge error models (Section 6.5).
vanishing points without any a priori information. A vanishing region is different from a vanishing
hull, the former is an accumulation space for vanishing points detection while the latter is a tool to analyze
the stability and accuracy of vanishing points.
The rest of the Chapter is organized as following
erties based on a simple end-points edge error model (Section 6.2), then we present some novel
methods for vanishing points detection based on the edge error model (Section 6.3). Section 6.4 presents an
algorithm to determine the vanishing hull and analyze vanishing points estimation, and we extend the
vanishing hull concept to general edge error models in Section 6.5. The performance of our method is
extensively analyzed with both simulation and real data, and quantitatively compared with one state-of-the-
art technique [Lie98](Section 6.6). Finally, we conclude the chapter in Section 6.7.
6.2 Vanishing Hull
ce the idea of vanishing hull is derived from the intersection of edge regions, we first present a simple
edge error model, and then introduce the definition and properties of the vanishing hull.
6.2.1 Edge error model
68
69
Figure 6.2. The vanishing hull
Fan regions
Intersection
Consider the representation of a line segment using two
endpoints, and assume that the two end points have one pixel
prec
regions will be
con
of the fan-shape edge regions with a given edge error
11 shows a vanishing
ion 6.4.
Property I. A vanishing hull is not empty. Proof:
according to the grouping method that will be
presented shortly in Section 6.3.1, an edge is assigned to a cluster only if its edge region covers the
intersection point of the cluster, so the vanishing hull contains at least the intersection point.
Property II. A vanishing hull is convex. Proof: the intersection of convex regions (edge regions are fan
shape regions, hence convex) is convex.
Property III. A true vanishing point lies inside the region of its vanishing hull with the assumption that
the edge error model is correct (the true edge lies inside the fan region). Proof: By definition, the true
vanishing point must lies inside the union of all the edge regions. Now assume the vanishing point VP lies
ision, then two fan regions with a rectangular region in the
middle can be formed by moving the two end points freely in the
pixel squares (Figure 6.1). This region is a concave region, so we
cannot guarantee that the intersection of such
vex. Fortunately, a true vanishing point cannot lie in the
image line segment (Section 6.3.1), so the rectangular region has
no effect on the shape of the intersection of edge regions. We simply take the middle point of the edge, and
form two fan regions with the two end points. Furthermore, a true vanishing point can only lie in one
direction of the edge, so we just take one of the fan region, which is a convex region (Figure 6.1).
6.2.2 Vanishing hull definition and property
The Vanishing Hull is defined as the intersection
model (Figure 6.2). Figure 6.
hull of a real image. We first present the properties of
the vanishing hull, the computation and algorithm
details are presented in sect
Figure 6.1. Edge error model.
70
outs . Then there must exist some edge, say
model of edge L is wrong, which is
nishing hull. This property is important,
stimation of the vanishing point with a
anishing point is the expectation of the
g hull under statistical meaning. With a
oid.
anishing
on-trivial convex polygon, a line segment or a point
ing hull is an open convex hull (Figure 6.3 (a)), the
the vanishing point is unstable. This is reasonable
nlarged to infinity when the vanishing
open vanishing hull indicates a vanishing point at
t at the great circle parallel to the image plane in the Gaussian sphere.
We handl shing point to infinity.
nvex polygon point
direction, and
ide of the vanishing hull but inside the union of the edge region
L, whose edge region does not cover VP. Hence the edge error
contradictory with our assumption. Hence, VP must be inside the va
it tells us where to find the true vanishing point.
Property IV. The centroid of a vanishing hull is the optimal e
uniform distribution model. Proof: the optimal estimation of the v
probability distribution of the vanishing points inside the vanishin
uniform distribution, the expectation of a vanishing hull is its centr
Property V. The variance of a vanishing hull determines the accuracy of the estimated vanishing point.
Proof: this follows directly from the probability theory.
Property VI. The shape of a vanishing hull determines the stability of the estimation of the v
point.
A vanish hull can be open, it can also be a closed n
(Figure 3). When the image lines are parallel, the vanish
centroid is undetermined, which means the estimation of
because edges have noises, any non-zero noise will be e
infinity, which makes the estimation unreliable. An
infinity, which corresponds to a poin
point is at
e the open vanishing hull case by setting the vani
When the vanishing hull is a closed non-trivial co
can be estimated using the centroid with the variance of the distribution as the estimation accuracy. When
the vanishing hull shrinks to a line segment (Figure 6.3 (c)), the uncertainty is along just one
(Figure 6.3 (b)), the vanishing
Figure 6.3. The shape of a vanishing hull. From left to right, open, close, a line and a point.
the vanishing point can be precisely computed when the
vanishing hull degenerates to a point (Figure 6.3 (d)), which
corresponds to an error-free edge model.
6.3 Vanishing Points Detection
Since
the grouping process in both im
6.3
clusters
Gro
reasonable threshold is to use a tuple of both distance and angle, or use normalized angle error [Mcl95].
However, all these methods need a hard threshold, which may be inconsistent with the edge error model.
We use a geometric grouping method that is consistent with the edge error model without any hard
thresholds. For each cluster of two lines, we can find the intersection region A of the edges, and a test edge
is assigned ng
.
we are interested in finding the intersection regions of all
the edges, it is necessary to identify the image lines that can
form a possible vanishing point, i.e., we need to group lines
into different clusters to detect vanishing points. We describe
Figure 6.4. Grouping with an edge error
model
age space and the Gaussian sphere.
.1 Image space
Finding
The Canny edge detector is used to find sub-pixel edges, then Hough Transform is used to find possible
lines, and nearby lines are connected using some user defined threshold. The lines are represented using
two end points.
The intersections of all pairs of line segments are computed to find all possible clusters. The
computational complexity is ) (
2
n O , where n is the number of lines. Grouping lines into different clusters
takes O(n) time, so the overall time complexity is ) (
3
n O , which is expensive for a large number of lines.
We will reduce the complexity using a filtering step and the RANSAC algorithm later.
uping
After finding the clusters, we need a criterion to assign lines to different clusters. A distance criterion
gives priority to close vanishing points, while an angle criterion gives priority to far vanishing points. A
to this cluster when its edge region overlaps with region A. Furthermore, we use a stro
71
constraint for clustering (Figure 6.4). An edge is assigned to a
intersection point of the cluster. This guarantees that the interse
cluster is not empty (Property I in Section
cluster only if its edge region covers the
ction region of the edge regions in each
6.2.2). The normalized length of each edge is accumulated in its
chosen to compute potential vanishing points.
heavy occlusion by trees (Figure 6.9 (a)),
e
acted lines into x and y groups
2
according
anishing points are filtered using the
1) Iterative line length. According to the edge error model, longer lines are more reliable, however, we
would like also to keep shorter lines. So we first filter the lines using a large length threshold, and then
estimate the possible vanishing points, and these points are used to find more short line supporters
according to the grouping method.
2) Covering area. Another observation of the image is that edges of trees only cover a small part of the
image region, so the ratio of the covering area against the image area is also used to filter spurious
e the
gment in the image
filter is very effective in reducing spurious clusters.
RA
f
lines. Gen
assigned cluster, and the maximum clusters are
Filtering spurious vanishing points
Most of our testing images are outdoor building images with
which causes many spurious vanishing points. Knowledge of th
filter spurious vanishing points. First we roughly classify the extr
to the line orientation to reduce the size of line number. Then v
image and vanishing points are used to
following three filters.
vanishing points.
3) Valid vanishing point. Vanishing points ar intersection of image lines that correspond to
parallel lines in 3D. So by definition, a valid vanishing point will not lie on the image se
space. This
NSAC
Even though we classify lines into two groups to reduce the line number, and use filters to reject
spurious clusters, the number of clusters may still be large. Since we are interested in find vanishing points
that correspond to dominant directions, the RANSAC algorithm is used to find the maximum cluster o
erally there exist three orthogonal vanishing points for images of outdoor buildings. We first find
2
Each group may contain more than one vanishing point.
72
the dominant vanishing points for x and y direction. The vanishing point of z direction is estimated using
the orthogonal property of the three directions, and its supporting lines are found using our grouping
method to refine the position using the vanishing hull.
6.3.2 Gaussian sphere
The geometric grouping method can also be adapted to the Gaussian sphere space. According to the
edg
gth. Then we find the dominant cell and scan more short lines to find more
sup
as the advantage of treating each vanishing point equally, including
infi
iscretization accuracy, hence it is hard to achieve the precision that an image can offer.
Sec
tioned grouping method. The last reason is
jected vanishing hull onto the Gaussian sphere is not a polygon anymore, so it is hard to
com
detected. We set the orientation angle for an infinite vanishing point to zero.
e error model, each line casts a swath on the Gaussian sphere rather than a great circle. Similar to
Shufelt [Shu99], we can rasterize the swath into the Gaussian sphere using polygon boundary fill algorithm.
The three heuristic filters can be adapted as well. For the iterative line length filter, we first only scan lines
with length above some threshold into the Gaussian sphere, then discretize the sphere and collect votes
weighted with line len
porting lines. For the other two filters, we find the corresponding lines for each dominant cell, and
compute the covering area and filter them with a threshold, and finally test whether the dominant cells lie
out of each line segment.
The Gaussian sphere method h
nite vanishing point. However, in experiments with real images, we opt to use the clustering method in
image space rather than the Gaussian sphere for several reasons. First, the accuracy of the Gaussian sphere
is limited to the d
ondly, the intersection of the fan regions of the edges that belong to the maximum cell in the Gaussian
sphere may be empty, which makes the vanishing hull meaningless. While in the image space, we can
guarantee the vanishing hull to be non-empty using the aforemen
that the pro
pute the probability distribution and analyze the stability and accuracy. Furthermore, we argue that we
treat finite and infinite vanishing points equally even using image space grouping method. This is due to
Property VI of the vanishing hull. When the vanishing point is finite, it can be well determined by the
vanishing hull. When the vanishing point is at infinity, the vanishing hull is open, which can be easily
73
6.4. Determining Vanishing Hull
6.4.1 Half-plane intersection
Given the group of lines that form a vanishing point, let us consider how to find the vanishing hull. A fan
considered as the intersection of two half planes (Figure 6.1), so the problem of
find
pped as the problem of finding the convex hull of the points in the dual space.
3. Map the two half convex hulls back to prime space, and merge them to find the vanishing hull.
shape edge region can be
ing the intersection of the edge regions can be cast as the problem of half-planes intersection. A naïve
way to solve the problem is to add one half-plane bound line at a time to compute the intersection region,
which takes ) (
2
n O time. However, a more elegant algorithm with ) lg ( n n O time can be presented by
utilizing the properties of dual space [Ber].
There exists an interesting property called duality between lines and points [Ber]. Given a non-vertical
line L, it can be expressed using two parameters (k, b). We can define a point with k as the x coordinate and
b as the y coordinate. This point is called the dual point (L* ) of line L. The space of the lines is called the
prime space, while the space of the points is called the dual space. The half-plane intersection problem in
the prime space can be ma
We first divide the bound lines of half-planes into two sets: an upper set (half-planes lie above the
bound lines) and a lower set (half-planes lie below the bound lines). Let’s consider the upper set first. For
each line L in the upper set, we can find a dual point L*(k, b). The intersection of all the half-planes in the
upper set can be mapped as finding the lower convex hull (the edges of the lower boundary of the convex
hull) of the dual points. The proof is out of the scope of this chapter. Readers may refer to [Ber]. Similarly,
we can find the intersection of all the half-planes in the lower set by determining the upper convex hull of
the dual points. Note that according to our dual mapping, the upper convex hull and the lower convex hull
will not intersect although the upper and lower half-planes do intersect, and that is the reason we split the
half planes into two sets. Finally, the two regions are merged to find the vanishing hull. The algorithm is
summarized as following:
1. Split the half plane into two sets, an upper set and a lower set.
2. Map each set into a dual space and find the corresponding upper and lower convex hull (note
that the two convex hulls are different).
74
Finding a convex hull of a point sets is a well-defined problem [Ber], which take , with n as
ing and merging takes linear time, so the overall time complexity is
d var ist
g point can be computed using Equation 6.1, where D is the
regi
s ) lg ( n n O
the number of points. The mapp
) lg ( n n O .
6.4.2 Expectation and variance of vanishing hull
As aforementioned, we can estimate the position of the vanishing point using the expectation of the
vanishing hull and analyze the accuracy using its variance. Now let us consider the computation of the
expectation an iance of the vanishing hull. With a uniform probability d ribution, the expectation and
variance of the position of a true vanishin
on of the vanishing hull, and A is its area. We only show x coordinate here, y coordinate can be
computed in a similar way. Given a list of vertices of the vanishing hull, it is easy to show that the mean
can be computed using the coordinates of these vertices (Equation 6.2, 6.3).
∫∫
= xdA
A
x
1
) ( µ
D
∫∫
− = dA x x
A
x
2
)) ( (
1
) var( µ
D
∑
−
=0 i
+ +
− =
1
1 1
) (
2
1
n
i i i i
y x y x A
∑
−
=
1
1
n
+ + +
− + =
0
1 1 1
) )( (
6
) (
i
i i i i i i
y x y x x x
A
x µ
The variance of the vanishing hull can also be represented as a simple expression of the coordinates of
the vertices. According to Green’s theorem [Wei], an integral in the region can be converted to an integral
on the boundary:
∫∫ ∫
∂
∂
∂
∂
∂ D D
y
f
x
g
Let 1 = f ,
− = + dxdy dy y x g dx y x f ) ( ) , ( ) , (
3
)) ( (
3
1
x x g µ − = , we have
2
)) ( ( x x
y
f
x
g
µ − =
∂
∂
−
∂
∂
So:
∫∫ ∫∫
∂
∂
∂
∂
D D
y
f
x
g
A A
1 1
− = − = dA dA x x x ) ( )) ( ( ) var(
2
µ
(6.4)
(6.5)
(6.1)
(6.2)
(6.3)
(6.6)
75
dy y x g dx y x f
D
) , ( ) , (
1
+ =
∫
A
∂
Let u consider the integral on line to we can parameterize the point on the line
using t, such that:
s ) , (
i i
y x ) , (
1 1 + + i i
y x ,
) 1
(
i i i
x x t x x − + =
+
) 1
(
i i i
y y t y y − + =
+
∈ t ] 1 , 0 [
Let , it is easy to show that the variance of x coordinate )) ( ( x x a
i i
µ − = x x b − = y y c , ) (
1 i i i + 1 i i i
, ) ( − =
+
can be computed as:
)
3
(
1
) var(
3
2
1
2 3 i
i
n
i i i i i
b
b a b a a c x + + + =
∑
4 2 3
0 i
−
=
Similarly, we can compute the variance of y coordinate.
6.5 Vanishing Hull for General Edge Error Model
The vanishing hull concept can be easily extended to general statistical edge error models. First we show an
augmented vanishing hull consid ing the full edge region of the simple edge error model, then we show
that a general vanishing hull can be derived in a similar way.
n section 6.2.1 we ignored the rectangular gion, and used one of the edge fan to derive the vanish
hull concept. We claimed that the shape of the vanishing hull will not change with this approximation,
which
er
I re ing
is true. However, the probability distribution in the vanishing hull is not a uniform distribution. For a
full edge span region, an angle (called extreme angleθ ) is formed by the vanishing point VP and two
extreme points and (Figure 6.5). According to t ge error model, the two end points have equal
pro ility inside the one-pixel-size square. Th probability of a true edge passing through the vanish
point VP is determined by the overlapping area of the extreme angle and the pixel squares.
Let be the probability of a true edge passing through VP, and be the two end points
of and be the overlapping region of the extreme angle with the two pixel squares, then:
∈ ∈ = (6.9)
(6.7)
(6.8)
1
P
2
P he ed
bab e ing
) , ( VP l p
i i
i
l
1
e
2
e
i
, and l
1
S
2
S
) & ( ) , (
2 2 1 1
S e S e p VP l p
i i
76
Full edge r
V
egion
Assuming the two end points are independent with probability density function (PDF) and
(
,
y x g respectively, then the joint PDF is , which is a 4D uniform distribution. Then
S e ∈ ∈
Note that points and VP are collinear. We can use a line-sweeping method to compute the
inte
) , ( y x f
) ,
,
) , ( * ) , (
, ,
y x g y x f
) , ( VP l p
i i
is the integral of the joint PDF over the region
1
S
2
S .
2 1
, ,
2 2 1 1
) , ( * ) , ( ) , ( dS S d y x g y x f VP l p
i i ∫∫
=
1 2
& S e
e , e
gral in a region. Now consider a sweep line
i
L that passes VP (Figure 6.5), and intersects the two
squares with line segment
1
l and
2
l (Note that
i
L is a line inside the extreme angle, while
i
l is a true edge
inside the whole edge region). Line
i
L sweeps the whole overlapping region when its angle varies from 0
to θ , the integral of the joint PDF over the two line segments is:
2 1
, ,
2 2 1 1
2 2 1 1
) , ( * ) , ( ) & ( dl l d y x g y x f l e l e p
∫
= ∈ ∈
Denote the end points of line segment
1
l as ) , ( ) , (
2 2 1 1
y x y x and
2
l as ) , ( ) , (
2 2 1 1
y x y x , t
& l e l e ∈ ∈
, , ,
hey form a line
seg in a 4D s t v
rm e:
,
ment pace (because we have three linea constraints:
1
l ,
2
l and hey ha e the same slope). The
integral of a unifo distribution over a line segment in 4D is just the length of the line segment, henc
r
2
1
2 ,
1
,
2
2 ,
1
,
2
2
1 2
2
1 2 2 2 1 1
] ) ( ) (
) ( ) [( ) & (
y y x x
y y x x l e l e p
− + − +
− + − = ∈ ∈
(6.10)
(6.11)
(6.12)
Figure 6.5. Compute probab istribution for a full edge region. ility d
i
L
P
P1 P2
Extreme angle θ
Vanishing hull
1
l
2
l
1
S
2
S
1
e
2
e
77
Now we can integrate over the angle to get the region integral:
∈ =
ϑ
ϑ l e p VP l p
i i
& ) ( ( ) , (
1 1
The full expression of th DF of the vanishing hull for a full edge region is complicated,
and the distribution function may not be continuous over the entire region. Even when the distribution for a
single edge is continuous, th F is very high order due to the large number of edges. An
analytical solution to the integral of such a high PDF ery complicated, and may not exist. We use
a discretizing method to solve this problem. The discretization process can achieve high precision (one
pixel) because the vanishing hull region is bounded. Consider the center point , of a cell with one
pixe pute
probability according to Equation 6.13. Then the probability of the point over all the
edges is c expectation and variance can be easily computed
in the discretized vanishing hull.
∫
∈ ϑ ϑ d l e )) (
2 2
e analytical P
e overall PD
order is v
, ) , ( y x VP
l size, we can find the extreme angle relative to edge
i
l , and then com the
)) , ( , ( y x VP l p
i i
) , ( y x VP
omputed and normalized (Equation 6.14). The
∏
−
=
=
1
0
)) , ( , ( ) , (
n
i
i i
y x VP l p y x P
∑
=
D
Such a vanishing hull considering the full edg span region is called “augmented vanishing hull”. In
practice, we found that the vanish g hull often consists of only a few vertices (less than 0 vertices for
1000 lines), which means that the probability of a vanishing point being close to the edge region’s
y x P
y x P
y x P
) , (
) , (
) , (
*
e
in 1
boundary is very low. Since vanishing points close to the middle of the edge region has similar overlapping
area with the pixel squares, it is reasonable to assume a uniform distribution for the vanishing hull formed
of full edge regions.
t to general odel in a wa
augmented vanishing hull. A general edge error model often models the error of the edge centroid and
orientation, the two end points or edge pixels, using a Gaussian distribution. The edge span region is still a
fan shape, so the intersection of the edge regions is a convex hull. Assuming the PDF of the line passing
a vanishing point (x, y) is , then the PDF of the vanishing hull over all the lines is A
(6.13)
(6.14)
We can extend the vanishing hull concep edge error m y similar to the
i
l
) , ( y x f
i ∏
i
i
y x f ) , ( . gain,
78
this vanishing hull, and compute the mean
and variance of the vanishing hull.
We first analyze the vanishing hull concept using synthetic data. The goal of the simulation is to show
mines the sta . The s
pecified magnitude are added to the end points. The vanishing point is
estimated using the centroid o he vanishing hull assum ng a uniform distribution. We extensively a
our Vanishing Hull (VH) algorithm with different parameter settings (Table 6.1). For each of the five
gro
function is a high order non-linear function, we discretize the
6.6 Analysis and Results
We have extensively analyzed the vanishing hull concept using both synthetic and real data. The
performance of our method is also compared to one state-of-the-art algorithm [Lie98].
6.6.1 Simulation data
that a vanishing hull gives the region of the true vanishing point, its expectation is the optimal solution, its
shape deter bility and its variance determines the accuracy of the estimation imulation is
designed as following. A group of 3D parallel lines are projected by an ideal pin-hole camera to an image
plane, then random noises with s
f t i nalyze
ups of parameter settings, we sample the space with 100 evenly distributed intervals, the other four
parameters are set as constant when one parameter varies in its range to test the performance relative to
each parameter.
Parameter Range Other parameter settings
1.line orientation
Table 6.1 Parameter settings.
] 40 , 01 . 0 [ ∈ θ
40 = fov
50 = l 5 . 0 = ε 200 = n
angle (degree)
2.camera field of view (degree) ] 80 , 20 [ ∈ fov
1 = θ 50 = l
5 . 0 = ε 200 = n
3.image line length (pixel) ] 100 , 10 [ ∈ l
10 = θ
40 = fov
5 . 0 = ε 200 = n
4.image noise magnitude (pixel) ] 5 . 0 , 05 . 0 [ ∈ ε
1 = θ
40 = fov
= l = n 50 200
5.number of image lines ] 1000 , 20 [ ∈ n
5 = θ
40 = fov
50 = l 5 . 0 = ε
79
Vanishing hull is the true vanishing point region
he simulation shows that all the true vanishing points lie inside the vanishing hull. This is logical
de is specified, so the edge error model exactly predicts the region of
the true edge region, hence the true vanishing points lie inside the vanishing hull.
The result of VH method is compared with other two methods, Least Square (LS) method and
Maximum Likelyhood (ML) method [Lie98], to show that the expectation is the optimal solution. The
com
T
because the maximum noise magnitu
Expectation is the optimal solution
parison criterion is the recovered orientation angle error relative to the ground truth. LS method uses
least square to find the vanishing point closest to all lines, and ML method uses a non-linear method to
minimize the distance of the line that passes the vanishing point and the mid-point to the two end points.
According to our implementation, the difference of the ML and LS method is often several pixels, so the
angle difference is very small. This is because ML method uses a non-linear optimization method, which
often gives a local minimum close to the result of LS method. We just show the result of VH and ML
method to make the figure clear (Figure 6.6).
The parameter θ and fov are related to the perspective effect of images. The result (Figure 6.6 (a), (b))
shows that the ML method gives large errors for weak perspective images (up to 40 degrees when the
orientation angle is less than 0.1 degrees), while VH method performs reliably with maximum angle error
less than 0.5 degrees and averag he perspective effect is strong (orientation
angle is larger than 10 degrees), the vanishing hull shrinks to several dozens of pixels, both methods
perform well.
e less than 0.1 degrees. When t
The parameter l and ε are related to the quality of edge The simulation (Figure 6.6 (c), (d)) shows that
the ntation errors for poor quality edges (
s.
ML method gives large orie 30 < l or 2 . 0 > ε ), and the ma mum
angl he VH method perform gnificantly better than the ML me d and
very ange (maximum angle error 0. degrees, and average angle error less than 0.1
degrees).
xi
e error is more than 14 degrees. T s si tho
reliably over the whole r 3
80
The last parameter (Figure 6.6 (e)) compares the performance against the number of image lines. The
simulation shows that the number of lines has no strong effects on the performance of ML method,
however, it affects the performance of the VH method. When the number is more than 500, VH method
performs very reliably, when the number drops to 100, it still performs significantly better than the ML
method. However, when th umber of lines drops below 50, the result is mixed. There are several reasons
for this. First, the region of a vanishing hull shrinks with the increasing number of lines, so it is more
reliable for more lines. Second, the expectation of the vanishing hull is the optimal solution for a vanishing
point under statistical meaning. However, when the sample (number of lines) is small, the true value of the
vanishing point may deviate from the statistical value. The last reason is that we use a uniform distribution,
which is an app
e n
roximation as we showed in section 6.5.
m
ry rel le with
In general, the VH ethod performs significantly better than the other two methods, especially for
weak perspective images, and the performance of the VH method is ve iab several hundreds of
lines, a reasonable number for high-resolution images of buildings and aerial images. This shows that the
expectation of the vanishing hull is the optimal solution of vanishing points estimation.
0 0.1 0.2 0.3 0.4 0.5
0
1
2
3
4
5
6
VH
ML
Error of orientation (degree)
Noise magnitude (pixel)
d
0 200 400 600 800 1000 1200
0
0.2
0.4
0.6
0.8
1
1.2
1.4
VH
ML
Error of orientation (degree)
Number of lines
e
0 200 400 600 800 1000 0
0
20
40
60
80
100
VH
ML
Error of orientation (degree)
Number of lines
f
Figure 6.6. Performance comparison of VH and ML method on synthetic data with random noise.
0 10 20 30 40
0
5
10
15
20
25
30
35
40
VH
ML
Error of orientation (degree)
Orientation angle (degree)
a
20 30 40 50 60 70 80
0
5
10
15
20
25
30
VH
ML
Error of orientation (degree)
Camera FOV (degree)
b
0 20 40 60 80 100
0
2
4
6
8
10
12
14
16
VH
ML
Error of orientation (degree)
Line length (pixel)
c
81
Stability and accuracy
We can also predict unstable shing points using the VH hod. For most of the cases, the
v shing hull is closed, which indic hat the vanishing point is sta When the vanishing hull is open,
it icates that the vanishing point is at infinity. The algorithm simply sets the orientation angle to zero for
infinite e visualize unstab ts with blue color ), where the
g nd trut
ani
ind
vani
ates t
met
ble.
vanishing points. W le vanishing poin in Figure 6.6 (f
rou h 01 . 0 = θ . By setting the entation of unstable vanishing p ints to zero, the VH method has a
s ll error of 0.01 degree, while th L method gives a large error ( re than 10 degrees). Note that the
VH method achieves an average err ess than 0.5 degrees even for su ill-conditioned cases. The error of
t H method is also within the m itude of the variance for all five ps of parameters, which implies
that the variance determines the accuracy of the estimation. The graph of the variance is not shown here.
e of
our
ori
e M
or l
agn
o
mo
ch
grou
ma
he V
Since the noises in real images are generally Gaussian noises, we have also tested the performanc
algorithm with the Gaussian noises. Similar to the random noise case, we project a group of 3D parallel
lines with an ideal pin-hole camera to an image plane, then add Gaussian noises with zero mean and
specified variance to the end points. Again, we test the performance with five groups of parameter settings
0 0.1 0.2 0.3 0.4 0.5
0
0.5
1
1.5
2
2.5
3
3.5
VH
ML
Error of orientation (degree)
Noise magnitude (pixel)
d
0 200 400 600 800 1000 1200
0
0.5
1
1.5
2
VH
ML
Error of orientation (degree)
Number of lines
e
Figure 6.7. Performance comparison of VH and ML method on synthetic data with Gaussian noise.
0 10 20 30 40
0
5
10
15
20
VH
ML
Error of orientation (degree)
Orientation angle (degree)
a
20 30 40 50 60 70 80
0
5
10
15
20
25
30
VH
ML
Error of orientation (degree)
Camera FOV (degree)
b
0 20 40 60 80 100
0
5
10
15
20
VH
ML
Error of orientation (degree)
Line length (pixel)
c
82
(Table 6.1). The vanishing point is estimated using the centroid of the vanishing hull rather than
disc
4, how
).
How
model, grouping errors, camera lens distortion and camera calibration errors. Here we compare the
performance of the VH and ML method with two sets of images, indoor and outdoor images. More
applications of the VH method for r mages are presented in Chapter 7
Indoor images
For the indoor case, we print so pattern of parallel lines, and carefu place the camera with a tripod
so that th the patter late the camera to ta t pictures
while keeping the orientation unchanged. Figure 8 shows one of such images. By carefu controlling the
camera motio e can assume that the orientation grou th is close to zero (careful user verification
using the ima shows that the variance is within 0.5 d es). We have also tried to control the camera
motion to tak ages with a slanted angle, however, it i ry hard to measure the exact orientation angle.
Furthermore, own in the graph of Figure 6.6 (a), w he rotation angle is more than 10 degrees,
although the VH method er than the ML method, the so small that it will be
covered by the measurement error.
retizing the vanishing hull and computing the exact distribution. It is interesting to note that, although
the distribution is not uniform in the vanishing hull with the Gaussian noises, the centroid is still a very
good approximation to the optimal solution. The reason might be that the optimal solution is close to the
centroid with zero mean Gaussian noises. Of course, one can also compute the exact optimal solution using
Equation 6.1 ever, we find that the centroid approximation is good enough and much simpler to
compute. We also compared the performance of VH method with the ML method (Figure 6.7), the result is
similar to that of the random noises.
6.6.2 Real data
The vanishing hull concept is also extensively tested with real images to generate textures (Chapter 7
ever, the comparison of the performances of different algorithms in computing vanishing points for
real images is difficult. The main challenge is that the ground truth positions of vanishing points for real
images are unknown. Furthermore, the errors may come from different sources, such as the error of edge
eal i
me
.
lly
e image plane is parallel to n plane. We trans ke 15 differen
lly
n, w
ges
e im
as sh
nd tru
egre
s ve
hen t
still performs bett difference is
83
To reduce the error from the camera, we calibrate the camera and correct the lens distortion for all the
15 images. As aforementioned, the vanishing hull concept is derived from the edge error model, so its
performance depends on the edge error model. For real images, we model the noise magnitude of the end
points of edges using an empirical model: l c / = ε , where l is the length of the edge, and c is set to 3.5
pixels for all the tested images. This model shows a better result than setting the noise magnitude as a
constant value for all lines. Then, the vanishing hull is computed using the dual-space algorithm, and its
centroid is used to find the approximated optimal estimation of the vanishing point. Figure 8 compares the
result of the VH and ML method. As predicted by the graph of Figure 6.6 (a), the advantage of the VH
method is apparent compared with the ML method for small angle images. Note that, even for images with
the VH method gives very good result without using the Gaussian sphere.
Ou
such small angles,
0 5 10 15
0
1
2
3
4
5
6
VH
ML
Error of orientation (degree)
Image number
Figure 6.8. Real data comparison, indoor images. Left: one of the 15 images, right: performance
error of the VH method (red stars) is much smaller than the ML method (green circles),
advantage of the VH method.
comparison. The
which shows the
tdoor images
For outdoor images, since it is hard to control the camera’s motion in outdoor environments, we opt to
use a manual verification by measuring the horizontal and vertical line angle error of the rectified images.
We discuss two typical cases here, small and large rotation angle.
84
The first case is a strong perspective image with a large rotation angle (Figure 6.9 (a)). This image is
also one of the typical test cases, where images have trees occlusions and small-scale textures. Our
85
vanishing point detection method is robust to these difficulties. Thanks to the heuristic filters, spurious
vanishing points caused by trees and small-scale textures (Figure 6.9 (a)) are filtered (Figure 6.9 (b)), which
shows the effectiveness of the filter. The vanishing hull is found using the dual-space algorithm, then the
vanishing point is computed as the centroid assuming a uni distribution, finally the image is rectified
(Figure 6.9 (c)). Careful manual user verification shows that horizontal angle error of the VH method is
less than 0.1 degrees (the standard deviation of the vanish hull is
form
the
ing 12 . 0 = σ degrees), and the vertical
angle less than 0.15 degrees ( 17 . 0 = σ ), while ML me gives 0.2 and 0.3 degrees angle error
respectively. Both methods perform well, though the VH method is slightly bet
thod
ter.
Slanted line
Horizontal line
Figure 6.10. Compare rectified image of the VH and ML method for a weak perspective image. Left:
ML method, the line that should be horizontal is slanted. Right: VH method, the line is correctly
rectified.
a b c
Figure 6.9. Our door images: a strong perspective image case. Our vanishing point detection method is
robust to trees occlusions and small-scale textures. (a) Clustering result before filtering, (b) the spurious
vanishing points are filtered using the heuristic filters, and the image is rectified (c)
The second image is a weak perspective image with a small rotation angle (Figure 6.10). The result of
the VH method shows that the x direction vanishing point is , and the y direction is . Careful
manual user verification shows that the horizontal angle error of VH method is less th degrees
6
10 2 ×
4
10 4× −
an 0.3
( 35 . 0 = σ ), and the vertical angle less than 0.2 degrees ( 22 . 0 = σ ), while ML method gives 1.94 and 0.55
degrees angle error respectively
3
. Figure 10 compares the result of the rectified image using both methods,
es
bett
the VH method is apparently superior for this case, which is consistent with the graph of Figure 6.6 (a).
The comparison result is summarized in Table 6.2. Both typical cases show that the VH method giv
er performance (optimal solution), and the user verified orientation error is within the range of variance
(accuracy). Hence the true vanishing points are within the region of the vanishing hull. All the vanishing
points are stable because the vanishing hulls are all closed convex polygons. Both simulation and real data
show that our method is superior.
Strong Perspective Image
(Orientation error, degree)
Weak Perspective Image
(Orientation error, degree)
Method
X Y X Y
VH 0.1 0.15 0.3 0.2
ML 0.2 0.3 0.55 1.94
Table 6.2. Outdoor real images comparison.
6.7 Conclusion
Vanishing points are valuable in many vision tasks such as orientation and pose estimation. This chapter
defines the concept hull from a geometric viewpoint, which is th he edge
regions. Based on the edge error model, we also present a novel geometric image grouping method for
vanishing points detection. The vanishing hull gives the region of the true vanishing point, and its
probability distribution determines the property of vanishing points. The expectation of the vanishing hull
is the optimal solution of the vanishing point, its variance defines the accuracy of the estimation, and its
shape determines the stability of the vanishing point. Extensive simulation and real data experiments show
that our method is superior to one state-of-the-art technique. We will present many applications using the
vanishing hull method in the following chapters.
of vanishing e intersection of t
3
For a line with length 570 pixels, 0.1 degree angle error corresponds to 1 pixel error.
86
Figure 6.11. Vanishing hull of a real image. Each line (short yellow lines) in the image forms a fan shape
a red and a green line. These fan regions intersect at a convex polygon (yellow
vanishing hull. An enlarged image of the y direction vanishing hull is also shown,
where the yellow dot in the center is the centroid of the vanishing hull.
edge region bounded by
polygon), which is the
87
Chapter 7
Integrating Ground Images - Part II: Pose Recovery
This chapter applies the vanishing hull concept to estimate vanishing points and recover the camera’s
poses, then generates textures from ground images with the poses.
7.1 Introduction
The models generated from LiDAR and aerial images only have top-view information, the results need to
be enhanced with façade details. Texture mapping is a successful technique to generate scenes that appear
b
t
Automatic camera calibration and pose recovery is a challenging task for outdoor building images. The
camera orientation can be estimated using vanishing points, which can be estimated using the vanishing
hull concept. However, heavy trees occlusion of outdoor urban buildings causes lines grouping to be a
difficult problem. Furthermore, given a camera’s orientation, its translation (relative to the models)
recovery is often an under-constrained problem using a single image due to the lack of features in rough
building models and the narrow field of view of a camera. Lastly, it is very hard to capture an image that
covers a whole building due to the small field of view of a camera and the physical barriers of narrow
streets. Multiple images are often necessary to generate textures for a whole building face, which causes
other problems such as illumination variation and parallax in different images. The goal of this chapter is to
oth detailed and realistic while using relatively simple geometry. Ground images can be used to generate
extures to improve the visual effects, which requires camera calibration and pose recovery.
88
generate high quality textures for given urban build models (especially rough models) by automatic
camera calibration and pose recovery.
Previous work [Fru03] fix the image senor with t range sensor to get the texture data, and the camera
pose is recovered by a simple calibration relative to t e range sensor. This technique has the advantage of
simple calibration, but lack flexibility. Stamos and Allen [Sta01] use a freely moving camera to capture the
era pose is computed by fitting rectangular features from dense 3D range data, which
is not applicable for coarse urban models. Sunchun et al. [Lee02] propose a system to register ground
ature
correspondences to estimate the translation. The translation under-constrained problem is solved by
This chapter presents new techniques to solve the challenges in generating textures for urban models.
e error model to group image lines and knowledge-based filters to extract correct
e illumination
and
7.2.1. Camera model
The camera projection matrix is modeled using Equation 7.1, where K is the internal parameter matrix,
and RT is the camera’s external parameter matrix. The focal length and principle point can be estimated
given three orthogonal vanishing points [Cip99]. When only two vanishing points are available, the
principle point is assumed to be at the image center.
ing
he
h
image data. The cam
images to urban models using vanishing points to estimate the orientation and 2D to 3D fe
manually infer more 3D points from registered images, which is not applicable when all images are of
under-constraints.
Our algorithm uses an edg
vanishing points under heavy trees occlusion for camera calibration and orientation estimation, and we
derive equations for camera orientation estimation with infinite vanishing points. Two techniques are
presented to compute the translation when a single image does not provide enough constraints. The final
textures are generated using a color calibration method and blending techniques to solve th
parallax problem in different images.
7.2 Pose Estimation
89
] [ T R K P =
⎥
⎥
⎤
⎢
⎢
⎣
⎡
=
1 0 0
0
0
0
0
v
u
K α
α
⎥
⎦
⎢
ed in Chapter 6, we use an automatic line clustering method in image space . Edges are
dete
he rotation matrix and camera’s focal length and principle point can be estimated given three
ipolla et al. [Cip99] derive the following equations:
u u u
⎥
⎥
⎤
⎢
⎢
⎡
=
⎥
⎥
⎤ ⎡
0
0
0
0
3
3
α
α λ λ λ
=
0 3 3 0 2 2 0 1 1
/ ) ( / ) ( / ) ( α λ α λ α λ v v v v v v R
We decompose the camera’s external matrix into an orientation and a translation. The orientation of the
camera is estimated using vanishing points extracted by automatic line clustering, the translation is
computed using 3D to 2D corner correspondences, and multiple images are used if one image does not
provide enough correspondences. A global registration algorithm is used to refine the pose.
7.2.2 Orientation estimation
Vanishing points extraction
Vanishing points are used to estimate the camera’s orientation due to lack of features in 3D urban
models. As describ
cted using the Canny edge detector and lines are extracted using Hough Transform as a preprocess step
for vanishing points extraction. The intersections of all pairs of line segments are then computed to find all
possible vanishing points for line grouping. We then group all the lines based the edge error model without
any hard thresholds and filter them with some heuristics. The position of the vanishing point is then
computed using the vanishing hull concept assuming a uniform distribution edge error model.
Rotation estimation
T
orthogonal vanishing points. C
R v
u
v v v
⎢
⎢
3 2 2 1 1
3 2 2 1 1
λ λ λ
⎥
⎦
⎢
⎣
⎥
⎦
⎢
⎣
1 0 0
3 2 1
λ λ λ
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎣
⎡ − − −
3 2 1
0 3 3 0 2 2 0 1 1
/ ) ( / ) ( / ) (
λ λ λ
α λ α λ α λ u u u u u u
Where
1
− − −
λ ,
2
λ and
3
λ are scale factors, ) , (
0 0
v u is the principle point, are the three
orthogonal ng p ts, and
3 , 2 , 1 ) , ( = i v u
i i
vanishi oin α is the fo . When only two vanishing points are available, the
(7.1)
(7.2)
cal length
(7.3)
90
third vanishing point can be inferred by assuming that the principle point is at the camera center [Lee02].
Equation (7.3) gives the solution of the rotation matrix for finite vanishing points, however, w
vanishing points are at in
⎡
+ + −
−
−
cxcy sxcz sycxsz sxsz sycxcz
sxcy cxcz
sy cysz cycz
Where x, y, z are the three Euler angles, cx stands for cosx, and sx stands for sinx. We classify the
situation into two cases according to the number of infinite vanishing points, and derive the solutions
respectively.
1. One infinite vanishing point, two finite vanishing points. Suppose the x direction vanishing point is
at infinite, thus the y direction rotation is zero, so:
⎡ −sz cz 0
ubstitute Equation (7.5) into Equation (7.2), we have:
A
⎥
⎤
⎢
⎡
=
⎥
⎤
= 0
0
0
0
3 3 2 2 1 1
3 3
α
α
λ λ λ
λ ⎤
⎢
⎡
+ − + +
+ − +
= cx v sx sxcz v cxcz sxsz v cxsz
cx u sxcz u sz sxsz u cz
0 0 0
0 0 0
α α α
α α
e just two variables of x
and
hen the
finity (lines are parallel in the image space), the solution becomes unclear.
We derive the equations to compute the rotation matrix for infinite vanishing points using Euler angles.
A rotation matrix can be represented using three Euler angles (We ignore singularity cases of 90 degrees,
which can be treated specifically).
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
+ − + = sxsysz cxsz sxsycz R
(7.4)
(7.5)
(7.6)
⎣
⎥
⎥
⎥
⎦
⎢
⎢
⎢
⎣
− =
cx sxcz sxsz
sx cxcz cxsz R
⎤
S
R v
u
v v v
u u u
⎢
⎡
2 2 1 1
λ λ
⎥
⎥
⎦
⎢
⎢
⎣
⎥
⎥
⎦
⎢
⎢
⎣
1 0 0
3 2 1
λ λ λ
⎥
⎥
⎦
⎢
⎢
⎣
cx sxcz sxsz
Assume 0 ≠ cz (the rotation around z axis is not 90 degrees, otherwise we hav
⎥
α , which ial to solve), from certain operations on both sides, we have: is triv
1 1 21 11
/ / v u A A = = ) /( ) (
0 0
sx v cx sx u ctgz + + α α
2 32 12
/ u A A = = sx sx u tgz / ) (
0
+ −α
(7.7)
2 32 22
/ v A A = =
0
v ctgx + α
Withou oss of ge rality, we assume th iple points are at the imag 0 , 0
0 0
t l ne e princ e center, and = = v u .
Let
1 1 1
/ v u k = (although
1 1
v u are at infinity, the slope of the image lines is still finite since z rotation is not
91
90 degrees),
2 2 2
/ v u k = , it is easy to find the solution of Equation (7.7) as:
2 / 1
2 1
1
)] /( 1 [ cos k k x − =
−
tgx v v ) (
2
− =
0
α
] / ) [(
2 0
We can derive the solution in a sim
1
α sx u sx u tg z − =
−
ilar way when the y or z direction vanishing point is at infinite.
2. Two infinite vanishing points, one finite vanishing points. Suppose x, y direction vanishing points
are at infinite, then the x, y direction rotation is zero, so we have:
⎤
⎢
⎡ −
⎥
⎤
⎢
⎡
1 0 0
0
3 3 2 2 1 1
sz cz u u u α α
λ λ λ
λ λ λ
−
, where is the slope of image lines that corresponds to the x direction
vani
ived in a similar way.
The accuracy of the rotation angle depends on the accuracy of vanishing points. Given an edge error
model, the vanishing hull gives the optimal estimation. However, for images taken at a skewed angl
lines close to the camera center will be detected while far lines cannot be detected due to foreshorten
effects (Figure 7.1 left). This cause the vanishing points to be biased, hence the rectified texture of far
building surfaces is skewed (Figure 7.1 right top).
A hard constraint technique is used to solve the bias problem. As shown in Figure 7.1, we first compute
the poin
(7.8)
⎥
⎥
⎥
⎦
⎢
⎢
⎣
=
⎥
⎥
⎦
⎢
⎢
⎣
0
3 2 1
3 3 2 2 1 1
cz sz v v v α α λ λ λ
Thus, z =
(7.9)
) / (
1 1
1
v u ctg
1 1
/ v u
shing point. The focal length cannot be recovered in this case. The other tow cases when the z-y or x-y
direction vanishing points are at infinite can be der
e, only
vanishing t using the detected lines (yellow lines). Then the user interactively indicates a far line
Figure 7.1 Hard constraints optimization.
92
(the blue line d box) as a hard constraint. We define an error func in the re tion as in Equation 7.10, where
is the distance of the vanishing point to each line, is the distance to the hard constraint line, and w is
its weight. We set a high value to penalize vanishing points far from the hard constraint line. A
linear technique (LM algorithm) is used to find the optimal solution. Figure 7.1 right bottom shows the
texture generated after the hard constraint optimization.
7.2.3 Translation estimation
Given the orientation of the camera, the translation relative to the 3D models (generated from Ch
can be computed using two 2D to 3D point correspondences [Lee02]. However, due to the physical
limitations (such as a camera’s small field of view and narrow streets), sometimes only one 3D corner or
none corners are visible for a single image, which makes the translation estimation problem an under-
constrained problem. We use two techniques, multiple images at different view points and a mosaic
technique, to solve this problem.
Multiple images
Let’s first consider two images, each of which covers only one 3D model corner (Figure 7.2). Given
only one 2D to 3D point correspondence, the position of the camera’s projection center has ambiguity; it is
along the line of the 3D point and 2D image point (Figure 7.2 line ). The ambiguity can be fixed for
both images given one extra 2D image point correspondence between the two images. We give a
geometrical explanation for this, and the exact analytical solution can also be derived.
Fig
i
d
h
d
w to non-
∑
+ =
h i
wd d error
(7.10)
apter 4)
1 1
i P
ure 7.2 Translation estimation.
2
O 1
O
1
i
1
P
3
P
2
P
1
l
2
l
'
3
i
3
i
2
i
93
As shown in Figure 7.2,
1
O
2
O are the projection centers of two images,
1
P
2
P are two 3D model
vertices, with
1
i
2
i as their image points respectively, and ) , (
'
3 3
i i is a given image correspondence pair. The
osition of O is along the line i P . Since the orie ion is fixed, the line i O forms a plane while its
end points O moves along the line i P . The plane i O P intersects with the model plane at a line l (or a
curve for a curved model surface). Similarly,
3 2
i O form
3D p nt
s a plane while its en moves along the
line , an plane intersects with the model plane at a line . So the 3D model point P is
un uely de ed by the intersection of line and . Hence the 3D position of s fixed by the
intersection of line and . Similarly, the 3D position of is fixed by the intersection of
gle image using several point correspondences is not robust, so a global
is employed to refine the pose. 2D image corners are extracted for each image, and they
are
sfo
refine the pose and registration.
1 1 1
at
3 1
1 1 1 3 1 1 1
'
d points
2
O
2 2
i P d the
'
3 2 2
i O P
2
l
3
iq termin
1
l
2
l
1
O i
1 1 3 3
i P i P
2
O
line
2 2
i P and
3 3
i P . Thus we can compute the 3D translation for both images.
When there is an image that does not cover any 3D model corners, we use two 2D image point
correspondences to fix its translation relative to a chosen base image, and compute the translation in a
concatenate way. The translation ambiguity of the base image is solved when two or more 3D model points
are visible from the image sequence.
The pose computed for a sin
'
registration process
matched automatically using the estimated pose. We first initialize the correspondences with a large
matching threshold, then iteratively refine the correspondences by reducing the threshold and updating the
transformation. Finally the matched corners are used to find a tran rmation between the images, and
Figure 7.3 A mosaic image.
94
Mosaic images
Images at far-apart viewpoi ave color difference and parallax problems, w cause the generated
textures to be rred. We generate mosaic images taken lmost the same viewpoint to alleviate this
problem and sol the under-constr ed pose problem.
homography, and more correspondences can be und to refine the result. Figure 7.3 ws a generated
mosa image using [Ptgui zeliski and Shum [Sze97] use 360 degrees panoramic images to create
environment maps, while our algorithm does not require the mosaic image to be a full pa orama. As long
as the mosaic image covers at least two 3D model points, we can recover the pose.
osaic image has more image lines than a single image, which makes th
nts h hich
blu at a
ve ain
There exists a homography between images taken at the same viewpoint or images of planar structures.
Four 2D correspondences are enough to compute the homography. Corners can be used to find the initial
fo sho
ic ]. S
n
A m e vanishing point extraction
more reliable. It also covers more building corners, which makes the translation estimation for a single
mosaic image possible. Furthermore, images takes at the same viewpoint have no parallax and the color is
often consistent, which improves the texture quality.
7.3. Texture Generation and Mapping
olor image, then pixels with constant colors in a window with user defined
size (we use size 5 by 5 in our implementation) are extracted, and these pixels are automatically matched
with other images using the estimated pose, finally each image is color rectified with an affine color model
[Min02].
⎤
⎢
⎢
⎡
+
⎥
⎥
⎤
⎢
⎢
⎡
⎥
⎥
⎤
⎢
⎢
⎢
⎣
⎡
=
⎥
⎥
⎥
⎥
⎦
⎤
⎢
⎢
⎢
⎢
⎣
⎡
g
r
O
O
g
r
a a
a a a
a a a
b
g
r
33 33
23 23 23
13 12 11
'
'
'
(7.11)
A base buffer (Chapter 8) is allocated for each façade of the building model to store the final textures. Each
image is warped to the base buffer, and multiple images are blended.
For images taken at different viewpoints, due to the illumination variation, the textures have different
colors, which cause visual inconsistence. A color rectification process is used to solve the problem. We
first chose an image as the base c
⎥
⎦
⎢
⎣
⎥
⎦
⎢
⎣
⎥
⎦ b
O b a
33
⎥
⎥
95
Images taken at different viewpoints cause a parallax problem. Blending all the overlapping area will
crea
method helps to reduce visual defects caused by
para
he building (Figure 7.9
(a))
textures to building facades. The system has four
selecting modes: buildings, primitives, vertical walls and vertices. For texture mapping to vertical walls,
ally finds the connected vertical wall in the same
plane, and computes the texture UV for each vertex. The system also allows the user to use the shift key to
select multiple walls and map it to the same texture. The other modes allow the user to map textures to non-
rectangular shapes. We can map textures to a complex building (Figure 7.9 (a)) with 160 faces within five
minutes.
7.4. Results
ages. The 3D models are generated using the methods described in Chapter 4 and 5. Pictures are
taken with an un-calibrated camera at different time, and the focal length and illumination varies from
ima e.
te ghost effects. We solve the problem by automatically finding the best blending area based on the
histogram of the 2D matching corners between images. The size of the area is a user-defined value, which
is set to a small value for strong parallax images. This
llax for highly structured textures such as building bricks and windows. However, a more complicated
algorithm using dynamical programming to find the best cut is necessary for unstructured textures [Efr01].
Buildings generally have repeated texture patterns, so we use the same rectified texture image for
different faces with similar texture patterns. This has several advantages. We only need to capture images
of distinct patterns of building faces rather than capturing every face of the building, which is not possible
under some circumstances, e.g., we cannot capture images at the southeast part of t
because there is some field construction going on. Another advantage is that less texture memory is
required to render the scene using repeated texture pattern technique, and the visual effects is almost the
same as real photos.
We have developed an interactive tool to map
the system needs only one mouse click, then it automatic
We have generated many textures using the two techniques: multiple images at different viewpoints and
mosaic im
ge to imag
96
7.4
(Fig
atically estimated vanishing points, then combine the four images to compute the
tran
s at different viewpoints can be used to solve occlusion problems. As shown in Figure 7.5, both
images (left two images) are occluded by a pole but at different places. The user interactively mark the pole
(note that we do not need to mark the exact shape of the pole), then the algorithm
taken at the same viewpoint are used to generate mosaic images. One of the original
images is shown in Figure 7.6 (a), and the images are stitched as a mosaic (b) using [Ptgui]. The pose is
.1 Multiple images taken at different viewpoints
For images taken at different viewpoints, several different textures are generated using the described
method. Figure 7.4(a) and (b) demonstrate the effectiveness of the vanishing points extraction method
using filters described in Chapter 6. Many false vanishing points (yellow points) are detected before the
filtering due to the heavy occlusion of trees and small-scale textures (Figure 7.4(a)). These spurious
vanishing points are filtered using our algorithm, and only correct dominant vanishing points are identified
ure 7.4(b)). The x direction vanishing point for the image in Figure 7.4(a) is at infinite, and its
orientation is recovered using Equation 7.8. The final texture created from four separated images is shown
in Figure 7.4(i).
Two of the four original images taken at different viewpoints used to create the texture for another
building are shown in Figure 7.4(c) and (d). Each of the four images only covers one corner of the
building, so the translation is an under-constrained problem. We first compute the orientation for each of
the image using autom
slation, and the final pose are refined using global registration. The four images are color corrected, and
warped to the base buffer using the refined pose. The best blending area are automatically detected (Figure
7.4(e)) to reduce parallax effects, and the final texture is generated using blending techniques (Figure
7.4(f)). More textures are generated using the described technique (Figure 7.4 (h), (i)), and integrated into a
3D environment (Figure 7.4 (g)).
Remove Occlusions
Image
using a rectangle
automatically recovers the poses for each image, and generates the rectified texture (Figure 7.5 right) by
copying pixels into the red rectangle area and blending in other areas, note that the pole is removed.
7.4.2 Mosaic Images
More images
97
automatically estimated, and the mosaic is rectified to generate a high quality texture with resolution 4000
by
(c). For the dorm area, more than 80 ground view images were taken to generate 24
faça
into orientation and translation, and use an edge error model and
rs to estimate correct vanishing points. We derive equations to compute the
orie
3000 pixels (Figure 7.6 (c)). Textures generated from mosaic images are often of better quality due to
free from color and parallax problems.
We have generated textures for a campus center area (Figure 7.6 second and third row) and a dorm area
(Figur 7.6 bottom row). More textures for the USC campus are shown in Figure 7.7. For the campus center
area (Figure 7.6), more than 100 ground view images were taken to generate 30 façade textures with
automatically recovered camera poses. These textures are mapped to the buildings using our tool within
five minutes. Figure 7.8 (a) shows an overview of the campus environment, and two close up views are
shown in (b) and
de textures. These textures are mapped to the building with more than 160 faces using our tool within
five minutes. Some of the textures are reused for building faces with similar texture patterns. Figure 7.9 (a)
shows the overview of the building compound, two close up views are also shown in (b), (c). Although the
3D models do not have facade details, the rendering results are very realistic thanks to the high quality
textures.
7.5 Conclusion
This chapter presents new techniques to address the challenges in generating textures for urban models.
We decompose the camera pose
knowledge-based filte
ntation with infinite vanishing points, estimate the translation by combining information from multiple
images when each single image does not provide enough constraints, and generate final textures with color
calibration and blending. The rendering result looks very realistic, however, occlusions of trees and
vegetations will cause problems in the final textures.
98
99
Figure 7.5 Remove occlusions.
(a)
Figure 7.4 Results of generated textures.
(c) (e)
(b) (d) (f)
(g) (h) (i)
(a) (b) (c)
100
Figure 7.6 Results of generated textures.
Figure 7.7 Results of generated textures.
101
102
Figure 7.8. Rendering results of USC central area.
Figure 7.9 Rendering results of a dorm area in USC.
103
Chapter 8
Integrating Ground Images - Part III: Image Repair
This chapter presents new techniques to remove trees and other occlusions from a single image.
8.1. Introduction
As demonstrated in previous chapters, high quality textures have the advantage of producing a realistic
image without the tedium of modeling small-scale 3D structures and surface reflectance parameters. Real
images are often used as a texture resource to generate realistic visual effects. However, problems arise
when raw photos are used as the texture resource. Trees occlusion is the major challenge for images of
buildings in residential area and university campuses. The goal of this chapter is to develop a tool that
requires minimum user interactions to semi-automatically remove occlusions from images and generate
rectified images for texture mapping. Specifically speaking, the user only needs to indicate the occluded
image part with a rectangle (one mouse click and drag). Then, the system automatically removes the
occlusion and generates a visually seamless rectified texture image (Figure 8.1).
104
Figure 8.1. User drags several quadrilaterals around trees (left). The system automatically removes
the trees using patches from the same image (middle) and rectifies the image for texture mapping
(right).
105
8.1.1 Motivation
The problem of image disocclusion is also defined as image retouching or inpainting [Ber00]. The key idea
of image inpainting is to propagate the boundary information into the occluded part. Hence, the propagation
order is critical to the final results. A wealth of research has been conducted on this topic [Dor03].
However, the image inpainting technique is generally slow, and thus takes a long time to restore a high-
resolution image. More importantly, this technique will not work for images with complicated structures,
such as images of buildings where a window is partially or completely occluded (Figure 8.1). Recent work
by Sun et al. [Sun05] uses belief propagation to propagate structure information along curves. However,
this technique needs the user to manually complete the structure, which is tedious for complex structures.
In general, new techniques still need to be explored to repair images with well-structured patterns, which is
the main motivation of this chapter.
Rather than traditional image inpainting techniques, we opt to use a different approach. Observing that
images of architectures and other man-made objects often preserve repetitive structures, such as windows
(Figure 8.1), we copy an existing non-occluded image patch that contains a similar structure and paste it
into the destination to remove occlusions. Based on this idea, we have developed a system called Intelligent
Copy and Paste tool (ICP). Our system explicitly estimates the perspective effects in the image using the
Vanishing Hull technique. After the user interactively indicates the part where occlusions need to be
removed, the system automatically finds a source patch that matches the destination patch and transfers it
into the destination. Finally, the graph cut technique is used to clean up the boundary of the source and
destination patches to generate a seamless blending image.
Compared to the traditional image inpainting technique, the ICP technique has many advantages. First,
ICP is very fast, achieving interactive speed for patches up to 500 by 500 pixels. Second, ICP needs fewer
user interactions. The user only needs to grab the occluded part with one mouse click and drag, while
image inpainting techniques generally require an exact segmentation of the occlusion part. Most
importantly, ICP works very robustly for heavy occlusions and highly structured images, while most image
inpain
within ds.
ting techniques fail. Figure 8.1 shows the repaired results of a high-resolution image using ICP
20 secon
106
8.1.2 Related Work
g [Dro03]. Perez [Per04] analyzes the impact of the patch size on the
ima
g perspective effects (Figure 8.11), and we can
repl
e this chapter in Section 5.
According to the size of image patches, image repair techniques can be classified as pixel-based techniques
and example-based techniques. Bertalmio et al. [Ber00] propose a PDE-based method to propagate image
pixels from the boundary. This technique achieves good results for images with relatively small holes, and
motivates much research on this topic. They later combine structures and texture synthesis for image repair
[Ber03] and extend to fluid and video inpainting [Ber00]. Levin [Lev03] repairs images using texture
synthesis techniques [Efr99] based on global image statistics. Pixel based techniques work well for small
holes and thin structures. However, they will generate artifacts for large holes and highly structured images.
Example-based methods fill in the hole by directly copying patches from the source image. This
technique has also been used for texture synthesis [Efr01; Kwa03]. The size of the patch and the filling
order is the key to a successful fillin
ge repair results. Drori et al. [Dro03] use a confidence map to guide the filling order, and fill the holes
using fragments. This technique works well for large holes, but it is slow and does not work well for highly
structured images. Jia et al. [Jia03] explicitly segment the image and complete the structure and texture
using tensor voting techniques. Sun et al. [Sun05] use belief propagation to propagate structure information
along curves. However, as aforementioned, this technique needs the user to manually complete the
structure, and does not work well when the structure is completely occluded.
Our method is an example-based method. However, compared with previous example-based methods,
our method is different and novel in the following aspects. First, we explicitly estimate the perspective
effects. Hence, our method works well for images with stron
ace structures using image patches with very different perspective effects (Figure 8.13). Second, we
explicitly use the knowledge of repetitive structures in the image. We use a patch that contains the whole
similar structure (e.g., a window) to replace the destination patch, so our method works well even when the
whole structure is occluded. Finally, we use graph cut to remove seams such that the result looks seamless.
The rest of the chapter is organized as following. We first estimate the perspective effect in Section 8.2.
The Intelligent Copy and Paste tool is described in Section 8.3, and results are shown in Section 8.4.
Finally, we conclud
107
8.2. Perspective Effects Estimation
Figure 8.2. Line extraction for
vanishing points detection.
Perspective effects are very important in image editing. Copying and pasting an image patch in a
perspective image will generally cause structure seams [Per04], which cannot be eliminated with graph cut
techniques. Simple techniques such as scaling image patches [Kw03] will not work for images with strong
perspective effects, especially when the destination and source patch have different perspective effects. We
first explicitly estimate the perspective effects using vanishing point techniques.
As described in Chapter 6, we use an automatic line clustering
method in image space. Edges are detected using the Canny edge
detector and lines are extracted using Hough Transform as a
preprocess step for vanishing points extraction. The intersections of
all pairs of line segments are then computed to find all possible
vanishing points for line grouping. We then group all the lines based
the edge error model without any hard thresholds and filter them with some heuristics (Figure 8.2). The
position of the vanishing point is then computed using the vanishing hull concept assuming a uniform
distribution edge error model.
When there exist multiple planes (multiple groups of horizontal and vertical lines), we allow the user to
interactively indicate the plane using a quadrilateral. The user does not need to exactly align the edges of
the quadrilateral with the lines in the image. The system automatically extracts lines in the selected plane,
estimates the vanishing points using the vanishing hull, and aligns the edges with the image lines (Figure
8.3). When there are only a few lines or the image has severe noises, the edges of the quadrilateral may not
be perfectly aligned with the image lines. We allow the user to adjust the edges to the correct positions.
e image (Chap 7). A camera’s internal parameter and
8.2.2 Image Rectification
Given two orthogonal vanishing points, we can estimate the camera’s focal length and orientation with
the assumption that the camera’s center is at the image center. Then we can project the image onto a plane
using the camera’s orientation, and rectify th
108
orie ation are important for applications
However, for image editing applications, it is
quad rectangle. We can compute the
gle. However, the aspect ratio of the
ct ratio of the ground truth. This
n the 3D model is not available, we
r this
destination image patches have
then rectify the source image, and use bilinear interpolation to remove artifacts (Figure 8.3).
⎤
⎢
⎡ h h h
to
e. Figure 8.4 shows the data flow and transformations of our algorithm.
image patch D into the rectified space (R space) as
nt
such as reconstruction from a single image.
unnecessary to estimate the camera’s internal
and external parameters. We can use a
homography to rectify the image.
A homography (Equation 8.1) has eight parameters, which can be computed using 4 pairs of 2D point
correspondences. As shown in Figure 8.3, to rectify the selected image part,
Figure 8.3. Left: user interactively indicate a quadrilateral
(green), the system automatically finds the correct
quadrilateral (red) that aligns with the image lines. Right:
rectified image.
we need to warp the
rilateral passing the two orthogonal vanishing points into a
homorgraphy using the 4 vertices of the quadrilateral and the rectan
rectangle in the rectified image has ambiguity, due to the unknown aspe
ambiguity can be solved given the model of the 3D buildings. Whe
simply use the width and height of the quadrilateral as the size of the
ambiguity will not affect the image editing results as long as the source and
the same perspective effects, i.e., they are from the same plane. When this is not the case, we allow the user
to interactively indicate a scale to match the destination image patch. After computing the homography, we
ectified rectangle. We claim that
original rectify
I H I × =
⎥
⎥
⎦
⎢
⎢
⎣
=
1
8 7
6 5 4
3 2 1
h h
h h h H
8.3. Intelligent Copy and Paste
The main idea of our image repairing technique is to use a source patch containing a similar structure
(8.1)
⎥
replace the destination structur
First, the user indicates the destination patch, D, where occlusions need to be removed by dragging a
rectangle in the perspective space (P space). The system automatically computes a quadrilateral that aligns
with the image lines based on the user input and the estimated vanishing points. Then the system warps the
D′ using the homograph H. We automatically search a
109
source patch S′ in the R Space to match th
patch D′ , and compute the affine transform
between the two patches and paste the patch
e
A
S′
into D′ . Since the boundary of patch D′ a
S′ generally does not match, the graph cut algo
blending step is employed to furt
nd
rithm is then used to find a seamless cut, and an adaptive
P Space
D S
R Space
D′ S ′
H
A
H
-1
Figure 8.4. Data flow and transformations.
P Space
D S
he duce seam r re s. Finally, the patch S′ is warped back to P space
using
1 −
H . The whole process is summarized using Equation 8.2.
S AH H D × =
−
) (
1
8.3.1 Auto Source Patch Selection
Given a destination patch D in P space, we warp it into R space as D′, and automatically search a
source patch S′ that matches the destination patch D′ in R space. Rather than exhaustively searching the
whole image for the best match, we explicitly use the knowledge of repetitive structures in the image.
Observing the fact that images of buildings generally have repetitive patterns in the horizontal and vertical
direction (Figure 8.1), and that the image is rectified in R space, we only search along the x and y directions
for the source patch. Since the image may not be perfectly rectified due to image noise and camera lens
distortions, we use a window with size w while searching in each direction.
(8.2)
A similarity function is necessary to match the source and destin atches. Generally speaking,
there wo types of sim arity functions, p -based and feature-based functions. We first try to e
corner features and use them to match the source and destination patches. However, we find that it is not
practical for our purposes. First, many false feature points are extracted due to the heavy occlusion of trees.
ay look similar to humans, their feature points are
04]
use
ation p
are t il ixel xtract
Second, although the source and destination patches m
actually quite different, and sometimes the structures and textures are very different too. Finally, since the
number of feature points is limited, false matches may cause the result to be non-robust. Sand et al. [San
an EM-like feature based method to match videos taken from almost the same viewpoint. Their method
can handle occlusions and illumination changes. However, since we do not know the exact location of the
source patch, their method is not applicable here. Wu and Yu [Wu04] use curve features for texture
110
a b c d
Figure 8.5. a) User drags a rectangle to indicate the destination patch b) The system automatically
Use g h cut to clean up the boundary with the cutting path show
.
searches the source patch, and pastes it in the correct location. Note that the boundary does not match. c)
rap n in green. Note that the boundary
match very well. d) Final result. es
synthesis. This method may be applied to our application. However, trees will generate many curves
nt trees, this method may fail when the structures of the source and
dest
u
D j i,
and
cause false matches. Even we can segme
ination patches are different.
We opt to se a pixel based similarity function due to the aforementioned reasons. Since single pixel
matching is not robust, we compute the SSD of the whole source and destination patches to find the best
match. However, trees in the destination patch will jeopardize the matching if the whole patch is used to
compute the SSD. A direct solution for this is to segment the trees and only match the background part.
Exact segmentation of objects is a difficult task that often demands tedious user interactions, especially for
trees. We have implemented one state-of-the-art segmentation technique, the grabcut [Rot05]. However,
this still needs many user interactions for trees segmentation, which are the major objects we would like to
remove. Furthermore, segmentation is unnecessary for other applications such as texture and structure
replacement using our technique (Figure 8.13).
To find the best match without explicit segmentation and to reduce the effects of trees, we opt to match
the source and destination patches over the boundary (Figure 8.6). Specifically speaking, for each of the
four boundaries, we choose a region D with size w (w=1/6 of the image patch size for our implementation).
Then we match the four boundary regions between the source and destination patches using SSD (Equation
8.3).
∑
− =
dst src
j i I j i I SSD
2
)) , ( ) , ( (
(8.3)
∈
111
The four boundary regions matching method assumes that the occlusion is in the middle of the
destination patch. This is also consistent with the later graph cut process,
which finds a seamless cut in the boundary regions and replaces the middle
destination patch. Our experiments show that the algorithm finds a very
good match as long as the user selects enough background images (Figure
the user to interactively select a source patch by dragging a rectangle. This is useful when the user wants a
8.3.2 Auto Patch Match
The goal of auto match is to find the best location to paste the source patch and resize it to match the
destination patch. Finding the best location is particularly important when the user selects a source patch
and roughly indicates the destination location.
We use an affine transformation (Equation 8.4) to match the source and the destination patch. To find
the best transformation, an exhaustive search technique is used. We search around the destination location
within a predefined window size and a predefined scale to minimize the SSD (Equation 3) in the four
boundary regions. After finding the best affine transformation A, we then warp the source image and paste
it into the destination location (Figure 8.5 (b)).
Note that both the auto search and match processes are performed in R space. This has several
advantages. First, it dramatically reduces the dimension of the searching space. Without explicitly
estimating the perspective effects, we need to search the source patch in P space. Since a homography has
eight parameters, we need to search in an 8D space, which is practically not possible. Searching in R space
is a 2D problem and auto matching in R space is a 4D problem (3D if the scale of x and y directions are the
sam mo forming with a homography is non-linear problem, which is not robust
(8.4)
Figure 8.6. Boundary
regions for match and cut.
8.5).
Other than the automatic searching for the source patch, we also allow
specific source patch to replace the destination patch or for texture replacement applications.
⎢
⎣
⎡
+
⎥
⎥
⎦
⎤
⎢
⎢
⎣
⎡
⎥
⎥
⎦
⎤
⎢
⎢
⎣
⎡
=
⎥
⎥
⎦
⎤
⎢
⎢
⎣
⎡
y
x
src y
src x
y
x
dst y
dst x
t
t
I
I
s
s
I
I
,
,
,
,
0
0
⎥
⎥
⎦
⎤
⎢
e). Further re, trans a using
112
sear
lem, it is still slow when the
to speed up. We first down-
the low-resolution image.
e pass the parameters from the low-resolution image to the high-resolution image, and refine the
para d
result already looks better than most image repair techniques (Figure 8.5 (b)).
How
cut, is a tool to minimize some energy function. Boykov
[Bo
and
ching techniques or other gradient descent techniques. While searching in R space is a linear problem,
hence very robust.
Although searching and matching in R space is a low-dimension prob
destination patch size is large. We use the image pyramid [Hee95] technique
sample the original image and the destination image, and search/match using
Then w
meters. This dramatically speeds up the process, and enables the system
for image patches up to 500 by 500 pixels with original image size of 3K by 2K.
8.3.3 Graph Cut
to achieve interactive spee
After auto matching, the
ever, the boundary of the source and destination patches generally do not match. Efros [Efr01] uses
dynamic programming to reduce the seams, while we use the graph cut technique to clean up the boundary.
Graph cut, also known as maximum-flow
y99] discusses what type of energy function can be minimized using the graph cut technique. We use
their approach. The basic idea of graph cut is to find the cutting path with the minimum sum of edge
weights to separate the nodes from the source to the destination
(Figure 8.7). So the key to using graph cut is to define an energy
function (edge weight function). Suppose
, ( j i I
i+1, j i, j
i, j+1 i+1, j+1
src
dst
cut
Figure 8.7. Graph cut to find a cut path.
) ) , 1 ( j i I +
are
two pixels in the source and destination patches. We
define the energy function between the two pixels using the color
diff
adjacent
erences.
) , 1 ( ) , 1 ( ) , ( ) , ( j i I j i I j i I j i I F
dst src dst src
+ − + + − =
Kwatra et al. [Kwa03] use gradient to consider the effects of frequencies, which is computationally
more expensive. We find that the function in Equation 8.5 satisfies our purposes.
After defining the energy function, we need to initialize the graph, i.e., to indicate the pixels coming
directly from the source and destination patches. For image repair purposes, the pasted source image will
(8.5)
be surrounded by the destination patch and background image. A straightforward method is to indicate one
113
interior pixel directly from the source image and all the boundary pixels directly from the destination image
[Kwa03]. However, we find that this method tends to give biased results towards short path cuts, or cuts
with
the four cuts for the final
e eliminate the bias and guarantees that the middle part of the source image will be
copi
destin
ce
visible
g
n
tting path with a constant window size may cause artifacts when the source and destination
pat fferent (Figure 8.8). We use an adaptive blending window to solve this problem.
lter the different window
size
fewer edges. In some extreme cases, only one pixel will be copied from the source image. To reduce
this bias and enforce that at least some part of the source image will be copied, we divide the cutting
problem into four sub-cut problems. Specifically speaking, we use the regions along the four boundaries
(Figure 8.6), and find a cut in each region. For each of the boundary regions, two opposite sides are
assigned to the source and destination patches, which guarantees an equal number of pixels coming directly
from the source and destination. We then cut in each region, and find the union of
cuts. In doing so, w
ed.
We also allow the user to add some hard constraints indicating that pixels should come from the source
image. The hard constraints are the four boundary constraints. This is applicable when the boundary of the
ation patch is the boundary of the original image, or it is covered by trees and the user would like to
remove it by copying pixels from the source image. Figure 8.5 (c) shows the result with the left boundary
constraint.
8.3.4 Adaptive Blending
The graph cut is powerful to find the best path along the sour
and destination patches. However, seams will still be if there
does not exist a good path. A blending technique (or featurin
technique) can be used to reduce the seams. However, blendi
along the cu
Figure 8.8. Blending with constant
widow size and adaptive blending. g
ches are di
For each point I in the cutting path, given a predefined blending window size D, we find the range Di
within which the source and destination patches have consistent color. Then we fi
along the cutting path with a median filter such that the transition is smooth. Finally we blend the
source and destination patches using a weight that is inversely proportional to the distance to the cutting
path. Figure 8.8 compares the result of constant and adaptive blending.
114
8.4. Results and Discussion
We have applied our ICP tool for a number of applications, including removing occlusions, replacing
structures and textures, and generating new structures.
8.4.1 Remove Occlusions
The ICP tool is very powerful in removing occlusions in images of buildings. Our tool works
extremely well for large holes or even when the structure is completely occluded. We have tested our
system with dozens of images and high-resolution mosaic images (typically 3K by 2K). Mosaic images
cover more building facades and more repetitive structures. The ICP tool also works well for a single image
as l
est matches the destination patch and pastes it there. Finally the graph cut algorithm is used to
clean up the boundary and adaptive blending is used to further reduce seams. The user can also select a
source path and roughly indicate the destination location. The system then automatically finds the best
location to paste the image, and computes the best cut to seamless blend the two patches. The output of our
algorithm is both the repaired perspective image and the rectified image. The latter can be directly used for
texture mapping (Figure 8.1 and 8.11).
For generating textures applications, we can first rectify the perspect ,
and then use the ICP tool to remove occlusions in the rectified image only. Figure 8.12 shows the results.
The processing time for each image depends on the number of occlusions, the size of the destination patch
and the size of the original image. For the results shown in Figure 8.11 and 8.12, it takes less than one
minute to process each of the images. Figure 8.17 shows the rendering results with the textures mapped to
the 3D models, note that the trees are removed.
ong as there exist repetitive structures.
The typical process of occlusion removing using the ICP tool is as following. The user selects the part
needs to be removed by dragging a rectangle. The system automatically computes the vanishing points, and
finds a quadrilateral that aligns with the image lines. Then the system automatically finds a source image
patch that b
ive image using vanishing points
115
8.4.2 Replace Structures and Textures
ce the existing structures and textures. The user first indicates the
e source patch, and the system uses it to seamlessly
replace the destination patch (Figure 8.13). The ICP tool allows the user to copy and paste image patches
e effects (Figure 8.13), which cannot be achieved using pervious techniques.
The
s that works very well for
ima
r result is much fast
(5 seconds) and better (Figure 8.5 (d)). Compared with previous example-based techniques, the ICP tool
We can also use the ICP tool to repla
structure that needs to be replaced using a rectangle. The system then automatically finds a similar structure
and seamlessly replaces it. The user can also specify th
with very different perspectiv
user can even copy and paste textures between different buildings (Figure 8.14). However, as
aforementioned, since there is an aspect ratio ambiguity, we need the user to interactively adjust the scale to
match the source and destination patches.
8.4.3 Modify and Generate New Structures
One limitation of copy and pasting a whole patch is that the repaired result looks repetitive. To reduce
the tiling effects, we can make up new patches using the ICP tool. Rather than selecting a large patch that
contains a whole window, we select small patches that contain only a portion of the window. We then use
these patches to make a new window in a way similar to patch based texture synthesis [Efr01]. Figure 8.15
shows a new window generated from three existing windows. We can also modify the structures in the
image to generate some interesting results. Figure 8.16 shows two examples of extending a building into a
skyscraper.
8.4.4 Discussion and Comparison
The ICP tool is a powerful mean
ges of buildings where repetitive structures appear. Compared
with commercial software, such as Photoshop, the ICP tool works
automatically, saving hours of tedious manual work and achieves
better results. Figure 8.9 shows that another commercial software,
Microsoft Digit Image Pro 10, took five minutes but failed to repair a
well structured image (Figure 8.5 (a)), while ou
Figure 8.9. Result of Microsoft
Digit Image Pro 10.
116
works well for images with strong perspective effects and patches with different perspective effects, where
mos
e inpain
age repair techniques depends on the patch size (Figure
8.10). For pixel level image inpainting techniques [Ber00], they work well for small holes and thin
structures. Fragment [Dro03] and patch based methods [Per04] work well for relatively large holes, where
the structure and texture information can be propagated from the boundary. Our method considers the
global structure of the image, hence works well for highly structured images and holes where the structure
is completely occluded. A more powerful image repair tool can be achieved by combining these different
level-of -detail techniques.
8.5. Conclusion
occlusions in images with repetitive
g image using vanishing points. To
the destination. Then, the system
on. Finally these two patches are
echniques. The ICP tool works very
ghly structured images.
t of pervious work fails. Furthermore, since we explicitly use the high level knowledge of repetitive
structures, our method works very well for large holes and can achieve visually pleasing effects even when
the whole structure is completely occluded.
The limitation of our technique is that the
result may look repetitive, which can be alleviated
by copying small patches and generating new
structures. However, for removing small objects,
such as power lines and texts, the imag
Inpainting Fragment Patchwork
ICP
Patch Size Pixel Image
Figure 8.10. Patch size in different image repair
techniques.
ting technique achieves better results than ours. As
discussed in [Per04], the application of different im
This chapter presents an intelligent copy and paste tool to remove
structures. We explicitly estimate the perspective effects in the editin
remove occlusions, the user simply drags a rectangle indicating
automatically finds a source patch and transfers it to the destinati
seamlessly blended together using the graph cut and adaptive blending t
well for images of architectures, especially for heavy occlusions and hi
117
Figure 8.11. Left: original image; middle: repaired image; right: rectified image with occlusion
removed.
Figure 8.12. Left: original image; middle: original image rectified; right: rectified image with occlusion
removed.
Figure 8.13. Structure replacement. Left: original image; middle: source image; right:
replacement result. Note the perspective differences in the source and destination patches.
118
Figure 8.14. Texture replacement between buildings with different perspective effects.
Figure 8.15. Generate new patches. Left three images: original windows. Right: new generated window
and rectified.
a b
c
d e
Figu
removed; i e
re 8.16. Extend buildings. a) Original image; b) Extended result; c) Original image; d) Trees
e) Build ng ext nded into a skyscraper.
119
Figure 8.17 Rendering results of a dorm area in USC with trees removed.
120
Chapter 9
Integrating Videos
9.1 Introduction
Static textures from ground images are useful in creating a realistic imagery of a virtual environment,
however, they have been difficult to employ in applications that require rapid and dynamic updates of the
environment. On the other hand, videos can be used as a texture resource to create dynamic textures to
reflect the most up-to-date changes of environment imagery.
This chapter presents a novel technique, called texture painting from video, to cope with the
aforementioned limitations of the static texture mapping and visualizations. Live video is used as texture
resource and mapped dynamically onto the 3D models of scenes to reflect the most recent changes of the
environments. The video streams can be acquired from stationary or moving cameras such as handheld
camcorder, and their projections onto the 3D model are achieved in real-time. Unlike the traditional texture
mapping in which each texture image is a priori associated with, and mapped onto, patches of the geometric
models, our approach dynamically creates the associations between the model and image as a result of
image projection during the rendering process. In this case, we can automate the texture mapping process.
As new images arrive, or as the scene changes, we simply update the camera pose and image information,
rather than repeat the time consuming process of finding mapping transformations, hence make it possible
to handle live video streams in real-time.
121
9.2 Related Work
Texture synthesis is a popular technique for texture creation. Such an approach is able to take a sample
texture and generate an amount of images, while not tly like the original, will be perceived by human
beings to be the same texture. The parametric model based approach uses a number of parameters to
describe a variety of textures ([Hee95, Por00]). The non-parameterized texture, or example based methods
rectly copying pixels from input textures [Efr99]. Recently, [Ash01] suggested an
whole patches of input images. While the texture synthesis technique
n certain applications to be a useful tool for texture generation, the
sults of the synthesized textures are not photo-realistic and lack of texture details.
n alternative way to create textures. A number of interactive texture painting
tems are interactive, allowing users to easily
des
ited length, they can generate an infinitely long image sequence
by rearranging and blending the original sequence. [Soa01] presented the idea of dynamic textures that are
sequences of images with a certain stationary property in time. By learning a model from the input
exac
generate textures by di
approach to synthesize textures using
has been demonstrated successfully i
re
Texture painting is a
systems have been suggested. [Iga01] presented a 3D painting system that allows users to directly paint
texture images on a 3D model without predefined UV-mapping. [Ber94] employed the Haar wavelet
decomposition of image for multi-resolution texture painting. [Per95] painted multi-scale procedural
textures on 3D models. Most of current texture painting sys
ign and edit textures to achieve desired effects. The results can be aesthetically pleasing, but it is hard
to make them photo-realistic.
A straightforward way to produce realistic textures is to use real world images as texture resources. To
create a complete texture map covering the entire scene being textured, multiple images are used. [Ber01]
used high resolution images captured from multiple viewpoints to create high quality textures. [Roc99]
stitched and blended multiple textures for creation of textures. [Ofe97] suggested a quadtree approach to
represent multi-resolution textures extracted from image sequences. This method requires user to mark the
texture area to be extracted, and then manually track the area for a short image sequence (less than 16
images).
Recently several works suggested using video clips as texture resources. [Sch00] proposed the idea of
“video textures”. From the input clip of lim
122
sequence, they synthesize ne
123
Base Buffer
Triangle Mesh
Figure 9.1 Polygon clipping.
exture memory. Current graphics cards have a limited number of
texture units and memory, so limiting the number of base buffers is
necessary.
w dynamic textures. Both above methods use the textures in a non-traditional
way to achieve desired effects and goals of applications. Their systems deal only with the special textures
of repeated patterns, such as sea waves, smoke plumes, etc. In our approach, however, we are dealing with
general image textures, i.e. a multidimensional image that is mapped to a multidimensional space. Rather
than synthesize new textures, we directly use the original video captured from any real world scene to
produce an accurate and realistic appearance of the environment.
al images rather than
gen
static images (static textures, Chapter 7) or videos (dynamic textures).
9.3
ndering using graphics hardware, and is also proportional to the
size of t
9.3 Texture Storage
Our goal is to generate a photorealistic virtual environment with textures from re
erating texture patterns or synthesizing textures on-the-fly wile rendering. So an efficient way to store
textures for a large-scale environment is important, especially when videos are used as a texture resource,
which leads to infinite texture storages.
With the fact that only limited scene geometry can be visible from a viewpoint, we propose the idea of
base texture buffer, which is a texture buffer associated with a model patch or a group of neighboring
patches being visible from the viewpoint. The base texture buffer is first initialized as white texture, and
then dynamically updated with
.1 Base-buffer allocation
Given the 3D models of a large-scale environment, we need to allocate base buffers for texture storage.
The goal of base-buffer allocation is to find an optimal allocation that covers all the geometry of the
environment and satisfies three criteria: minimum number of base buffers, minimum number of clipped
polygons, and optimal texture quality.
The number of base buffers corresponds to the number of texture units
in scene re
124
roblem. Texture quality depends on the
vide resolution, the angle of the polygon relative to the camera view direction, and the angle of the
ffer. A good allocation for a base buffer is one where textures will not be
compressed or stretched during the texture mapping process. This means the base buffer should be oriented
parallel to most of its associated polygons. Simple algorithms that combine nearby polygons into one base
buffer often lead to low texture quality (Figure 9.2 left), since polygon orientation also has to be taken into
account.
ronment is analogous to the
entire surface of a given 3D
nvironment model has a large
comes prohibitive. Matsus ita
es the camera-positioning problem using a heuristic that gives p
Since the number of clipped polygons will increase the total number of polygons and affect the
rendering speed, we also want to minimize this number. Polygon clipping is necessary when a base buffer
only covers part of a polygon. As shown in Figure 9.1, we need to clip the triangle in order to compute UV
texture coordinates for the shown base buffer. Assigning base buffers that cover entire polygons will avoid
the polygon-clipping problem.
The last issue with base-buffer allocation is the texture quality p
o
polygon relative to the base bu
Base Buffer
Base Buffer
Figure 9.2 Texture quality. Left: low texture quality due to slanted angle of polygon
relative to base buffer; right: high texture quality when the base buffer is parallel to
the polygon plane.
9.3.2 Previous Work
The problem of selecting optimal base buffers that cover the whole envi
problem of selecting a minimal number of camera positions that cover the
model. The latter problem has been shown to be NP-complete10. When the e
number of polygons, the exact solution for optimal base-buffer allocation be
[Mat99] solv
h
riority to viewpoints covering
larg
e number of buffers (bn), s a function of the number of clipped
pol , and is a function of the texture quality. The design of the functions and is
straightforward; however, the measurement of texture quality is more complex. Our measure is the ratio of
the model surface area to the texture image area, as given in Equation 9.2. We aim to find a buffer
allocation method that minimizes the cost of Equation 1.
e model regions. This algorithm works well for an object, but has disadvantages in an environment with
multiple buildings, such as redundant texture storage and possible poor texture quality.
9.3.3 Our Solution
We summarize the three criteria of base-buffer allocation using a cost function shown in Equation 9.1,
where ) (bn f is a function of th ) ( pn g i
ygons (pn) h ) (bn f ) ( pn g
) _ ( ) ( ) ( ) ( quality texure h pn g bn f buffer F + + =
) ( / ) ( ) _ ( texture S polygon S quality texture h =
Since the exact allocation solution is prohibitive, we attack this problem by leveraging the
characteristics of the building models in the environment to find a near-optimal solution. Noting that
buildings usually have four major planar sides, we opt for using a box with four rectangular texture-maps
for each building. To allocate the four rectangular buffers, we first divide the scene model into separate
buildings. Then we find the four major directions based on building surface normals and allocate a base
buffer for each direction (Figure 9.3). To find the relationship between the model and base buffer, we
(9.1)
(9.2)
Figure 9.3 Base-buffer allocation.
125
project the vertices onto the base buffer with an orthogonal projection to compute the UV coordinate for
each model vertex.
Four rectangular buffers work well for our campus model. The base buffers are automatically allocated,
yieldi
ilding ffers. Th
maximized for most polygons because the buffers are parallel to most of the polygon planes. Since each
base buffer covers one whole side of a building, polygon clipping is avoided.
9.4 Texture Painting from Video
Usi eo as the texture resource can offer us many b ts to reproduce real scene. How we
also several technical barriers that need to be rcome. First, the video streams ne be
acq
positions and orientations are known, these data can be
projected onto the scene model correctly, thereby highly
precise tracking of camera pose and alignments between
image frames are required.
Third, due to lack of the models for dynamic objects,
those moving foreground objects need to be segmented
from the input video to persist only the background
textures being projected onto the surfaces of scenes.
ng a near-optimal solution according to the energy function in Equation 9.1. The campus model has
100 bu s and 40,000 polygons, covered by 400 base bu e texture quality (Figure 9.2 right) is
ng live vid enefi ever,
encounter ove ed to
uired continually and updated, which may lead to infinite texture storage. A straightforward approach
of using certain amount of texture memory is unable to keep old texture data from where the projection was
a few moments ago. Such texture retention requires approaches being able to persist in the projection for
each new video frame onto a surface area.
Second, since we want to paint the dynamic videos
onto the surface of 3D model. Only if the camera
Figure 9.4 Overview of the texture painting
from video system.
126
Proposed Approach
We present our approaches towards the system requirements essential to the texture painting from
vide
e the infinite video streams
with a certain amount of texture buffers so that the rendering algorithm can persist in the texture
. Given the fact that only limited scene geometry can be
associated with the base texture buffer is
o. We propose novel methodologies for rapid creation of dynamic textures from live video streams and
their data retention, storage management, and texture refinement. We also implement a prototype of real-
time 3D video painting system based on the methodologies we proposed.
Figure 9.4 illustrates the main structure of our texture painting from video system. As stated above, the
main challenge of using live video as texture-maps is how to effectively handl
in
projection for each new frame onto a surface area
visible from a viewpoint, we propose the idea of “base texture buffer”, which is a texture buffer associated
with a model patch or a group of neighboring patches being visible from the viewpoint (Figure 9.5). The
base texture buffer is first initialized as white texture, and then dynamically updated with the new coming
frames.
To update the base texture buffer for each group of visible patches, we transform each new frame to the
base texture buffer. Let the projection matrix P and the projection
mat very new frame, the transformation between
rp every new frame from different viewpoint to
ent onto the visible surface of scene. In this
ms of infinite texture storage and also the time consuming process of polygon
clip
generated and refined as described in Chapter 4
achieve seamless texture images. Several other core
steps, including selective texture painting, base buffer selection, and occlusion detection, are also suggested
and will be detailed in following sections.
rix of the new coming frame at a viewpoint is
v
P . For e
the new frame and the base texture buffer is
1
*
−
=
v t
P P P
By using Equation 9.3, we are about to dynamically wa
the common base texture buffer, and project the updated cont
case, we overcome the proble
t
P
(9.3)
ping for every video frame.
To achieve accurate image alignment, the 3D model is
and 5, and we recover the camera pose using a robust trac
refine the recovered alignments in 2D image domain to
king approach proposed in [Neu03]. Then we
127
9.5
A key part of texture painting from video is to dynamically update the base texture buffer, which is based
on a process of model based image warping. First, we select a base texture buffer. The pose of the base
buffer relative to the 3D model of the scene is computed. The correspondence of each pixel in the base
buffer to the 3D model is computed using Equation 9.4, where denotes the pixel in the base buffer, is
the projection matrix of the base buffer, and
Approach Details
9.5.1 Model Based Image Warping
b
I
b
P
M is the corresponding point on 3D model. Next, when a new
video frame comes, if the model correspondence to the base buffer is visible from current camera position,
we update the base buffer with current new frame. This is done by a model based image warping
operation.
(9.4)
1 −
=
b b
P I M
(9.5)
M P I
v v
=
v b
I I = (9.6)
M
Base Texture
Video Image
I b
I v
Figure 9.6 Result of the model based image
warping. Left: original image, right: image warped
to base buffer.
Figure 9.5 Model based image warping.
As indicated in Figure 9.5, for each pixel
b
I in the base texture, we can find its correspondence M on
the 3D model. Given the tracked camera pose and the projection matrix denoted P , we project the 3D
v
point M back to the image plane to find its c rresponding pixel using the Equation 9.5. en o
V
I We th
128
update its color information in the base texture buffer using Equation 9.6. This warping process is
d in the base texture buffer.
The 3D model based approach is flexible, allowing the camera moving freely in any 3D environment.
It requires, however, highly precise camera tracking, which is usually hard to achieve, especially in an
outdoor environment. We compensate the tracking errors by employing a 2D image registration approach
in Section 9.5.5. Figure 9.6 illustrates the result of the model based warping approach.
of I(xb, yb) in the base texture buffer is treated as a real number. Using the Equations (9.4-6),
l in video f ame,
e coordinates of which, I(xv, yv), are also real
numbers. Usually, I(xv, yv) will not fall into an
integer grid in the image. We then interpolate the
color information using a four-neighboring bilinear
interpolation. Figure 9.7 shows the result of
applying the approach to Figure 9.6,which
apparently improved the image quality.
9.5.3 Occlusion Detection
Under some circumstances, although the 3D mo ed
from cu w ring will project part of the frame onto
the s ea
h
g, in
back to the image plane, and compare its depth value with the estimated depth map. Finally, we keep the
repeated for every pixel containe
9.5.2 Improve Warped Image Quality
Direct back-warping of the image to the base buffer may result in aliasing. To improve the final
texture-maps quality, we use bilinear interpolation for anti-aliasing. As indicated in Figure 9.7, each pixel
coordinates
we can find its corresponding pixe r
Figure 9.7 Using bilinear interpolation to
improve warped image quality.
I(xv, yv)
I(xb, yb)
th
del is visible from the base buffer, it may be occlud
arping and textu rrent camera viewpoint. The model based
occluded model areas (Figure 9.8 left). Thi occlusion problem is solved using depth maps. From ch
camera viewpoint we render the 3D model to obtain an estimated depth map. W en doing the image
warpin we first find the corresponding 3D point for an image po t being warped. We then project it
129
pixel projection only if its depth value is less than the corresponding depth value in the depth map. The
result is shown in Figure 9.8 right.
Figure 9.8 Occlusion processing. Left: texture painting before depth test. Right: after depth
test. Note that the occluded texture part in the ellipse is removed.
9.5.4 Selective Texture Painting
Texture painting from video has the advantage of
environment changes. However, if we don’t select the con
the video sequence onto the 3D environment, and con
shows a case of moving objects are painted onto the 3D
We can view that there are three types of texture in
foreground textures such as trees, and dynamic objects s
dynamic texture updating to capture the most recent
g in
sequently give an undesired result. Figure 9.9 (left)
model as part of background textures.
formation in a video sequence: background textures,
uch as moving people or vehicles. The background
tent to be textured, it will project everythin
Figure 9.9 Selective texture painting. Left: texture painting without background learning.
Notice that the moving yellow sphere is painted as part of background texture. Right: after
background learning. Only the static scene is painted as background texture.
130
textures are the only part we are interested in and intending to process, since they have corresponding 3D
model to be textured.
By employing a background learning approach, we segment and remove the undesired objects from the
input video. Given a number of training frames, we first learn the images to estimate a background model
based on the median technique. General median method will only work for static cameras. To deal with
moving cameras, we first warp each new frame onto the base texture buffer (which is static relative to the
3D model), then we average each warped frame using the median filter over the base texture buffer. After
that, we segment the objects from the background using image subtraction approach. Figure 9.9 (right)
sho and
mo hapter 9).
9.5.5 Refine Texture Alignment
As mentioned above, inaccurate camera pose tracking will result in misalignments between the
textured images as shown in Figure 9.10 (left). The alignment needs to be refined to improve the
visualization value. Our approach to this problem is to perform the refinement in 2D image domain, i.e. we
first register the video frames based on the camera tracking data and 3D model, and then re-align the
registered image with the base buffer using a 2D image registration approach.
ws the result after the background learning. nother option for selective texture painting is to track
del the dynamic objects, and paint textures on the dynamic objects (C
A
Figure 9.10 Refining texture alignment. Left: texture painting using only 3D model
based warping, which results in significant misalignment due to inaccurate camera pose.
Right: texture painting after refining the texture alignment.
131
Motion Model
Affine model is used for estimating the transformation parameters between the warped video frame and
the base texture buffer.
⎢
+
⎥ ⎢ ⎥ ⎢
=
⎥ ⎢
Parameter Optimization
While three pairs of correspondences are sufficient to compute a unique solution for affine
transformation, we use all matching corners to guarantee accuracy. Least square approach is used to
estimate the optimal parameters.
Once the affine parameters are estimated, the 3D-warped image is warped again using the affine
transformation to the base texture buffer. Figure 9.10 (right) shows the refined result using the approach.
9.6 I
described in Chapter 7,
such as GPS and inertial sensors, the key issue in generating
dynamic textures is to update the base buffer in real time. Updating the textures for all base buffers at the
same time is infeasible for a real-time system, so we dynamically detect a visible base buffer (active base
buffer), and update the buffer using a software technique, then accelerate the process with the features of
graphics hardware.
⎥
⎦
⎤
⎢
⎣
⎡
⎥
⎦
⎤
⎢
⎣
⎡
⎦
⎤
⎣
⎡
⎥
⎦
⎤
⎢
⎣
⎡
y
x
yv
xv
yb
xb
t
t
I
I
a a
a a
I
I
22 21
12 11
Feature Matching
Corner feature of scenes is used as matching primitive to find the correspondences between the video
frame and the base texture buffer. The Harris algorithm is employed to detect image corners, and the SSD
(Sum of Squared Difference) approach is used for feature matching. Since the images have been aligned
based on the camera tracking data, the SSD matching space is greatly reduced.
Given the pose of the video with either the pose recovery method for each frame as
(9.7)
⎥
mplementation Details
or with a hardware tracking technique
132
9.6. 1. Active buffer selection
the current camera viewpoint. To speed up this process, bounding boxes are
computed for each building. Then the bounding boxes are intersected with the camera frustum to determine
their visibility. Among all the visible buildings, we find the closest one to the current viewpoint. In a
pute the angle between the buffer normal and camera optical axis for all four base
buf
A two-step strategy is employed to find the current active base buffer (Figure 9.1). First, we find all
the buildings visible from
second step, we com
fers of the closest building, and set the buffer with the minimum angle as the current active buffer. If all
the visible buffers in the closest building exceed an angle threshold, the next closest building is considered.
A weighted average of the normalized building distance and buffer view angle is used to select the active
buffer in order to ensure the texture quality (Equation 9.8).
Angle Dist w * * β α + =
Where Dist is the minimum distance from the polygons of a building to the current viewpoint, Angle is
the minimum angle between the buffers’ normal and the view direction, and α and β are the weights. Dist
and Angle are normalized respectively using the minimum distance and angle among all the visible
buil
We update the base buffers pixel by pixel with the texture painting from video technique. As afore
dynamic textures is a 3D model-based image warping process
dings.
9.6. 2 Software Implementation
mentioned, the key process in generating
(9.8)
Figure 9. isible 11 Detect active buffer. The yellow ray identifies the closest v building.
The re e buffer, the
three yellow edges are the other three buffers of the active building.
d edge of the rectangle above the building denotes the current activ
133
(Figure 9.5), therefore we need to know the corresponding 3D model point for every pixel in each base
buf
e base buffer if the 3D pose is inaccurate. The
system 256 x 256 (Table 9.1), given accurate pose.
9.6. 3 Acceleration with graphics hardware
The software implementation achieves real time for a small-sized texture image, such as 256 x 256.
However, we need to pre-compute and store a depth map for each base buffer, which costs additional
memory. For our university campus with 100 buildings, each building has four buffers with a depth map of
ired to store the depth maps.
for a large-scale environment. We need a more efficient implementation that uses graphics hardware
features to improve the performance and reduce the memory requirement.
Compared to 2D image warping, which can be easily implemented using GPU programming, image
warping based on a 3D model is more complex to implement using graphics hardware. If the 3D model is
just a plane, the relationship between the base buffer and camera image plane can be described using a
homography. However, image warping based on homography is impractical for our purpose because most
building geometry is much more complex than a few planes. We cannot afford to segment the video
frames a
fer to do the image warping. In order to continuously update textures in real time, we pre-compute the
depth map for each base buffer, and use it as a lookup table to find the 3D point for every pixel in the
buffer.
As a video frame is processed (Figure 9.5), we first find the active base buffer according to the
aforementioned algorithm. For each pixel
b
I in the active buffer, we find the corresponding 3D point M
based on the depth map. If the point M passed the occlusion test (i.e., it is visible from the camera
viewpoint), we project it into the current camera image plane to find the corresponding pixel
v
I . We then
update the base buffer pixel with the pixel
v
I from the video frame. Textures are segmented if moving
objects present, and the warped image is registered with th
can reach up to 12 fps for video size of
512 x 512, more than 400 MB are requ
Generating high quality dynamic textures is hard to achieve real time performance using software
implementation, and the storage of depth maps in software implementation is a heavy burden on the system
nd warp them according to different homographies in a time- critical system.
134
To fully benefit from graphics hardware, we use the technique of projective texture mapping [Mar78].
We first set the current active buffer as the rendering buffer, then render the current scene using projective
texture mapping; we solve the visibility problem using a hardware register combination based on a depth
map. In order to update the buffer with new textures and keep the old textures, we use an alpha channel to
mas
th
e textures, and
solve the visibility problem based on a depth map from the camera’s viewpoint. Set the alpha value of
the pixels without textures or invisible to zero.
• Enable alpha test; only render pixels with alpha value greater than zero into base buffer.
• Read the rendering buffer, and set it as the textures for the models corresponding to the active base
buffer.
We can gain many benefits through the implementation of using graphics hardware. The frame rate
increases dramatically, and we can achieve real time for high-resolution textures of 1024 x 1024. Using
graphics hardware features, depth map storage for base buffers is not necessary, which saves a great deal of
memories. Table 9.1 compares the frame rate at different sensor resolutions and memory requirements (for
a fixed buffer resolution of 512 x 512 and 100 buildings with 40,000 polygons) for software
implementation and graphics hardware acceleration on a Dell machine with 3.4 G CPU and NVIDIA
quadro 980 graphics card.
k out all the pixels outside the camera frustum and those pixels without textures. The process is
summarized as the following pseudo code:
• Dynamically detect the active base buffer.
• Render a depth map from the camera viewpoint. (The depth map here is used to solve e visibility
problem, not to find the correspondence between model points and pixels in the base buffer.)
• Load the image of the current base buffer (with textures already painted) into the rendering buffer.
• Render the scene using projective texture mapping with the current video frame as th
135
Frame rate at different sensor resolution
Simulation Data
simu
(Figure th a red frustum) moving freely within the environment
n the Figure
9.12(b), which is used to simulate input video. We then applied the proposed approach to this scenario.
Figure 9.12(e) is the input image warping to the base buffer, and Figure 9.12(d) shows the images painting
onto the 3D model. Since the synthetic camera pose is perfect in this simulation experiment, no further 2D
image registration is needed. The whole system is in real time, achieving ~30fps on a 1.1GHZ DELL
workstation. It is worth to mention that the camera motion is completely arbitrary, and the captured images
are painted persistently onto the correct surface areas, as shown in Figure 9.12(c), and Figure 9.12(f).
The system is further tested on a large-scale environment, the USC campus, with synthetic textures and
camera poses. The campus model was generated using techniques described in Chapter 4 and 5, and we
texture-mapped it with some brick textures to create a simulation environment (Figure 9.13 left image). A
simulated camera (shown with a red frustum) moved freely to capture the video, which was then painted
onto the 3D model using the presented approach. In the simulation data, again, since the camera pose was
perfect, no further 2D image registration was needed. The whole system was real time using a 3.4GHZ
DELL machine (Table 1). Actually, we found that most of the time was spent on rendering the whole
9.7 Results
We first tested the presented system using simulation data. Figure 9.12 shows the results from our
lation experiment. A 3D cube model covered by a real image is used to mimic a 3D environment
9.12(a)), and a synthetic camera (shown wi
is used to “capture” the scene. The image captured from a viewpoint by the camera is shown i
Method 256 512 1024
Memory
Software 12 fps 4 fps 1 fps 700 M
Hardware 30 fps 20 fps 10 fps 300 M
hardware acceleration.
Table 9.1 Frame rate and memory requirements of software implementation and graphics
136
sce g
ainted, the ame rate reached up to streams at the same time in
al time. Fi middle image shows the result of t f buildings d in severa
Unlike 2D image re n, which age p, 3D mo d image
registration constraints era motion some s camera path, we
esigned some special shape using the given texture. Figure 9.13 right image shows the PHE letter painted
nto the surface of buildings using this technique. Other irregular shapes were also painted, which is hard
raditional texture mapping method. This feature also makes video painting technique a
potential method for special effects and artistic design.
e 2D
affine
ne with texture mapping rather than painting. When we only rendered the current building bein
p fr 60fps. So we could paint multiple video
re gure 9.13 ens o painte l minutes.
traditional gistratio requires im overla del-base
requires no on cam . With pecific motion
d
o
to achieve using t
Real Data
We also used real video streams captured from hand-held cameras to further test the algorithms. While
painting textures that cover a whole simulated environment is easy (we accomplished it in less than half an
hour), capturing video data that covers a whole real environment (with the same models) is much harder
even with hand-held cameras. More research is needed to explore techniques for camera orienting and path
design to efficiently capture data, which is out of the scope of this research.
The strength of our system is automatic texture mapping and real time texture updating, to show this,
we captured three video streams in a university campus with different motions. The motion of the first
video stream is mainly rotation, the second one translation, and the third one both translation and rotation.
A portable tracking and video system [Neu03] was used to collect camera tracking data and video streams.
The video frames were first aligned to a temporary buffer based on the tracked camera pose; then th
transformation between the warped image and base frame was computed to refine the alignment; and
finally the warped image is re-warped back to the base buffer using the computed 2D transformation.
Figure 9.14 shows the result. The whole system is fully automatic and real time (29 fps thanks to hardware
implementation)
137
9.8 Conclusion
Texture is a crucial element in today’s graphics oriented applications. Traditional static texture-maps are
limiting for capturing a dynamic and up-to-date picture of the environment. This chapter presents a new
technique of texture painting form video. By employing live video as texture resource, we are not only
able to create an accurate and photo-realistic appearance of the rendering scene, but also can support
dynamic spatio-temporal update in the structure of texture model, database, and rendering system. We
present our approach towards the system requirements and experimental results for both simulation and real
datasets.
Figure 9.12 Texture painting with simulation dataset. A cube model covered by a real image is used as
a
simulated environment (a). A synthetic camera moving inside the environment is shown in red frustum.
The image viewed from the camera’s viewpoint, which is used as simulation of input video (b). The
3D model (d). The (c), (f) show the painting results of simulated data.
video frame is warped based on 3D model onto the base buffer (e), then painted as texture onto the same
a b c
d e f
Figure 9.13. Simulated data for a large-scale environment. Left: simulation environment of the USC campus.
Middle: tens of buildings painted in several minutes. Right: user-painted letters on building surfaces in real-
time.
138
139
Figure 9.14 Dynamic textures generated from three real video streams. Left column: selected frames
from the three video streams. Right column: results painted in real time.
Chapter 10
Applications
The photorealistic representation of large-scale environments have many applications, including
applications in government agencies for development planning, academic studies in climate, air quality, fire
propagation, and public safety; and commercial applications such as entertainment and virtual tours. This
chapter shows one such application in a system called Augmented Virtual Environment (AVE).
10.1 Introduction
Three-dimensional Virtual Environments (VEs) are used for engineering, training simulations,
entertainment, and planning. In many of these cases, the value of the VE is increased with dynamic
geometry and apprearance changes that model moving people and vehicles within the scene. The goal of
the AVE system is to create an enhanced VE model that augments current static VE models with
representations of the dynamic activities in the corresponding real world scene. The inclusion of dynamic
data makes such AVE models suitable for complex event visualization and comprehension in command
and control and surveillance applications. Our approach is to derive the dynamic aspects of the model from
real-time video imagery (or other sensor data) gathered from multiple and possibly moving sources in the
scene.
y
from range and image sensors; (2) model reconstruction to produce static 3D surface models; (3) model
refinement to segment structures and extract dominant and newly observed scene features; (4) sensor
Figure 10.1 depicts the main components of the AVE system: (1) data acquisition to collect geometr
140
motion tracking to provide sensor pose and motion da for registration and data fusion; (5) object detection
and tracking to facilitate the modeling of dynamic scene elements; and (6) data fusion to combine all the
models, images, and video with annotation for a coh visualization that supports scene comprehension
and dynamic scene analysis.
The contribution of this thesis to the AVE system is mainly four parts, static scene modeling, video
ect visualization, and integration and 3D visualization.
r the University Park area with an accuracy of centimeters in height and sub-meter in ground
pos
ta
erent
projection, dynamic obj
Range Sensors Image Sensors Tracking Sensors
Data Acquisition Motion Tracking
Model
Reconstruction
Model Refinement
10.2 Static Scene Modeling
An accurate model of the scene is essential for fusing image data from various viewpoints to resolve
occlusions and create realistic visualizations. For efficiency and visualization context, we create a static
model off-line so that only the dynamic scene elements have to be modeled at run-time and the entire scene
model is visible regardless of sensor activity and coverage.
As described in Chapter 4, we employ a LiDAR system in an aircraft to quickly collect a 3D point
cloud fo
ition. The model is refined using the primitive based modeling system, part of the models are detailed
with high-resolution aerial images, and ground view images are used to improve the visual quality. We
Dynamic Fusion
Imagery Projection
Object Detection
and
Tracking
Augmented Virtual
Environment
Figure 10.1 AVE system components
141
modeled the entire University Park area including the USC campus, LA Natural History Museum, Science
Museum complex, LA Coliseum, and Sports Arena (Figure 10.2).
10.3 Video Projection
The AVE system produces visualizations such as that shown in Figure 10.3. This example has three
tracked cameras whose viewing es. (A static texture is projected frustums are depicted by the wire-fram
Figure 10.2 University Park models
Figure 10.3 An AVE system screen snapshot showing
three video projections within a campus area.
142
only onto the ground terrain.) The images from each camera are projected onto the scene model,
effectively inverting the camera imaging process. In this example, the cameras view portions of the
buildings and surrounding grounds. Note that the building textures visible in Figure 10.3 arise only from
projected video textures. Users observe the dynamic movement of cars and people in the projected
textures. Additional cameras could be placed in the scene to increase the area over which dynamic events
are visible.
10.4 Dynamic Object Visualization
The AVE system analyzes video imagery to segment and track moving objects in the scene. The
segmented regions are used to determine an approximate model that is dynamically positioned to capture
the projection of the segmented imagery, thereby creating a pseudo-3D model that enhances the
visualization of moving objects. The dynamic object analysis approach is a background subtraction
detection method followed by a pseudo-tracking algorithm [Neu03]. The outputs of the detection and
r coordinates bounding the moving object regions in the 2D image plane.
Figure 10.4 illustrates results of applying the approach to track moving vehicles and people around the
USC campus.
tracking system are the four-corne
Figure 10.4 Examples of tracked vehicles and people
143
A
C n
v
Model
Image
Figure 10.5 Parameters for position, orientation, and size
B
Ground
Plane
are derived from a tracked object to define a dynamic model
polygon.
We consider the case of ground-level cameras and use the observation that tracked objects (people and
igure 10.4 show examples of tracked objects we wish
-vector. We then project two corners of the bounding box onto the model plane to
determine the size of the model polygon.
As shown in Figure 10.6, a video sensor captures the dynamic movements of a person or a car in the
scene. The video cameras are near ground level so their projections of a person or a car appear distorted
vehicles) rest on the ground. The camera images in F
to model. Figure 10.5 illustrates how a dynamic polygon model is constructed for such cases. Three
parameters define the dynamic model polygon: position, orientation and size. Point A is the optical center
of the camera, the red box in the image plane is the bounding box of the tracked moving object. Point B is
the mid-point of the bottom edge of the bounding box. Using point A and B, we cast a ray AB (or vector
v), to intersect with the ground, and label the intersection point C. With the assumption that moving objects
rest on the ground, point C defines the 3D position of the model polygon. The orientation of the model
surface is denoted by vector n, which is in the same vertical plane as v, and n is always perpendicular to the
scene model up
a person (left) an
Figure 10.6 Image projection of d
car (right) without dynamic models produces
distorted presentations.
Figure 10.7 Dynamic models greatly improve the
comprehension of video textures projected onto the
model
144
without dynamic 3D scene models. The person and car appear to be “smeared” over the road and part of
the building when viewed from the raised viewpoints in Figure 10.6. To reduce these distortions in AVE
visualization, we add dynamic models in Figure 10.7. The paths of the walking person and the moving car
are depicted by a red line to show their current positions and orientations in the 3D world. The video
texture is projected onto t same as Figure10.6.
Users can freely navigat
eves real time (~25Hz) visualizations on a 2.2GHz Pentium-4
wor
terrain and vegetation models.
Figure 10.8 shows an AVE quad-window presentation. The top-left window is the user’s browsing
window. It shows a user-controlled view of the full 3D geometric model and all camera data. Wireframe
frustums are added to illustrate the camera positions and poses in the scene. Users typically fly around the
scene using this window and select regions (or sensors) of interest. The other three windows display three
selected camera viewpoints. These views can be locked to the selected camera (sensor) pose, making the
a
he model as before and the visualization viewpoints are th
e around the complete model.
e
10.5 Dynamic Fusion and Imagery Projection
Traditional texture maps require that portions of each texture image are a-priori associated with, and
mapped onto, patches of the geometric model(s) before visualization can begin. In contrast to this fixed
image-to-model association, the AVE system dynamically associates texture images with the sensor and its
pose within the model. The mapping between the model surfaces and imagery is computed dynamically as
a result of texture projection during the rendering process. Changing the sensor pose automatically changes
the mapping function. The projection process is implemented using projective texture mapping technique
[Mar78] and the visibility problem is solved using hardware shadow maps (similar to the algorithm
described in Chapter 8 for acceleration with graphics hardware).
The current AVE implementation achi
kstation, supporting up to ten live fire-wire video streams and high-resolution aerial photograph
projections onto our entire USC campus model containing over 200 building models embedded in LiDAR
images very similar to those obtained by the camer s. In addition to the video and model geometry,
145
annotation can be added to the scene by manual or automatic analysis. The four windows are simultaneous
views and facilitate a user’s interactive control during visualization sessions.
Our display system consists of an 8x10 foot screen, back-projected by a sequential-frame stereo video-
projector. A 3rdTech ceiling tracker is used to couple the rendering viewpoint to user’s head position. A
hand-tracker also facilitates mouse-like interactions. The overall system provides the user with a high
performance AVE visualization environment [Neu03].
We presented our methodologies and novel prototype of an augmented virtual environment (AVE) that
supports dynamic fusion of imagery with 3D models. The core contributions of this thesis to the AVE
system are model construction, video projection, dynamic model visualization, and dynamic fusion and 3D
visualization.
Figure 10.8 Quad-window AVE display
146
Chapter 11
Conclusions
A wealth of datasets from differ ntation. The key observation of
this thesis is that the different datasets are complementary, hence fusing information from these
complementary datasets reduces errors in processing each dataset. In addition, the fusion method benefits
from the merit of each dataset, thus it helps us to represent large-scale environments in an efficient and
accurate way.
This thesis presents a hybrid representation that fuses information from four complementary datasets,
LiDAR data, aerial images, ground images and videos, to photorealistically represent large-scale
environments. We first fuse information from both LiDAR and an aerial image to create urban models with
accurate surfaces and detailed edges. We then enhance the models with high-resolution façade textures, and
update them with dynamic textures from videos to capture the most up-to-date environment information.
The primary contribution of this thesis is the hybrid representation that fuses information from four datasets
to create a detailed, accurate and photorealistic representation of a large-scale dynamic environment. In
each of the steps, we also have novel contributions as following.
• A practical primitive based hybrid modeling system. Primitives are used in both outline extraction
from an aerial image and surface fitting from LiDAR data. Our system minimizes the user input
and creates urban models with accurate surfaces (up to ten centimeters) and edges (up to a foot) in
a fast and convenient way. A wide range of complex buildings in different large-scale urban areas
have been modeled using our system, including the USC campus, part of Washington DC and the
ent sensors exists for environment represe
147
Carson City. This shows the system’s capability and efficiency in modeling large-scale urban
buildings.
• A novel framework, the vanishing hull. This new concept is used to quantitatively analyze the
stability and accuracy of vanishing points estimation.
• solve the challenges in generating textures from ground images: heavy tree
amera’s narrow field of view and under-constrained translation problem.
age-editing tool, the ICP tool. The ICP tool is very powerful in removing
occlusions for highly structured perspective images. It also has broad applications, including
•
ion, the hybrid representation benefits from the merit of each dataset and generates results
with
Novel techniques to
occlusions, a c
• An intelligent im
replacing textures, extending structures, and generating rectified images for textures.
• A video painting system that directly paints videos onto 3D models as textures in real-time.
An application in the Augmented Virtual Environment.
In conclus
accurate surfaces, detailed edges, overview colors, seamless ground view textures and real-time
imagery information.
148
References
[Ah 2
Conference and FIG XXII Congress, April 22-26, 2002.
[Aga04]
[Air] h
[Alm ] priori
[And 0] A. L o computer graphics,”
e Techniques, Seattle, Washington,
United States, 1980, pp.32 - 42.
[An
[Ash01] M. Ashikhmin, “Synthesizing natural textures,” ACM Symposium on Interactive 3D Graphics, pp.
1.
rm02] G. Armin, W. Xinhua, “Integration of Landscape and City Modeling: The pre-hispanic site
Xochicalco,” IAPRS, Kunming China 2002, Vol. 34, Part5/W3.
zu97] R. Azuma, “A Survey of Augmented Reality”, Presence: Teleoperators and Virtual Environments.
Vol. 6, No.4, pp. 355-385, 1997.
[Bar83] S. Barnard. “Interpreting perspective images,” Artificial Intelligence, vol.21, pp.435-462,1983.
[Bar85] M. Barnabei, A.Brini and G-C. Rota, “On the exterior calculus of invariant theory,” Journal of
Algebra, 96 pp.120-160, 1985.
[Bar02] A. Barret, and A. Cheney. “Object-based image editing,” Proceedings of SIGGRAPH 2002,
pp.777–784.
[Bat01] M. Batty, D. Chapman, S. Evans, M. Haklay, S. Küppers, N. Shiode, A. Smith, and P.M. Torrens,
“Visualizing the city: communicating urban design to planners and decision-makers,” In: Brail R.
& Klosterman R. (eds), Planning Support Systems. ESRI Press and Center Urban Policy Research,
Rutgers University Press, New Brunswick, NJ, 2001.
[Bel02] C.J. Bellman, M.R. Shortis, “A Machine Learning Approach to Building Recognition in Aerial
Photographs,” PCV’02, Part A, 2002, pp. 50-55.
[Ber] M. Berg etc. “Computational geometry algorithms and applications.” ISBN: 3-540-65620-0,
published by Springer.
[Ber94] Berman, D.F. etc. “Multiresolution painting and compositing.” Proceedings of SIGGRPAH 94,
1994.
m0 ] F.E. Ahmed, S.B. James, “Building Extraction Using Lidar Data,” ASPRS-ACSM Annual
A. Agarwala et al., “Interactive digital photomontage,” Proceedings of SIGGRAPH 2004.
ttp://www.airborne1.com.
03 A. Almansa, A. Desolneux and S. Vamech. “Vanishing point detection without any a
information,” PAMI, 25(4): 520–507, 2003.
8 ippman, “Movie-maps: An application of the optical videodisc t
International Conference on Computer Graphics and Interactiv
t00] M. Antone and S. Teller. “Automatic recovery of relative camera rotations for urban scenes,” In
CVPR, pp.282–289, 2000.
217–226, 200
[A
[A
149
[Ber00] M. Bertalmio, G. Sapiro et al. “Image inpainting.” Proceedings of SIGGRAPH 2000. pp. 417–424.
[Ber01] F. Bernardini, I.M. Martin and H. Rushmeie h-quality texture reconstruction from multiple
scans.” IEEE Visualization and Computer Graphics, Volume: 7, Issue: 4 pp. 318-332,2001.
[Ber03] M. Bertalmio, L. Vese, G. Sapiro and S. sher. “Simultaneous structure and texture image
inpainting.” CVPR, II. pp. 707–714.
[Bert01] M. Bertalmio et al. “Navier-stokes, fluid dynamics, and image and video inpainting.” CVPR, I. pp.
355–362.
[Bos03] M. Bosse, R. Rikoski, J. Leonard, and S. Teller, “Vanishing points and 3D lines from
l video,” The Visual Computer, Special Issue on Computational Video, 19 (6) pp.
[Boy99] Y. Boykov etc. “Fast approximate energy minimization via graph cuts.” In ICCV, pp. 377–384,
[Bra95] ers, W. F¨orstner, and L. Pl¨umer,
95, pp.
[Cap90] anishing points for camera calibration,” International Journal of
[Car94] bra: an effective tool for computing invariants in computer vision, in
01,
[Cip99] D.P. Robertson. “Camera calibration from vanishing points in
[Cla98]
orkshop on Automatic Extraction of Man-Made
[Col90] . Weiss. “Vanishing point calculation as statistical inference on the unit
[Cou03] J.M. Coughlan and A.L. Yuille. “Manhattan world. Neural Computation,” 2003.
r. “Hig
O
omnidirectiona
417-430,2003.
1999.
C. Braun, T. H. Kolbe, F. Lang, W. Schickler, A. B. Crem
“Models for photogrammetric building reconstruction,” Compute & Graphics 19(1), 19
109–118.
[Bri91] B. Brillault-O’Mahoney, “New method for vanishing point detection,” Computer Vision, Graphics,
and Image Processing, Vol. 54 pp.289-300, 1991.
[Bur00] C. Burchardt and K. Voss, “Robust vanishing point determination in noisy images,” International
Conference of Pattern Recognition, 2000.
B. Caprile and V. Torre. “Using v
Computer Vision, pp.127–140, 1990.
S. Carlsson, “The double alge
J. Mundy,” A. Zisserman and D. Forsyth, editors, Applications of Invariance in Computer Vision,
number 825 in LNCS. Springer, 1994.
[Cha01] T. Chan and J. Shen, “Non-texture inpaintings by curvature-driven diffusions,” J. Visual Comm.
Image Rep. 12, 4, pp. 436–449.
[Che95] S. E. Chen, “QuickTime VR – an image-based approach to virtual environment navigation,”
Computer Graphics (SIGGRAPH’95), August 1995, pp. 29-38.
[Chr01] C. Frueh., A. Zakhor., “3D Model Generation for Cities Using Aerial Photographs and Ground
Level Laser Scans,” Conference on Computer Vision and Pattern Recognition, Kauai, USA, 20
vol. 2.2, pp. 31–38.
R. Cipolla, T. Drummond and
images of architectural scenes.” In Proceedings of British Machine Vision Conference, pp.382–
391, 1999.
[Cla00] B. Claus, “Towards Fully Automatic Generation of City Models,” IAPRS, 2000, Vol. 33 Part 3,
pp.85-92.
J.C. Clarke. “Modelling uncertainty: A primer,” Technical Report, 2161, University of Oxford,
Dept. Engineering Science,1998.
[Cli01] F. Clive, B. Emmanuel, and G. Armin, “3D Building Reconstruction from High-Resolution
IKONOS Stereo Imagery,” Third International W
Objects from Aerial and Space Images, 2001.
R. T. Collins and R. S
sphere”. In ICCV, pp.400-403, 1990.
150
[Cri00] A. Criminisi, I.D.Reid, and A. Zisserman, “Single view metrology,” International Journal of
Computer Vision, Vol.40 (2) pp.123-148, 2000.
A. Criminisi, P. Perez and K. Toyama, “Object removal by exemplar-based inpainting,” CV [Cri03]] PR,
[Cur96] umetric method for building complex models from range
[Cyb] http://www.cybercity.tv
ojective Texture-Mapping”, 9th Eurographics Workshop on Rendering, pp. 105-116, 1998.
[Ela02] , J.S. Bethel, and E. Mikhail, “Reconstructing 3D Building Wireframes from Multiple
[Efr99] -parametric sampling.” ICCV, pp. 1033-1038,
[Efr01] ing for texture synthesis and transfer.” Proceedings of
[Fau95] geometry and algebra of the point and line correspondences
[Fau98]
fold of trifocal tensors,” In Trans. of the Royal Society A,
[Fol90] raphics: principles and practice”, Addison-Wesley, Reading,
[For99] tomatic acquisition methods,”
[Fru01]
EEE Conference on Computer Vision and Pattern Recognition”, 2001.
2003.
99, vol. 32, part 3-2w5, pp. 87-92.
Surface Mapping and Characterization Using Laser Altimetry,
[Geo01]
pp. 417–424.
B. Curless, and M. Levoy, “A vol
images”, SIGGRAPH’96, pp. 303-312, 1996.
[Deb96] P. Debevec and C. Taylor and J. Malik, “Modeling and Rendering Architecture from Photographs:
A Hybrid geometry- and image –based approach,” Proceedings of SIGGRAPH 96, 1996, New
Orleans, Louisiana, pp. 11-20.
[Deb98] P .E. Debevec, Y. Yu, and G.D. Borshukov, “Efficient View-Dependent Image-Based Rendering
with Pr
[Dro03] I. Drori, D. Cohen-Or and H. Yeshurun, “Fragment-based image completion,” Proceedings of
SIGGRAPH, pp.303–312.
A. Elaksher
Images,” PCV02, Part A, 2002, pp. 91.
A. Efros and T. Leung. “Texture synthesis by non
1999.
A. Efros and W.T. Freeman. “Image quilt
SIGGRAPH 2001, pp. 341–346,2001.
O. Faugeras and B. Mourrain, “On the
between N images,” Technical Report 2665, INRIA, 1995.
O. Faugeras and T. Papadopoulo, “Grassmann-cayley algebra for modeling systems of cameras
and the algebraic equations of the mani
Vol.365, pp.1123–1152, 1998.
J. D. Foley et al. “Computer G
Massachusetts, 1990.
W. Forstner, “3D-city models: automatic and semiau
Photogrammetric Week, 1999, pp. 291-303.
C. Fruh and A. Zakhor, “3D Model Generation for Cites Using Aerial Photographs and Ground
Level Laser Scans”, I
[Fru03] C. Früh and A. Zakhor, “Constructing 3D City Models by Merging Ground-Based and Airborne
Views,” to appear in Conference on Computer Vision and Pattern Recognition 2003, Madison,
Wisconsin,
[Geo99] V. George, “Building Reconstruction Using Planar Faces in very High Density Height Data,”
IAPRS, 19
[Geo01] V. George and D. Sander, “3D Building Model Reconstruction from Point Clouds and Ground
Plans,” ISPRS workshop on Land
2001, XXXIV-3/W4, pp. 37-43.
G. Vosselman and I. Suveg, “Map Based Building Reconstruction from Laser Data and Images,”
Ascona01, 2001, pp. 231-239.
[Geo02] V. George, “Fusion of Laser Scanning Data, Maps and Aerial Photographs for Building
Reconstruction,” IGARSS, 2002.
[Goo] http://earth.google.com/
151
[Gra04] L. Grammatikopoulos, G.Karra and E. Petsa. “Camera calibration combining images with two
vanishing points.” ISPRS, 2004.
[Gre93] N. Greene, M. Kass, and G. Miller, “Hierarchical Z-Buffer Visibility.” SIGGRAPH’ 93, 231-238,
1993.
[Gru98]A Gruen and R. Nevatia (editors), “Special Issue on Automatic Building Extraction from Aerial
Images”, Computer Vision and Image Understanding, November 1998.
aux, E. Maisel. and K. Bouatouch. “Using vanishing points for camera
[Gul95] rom digital images,” Contributions to 2nd Course in Digital
[Hei99] el, “Realistic, Hardware-accellerated Shading and Lighting,”
[Heu98] Vanishing point detection for architectural photogrammetry.” International
[Heu01a g and grouping 3D lines from multiple views
[Hu06b] “Vanishing hull,” Third International Symposium on 3D Data
[Hu06c] . “Real time video painting for a large-scale environment,” USC
[Gui00] E. Guillou, D. Meneve
Calibration and coarse 3D reconstruction from a single image.” The Visual Computer, pp.396-
410, 2000
E. Gulch, “Cartographic features f
Photogrammetry, Bonn, Royal Institute of Technology, Department of Geodesy and
Photogrammetry, Stockholm, Sweden, 1995.
[Gul99] E. Gulch, H. Muller, and T. Labe, “Integration of Automatic Processes into Semi-automatic
Building Extraction,” IAP, 1999, Vol.32, Part 3-2W5, pp.177-186.
[Guo01] Y. Guo, H.S. Sawhney, R. Kumar, and S. Hsu, “Learning-Based Building Outline Detection from
Multiple Aerial Images,” CVPR01, pp. 545-552.
[Hae93] P. Haeberli and M. Segal, “Texture mapping as a fundamental drawing primitive”, Fourth
Eurographics Workshop on Rendering, pp. 259-266, 1993.
[Han99] G.M. Hans, “Fast Determination of Parametric House Models from Dense Airborne Laser Scanner
Data,” IAPS 1999, Vol 32, 1999, Part 2W1, pp.1-6.
[Han01] A.R. Hanson, M. Marengoni, H. Schultz, F. Stolle, and E.M. Riseman, “Ascender II: A
Framework for Reconstruction of Scenes from Aerial Images,” Ascona01, pp. 25-34.
[Hee95] D.J. Heeger and J.R. Bergen. “Pyramid based texture analysis/synthesis.” Proceedings of
SIGGRPAH 95, pp.229-238, 1995.
W. Heidrich and H.-P. Seid
SIGGRAPH '99, 1999
F.A. van den Heuvel. “
archives of photogrammetry and remote sensing. 32(5):652–659,1998.
] S. Heuel, W. Forstner, “Matching, reconstructin
using uncertain projective geometry,” International Conference on Computer Vision and Pattern
Recognition, 2001, Vol. II, pp.517-524, 2001.
[Heu01b] S. Heuel, “Points, lines, and planes and their optimal estimation,” Lecture Notes In Computer
Science, Vol. 2191, Proceedings of the 23rd DAGM-Symposium on Pattern Recognition, 2001,
pp.92 – 99.
[His96] Y. Hsieh, “SiteCity: A semi-automated Site Modeling System”, IEEE Conference on Computer
Vision and Pattern Recognition, pp. 499-506, 1996.
[Hu06a] J. Hu, S. You and U. Neumann, “Automatic pose recovery for high-quality textures generation,”
International Conference of Pattern Recognition, HongKong, 2006.
J. Hu, S. You and U. Neumann,
Processing, Visualization and Transmission, University of North Carolina, Chapel Hill, 2006.
J. Hu, S. You, Ulrich Neumann
Technique Report 05-861, 2005.
152
[Hu06d] J. Hu, S. You and U. Neumann, “Integrating LiDAR, aerial image and ground images for
complete urban building modeling,” Third International Symposium on 3D Data Processing,
[Hu05] J . You, and U. Neumann. “Texture painting from video.” Journal of WSCG, ISSN 1213–
[Hu04] magery,” ASPRS 2004.
[Hue00] , Z. Kim, and R. Nevatia, “Multisensor Integration for Building Modeling,” CVPR00,
[Hui01]
Cameras,” ICVS’2001, pp. 284-297.
1-976.
.
5-180.
67–775, 1992.
International Workshop on Automatic Extraction of Man-Made Objects
[Ken02] Airborne Laser Elevation
[Ker02]
S 56, 2002, pp. 167-176.
[Lia01] L. Liang, C. Liu, Y.Q. Xu, B. Guo and H. Shum, “Real time texture synthesis by patch-based
sampling,” ACM Transactions on Graphics, 20, 3, pp. 127–150.
Visualization and Transmission, University of North Carolina, Chapel Hill, 2006.
. Hu, S
6972, Volume 13,pp. 119–125, 2005.
J. Hu, S. You, U. Neumann, “Building modeling from LiDAR and aerial i
[Hu03] J. Hu, S. You, U. Neumann, “Approaches to Large-Scale Urban Modeling,” in IEEE Computer
Graphics and Applications, 2003.
A. Huertas
2000, pp. 203-210.
Z. Huijing and S. Ryosuke, “Reconstructing Textured CAD Model of Urban Environment Using
Vehicle-Borne Laser Range Scanners and Line
[Iga01] T. Igarashi and D. Cosgrove. “Adaptive unwrapping for interactive texture painting.” ACM
Symposium on Interactive 3D Graphics. 2001.
[Jay97] C.O Jaynes, M. Marengoni, E.M. Riseman, and H.J. Schultz, “Knowledge Directed Reconstruction
from Multiple Aerial Images,” DARPA97, pp. 97
[Jia01] B. Jiang, U. Neumann, “Extendible Tracking by Line Auto-Calibration,” International Symposium
on Augmented Reality, pp.97-103, New York, October 2001
[Jia03] J. Jia and C. K. Tang, “Image repairing: robust image synthesis by adaptive ND tensor voting,”
CVPR, pp.643–650.
[Joc00] S. Jochen, “Combining Geometrical and Semantical Image Information for the Improvement of
Digital Elevation Models,” Proceedings of the 20th EARSEL-Symposium, 2000, pp.17
[Kan92] K. Kanatani, “Statistical Analysis of focal-length calibration using vanishing points,” IEEE Trans.
Robotics and Automation, 8 (6) pp. 7
[Kar01] K. Karner, J. Bauer, A. Klaus, F. Leberl, and M. Grabner, “VIRTUAL HABITAT: Models of the
Urban Outdoors,” Third
from Aerial and Space Images, 2001.
F. Kensaku and A. Tomohiko, “Urban Object Reconstruction Using
Image and Aerial Image,” GeoRS(40), No. 10, October 2002, pp. 2234-2240.
M. Kerry and K. Amnon, “Integration of Laser-derived DSMs and Matched Image Edges for
Generating an Accurate Surface Model,” ISPR
[Kwa03] V. Kwatra etc., “Graphcut textures: image and video synthesis using graph cuts,” Proceedings of
SIGGRAPH.
[Lee00] S. C. Lee, A.Huertas, and R. Nevatia, “Modeling 3D Complex Buildings with User Assistance,”
IEEE Workshop on Application of Computer Vision, 2000, pp. 170-177.
[Lee02] S.C. Lee, S.K. Jung, and R. Nevatia, “Integrating ground and aerial views for urban site
modeling,” ICPR02, pp. 107-112.
[Lee02 b] S. C. Lee, S. K. Jung and R. Nevatia, “Automatic Integration of Façade Textures into 3D
Building Modelings with Projective Geometry Based Line Clustering”, EUROGRAPHIC’02,
2002.
[Lev03] A. Levin, A. Zomet and Y. Weiss, “Learning how to inpaint from global image statistics,” ICCV,
II. pp. 305–313.
153
[Lie99] D. Liebowitz, A.Criminisi and A. Zisserman, “Creating Architectural Modeling From Images”,
EUROGRAPHIC’99, pp. 39-50, 1999.
. Liebowitz and A. Zisserman. “Metric rectification for perspective images of pla [Lie98] D nes.” In CVPR,
[Lin98] ction and Description from a Single Intensity Image,”
ns. Pattern Analysis and Machine Intelligence, 16 (4) pp.430–
[Mag84] J.K. Aggarwal, “Determining vanishing points from perspective images,”
[Mar78] 8, pp. 270–
[Mar97] -rendering 3D warping.” ACM Symposium on
[Mat99] re mapping on 3d surfaces.” In Proc. of
[Mat90] Based Aerial Image Understanding
[May99] atic Object Extraction from Aerial Imagery - A Survey Focusing on
[Mcl95] 90-
[Mcm95 eling: an image-based rendering system,”
[Mic98]
[Mor00] ne Laser Scanning Dat,”
[Nei93] , D.J. Sandin, and T.A. DeFanti, “Surround-Screen Projection-Based Virtual
[Neu03] l Environments (AVE):
[Nev98] vatia, A. Huertas, “Knowledge-Based Building Detection and Description,” 1997-1998,
[Nor97a] . Claus, “Generation of 3D city models from Airborne Laser Scanning Data,”
pp.482–488, 1998.
C. Lin, R. Nevatia, “Building Dete
CVIU(72), No. 2, November 1998, pp. 101-121.
[Lut94] E. Lutton, H. Maitre, and J. Lopez-Krahe, “Contribution to the determination of vanishing points
using hough transform,” IEEE Tra
438, 1994.
M.J. Magee and
Computer Vision, Graphics, and Image Processing, 26 pp. 256–267, 1984.
S. Mark. et al. “Fast Shadows and lighting effects using texture mapping.” Siggraph’7
274, 1978.
W. R. Mark, L. Mcmillan and G. Bishop. “Post
Interactive 3D Graphics, pp. 7–16, 1997.
K. Matsushita and T. Kaneko. “Efficient and handy textu
Eurographics, pp.349-358, 1999.
T. Matsuyama, and V. S.-S. Hwang, “SIGMA -A Knowledge-
System,” Plenum, New York, 1990.
H. Mayer, “Autom
Buildings,” Computer Vision and Image Understanding Vol. 74, No. 2, May 1999, pp. 138–149.
G. Mclean and D. Kotturi. “Vanishing point detection by line clustering.” PAMI, 17(11):10
1095, 1995.
] L. McMillan and G. Bishop, “Plenoptic mod
SIGGRAPH 95, Los Angeles, California, August 6-11, 1995, pp.39-46.
C. Michele and C. Bruno, “Optical and Radar Data Fusion for DEM Generation,” IAPRS Vol 32,
Part 4, 1998.
[Min02]F. Mindru, L. V. Gool and T. Moons. “Model estimation for photometric changes of outdoor planar
color surfaces caused by changes in illumination and viewpoint,” ICPR, 2002.
[Mit] http://city.lcs.mit.edu/
M. Morgan and K. Tempeli, “Automatic Building Extraction from Airbor
Proceeding of the 19th ISPRS Congress, Amsterdam, Book 3B, pp. 616-623.
C. Cruz-Neira
Reality: The Design and Implementation of the CAVE”, SIGGRAPH’93, 1993.
U. Neumann, S. You, J. Hu, B. Jiang, and J. W. Lee, “Augmented Virtua
Dynamic Fusion of Imagery and 3D Models,” IEEE Virtual Reality 2003, Los Angeles,
California, March 2003, pp. 61-67.
[Neu99]U. Neumann. and S. You, "Natural Feature Tracking for Augmented-Reality", IEEE Transactions
on Multimedia. Vol. 1. No.1, 1999.
R. Ne
DARPA98, pp.469-478.
H. Norbert and B
3rd EARSEL Workshop on LIDAR remote sensing of land and sea, 1997.
154
[Nor97b] H. Norbert and K. Anders, “Acquisition of 3D Urban Models by Analysis of Aerial Images,
Digital Surface Models and Existing 2D Building Information,” SPIE Conference on Integrating
[Nor99] Claus, “Extraction of Building and Trees in Urban Environments,” ISPRS,
[Nor01] ling of Buildings from Multiple Aerial Images,”
[Ofe97]
s, Volume:17, Issue:2 pp.18-29, 1997.
, 1995.
R-TR-2004-04.
ternational Conference on 3-D Digital Imaging and Modeling,
[Pol03] wen, K. Cornelis, F. Verbiest, and J. Tops, “3D recording for
[Ree87] ows with Depth Maps”,
[Ren01] Development of a Bare Ground DEM and Canopy Layer in NW Forestlands Using
[Rib02] ible Cities,” IEEE
& Applications, Vol. 22, No. 4, 2002, pp10-15.
[Rob01] y and Albedo by
[Roc99] g on 3D objects.”
[Rot00]
of SIGGRAPH 2005.
[San04] pp.595–599.
Photogrammetric Techniques with Scene Analysis and Machine Vision III, 1997, pp. 212-222.
H. Norbert and B.
1999, pp. 130-137.
S. Noronha and R. Nevatia, “Detection and Mode
PAMI(23), No. 5, May 2001, pp. 501-518.
E. Ofek etc. “Multiresolution textures from image sequences.” IEEE Computer Graphics and
Application
[Oga98] Y. Ogawa, K. Iwamura, and S. Kakumoto, “A Map-based Approach to Extracting Object
Information from Aerial Images,” MVA98, pp. 220-223.
[Per95] K. Perlin. “Live paint: painting with procedural multiscale textures.” Proceedings of SIGGRAPH,
pp.153–160
[Per04] P. Perez, M. Gangnet and A. Blake, “Patchworks: example-based region tiling for image editing,”
Technical Report, Microsoft Research, MS
[Pol99] M. Pollefeys, R. Koch, M. Vergauwen, and L. Van Gool, “Hand-held acquisition of 3D models
with a video camera,” Second In
1999, pp. 14 –23.
M. Pollefeys, L. Van Gool, M. Vergau
archaeological fieldwork,” IEEE Computer Graphics and Applications, Vol. 23, Issue: 3, 2003 pp.
20 –27.
[Por00] J. Portilla and E.P. Simoncelli. “A parametric texture model based on joint statistics of complex
wavelet coefficients.” IJCV 40, 1(Oct.) pp. 49-70, 2000.
[Ptg] http://www.ptgui.com/
W. Reeves, D. Salesin and R. Cook, “Rendering Antialiased Shad
Computer Graphics, Volume 21, Number 4, July 1987.
M. Renslow, “
High Performance LIDAR,” ESRI international user conference, 2001.
W. Ribarsky, T. Wasilewski, and N. Faust, “From Urban Terrain Models to Vis
Computer Graphics
[Ric] E.W. Richard and C. G. Rafael, “Digital Image Processing”, Prentice Hall PTR, 2nd Edition.
D. M. Robin, V. T. Udo, and C. Peter, “High Resolution Surface Geometr
Combing Laser Altimetry and Visible Images,” ISPRS, 2001.
C. Rocchini, P. Cignoni and C. Montani. “Multiple textures stitching and blendin
In Eurographics Rendering Workshop, 1999.
C. Rother. “A new approach for vanishing point detection in architectural environments.” In
BMVC, 2000.
[Rot02] F. Rottensteiner and J. Jansa, “Automatic Extraction of Buildings from LIDAR Data and Aerial
Images,” ISPRS’2002.
[Rot05] C. Rother, V. Kolmogorov and A. Blake, “Grabcut-interactive foreground extraction using iterated
graph cuts,” Proceedings
[Rou] J. O’Rourke. “Art Gallery Theorems and Algorithms,” Oxford University Press, New York, 1987.
P. Sand and S. Teller, “Video Matching,” Proceedings of SIGGRAPH 2004.
155
[Sch93] R. Schuster, N. Ansari and A. Bani-Hashemi. “Steering a robot with vanishing points.” IEEE
Transactions on Robotics and Automation, 9(4): 491-498, 1993.
A. Schodl, R. Szeliski, D.H. Salesin and I. Essa. “Video textures.” Proceedings of SIGGRAPH [Sch00] ,
[Sch04] F. Dellaert, “Atlanta world: an expectation maximization framework for
Computer Vision and Pattern Recognition, 2004.
hop on Video Surveillance, November 2003,
[Sei99] d scene reconstruction from probability
[Shu99] ance evaluation and analysis of vanishing point detection techniques.” PAMI,
[Shu99]
, pp14-21.
[Soa01] ” In Proceeding of IEEE International
uter Vision, II, pp. 439–446, 2001.
[Sze97] osaics and Environment
[Tel01] N. Master, “Calibrated
nal of Computer Vision, 53 (1) pp.
[Tok] ht
tional Journal of Computer Vision, 50 (1) pp.35-61, 2002.
[Ven89] ellappa, “Intelligent Interpretation of Aerial Images,” Technical Report
”
pp. 489–498, 2000.
G. Schindler and
simultaneous low-level edge grouping and camera calibration in complex man-made
environments,” International Conference on
[Seb03] I.O. Sebe, J. Hu, S. You, U. Neumann, “3D Video Surveillance with Augmented Virtual
Environments,” in ACM Multimedia 2003 Works
Berkeley
S. M. Seitz and P. Anandan, “Implicit representation an
density functions,” International Conference on Computer Vision and Pattern Recognition, 1999,
pp. 28–34.
[Ser00] M. Seresht and A. Azizi, “Automatic Building Recognition from Digital Aerial Images,”
Proceeding of the 19th ISPRS Congress, 2000, Book 3B, pp. 792-798.
[Shi01] N. Shiode, “3D urban models: recent developments in the digital modelling of urban environments
in three-dimensions,” 2001 GeoJournal 52 (3), pp. 263-269.
J. Shufelt. “Perform
21(3):282–288, 1999.
H.-Y. Shum and R. Szeliski, “Stereo reconstruction from multiperspective panoramas,” In
ICCV’99
[Sim] http://www.simcenter.org/index.html
S. Soatto, G. Doretto and Y. Wu “Dynamic textures.
Conference on Comp
[Sta01] I. Stamos and P. Allen. “Automatic registration of 2D with 3D imagery in urban environments.”
ICCV, pp.731–736, 2001.
[Sun05] J. Sun, “Image Completion with structure propagation,” Proceedings of SIGGRAPH 2005.
R. Szeliski and H-Y. Shum. “Creating Full View Panoramic Image M
Maps.” In Proc. Of SIGGRAPH, pp.251-258, 1997.
S. Teller, M. Antone, Z. Bodnar, M. Bosse, S. Coorg, M. Jethwa, and
registered images of an extended urban area,” CVPR 2001. Vol. 1, pp. 813-820.
[Tel03] S. Teller, M. Antone, Z. Bodnar, M. Bosse, S. Coorg, M. Jethwa, and N. Master, “Calibrated,
registered images of an extended urban area,” International Jour
93–107, 2003.
tp://www.webscape.com/
[Tor02] P.H.S. Torr, “Bayesian model estimation and selection for epipolar geometry and generic manifold
fitting,” Interna
[Ucl] http://www.ust.ucla.edu/ustweb/ust.html
V. Venkateswar and R. Ch
USC-SIPI-137, Signal and Image Processing Institute, University of Southern California, Los
Angeles, CA, 1989.
[Van98] H. Vanden and A. Frank, “3D reconstruction from a single image using geometric constraints,
PandRS (53) , No. 6, December 1998, pp. 354-368.
156
[Wan02] X.Wang, S. Totaro, F. Taillandier, A. Hanson and S.Teller, “Recovering Facade Texture and
Microstructure from Real-World Images,” in Proc. 2nd International Workshop on Texture
[Wei] reen's theorem.” From MathWorld, http://mathworld.wolfram.com/
[Wei00]
[Wol99] t, V. Sequeira, K. Ng, S. Butterfield, J.G.M. Gonlves, and D. Hogg, “Hybrid Approach
[Wu04]
2004.
ic Modeling CGGM'2003, Montreal,
[Yu99] . Giraudon, “Toward Robust Analysis of Satellite Images Using Map
[Zha99] ation of building detection in satellite images by combining multispectral
andRS(54), No. 1, 15-February 1999, pp. 50-60.
ligence, Vol. 22, No.11, pp. 1330-1334, 2000.
from
[Zyd91] mulator for Virtual World Exploration and
Analysis and Synthesis, in conjunction with ECCV 2002, June 2002, pp. 145-149.
E. Weisstein. “G
GreensTheorem.htm.
L. Y. Wei and M. Levoy. “Fast texture synthesis using tree structured vector quantization.”
Proceedings of SIGGRAPH, pp.479–488, 2000.
[Wil92] H. William, A. Saul, T. William, and P. Brian, “Numerical Recipes in C,” Cambridge University
Press, 1992.
E. Wolfar
to the Construction of Triangulated 3D Models of Building Interiors,” CVS99, pp. 489.
Q. Wu and Y. Yu, “Feature matching and deformation for texture synthesis,” Proceedings of
SIGGRAPH
[You99]S. You et al. "Orientation Tracking for Outdoor Augmented Reality Registration", IEEE Computer
Graphics & Applications, Vol. 19, No. 6, Nov. 1999.
[You03] S. You, J. Hu, U. Neumann, and P. Fox, “Urban Site Modeling From LiDAR,” Second
International Workshop on Computer Graphics and Geometr
CANADA, May 2003.
S. Yu, M. Berthod, G
Information: Application to Urban Area Detection,” GeoRS(37), No. 4, July 1999, pp.19-25.
Y. Zhang, “Optimis
classification and texture filtering,” P
[Zha00] B. Zhao and J. Trinder, “Ingegrated Approach Based Automatic Building Extraction,” 19th ISPRS
Congress, 2000, Book 3B, pp. 1026-1032.
[Zhan00] Z. Zhang, “A flexible new technique for camera calibration”, IEEE Transactions on Pattern
Analysis and Machine Intel
[Zis02] A. Zisserman, E. Schaffalitzky, T. Werner, A. Fitzgibbon, “Automated reconstruction
multiple photographs,” ICIP02, Vol. III, pp. 517-520.
M. J. Zyda and D.R. Pratt, “NPSNET: A 3D Virtual Si
Experience”, Tomorrow’s Realities Galley, Visual Processings of SIGGRAPH91, 1991.
157
Abstract (if available)
Abstract
A wealth of datasets from different sensors exists for environment representation. The key observations of this thesis are that the different datasets are complementary and that fusing information from complementary datasets reduces errors in processing each dataset. In addition, a fusion method benefits from the merit of each dataset, hence helps us to represent large-scale environments in an efficient and accurate way.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Rapid creation of photorealistic large-scale urban city models
PDF
3D urban modeling from city-scale aerial LiDAR data
PDF
City-scale aerial LiDAR point cloud visualization
PDF
Body pose estimation and gesture recognition for human-computer interaction system
PDF
Learning to detect and adapt to unpredicted changes
PDF
A reference architecture for integrated self‐adaptive software environments
PDF
Leveraging georeferenced meta-data for the management of large video collections
PDF
Shape-assisted multimodal person re-identification
PDF
Event detection and recounting from large-scale consumer videos
PDF
Aging analysis in large-scale wireless sensor networks
PDF
Towards optimized dynamical error control and algorithms for quantum information processing
PDF
Tensor learning for large-scale spatiotemporal analysis
PDF
Single-image geometry estimation for various real-world domains
PDF
Representation, classification and information fusion for robust and efficient multimodal human states recognition
PDF
Multimodal reasoning of visual information and natural language
PDF
Intelligent video surveillance using soft biometrics
PDF
Exploitation of wide area motion imagery
PDF
Invariant representation learning for robust and fair predictions
PDF
Ensuring query integrity for sptial data in the cloud
PDF
Efficient SLAM for scanning LiDAR sensors using combined plane and point features
Asset Metadata
Creator
Hu, Jinhui
(author)
Core Title
Integrating complementary information for photorealistic representation of large-scale environments
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Computer Science
Publication Date
03/28/2007
Defense Date
11/20/2006
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
aerial images,building modeling,ground images,LiDAR,OAI-PMH Harvest
Language
English
Advisor
Neumann, Ulrich (
committee chair
), [illegible] (
committee member
), Kuo, C.-C. Jay (
committee member
), Nevatia, Ramakant (
committee member
)
Creator Email
jinhuihu@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m332
Unique identifier
UC185412
Identifier
etd-Hu-20070328 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-326495 (legacy record id),usctheses-m332 (legacy record id)
Legacy Identifier
etd-Hu-20070328.pdf
Dmrecord
326495
Document Type
Dissertation
Rights
Hu, Jinhui
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
aerial images
building modeling
ground images
LiDAR