Close
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Point cloud data fusion of RGB and thermal information for advanced building envelope modeling in support of energy audits for large districts
(USC Thesis Other)
Point cloud data fusion of RGB and thermal information for advanced building envelope modeling in support of energy audits for large districts
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Point Cloud Data Fusion of RGB and Thermal Information for
Advanced Building Envelope Modeling in Support of Energy Audits
for Large Districts
by
Yu Hou
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
Civil Engineering
August 2021
Copyright 2021 Yu Hou
ii
TABLE OF CONTENTS
List of Tables ............................................................................................................................................... iv
List of Figures ............................................................................................................................................... v
Abstract ........................................................................................................................................................ vi
Chapter 1. Introduction and Motivation ........................................................................................................ 1
Chapter 2. Background and Literature Review ............................................................................................. 7
2.1 Traditional Energy Audit Approaches .......................................................................... 7
2.2 Drone-based and Other Data Collection ....................................................................... 8
2.3 Photogrammetry and Thermal Mapping for Energy Audits ....................................... 10
2.4 Data Fusion Approaches ............................................................................................. 12
2.5 Computer Vision Approaches – Feature Extraction and Neural Network .................. 14
2.6 Computer Vision Approaches - Image Data Fusion ................................................... 15
Chapter 3. Research Objectives and Questions .......................................................................................... 18
3.1 Research Objective 1 .................................................................................................. 18
3.2 Research Objective 2 .................................................................................................. 19
3.3 Research Objective 3 .................................................................................................. 19
Chapter 4. An Innovative Approach to RGB Point Cloud and Thermal Information Data Fusion for
Building Thermal Map Modeling Using Aerial Images: Fusion Performance Results under Different
Experimental Conditions ............................................................................................................................ 20
4.1 Introduction and Motivation ....................................................................................... 20
4.2 Research Methods ....................................................................................................... 22
4.2.1 Research Architecture .................................................................................... 22
4.2.2 Step 1: Data Collection .................................................................................. 22
4.2.3 Step 2: Camera Calibration and Registration ................................................. 24
4.2.4 Step 3: Data Fusion Process ........................................................................... 26
4.2.5 Step 4: Evaluation of the Data Fusion Approach ........................................... 30
4.3 Experiment Analysis ................................................................................................... 35
4.3.1 Experiment One: Altitude Comparison .......................................................... 37
4.3.2 Experiment Two: Camera Angle Comparison ............................................... 41
4.3.3 Experiment Three: Flight Path Comparison................................................... 42
4.3.4 Experiment Four: Building Style Comparison ............................................... 42
4.3.5 Points Illustrated with SD values ................................................................... 44
4.4 Discussions ................................................................................................................. 45
4.5 Conclusions and Future Work .................................................................................... 50
iii
Chapter 5. Semantic Segmentation on Energy Audit Related Building Components and Object
Information Extraction Framework ............................................................................................................ 52
5.1 Motivation and Introduction: ...................................................................................... 52
5.2 Research Methods ....................................................................................................... 53
5.2.1 Data Preprocessing ......................................................................................... 53
5.2.2 Implementation on Mask RCNN ................................................................... 53
5.2.3 Implementation on PSPNet ............................................................................ 54
5.2.4 Implementation on DeepLab V3+ .................................................................. 55
5.2.5 Common Configurations (Hyper-parameters) for Performance Comparison 58
5.3 Case Studies and Results ............................................................................................ 58
5.3.1 Performance Evaluation ................................................................................. 58
5.3.2 Evaluation for PSPNet and DeepLab V3+ ..................................................... 60
5.3.3 Evaluation for MaskRCNN ............................................................................ 62
5.4 Discussion ................................................................................................................... 65
5.5 Conclusion and Future Studies ................................................................................... 67
Chapter 6. A Novel Building Temperature Simulation Approach Driven by Expanding Semantic
Segmentation Training Datasets with Synthetic Aerial Thermal Images ................................................... 69
6.1 Introduction ................................................................................................................. 69
6.2 Research Methods ....................................................................................................... 71
6.2.1 Simulation Domains and Dataset ................................................................... 71
6.2.2 As-built Building Envelope Thermal Image Rendering................................. 72
6.2.3 Evaluation Metrics ......................................................................................... 74
6.3 Results and Discussions .............................................................................................. 76
6.3.1 Simulation Result Assessment ....................................................................... 78
6.3.2 Comparison of Current Methods .................................................................... 80
6.4 Conclusions and Future Work .................................................................................... 81
Chapter 7. Conclusions and Limitations ..................................................................................................... 84
Chapter 8. Future Research Tasks............................................................................................................... 85
8.1 Thermal Bridge Detection on District Level Using Aerial Images............................. 85
8.2 Energy Consumption Simulations .............................................................................. 87
Chapter 9. Intellectual Merit and Broader Impacts ..................................................................................... 91
References ................................................................................................................................................... 92
iv
List of Tables
Table 1 Summary of experiments ............................................................................................................... 36
Table 2 Statistics of evaluation criteria of different experiments conducted under varying conditions ..... 37
Table 3 Statistics of evaluation criteria of different experiments conducted under varying conditions ..... 43
Table 4 Number of instances ...................................................................................................................... 58
Table 5 Performance evaluation for PSPNet and Deeplab V3+ ................................................................. 60
Table 6 Performance evaluation for MaskRCNN ....................................................................................... 63
Table 7. Total average MSE and SSIM values in each experiment. ........................................................... 76
Table 8 Bounding box regression metrics on the test images dataset ......................................................... 86
v
List of Figures
Figure 1 Image data fusion categories ........................................................................................................ 16
Figure 2 Research method workflow .......................................................................................................... 22
Figure 3 Camera calibration and image registration ................................................................................... 26
Figure 4 Illustration of the data fusion process ........................................................................................... 28
Figure 5 Test distribution and ideal distribution ......................................................................................... 32
Figure 6 Illustration of thermal models on a campus and in a city area ..................................................... 37
Figure 7 The distributions of SD values for different experiments (bin size = 0.01) ................................. 40
Figure 8 The distributions of SD values for different experiments ............................................................ 44
Figure 9 Illustration of RGB models and model with SD values for each point ........................................ 45
Figure 10 Comparison between experiments conducted on campus and in city areas ............................... 47
Figure 11 Illustration of fusion approach on Mask R-CNN ....................................................................... 54
Figure 12 Illustration of fusion approach on PSPNet ................................................................................. 55
Figure 13 Illustration of fusion approach on DeepLab V3 ......................................................................... 57
Figure 14 Trendline of evaluations on PSPNet and DeepLabV3+ ............................................................. 61
Figure 15 Trendline of evaluations on MaskRCNN ................................................................................... 64
Figure 16 Object instances detected by algorithms..................................................................................... 67
Figure 17. Illustration of the experiment locations ..................................................................................... 72
Figure 18. Examples that explain thermal image rendering ....................................................................... 74
Figure 19. Selected images with highest MSE and lowest SSIM in each evaluation. ................................ 77
Figure 20. Multivariate distribution figures for both the same and cross experiment evaluations. ............ 80
Figure 21 Example of thermal bridge annotations in the dataset ................................................................ 86
Figure 22 Research method workflow ........................................................................................................ 87
vi
Abstract
Auditing building envelopes’ energy performance and improving their energy efficiency to reduce
energy consumption has become more important. However, current studies only focus on a single building
and do not extend their research to a district level or a group of buildings. Additionally, researchers face a
series of processes to index energy loss from building envelopes, as the models do not contain semantic
information to identify objects. This study is designed to improve accuracy and efficiency for 3D model
reconstruction and semantic information identification. Using a data fusion method, a 3D model that has
both thermal and RGB information can be reconstructed for a group of buildings in a large district. Different
configurations of a drone-based data collection method are tested. Next, with artificial intelligence (AI) and
computer vision algorithms, researchers can classify different objects from datasets that fuse both thermal
and RGB information together for a better semantic segmentation performance. After receiving the
semantic information, researchers can easily locate and index all building envelopes in a large district to
support their energy audit tasks. Considering the demand of a big size of datasets, this thesis also provides
a synthetic thermal image generation method for semantic segmentation whose datasets have multiple
channels.
1
Chapter 1. Introduction and Motivation
In recent years, researchers’ attention has been drawn to the reduction of greenhouse gasses and
energy consumption. The European Commission set a goal of cutting at least 40% of greenhouse gas
emissions (from 1990 levels) to at least a 32.5% improvement in energy efficiency before 2030 [1][2][3].
The UK passed the 2008 Climate Change Act [4], which is one of the world’s first comprehensive
framework of laws on climate change [5]. Germany also drafted the Climate Action Law in 2019 to
guarantee that Germany fulfils its national and European climate targets [6]. These movements are an urgent
sign that measures are being taken to improve energy efficiency based on real research [7][6].
For the building sector, the German Ministry of Economics and Technology has promised an
ambitious energy efficiency target to reduce buildings’ primary energy consumption by 80% by 2050
[8][1][2]. The energy use of buildings makes up almost 25% of all primary energy generated [9]. About
80% of building energy use is consumed by single-family homes, and 15% is consumed by multi-family
homes (e.g., apartments, condos, etc.), and 5% in mobile homes [10][11]. Additionally, the majority of the
energy consumption in buildings is the heating, ventilation and air conditioning (HVAC) systems, which
account for 43% of the total energy consumption in buildings in the U.S. [12][9][13]. Specially, space
cooling and space heating, which consumed 214 and 207 billion kWh were top No. 1 and No. 2 of total
U.S. residential sector energy consumption by major end users respectively in 2018 [10]. It has been
confirmed that leakage from windows, doors, and roofs is major reason for energy loss from buildings [14].
Therefore, auditing energy performance of buildings and improving the energy efficiency of building
envelopes to reduce energy consumption become more important.
To reduce energy consumption in buildings, it is important to understand (1) the building envelopes
that separate the conditioned and unconditioned buildings’ environments [15][16] and to understand (2) the
district heating network that distributes heat, generated in a centralized location, through insulated pipes to
buildings. From a single buildings’ perspective, an efficient envelope can maintain the conditioned
environment. A building envelope is “the physical separator between the conditioned and unconditioned
2
environment of a building including the resistance to air, water, heat, light, and noise transfer”[15][17]. It
consists of external walls, windows, roofs, and floors of a building. An efficient envelope can maintain the
condition of an environment. For example, an envelope with fewer leaks, through which air may flow, can
reduce the load on the heating system and prevent heat loss through walls. It has been confirmed that
improving the efficiency of the building envelope is a low-cost, and high-return strategy [18]. Improving a
building envelope requires that residential and commercial building owners take retrofit measures. There
are tremendous opportunities for retrofit energy savings, but to capture the savings potential within the
established buildings can be a challenge. For example, homeowners cannot find the exact locations where
retrofits are needed. Homeowners face a series of daunting processes and competing priorities to make
decisions when conducting investments in improvements.
From a district heating perspective, it is common to deploy district heating network for residential
and commercial buildings in many countries. For example, the International Energy Agency (IEA) predicts
that the market investment for district heating networks in the EU will increase to 30% in 2030 and even to
50% in 2050 [19][20]. District heating has many benefits, such as saving building space, saving life-cycle
costs, and causing less harm to the environment [21][22]. However, due to the topology of district heating
networks, disadvantages are also obvious. As the topology shown, the plant generates heat, and the heat is
conveyed to subnetwork. Several groups of buildings in the topology are connected to the subnetwork and
consume heat. Therefore, buildings connected to the different subnetworks are not influenced by each other,
but buildings connected to the same subnetwork are mutually affected. Assuming one of the subnetworks
is running on a poor and unrepaired system, there potentially exist unexpected energy leaks. The energy
can leak through buildings in the subnetworks, but also through pipelines. It has been confirmed that most
transmission and distribution (T&D) lines were not originally designed to meet today’s increased demand,
but were built in the 1950s and ‘60s with a 50-year expected service life [21]. Residents who are not aware
of a heating network’s state of disrepair still tend to turn on the thermostat in the winter. Due to energy
leaks, residents might request more heat once they feel cold, causing the heat generating station to convey
more to meet residents’ demands. However, residents whose buildings are between leaks and energy
3
generators are forced to shut down heating distribution terminals because of the overwhelming extra heat,
or even worse, to open the windows. In the end, the problems remain unsolved, and energy is wasted.
Homeowners suffer from decision making when it comes to building envelope retrofits, facility managers
at a district level likewise have problems finding energy leaks in district heating networks. With a macro
perspective, the problem of energy audits needs a global view and should take entire communities into
consideration when improving efficiency.
There are many approaches to energy audits, which increase the performance of reducing a
building’s energy consumption, such as ultrasound, fan pressurization (blower door test), and sensor
monitoring [23][13][24][25][26]. Unfortunately, these approaches have some limitations in the scope of
use and failures of auditing the energy performance of individual buildings at a district level. Therefore, to
broaden the survey scope and reduce the labor and cost, we propose the use of automated fly-past surveys
for thermal mapping. This deploys unmanned aircraft systems (UAS), also known as drones, with mounted
infrared thermal (IRT) cameras to collect data. Images taken from UAS can be used to reconstruct 3D
models with photogrammetry technology, allowing managers to fully understand an entire district,
construct 3D as-is building energy models, and then conduct simulations and diagnostics [12]. Thus,
reconstructing a thermal 3D model for thermal mapping and energy audits at a district level is necessary.
Currently, utilizing optical images (RGB images) from UAS to reconstruct a 3D point cloud or
mesh model in a large area is successful, but utilizing thermal images for model reconstruction has not been
developed enough to be as successful. The traditional method was to use photogrammetry algorithms to
reconstruct 3D thermal mapping models only based on thermal images. Thermal photogrammetry
especially on a large area faces several challenges: (1) lower resolution of thermal images; (2) trade-off of
accuracy and efficiency; (3) environmental conditions for data collection [27][28] and (4) the distance from
the object to the cameras. These challenges make trade-offs between efficiency and accuracy for thermal
photogrammetry too important to neglect, especially when mapping a large area. Flight duration is not a
significant influence for RGB photogrammetry, since buildings’ appearances cannot be changed over short
spans of time, but thermal images representing the surface temperatures of buildings can change in short
4
spans of time. Thus, UAS should fly faster in this case. However, shorter flight durations would result in
fewer images captured by thermal cameras and would influence the data quality. Moreover, thermal
cameras usually do not capture images at a high resolution. Therefore, new frameworks for using infrared
thermography (IRT) mounted on UAS to reconstruct 3D thermal models by thermal image photogrammetry
for energy audits need to be investigated. Furthermore, it is important to investigate methods of balancing
the quality and efficiency of thermal models on a district level by evaluating how the different factors in
data collection and data processing phases influence the performance of thermal models’ reconstruction.
We proposed to test different factors related to flight configurations and data processing strategies and
evaluated the performance of thermal mapping model reconstruction influenced by these different factors.
The influence of different factors in the data collection and data processing phase are summarized in
Chapter 4.
Chapter 4 shows that current studies split and separately process RGB information from thermal
information, and thus do not make full use of the high definition RGB images. It also confirms that a
detailed 3D model can be created by high-resolution RGB images, and then thermal textures can be
projected on top of a 3D model. This finding motivates me to make full use of RGB images and fuse thermal
information with RGB information in a 3D mapping model allowing for more accurate tie points detection
and for more delicate point cloud/mesh model reconstruction during the mapping process. First, compared
to only using thermal images, the same or a smaller number of high-resolution RGB images can boost the
performance of thermal photogrammetry [29][30]. The less need for images reduces the data collection and
data processing time, which can better meet the efficiency requirements of energy audits for an entire
district level investigation. Second, ambient temperature has less influence on RGB photogrammetry if the
sunlight and shade do not dramatically change. It has been demonstrated that some components of buildings
may have a wider-ranging color in RGB images, but may have monotonous thermal values [30][31]. The
monotonous thermal value of a large building component (e.g., flat roof) is a significant problem for model
reconstruction. In thermal images, the monotonous thermal values that cause large uniform areas can result
in reconstructing a sparse point cloud model with large gaps. The sparse point cloud model cannot provide
5
sufficient information for energy audits. Based on these discussions, the leveraging of high-resolution RGB
images to improve the performance of thermal model reconstruction should be considered, and it is
important to fuse thermal information with RGB information.
To reconstruct a 3D model, the photogrammetry algorithm detects all potential feature points in all
images. A common feature point which can be clearly identified in two or more images can then define the
location of an object point in 3D. This is called a reference point or a tie point (TP). Tie points are decisively
important because they contain important feature information for reconstructing a 3D mapping model, such
as geographic coordinates, relationship between images and 3D models, the color information of objects,
and thermal information when reconstructing a thermal mapping model using only thermal images.
Therefore, studies on tie points data fusion of thermal and RGB information are crucial. Due to the
drawback of using thermal images to detect tie points, for example, their low resolution compared to RGB
images, we proposed to use high-resolution RGB images to create a mapping model, and then add thermal
information to all tie points in the RGB mapping model. Studies on tie point data fusion are not enough.
After detecting all tie points, photogrammetry can utilize them to reconstruct a 3D RGB point cloud model
and an RGB mesh model. We also proposed to fuse RGB information of all points in a point cloud model
with thermal information. A point cloud model fusing both thermal and RGB information can be used in
many fields for auditing energy and improving energy efficiency. For example, energy consumption
simulations need a high-quality thermal 3D model. To understand how to fuse thermal information with
RGB information, we evaluated tie point data fusion and complete point data fusion in Chapter 4.
As our research objects are buildings in a large district, it is inevitable to accidently capture
thermal information from equipment, such as cars and other components on the ground. Although roofs
and façades are energy audit targets, vehicles and equipment are heat sources and possible causes of
disturbances for energy audits. To better detect the heat loss source, I need to differentiate different
components so that the heat loss source that comes from buildings can be determined. The objective here
is to remove heat sources which are not building elements. To achieve this goal, relative studies and
experiments are conducted in Chapter 5. This task is defined as semantic segmentation in which
6
components’ semantic information, such as buildings, cars, equipment can be identified. Many computer
vision algorithms are implemented to achieve this task. Some of them can draw a bounding box for objects,
and others can predict objects in a pixel level, which means they can classify each pixel in the images into
categories. In my semantic segmentation task, I focused on differentiating façade, roof, cars, roof
equipment, and ground equipment. Traditional semantic segmentation task is mostly based on visible light
images, but in my task, I added other forms of measurement, thermal images, to improve the performance
of segmentation tasks. Thermal images can detect more information under various lighting conditions. I
implemented different algorithms on my datasets to examine the algorithms’ performance. Discussion and
conclusions are introduced in Chapter 5.
Despite the great success achieved by the previously described studies, deep learning algorithms
are quite data hungry as demonstrated in many studies [32][33]. Data hunger refers to the size of the
training dataset required for generating a model with a good predictive accuracy [34]. Although, me and
my research team went to Boston, USA and Karlsruhe, Germany many times to collect image data, the
dataset is still not enough to reach perfection. Collecting more data can be a solution, but it is time
consuming and labor intensive. Additionally, other researchers are often unwilling to share data, or their
data formats are not compatible with mine. Therefore, I focused on synthetic datasets. This is a research
task introduced in Chapter 6 to solve data hunger issues. I used thermal information to improve the
segmentation of aerial images of buildings outdoor scenes. I used a GAN approach to generate synthetic
thermal images to help segmentation tasks.
I also include future studies’ preliminary results in Chapter 7.
7
Chapter 2. Background and Literature Review
2.1 Traditional Energy Audit Approaches
Energy analysis can be categorized into the physical-based method and the data-driven method
[22][35]. The physical model-based method requires a complex modeling process and is only suitable for
theoretical analysis. The coefficients and parameters of the physical models must be frequently and
continuously adjusted corresponding to changes in the models’ conditions [36]. The traditional data-driven
method is limited to optimized sensor deployment in terms of sensor locations deployment and an optimal
number of sensors. This method also requires high accuracy in spite of inadequate and limited monitoring
data [37][38]. Currently, sensors are not limited to touch sensors, for example, temperature transducer that
should stick to the object. Some portable sensors, for example, handheld thermal cameras, are used for
energy analysis [39].
There are many reliable technologies for energy audits of buildings. Ultrasound is frequently used
for detection of missing insulation between the building envelopes and the window frames. Locating leak
paths by using ultrasound is called a “tone test” [40]. Heavy leaks are easy to detect using this method, but
minor leaks are harder to detect [41][42]. Additionally, an artificial ultrasonic source must be created in
some situations [25][40]. Another technique to measure air leakages through building envelopes is the fan
pressurization method (Blower Door Test) [24]. Unfortunately, fan pressurization is usually used on new
buildings before they are occupied. This approach cannot be implemented in a fully occupied building once
already inhabited because it causes inconveniences for users. Additionally, this approach is usually time
consuming and labor-intensive which results in extremely large inspection costs. It is not possible to
conduct fan pressurization method to exam every room in every building in a whole district. Finally, this
method needs to be integrated with other tools to determine air leakage pathways [43]. For instance, fan
pressurization is usually employed with infrared thermography (IRT), which can detect infrared energy
emitted from an object and convert it to temperature value [44]. It has been proven that when a pressure
difference is created between indoor and outdoor spaces, a cooler area near the leakage spot can be detected,
8
which is proportional to the growth of the pressure difference [45][46].
IRT is a widely utilized non-destructive approach. Many studies have proposed the use of IRT to
evaluate thermal performance or energy efficiency in buildings [44][47][48][49]. However, IRT modeling
quality is influenced by the data collection methods. There are generally two types of approaches to
collecting data for IRT, including active thermography and passive thermography [50]. Active
thermography consists of pulsed thermography (PT), lock-in thermography (LT), and laser spot
thermography (LST) [27][51]. Active thermography employs an external stimulus to produce an extreme
thermal difference (heating or cooling) between the objects and the environment. These approaches can
detect hidden defects in detail, but they assume that auditors are fully aware of where potential thermal
anomalies are located. Therefore, active approaches cannot be adopted in auditing for a whole district. On
the contrary, the passive thermography does not rely on man-made stimuli. Passive thermography
concentrates on observing thermal patterns based on the temperature values and is influenced by the weather
and thermal emissivity ( 𝜀𝜀 ). Passive approaches can be categorized into aerial survey, street pass-by and
drive-in survey, perimeter walk-around survey, and automated fly-past survey depending on which tools
are applied. Particularly, the automated fly-past survey (drone-based approach) has become more popular
recently.
2.2 Drone-based and Other Data Collection
The automated fly-past survey deploys unmanned aircraft systems (UAS), also known as drones,
with mounted thermal cameras for data acquisition. There are several advantages to this method. (1) When
compared to aerial surveys which require a large amount of labor and time, a complex manned aircraft is
unnecessary. For example, Iwaszczuk extracted textures from oblique airborne thermal infrared images
taken from helicopters [52][53]. (2) UAS allow the collection of thermal information from different angles
and different altitudes [54]. (3) When compared to street pass-by and drive-in survey methods which cannot
deal with energy loss from tall walls, soffit, and roofs, UAS can capture images from horizontal, oblique,
and vertical angles around a building, and fly at different altitudes between two buildings to capture fine
9
details. (4) UAS can reduce time and labor when compared to walk-around and walk-through surveys which
usually require handheld devices and are more suitable for HVAC and electrical systems rather than energy
audits over a whole district. To date, UAS-based IRT have been one of the most popular methods used in
energy audits. New products that are available today in the market such as dual cameras which capture
RGB images and thermal images simultaneously from the same angles and altitudes provide a
complementary method for model reconstruction using both thermal and RGB images.
Although UAS-based IRT can reduce the time and complexity of data collection and capture
detailed images throughout a whole district, better flight configurations remain undefined, and more
questions need to be answered to determine more efficient data processing methods. It has been
demonstrated that UAS can reduce the time and complexity of data collection and capture detailed images
throughout an entire district. However, data collections from an infrared thermal camera and an optical
camera differ from each other. (1) Infrared thermal cameras usually have a lower display resolution of
640x512 compared to the optical cameras’ display resolution of 4000x3000. Lower resolution cameras
cannot capture enough thermal information and do not allow algorithms to detect enough common features
from overlapped images to reconstruct high-definition 3D models. (2) Infrared thermal cameras are more
likely influenced by ambient conditions. For instance, collecting data in the early morning with wind speed
lower than 5 m/s is ideal [27]. (3) The distance from the object to the cameras [13], as well as the incident
angle (the angle between a ray reflected on a surface and the normal line), and image viewing angle should
also be considered [55]. This is because thermal images facing the same target but taken from distinct angles
might capture different temperature values, and might introduce errors [30][13]. (4) The time and duration
when the thermal cameras are used likely have an influence on the results. Many studies suggest that data
collection should be conducted in the early morning and late afternoon on cloudy days [56][51]. It also
requires at least 10 °C (18 °F) to 15 °C (27 °F) temperature difference between the indoors and outdoors
[28]. (5) As the performance of RGB model photogrammetry is influenced by light and shade, the
performance of thermal model photogrammetry is also influenced by objects’ temperature values. Buildings’
appearances cannot be changed over short time spans, but thermal images representing the surface
10
temperatures of buildings change quickly, so relevant thermal images must be taken as quickly as possible.
These differences emphasize consideration of trade-offs between efficiency and accuracy when conducting
drone-based data collection using thermal images and to consider the factors that can influence the
performance of reconstructing a thermal mapping model, especially when surveying a large district.
Energy audit procedures based on drones can be divided into three stages [50]: (1) pre-flight path
design; (2) during-flight data gathering; and (3) post-flight 3D modeling. Empirical parameters of flight
path planning with software and distance from target surface for energy audits has been examined in pre-
flight path design [57]. Weather condition and flight tips have been explored in during-flight data gathering
[58][59], and data processing also have been exploited in post-flight 3D modeling [50]. However, these
studies only focused on a single building. They did not extend their research to a district level with a group
of buildings. It is crucial to investigate the impacts of potential factors in all three stages of energy audit
procedures because the trade-off between efficiency and accuracy cannot be ignored when auditing the
energy performance of a group of buildings within a district using UAS. The potential factors include: (1)
different flight paths in pre-flight stage; (2) bearing angle and overlap of pictures in during-flight stage [30];
(3) factors related to data processing strategies in post-flight 3D modeling stage; and (4) factors related to
thermal image presentation strategies in the post-flight 3D modeling stage [60].
Aside from UAS, mobile mapping systems (MMS) is also a popular method for thermal mapping.
It scans buildings from a vehicle which is defined as an unmanned ground vehicle (UGV) [61][62][63].
Researchers also combine UGV and UAV techniques together [64]. Kim and Park [64] demonstrated that
construction site areas were difficult to explore from the ground level. Therefore, they first used UAV
system to fly over a construction site to generate a model, and then used the model to generate an optimal
path for a mobile robot. Their approaches were designed to reduce human interference for comprehensive
data collection, frequently monitoring and updating to support decision-making.
2.3 Photogrammetry and Thermal Mapping for Energy Audits
UAS and UGV based data collection has been widely used with a focus on RGB models. The data
11
collected by drones and other tools can be processed by photogrammetry, a reverse-engineering process, to
reconstruct a 3D point cloud/mesh model using a series of paired images for further analysis. In the
photogrammetry process, which is called finding correspondence, common feature points in different but
overlapped pictures can be detected by using two widely used algorithms, scale-invariant feature transform
(SIFT) and speeded-up robust features (SURF). Common features are the attributes that describe the same
objects in distinct images. The SIFT algorithm recognizes an object in one of the reference images and
compares each feature from it to other images to find candidate matching features based on euclidean
distance of their feature vectors. SURF conducts the same process of finding candidate matching features
but is based on different calculations. SIFT is better than SURF in different scale images while SURF has
a better result than SIFT for blurry images and it is three times faster than SIFT [65]. After candidate
matching features are found, random sample consensus (RANSAC) is utilized to remove outliers which are
incorrectly matched. Furthermore, as enough common feature points are matched, the location
determination problem (LDP) can be solved, and positions of points can be determined in a 3D space. Since
these 3D points tie the images together, we also define them as tie points (TP). This procedure for
reconstructing a 3D spatial point cloud model from two-dimensional image sequences is known as structure
from motion (SfM) [66][67][68]. If a mesh model is needed, the triangulation algorithm can convert the
given set of points into a consistent polygonal (mesh) model [69].
Photogrammetry has been used with a focus on RGB photogrammetry. For example, Golparver-
Fard et al. [70] created a construction model from unsorted site photos for construction monitoring. Armesto
[71] reconstructed structural models of buildings. Brilakis [72] reconstructed 3D models for infrastructures.
Rashidi [73] utilized videos to reconstruct as-is 3D models of bridges [74][75]. Hamledari et al. [76]
implemented computer vision algorithms to automatically detect the state of drywall work from drone
images, and updated 4D BIM schedule and progress information.
Besides photogrammetry, there exists another 3D model reconstruction approach, light detection
and ranging (LIDAR). Unlike photogrammetry that reconstructs a 3D model based on images, LIDAR can
directly measure distances (ranging) by illuminating the target with laser light, then reconstruct a 3D model
12
based on assessed distances [77]. There are many studies focusing on point cloud semantic classification
using various algorithms that assign categories to points for further data processing [78][79][80].
There are fewer studies related to thermal mapping modeling for energy audits at a large district
level. Clarkson [81] compared a reconstructed model based on thermal images with a reconstructed model
based on RGB images. However, he took pictures from a handcrafted architectural model, not a real
building. Laguela [82] used laser scanning to create building models and then texturize the building models
with thermography, and Borrmann [83] conducted a similar study mutually calibrating laser scanners and
thermal sensors. However, their research only focused on buildings’ facades because the laser scanner was
located on the ground.
Thermal mapping for an energy audit can be used to detect thermal leakages from thermal models
[84][85][51][27]. Researchers’ studies include quantitative and qualitative approaches. The quantitative
approaches consists of (1) R-value and U-value measurements [86]; (2) moisture content identification [87];
and (3) the calculation of the percentage of areas with thermal anomalies. Meanwhile, the qualitative
approaches are (1) classification of walls, glazing, and windows; (2) thermal bridge identification; (3) air
leakage inspection [88]; (4) moisture inspection; and (5) HVAC and electrical systems presentation [51].
2.4 Data Fusion Approaches
It has been demonstrated that fusing thermal information and RGB information has many potential
benefits for 3D model classification and energy simulation [89][50]. The research on fusion of thermal
information and RGB information has produced three primary methods to perform geometric registration
between thermal images and 3D models: 2D-2D (image to image) matching, 2D-3D (image to model)
matching, and 3D-3D (model to model) matching [90]. The related studies are as follows:
2D-2D: Ham and Golpavar-Fard [59] tested several traditional algorithms, including scale-
invariant feature transform (SIFT), affine-SIFT (ASIFT) and speeded up robust feature (SURF) to register
visible images with thermal images captured from different cameras. These algorithms were commonly
used for thermal-thermal or visible-visible pairs. However, their studies showed that these traditional
13
algorithms had a relatively poor performance for thermal-visible pairs. To improve performance,
Weinmann’ research group proposed a shape-based matching algorithm for 2D-2D image registration [91].
2D-3D: Weinmann et al. conducted a random sample consensus (RANSAC) for registering 2D-3D
correspondences [91]. Their research showed relatively good performance indoors. Iwaszczuk and Stilla
used many airborne thermal images and consider uncertainties in 3D models to perform linear feature when
conducting 2D-3D matching. Lin and Jarzabek-Rychard [90] pointed out the lack of research focusing on
data acquisition by a handheld thermal camera. They focused on the fusion of thermal images with RGB
3D point cloud models for façade energy audits. They proposed an approach to reduce thermal
disagreements, caused by misalignments, for the same object points provided by several associated images.
Their methods showed good results for both geometric registration accuracy and thermal registration
accuracy. Lin and Jarzabek-Rychard [92] also focused on fusion of select thermal textures with 3D mesh
models. Their research showed that the gaussian filter had a better performance than a texture voting
strategy. However, the handheld approach was not suitable for mapping a larger area.
3D-3D: Wiens and Kleiber [93] summarized two important tasks, surface estimation and point
cloud registration, when processing point cloud data. They proposed a new approach, the likelihood method
that competed against the popular iterative closest point method. Their method reduced the predictive mean
squared error by 18%. Hoegner and Stilla [94][88] focused on registering a thermal point cloud onto an
RGB point cloud, which can meet the prerequisite that thermal and RGB point clouds should register at the
closest points. However, it was hard to minimize the distance between two point cloud models since the
thermal image had a lower resolution compared to the RGB images. Hoegner and Tuttas et al. [95] also
fused thermal with RGB 3D model. They measured the quality of their data fusion approach by comparing
the differences of the back projection of points in the images used to reconstruct models. Additionally, two
reconstructions for thermal point clouds and RGB point clouds were based on two mutually exclusive series
of thermal images and RGB images respectively, which cannot guarantee the accuracy of registration for
two mutually independent point clouds. Alternatively, Truong and Yamaguchi [96] used a fixed relative
pose relation between RGB cameras and thermal cameras as a reference point for the iterative closest point
14
(ICP) to enhance the registration between the thermal point cloud and the RGB point cloud.
2.5 Computer Vision Approaches – Feature Extraction and Neural Network
There are many feature-extraction approaches that can be used to identify images. Traditional
approaches are all linear processing approaches. Due to the fast development of deep neural network, the
convolutional neural network (CNN) has been introduced for feature extraction. In classic CNN models,
convolution and fully connected (FC) layers perform linear transformations on their inputs. However,
because of the implementation of activation and pooling layers, non-linearity is also added into the feature
extraction tasks.
Alex Krizhevsky and Geoffrey Hinton’s AlexNet can be the groundbreaking work after a long ice
age in computer vision filed [97]. Later, many deep neural network architectures are designed based on
their works, such as ZFNet [98] and VGGNet. Especially, VGGNet has significantly improvement over
other networks. The biggest difference between VGGNet and other networks is the size of the filters. Both
AlexNet and ZFNet use large-size filters, such as 11x11 [97] and 7x7 [98] respectively, but VGGNet
reduces the parameters by using 2 layers of 3x3 filters (equivalent to covering 5x5) or by using 3 layers of
3x3 filters (equivalent to covering 7x7 effective area). Due to the reduction of the parameters to learn,
VGGNet can converge faster and avoid overfitting problems. There are many different versions of VGGNet,
the most popular ones are VGG-16 and VGG-19 because of their accuracy and speed.
Another famous architecture is GoogLeNet, which makes network much deeper, and is named as
“inception”. The first technique is to introduce a 1x1 convolution as a dimension reduction module by
reducing the computation bottleneck, therefore it allows depth and width of the network to be increased. It
also has been demonstrated that this technique can reduce the overfitting problem. The second technique is
about the FC layers. GoogLeNet’s FC layers are different from other networks’ architectures where the
whole feature maps are fully connected to the whole FC layers. In GoogLeNet, global average pooling is
used by averaging each individual element of the feature maps to each related element of the FC layer.
Those two techniques make GoogLeNet layers reach 22 layers in total, which is a very deep architecture
15
compared with AlexNet, ZFNet, and VGGNet.
With the fast development of neural network, it has been known that the deeper layers might cause
the problem of vanishing or exploding gradients. To solve this problem, ship connection (or shortcut
connection) is implemented in ResNet architecture by adding input to the output of few skipped layers. To
reduce the computational complexity, ResNet uses bottleneck design technique as well, which adds extra
1x1 conv layers to the start and the end of the network. Despite the increasing of the layers, the complexity
is not increased, for example, it is noted that ResNet-152 still has lower complexity than VGG-16/19.
2.6 Computer Vision Approaches - Image Data Fusion
Researchers have found that fusing multiple sensor data in additional to RGB data can potentially
improve object classification performance on images [99][100][101]. Most used fusion approaches can be
categorized into four types: input fusion, early fusion, late fusion, and multi-scale fusion. As shown in
Figure 1 (a), RGB and data provided by extra channel (e.g., depth, thermal, or other information) are
integrated as a joint input fed into a neural network. This fusion approach is called input fusion. As shown
in Figure 1 (b) and (c), RGB and extra channel data are first fed into different network streams, and then
their feature representations extracted by lower-level or higher-level are combined as joint feature
representations for next level decision. These fusion approaches are called early fusion and late fusion
respectively. As shown in Figure 1 (d), RGB and extra data are separately fed into two streams. Different
from late fusion, extracted features (yellow blocks and green blocks) are also connected instead of being
isolated at the beginning but fused in the last step. This fusion approach is designed as multi-scale fusion.
The extra data depicted in Figure 1 can be presented by different data types such as depth, thermal, and
other features.
16
Figure 1 Image data fusion categories
RGB-Thermal: Thermal data is used to improve semantic segmentation. For example, Laguela et
al. [85] researched on how to fuse infrared data with RGB images a high-quality thermographic building
envelope image. Luo et al. [102]’s hybrid tracking framework is an early fusion method. Li et al. [103]’s
two-stream CNN of RGB-Thermal object tracking is a typical late fusion method. Researchers have also
implemented multi-scale fusion method such as Zhai et al. [104]’s cross-modal correlation filters and Jiang
et al.[105]’s cross-modal multi-granularity attention networks. As illustrated in Figure 1 (d), ordinary
approach built two streams for RGB and extra data, respectively. However, Jiang et al. [105] built two
streams for RGB data and two streams for thermal data. One stream for RGB data had the same network
17
structure as one for thermal data, so did the other stream for RGB data and the other one for thermal data.
They author fused four streams together to make final predictions. The benefits of their methods could be
learning more features from data, but it also increased computing burden. Researchers have similarly
explored using synthetic data for RGB-Thermal segmentation [106].
18
Chapter 3. Research Objectives and Questions
The objective of this thesis is (1) to support efficient thermal data collection, fusion, and analysis
by using infrared thermography (IRT) mounted on unmanned aircraft systems (UAS), (2) to help facility
managers to adequately identify building objects and index building envelop in a large district area, and (3)
to provide a novel synthetic thermal image generation approach for enlarging training datasets.
3.1 Research Objective 1
The objectives include understanding how potential factors affect the quality of data collection and
the traditional thermal mapping model reconstruction by only using thermal images.
• Research Question 1: What impacts do flight configurations, such as bearing angles, altitudes, and
flight patterns have on traditional thermal mapping model reconstruction?
• Research Question 2: How can the quality of different thermal mapping models reconstrued by
different flight configurations be compared? What are the suggestions for thermal mapping data
collection?
To explore the construction of an accurate and robust data fusion approach by making full use of
high resolution RGB images. To understand the performance of data fusion approach under different
conditions with different potential factors.
• Research Question 3: What are the resulting changes in the performance of data fusion using
different UAS flight configurations such as flight paths, camera angles, and altitudes?
• Research Question 4: How does the data fusion performance differ when surveying the different
landscapes and different building styles? For example, districts loosely built up with detached
modern buildings on campus versus densely built traditional European buildings in a city area.
• Research Question 5: How does the data fusion performance differ when investigating different
components of thermal mapping models? For example, the roofs and façades.
19
3.2 Research Objective 2
The objectives include understanding the performance of extracting components that belong to
different categories of outdoor scenes from both thermal and RGB fused data. Categories include ground,
roof, façade, vehicle, and equipment. Roofs and façades are energy audit targets, while vehicles and
equipment are heat sources and possible causes of disturbances for energy audits. The objective here is to
remove heat sources which are not building elements.
• Research Question 1: How can RGB information be combined with thermal information to improve
the performance of semantic segmentation?
• Research Question 2: In which category a thermal channel boosts (or deteriorates) the
performance? In which category, do the segmentation approaches perform well, and in which
category, do they fail?
• Research Question 3: Compare different segmentation approaches, which neural network
segmentation framework for which category is superior to others?
3.3 Research Objective 3
The objectives include solving data hunger problems, generating synthetic thermal images to
improve segmentation performance.
• Research Question 1: How can RGB images of buildings and surrounding environments be used to
generate thermal images?
• Research Question 2: How can training data of captured RGB images affect simulation results?
Particularly, how is the generation model established by one building style used to generate thermal
images with another?
• Research Question 3: What are the differences between the current approaches and proposed
approaches for generating thermal images?
20
Chapter 4. An Innovative Approach to RGB Point Cloud and Thermal Information Data
Fusion for Building Thermal Map Modeling Using Aerial Images: Fusion Performance
Results under Different Experimental Conditions
4.1 Introduction and Motivation
Studies have confirmed that improving performance of building envelopes is a low consumption
and high-return strategy [107][15][17]. An efficient envelope can maintain a conditioned environment. In
order to detect defects in envelopes, researchers have explored many approaches to auditing energy and
improving efficiency, such as fan pressurization (blower door test), ultrasound (tone test), and
thermography [23][13][24][25]. These approaches have their limitations. Particularly, thermography
detects energy loss based on thermal values received by thermal sensors that are easy to use and more
flexible than the other two approaches. However, thermography has a limited ability in energy audits for
single buildings. Performing large-scale building analyses plays a crucial role in comprehensive energy
audits [108], and studies have shown that such analyses should be considered for complex interconnection
and interdependencies between buildings (e.g., shading, heat exchange) as well as in urban environments
[108]. It is not possible or efficient for inspectors to examine many buildings in one area using these current
approaches. Specifically, certain building areas are difficult to reach such as the roof and facades of tall
buildings. To investigate a large district efficiently, unmanned aircraft systems (UASs) should be utilized.
Mounted on UASs (drones), cameras can capture RGB images and reconstruct RGB 3D models by
using photogrammetry techniques for a district. An automated fly-past survey can reduce labor costs and
extend the survey scope efficiently. In the same way, thermal cameras can be mounted on drones to capture
thermal images to reconstruct thermal 3D models. A well-developed thermal map model can represent
building-envelope thermal information for energy audits. However, the result of using thermal images for
model reconstruction is not satisfactory because thermal image photogrammetry, especially in large-area
surveying faces the following challenges. First, thermal camera resolution is usually low, which makes
common feature detection (the same attributes that two different images share) between thermal images
21
more difficult. This subsequently makes model reconstruction more difficult [109][48]. Second, the angle
between the normal ray (a ray that is perpendicular to the surface) of the cameras and the normal ray of the
object influences thermal image accuracy in that different images might assign different thermal values to
the same object. Third, the trade-off between efficiency and accuracy is more important for thermal model
reconstruction. Thermal images are supposed to be taken as quickly as possible because the building surface
temperatures can rapidly change. However, such changes affect model reconstruction accuracy and can
lead to blurry thermal images. Finally, fourth, a high-quality point cloud model for a flat roof, which
normally has a single thermal value, is difficult to reconstruct since the photogrammetry algorithm cannot
detect common features between the thermal images that contain large uniform areas [30][31].
Based on these challenges, many studies have explored methods to improve the performance of
thermal 3D model reconstruction [110][111]. However, these studies have not taken full advantage of high-
resolution RGB images that can improve the performance of common feature detection for model
reconstruction and avoid the accuracy-efficiency dilemma to some extent. Therefore, I propose a two-part
thermal and RGB data fusion approach to first make full use of high-resolution RGB images to reconstruct
3D point cloud models and then project information from thermal images onto the RGB 3D models. In this
way, a high-quality thermal 3D mapping model for a large area can be obtained instead of the use of only
lower-resolution thermal images to reconstruct a thermal mapping model. This proposed model
simultaneously contains both RGB and thermal information, which is more efficient and effective in
obtaining energy audits. To understand how the data fusion approach performs under varying conditions, I
designed four experiments with four test factors including (1) different camera altitudes (60 meters and 35
meters), (2) distinct camera angles (45 degrees and 30 degrees), (3) diverse flight path designs (mesh grid
and Y path), and (3) various building styles (buildings on a university campus and buildings in a central
city area).
To evaluate whether different test factors would influence the data fusion approach, this study
sought to verify the performance of the proposed data fusion approach from different perspectives. The
study was designed to answer the following questions. First, what is the data fusion performance when
22
using different UAS flight configurations, such as camera altitudes and angles, and flight paths? Second,
how does the data fusion performance differ when surveying various building styles? For example, building
styles would include districts constructed with detached modern buildings on campuses versus densely built
European cities with traditional buildings; and Third, how does the data fusion performance differ when
investigating the diverse components of thermal mapping models? For example, roofs and façades would
represent different component types.
4.2 Research Methods
4.2.1 Research Architecture
This study includes four steps: (1) data collection, (2) camera calibration and image registration,
(3) data fusion, and (4) evaluation. Figure 2 illustrates the research method workflow.
Figure 2 Research method workflow
4.2.2 Step 1: Data Collection
4.2.2.1 Brief Summary of the Deployed Hardware and Software
In this study, the UAS is composed of three parts: (1) the main body, (2) a data collection system,
and (3) the controller. The main body consists of an aircraft and a gimbal. The gimbal can continuously
yaw on the vertical axis, pitch on the lateral axis, and roll on the longitudinal axis. There is also a connection
between the gimbal and the data collection system. The FLIR DUO Pro R, a package that integrates thermal
and optical cameras, is used in the data collection system. The benefit is that the gap between thermal lens
23
and RGB lens in FLIR DUO Pro R is shorter than using two distinct cameras. Both thermal and RGB
images can be obtained within one flight task, which can reduce the uncertainties for image registration.
The RGB image has a resolution of 4000 x 3000, and the thermal image has a resolution of 640 x 512.
Radiometric JPEG is the data format that provides both thermal data and RGB data. The third part is the
controller. It can remotely control the GPS systems in both the aircraft and cameras. The GPS system is not
only used to guide automatic flight, but also to record images’ georeferencing information that enables the
photogrammetry algorithm to precisely re-calculate the images’ positions in the 3D space and accurately
recreate 3D models [112][113]. I selected “DJI GS Pro” software for data collection, which was used on a
tablet to plan the flight and control the drone. The program’s automatic flight planning and flight control
allows the drone pilot to efficiently collect data.
4.2.2.2 Tested Factors During Data Collection
There were four factors of data collection: (1) different camera altitudes, (2) distinct camera angles,
(3) diverse flight paths, and (4) various types of building styles. First, 35 meters and 60 meters were tested
as camera altitudes to examine how such altitude influences the data fusion approach. To determine the
appropriate test flight altitude, the tallest building in the area was used as a calculation base for the minimum
altitude. The tested altitude, 35 meters, was 1.5 times the highest building, while 60 meters was 3 times the
highest building over the tested area. Second, two typical camera angles of 45 and 30 degrees were
evaluated. The angles were defined as the angle between the camera view line and the vertical axis (see
step one in Figure 2). A third factor was the definition of the flight path, which included the mesh grid flight
path and the Y path. Mesh grid is a commonly used flight path in which the drone flies facing in an
orthogonal four direction, such as north, south, east, and west. The Y path was designed to save data
collection time because, in this case, the drone only flew in three directions including north, southeast, and
southwest [30]. Finally, the last comparison in the data collection step was between various building styles.
One was a university campus in Germany where modern buildings, separated by lawn and roads were not
close in proximity. The contrasting scene was a dense city area in Germany where traditional European
buildings were located close together.
24
Images were taken under a variety of conditions with the abovementioned four factors, and the
proposed data fusion approach was conducted using these images. Four types of experiments driven by the
four factors were introduced in Experiment Analysis Section of this study in which I compared and analyzed
the performance of the proposed data fusion approach.
4.2.3 Step 2: Camera Calibration and Registration
There were two required processes to be completed before the deployment of the proposed data
fusion approach: camera calibration and image registration.
First, distortions on thermal and RGB images affect the quality of mapping. Therefore, using
camera calibration to solve the distortion problem is necessary. Assume that [ 𝑥𝑥 , 𝑦𝑦 , 𝑧𝑧 , 1] is a world point in
the 3D space, and [ 𝑥𝑥 , 𝑦𝑦 , 1] is an image point in an image. Equation 1 explains the projection from 3D point
to 2D point. W is called the scale factor and p is called the camera matrix. Camera calibration is the
processing of estimating p, the camera matrix. In Equation 2, p can be rewritten as [ 𝑅𝑅 | 𝑡𝑡 ], extrinsic rotation
and translation, which are linear transformation matrixes denoting the rotation and transformation from a
3D world coordinate system to a 3D camera coordinate system, and K, intrinsic matrix, which denotes
transformation from a 3D camera coordinate system to a 2D image coordinate system [114][115]. The
intrinsic matrix can also be written as Equation 3, where 𝛼𝛼 = 𝑘𝑘 𝑘𝑘 , 𝛽𝛽 = 𝑙𝑙 𝑘𝑘 . The symbol k and l are used to
handle pixel unit transformation, and the symbol 𝜃𝜃 is used to represent the angle between the x-axis and y-
axis in the image coordinate system, since the x-axis is not always perpendicular to the y-axis in the image
coordinate system.
𝑊𝑊 [ 𝑥𝑥 , 𝑦𝑦 , 1] = [ 𝑥𝑥 , 𝑦𝑦 , 𝑧𝑧 , 1] 𝑝𝑝
Equation 1
25
𝑝𝑝 = [ 𝑅𝑅 | 𝑡𝑡 ] 𝐾𝐾
Equation 2
𝐾𝐾 = [
𝛼𝛼 − 𝛼𝛼 𝛼𝛼𝛼𝛼 𝑡𝑡 𝜃𝜃 𝑥𝑥 0
0
𝛽𝛽 𝑠𝑠 𝑠𝑠𝑠𝑠 𝜃𝜃 𝑦𝑦 0
0 0 1
]
Equation 3
Images sometimes display objects incorrectly because the camera matrix does not consider
tangential and radial distortion. This problem can be solved with the use of a chessboard when collecting
calibration images. The camera’s RGB and thermal image distortion errors can then be corrected before
mapping. When collecting calibration images with the chessboard, it is important to take a set of images
that includes the entire view of the images and especially the edge of the view (see Figure 3 (a) and (b): the
process of calibrating the camera with some selected images). Since different materials emit different level
of infrared light for thermal camera to capture, the chessboard used for thermal images calibration should
have differences constructed from a hollow tin foil sheet. The hollow areas are lighter than the solid areas
so the cheeseboard can be detected (see Figure 3 (a) camera calibration on thermal images, and (b) camera
calibration on RGB images).
(a) (b)
26
(c)
Figure 3 Camera calibration and image registration
Second, image registration should overlap each pixel in thermal images onto its corresponding
RGB image pixel (see Figure 3 (c)). Although the FLIR Duo Pro R used for this research had built-in dual
cameras (thermal and optical lenses)—thereby allowing thermal and RGB images to be taken
simultaneously from the same angles and altitude, a gap remained between the two lenses. Thus, registration
between thermal and RGB images was necessary. Image registration also ensured accuracy when fusing
the corresponding RGB information with the thermal information during the data fusion process. To
complete the registration, the chessboard used for camera calibration was also used to detect common
features on both the thermal and RGB images, as shown in Figure 3 (c). These blue dots are features
detected on paired thermal and RGB images and used for image registration.
4.2.4 Step 3: Data Fusion Process
There are a variety of well-established software programs for processing images and reconstructing
3D mapping models, including Pix4D, Agisoft, and DroneDeploy. However, none of them provide the
needed API for supporting users’ extended developments, such as extracting intermediate parameters for
3D-pose estimation. ContextCapture recorded all intermediate files for extended programming, including
a list of matched images, camera focal length, and camera sensor size. These files and parameters were
27
fundamental to allow the replacement of RGB images with thermal images and to accurately project pixels
from thermal images onto the model.
As previously mentioned, RGB images had higher resolution than thermal images, as well as other
advantages. Therefore, RGB images were used to construct the 3D mapping model and as such, the points
in the point cloud model included more accurate spatial information and clearer surface color information
of objects. ContextCapture could detect all potential common feature points from sequential images to
reconstruct a 3D RGB model. The created point cloud consisted of many points with specific spatial, as
well as color information. The created mesh model was a consistent polygonal model converted from the
given point cloud model by using a triangulation algorithm. With the point cloud model, it was easy to store,
retrieve, and analyze information; however, a gap existed between every two points. The mesh model can
complement the gap by using a depth buffering. As the pixels were projected onto the RGB model, it was
necessary to use an algorithm to search for a point’s nearest neighbors. A K-d tree is a data structure that
partitions and organizes points in a k-dimensional space. When a ray projected from the pixels in the thermal
image reached the 3D RGB model, it could find the nearest neighbors and then add thermal information
onto the RGB information. As the detailed figure in the circle in Figure 4 shows, the blue dots represent
points in the RGB point cloud model, and the red point demonstrates where the pixels in the thermal image
were projected and defined as the projected point. The projected point searched for its nearest point, the
yellow dot in the point cloud model, which was defined as the target point. If the projected point found its
nearest point—the target point, it would fuse the target points’ RGB and thermal information. Each
projected ray can locate its nearest point and assign a thermal value to its first nearest point. This step was
repeated for all pixels in all thermal images so that every point in the RGB model was assigned several
thermal values.
28
Figure 4 Illustration of the data fusion process
Assume there are ( 𝑚𝑚 ) RGB images � 𝑉𝑉 ( 1)
, 𝑉𝑉 ( 2)
, … … 𝑉𝑉 ( 𝑚𝑚 )
� and corresponding ( 𝑚𝑚 ) thermal images
� 𝑇𝑇 ( 1)
, 𝑇𝑇 ( 2)
, … … 𝑇𝑇 ( 𝑚𝑚 )
� . Assume there are ( 𝑠𝑠 ) points reconstructed by RGB images in a 3D point cloud file,
noted � 𝑃𝑃 ( 1)
, 𝑃𝑃 ( 2)
, … … 𝑃𝑃 ( 𝑛𝑛 )
� , and every point in a 3D point cloud contain some pixel information from
various RGB images and their corresponding thermal images, noted point-image pairs
{{ 𝑃𝑃 ( 1)
: ( 𝑉𝑉 1
, 𝑇𝑇 1
) … … , � 𝑉𝑉 𝑘𝑘 ( 1), 𝑇𝑇 𝑘𝑘 ( 1) � }, { 𝑃𝑃 ( 2)
: ( 𝑉𝑉 1
, 𝑇𝑇 1
) … … , � 𝑉𝑉 𝑘𝑘 ( 2), 𝑇𝑇 𝑘𝑘 ( 2) � }, … … , { 𝑃𝑃 ( 𝑛𝑛 )
: � 𝑉𝑉 an y number
, 𝑇𝑇 an y number
�
… … , � 𝑉𝑉 𝑘𝑘 ( 𝑛𝑛 ), 𝑇𝑇 𝑘𝑘 ( 𝑛𝑛 ) � }}. Since every point has a various number of associated images, 𝑘𝑘 ( 1)
𝑘𝑘 ( 2)
. . . . . . 𝑘𝑘 ( 𝑛𝑛 )
is
used to distinguish this number. Additionally, the associated images for 𝑃𝑃 ( 𝑛𝑛 )
can begin with any number.
Figure 4 illustrates the data fusion process. Since all images in the data fusion process are run, color
information for each point is fused with relevant thermal information from source images. Each point then
receives various thermal values from different relevant thermal images that are supposed to be the same, as
the point represents an identical object. { 𝑃𝑃 ( 1)
: ( 𝑉𝑉 1
, 𝑇𝑇 1
) … … , � 𝑉𝑉 𝑘𝑘 ( 1), 𝑇𝑇 𝑘𝑘 ( 1) � } represents an example that is
one point of points in the blue point cloud model. 𝑃𝑃 ( 1)
is reconstructed by 𝑘𝑘 ( 1)
number of RGB images,
29
represented as { 𝑉𝑉 1
, … … 𝑉𝑉 𝑘𝑘 ( 1)}. If all RGB images were aligned with thermal images { 𝑇𝑇 1
, … … 𝑇𝑇 𝑘𝑘 ( 1)}, the
points would receive different thermal values from the relevant thermal images. Thermal values are pixels
in thermal images, noted as { 𝑝𝑝 𝑠𝑠 𝑥𝑥 𝑃𝑃 ( 1)
( 1)
, … … 𝑝𝑝 𝑠𝑠 𝑥𝑥 𝑃𝑃 ( 1)
( 𝑘𝑘 ( 1)
)
, }. Therefore, the mean value, noted as 𝑥𝑥 ̅ 𝑃𝑃 ( 1), and the
standard deviation (SD) value, noted as 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 1), of received thermal values for 𝑃𝑃 ( 1)
are calculated and
elaborated in Equation 4 and Equation 5. The mean value 𝑥𝑥 ̅ 𝑃𝑃 ( 1) is set as the point’s finalized thermal value
(FTV). Additionally, as I discussed, the various relevant thermal values are supposed to be the same—in
other words, { 𝑝𝑝 𝑠𝑠 𝑥𝑥 𝑃𝑃 ( 1)
( 1)
, … … 𝑝𝑝 𝑠𝑠 𝑥𝑥 𝑃𝑃 ( 1)
( 𝑘𝑘 ( 1)
)
, } are supposed to be identical. Therefore, 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 1) is supposed to be 0.
𝑥𝑥 ̅ 𝑃𝑃 ( 1) =
∑ 𝑝𝑝 𝑠𝑠 𝑥𝑥 𝑃𝑃 ( 1)
( 𝑖𝑖 )
𝑘𝑘 ( 1)
𝑖𝑖 = 1
𝑘𝑘 ( 1)
Equation 4
𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 1) =
�
∑ � 𝑝𝑝 𝑠𝑠 𝑥𝑥 𝑃𝑃 ( 1)
( 𝑖𝑖 )
− 𝑥𝑥 ̅ 𝑃𝑃 ( 1) �
2
𝑘𝑘 ( 1)
𝑖𝑖 = 1
𝑘𝑘 ( 1)
Equation 5
Finally, the mean values, points’ finalized thermal values (FTVs) { 𝑥𝑥 ̅ 𝑃𝑃 ( 1), 𝑥𝑥 ̅ 𝑃𝑃 ( 2), … … 𝑥𝑥 ̅ 𝑃𝑃 ( 𝑛𝑛 )} and SD
values { 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 1), 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 2), … … 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 𝑛𝑛 )} for all 𝑠𝑠 points in a point cloud file were calculated, and then generated
by the frequency distribution of the SD values in a graph.
In order to comprehensively investigate the feasibility of the proposed data fusion approach, the
data fusion process was run using data sets with distinct impact factors as discussed in Research
Architecture Section and deployed using the approach described in Data Fusion Process Section. The
different cases included altitude comparison (60 m and 35 m), camera angle comparison (45 degrees and
30 degrees), flight path design comparison (mesh grid and Y path) and building architectural style
30
comparison (campus area with sparsely spaced buildings and a densely packed city area). The results
included corresponding SD-value distribution for each experiment.
4.2.5 Step 4: Evaluation of the Data Fusion Approach
The performance of the point data fusion approach described in Data Fusion Process was evaluated
based on multiple SD distributions obtained under varying conditions. I evaluated the performance from
four perspectives: (1) the simple average value, (2) the earth mover’s distance, (3) the adjustable parameters,
and (4) the evaluation of components.
4.2.5.1 Simple Average Value (SAV)
SD values { 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 1), 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 2), … … 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 𝑛𝑛 )} for all ( 𝑠𝑠 ) points in a point cloud model were calculated,
and a frequency distribution of the SDs for one experiment were generated in a graph. Next, the
performance of this approach was evaluated based on the generated distribution. The total average of all
SD values was calculated, as shown in Equation 6, from the first evaluation perspective. SD values were
supposed to be zeros, so the total average was also supposed to be zero. However, since the unintended
errors were introduced in the fusion process, SD values were not always zeros, similarly, the total average
was a non-zero number.
𝜎𝜎 =
∑ 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 1), 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 2), … … 𝑆𝑆 𝑆𝑆 𝑃𝑃 ( 𝑛𝑛 )
𝑛𝑛 𝑖𝑖 = 1
𝑠𝑠
Equation 6
Therefore, a situation was defined in which the total average in an ideal situation was supposed to
be zero and noted as 𝜎𝜎 𝑖𝑖 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 . In addition, a realistic situation was defined in which one experiment was
conducted under one condition as a test situation, noted as 𝜎𝜎 𝑡𝑡 𝑖𝑖 𝑡𝑡𝑡𝑡
. Comparing 𝜎𝜎 𝑡𝑡 𝑖𝑖 𝑡𝑡𝑡𝑡
to 𝜎𝜎 𝑖𝑖 𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 , greater 𝜎𝜎 𝑡𝑡 𝑖𝑖 𝑡𝑡 𝑡𝑡
demonstrated that the method that fused RGB information with thermal information in a model performed
worse under corresponding conditions.
4.2.5.2 Earth Mover’s Distance (EMD)
31
Given a fixed total data set range in a distribution, the number of equal parts into which the data
set can be divided is flexible. These equal parts are known as bins or class intervals. Note that no overlap
is allowed between the bins, and any observation occupies one and only one bin.
In this evaluation, I defined the frequency SD-value distributions under varying conditions as test
distributions, while I defined distributions that consisted of ideal SD values that were supposed to be 0 for
all points as an ideal distribution. Figure 5 (a), (b), and (c) are examples of test distributions, while Figure
5 (d) is an example of an ideal distribution. The horizontal axes represent all observed possible SD values
in a distribution, and the vertical axes represent the number of points with the given values. As Figure 5 (a),
(b), and (c) shown, there was the same test distribution of SD values in the experiment, but with distinct
bin sizes. The statistics became more precise as the bin size decreased. There were 84 points with an SD
value of 0, which is more notable in Figure 5 (c) than in Figure 5 (a), as the blue arrows point outward,
because there is a lower bin size in Figure 5 (c) than in Figure 5 (a).
The distribution comparison metric introduced in this section is the earth mover’s distance (EMD)
[116], also known as the Wasserstein metric. This evaluation measures the discrepancy (similarity) between
two distributions that requires they have the same frequency (probability) space. In other words, the two
distributions should have the same number of bins. If the observed SD values in the test distribution have
a certain number of points with the given observed SD values, these should also have a non-zero number
of points. However, since the ideal distribution was comprised of all ideal SD values, only one observed
SD value—which was 0—has a certain number of points. The number of points having other observed SD
values except 0 was null. The red arrow shown in Figure 5 (c) indicates a red bar demonstrating the number
of points having an SD value of 0. Based on this principle, the ideal distribution needed to be slightly
adjusted to ensure that the ideal distribution and the test distribution have the same frequency space or, in
other words, the same number of bins. The adjustment was done by adding one to the number of points of
the given SD value in the ideal distribution, except when the SD value was 0. Possible SD values have at
least one point (see Figure 5 (d)). Thus, the number of points that obtained possible SD values (except the
SD value of 0) are all 1s instead of 0s. This approach is similar to the statistical method of add-one
32
smoothing (also called Laplace smoothing) [59][60]. Varying conditions yielded varying test SD-value
distributions, which were then compared with the ideal distribution to determine the performances.
Figure 5 Test distribution and ideal distribution
The earth mover’s distance can be seen as the minimum amount of “work” required to transport
the mass of earth in the bins of one distribution to the bins of the other distribution. The “work” is calculated
as the amount of earth weight that must be moved, multiplied by the distance it must be moved.
Assume that there is a test distribution A and an ideal distribution B. A has (m) bins of observed
SD values, noted as 𝐴𝐴 = {( 𝛼𝛼 1
, 𝑤𝑤 𝛼𝛼 1
), ( 𝛼𝛼 2
, 𝑤𝑤 𝛼𝛼 2
), . . . . . . , ( 𝛼𝛼 𝑚𝑚 , 𝑤𝑤 𝛼𝛼 𝑚𝑚 )}, where 𝛼𝛼 𝑖𝑖 is the blue bin and 𝑤𝑤 𝛼𝛼 𝑖𝑖
represents the weight of the mass of earth in the associated bin (see Figure 5 (c)), whereas B has (n) bins of
observed SD values with 𝐵𝐵 = {( 𝛽𝛽 1
, 𝑤𝑤 𝛽𝛽 1
), ( 𝛽𝛽 2
, 𝑤𝑤 𝛽𝛽 2
), . . . . . . , ( 𝛽𝛽 𝑛𝑛 , 𝑤𝑤 𝛽𝛽 𝑛𝑛 )}, where 𝛽𝛽 1
represents the red bins
and 𝑤𝑤 𝛽𝛽 𝑛𝑛 represents the capacity of the red bins for mass of earth. Let D = [ 𝑑𝑑 𝑖𝑖 , 𝑗𝑗 ] be the distance of
33
transporting earth between bins 𝛼𝛼 𝑖𝑖
and bins 𝛽𝛽 𝑖𝑖 . Next, let F = [ 𝑘𝑘 𝑖𝑖 , 𝑗𝑗 ] be a set of flow where 𝑘𝑘 𝑖𝑖 , 𝑗𝑗 is the weight
of earth moved between 𝛼𝛼 𝑖𝑖
and 𝛽𝛽 𝑖𝑖 . Therefore, the objective function to minimize the overall cost can be
defined as the production of distance 𝑑𝑑 𝑖𝑖 , 𝑗𝑗 and weight 𝑘𝑘 𝑖𝑖 , 𝑗𝑗 . It is noted as shown in Equation 7.
𝑚𝑚 𝑠𝑠𝑠𝑠 � � 𝑘𝑘 𝑖𝑖 , 𝑗𝑗 𝑑𝑑 𝑖𝑖 , 𝑗𝑗 𝑛𝑛 𝑗𝑗 = 1
𝑚𝑚 𝑖𝑖 = 1
Equation 7
Equation 7 is subject to the following constraints, from Equation 8 to Equation 11. First,
transporting earth from full blue bins of distribution A to empty red bins of distribution B can only be done
in one direction not vice versa, as demonstrated in Equation 8.
𝑘𝑘 𝑖𝑖 𝑗𝑗 ≥ 0 1 ≤ 𝑠𝑠 ≤ 𝑚𝑚 , 1 ≤ 𝑗𝑗 ≤ 𝑠𝑠
Equation 8
Second, the amount of earth moved from a bin of A to a bin of B should not be heavier than the
capacity of the corresponding bin of A and corresponding bin of B, as shown in Equation 9 and Equation
10.
� 𝑘𝑘 𝑖𝑖 𝑗𝑗 ≤ 𝑤𝑤 𝛼𝛼 𝑖𝑖
, 1 ≤ 𝑠𝑠 ≤ 𝑚𝑚 𝑛𝑛 𝑗𝑗 = 1
Equation 9
� 𝑘𝑘 𝑖𝑖 𝑗𝑗 ≤ 𝑤𝑤 𝛽𝛽 𝑗𝑗
, 1 ≤ 𝑗𝑗 ≤ 𝑠𝑠 𝑚𝑚 𝑖𝑖 = 1
Equation 10
Next, the total amount of earth moved from all bins of A to all bins of B should not be heavy than
the minimum value between total capacity of bins of A and B, as shown in Equation 11.
� � 𝑘𝑘 𝑖𝑖 , 𝑗𝑗 = min { � 𝑤𝑤 𝛼𝛼 𝑖𝑖 𝑚𝑚 𝑖𝑖 = 1
, � 𝑤𝑤 𝛽𝛽 𝑗𝑗 𝑛𝑛 𝑗𝑗 = 1
}
𝑛𝑛 𝑗𝑗 = 1
𝑚𝑚 𝑖𝑖 = 1
34
Equation 11
Finally, the optimal flow 𝐅𝐅 = [ 𝑘𝑘 𝑖𝑖 , 𝑗𝑗 ] can be defined based on linear optimization, and the EMD is
described as calculating the weighted average of the moving work based on the total flow, as shown in
Equation 12. The higher score represents more earth should be moved; or in other words, there should be
larger differences between the two distributions.
𝐸𝐸𝐸𝐸 𝑆𝑆 ( 𝐴𝐴 , 𝐵𝐵 ) =
∑ ∑ 𝑘𝑘 𝑖𝑖 , 𝑗𝑗 𝑑𝑑 𝑖𝑖 , 𝑗𝑗 𝑛𝑛 𝑗𝑗 = 1
𝑚𝑚 𝑖𝑖 = 1
∑ ∑ 𝑘𝑘 𝑖𝑖 , 𝑗𝑗 𝑛𝑛 𝑗𝑗 = 1
𝑚𝑚 𝑖𝑖 = 1
Equation 12
EMD is a commonly used distribution comparison metric. A lower score implies that the test
distribution had more similarities to the ideal distribution and the data fusion approach performs better
under corresponding conditions with related factors.
4.2.5.3 Parameter Evaluation
The third evaluation was regarding parameter examination in the data fusion approach. Considering
that the size of a point cloud file with an 80–120 million number of points would consume 2.5–3.8 computer
GB, I conducted down sampling on the experiment files. The distance of each point was limited to 0.5 m.
As previously described, pixels in the thermal images were projected onto the 3D model through rays
searching their nearest points in the point cloud model. For instance, as demonstrated by the detailed figure
in the circle in Fig 3, the red point shows where the pixel in the thermal image was projected, defined as a
projected point. This projected point searched its nearest point, the yellow dot in the point cloud model,
defined as the target point. The RGB information in the target point was fused with the thermal information
in the projected point. It was important to investigate the distance between projected points and their target
points. In some cases, the distance between them was too far to meet the accuracy requirement. For example,
one projected ray might have searched its nearest point with a distance greater than 0.5m (the down
sampling distance). A threshold was set, and if a certain number of points were searched with a distance
greater than the threshold, the result was not acceptable. Therefore, the influence of the distance threshold
35
between the projected points and their target points could be investigated. In this study, test cases were set
as 0.1 m, 0.2 m, 0.3 m, 0.4 m, 0.5 m, and no threshold because the distances of each point were down
sampled to 0.5 m. If the distances exceeded the threshold such as 0.1 m, 0.2 m, 0.3 m, 0.4 m, and 0.5 m,
the projected points were not added onto any target points. No threshold meant that as long as the projected
points found the nearest target points, they fused their thermal information with the target point RGB
information.
4.2.5.4 Components Evaluation
The fourth evaluation was intended to understand how the data fusion approach performed with
diverse components or element subgroups in the 3D point cloud model. As there were different components
in the outdoor scenes, a 3D point cloud was segmented into various categories, such as façades and roofs.
Since these components had distinct shapes and colors, the components’ reconstruction using RGB images
might perform differently. Additionally, since different components in a 3D model have distinct distances
from the cameras, and various angles exist between the normal ray of the components’ surfaces and the
normal ray of the cameras, sensors might inadvertently capture various thermal values. Dividing 3D models
into subgroups allowed for the investigation of the sensitivity of the data fusion approach performance on
different components of outdoor scenes.
In summary, datasets were tested with various testing factors following each step. The distinct flight
configurations included the altitude (60 meters and 35 meters), camera angle (45 degrees and 30 degrees),
flight path design (mesh grid and Y path) and building style (campus area with sparsely spaced buildings
and a city area with compactly packed buildings).
4.3 Experiment Analysis
This research was conducted on a college campus and a city center in Germany during the winter
since room temperatures are higher and outside ambient temperatures are lower. The average recommended
temperature in Germany is 17°C (63 °F) for indoor spaces. The outdoor temperature when the research was
conducted was -5 °C (23°F) at 7 AM. The experiments conducted in this study are summarized in Table 1
36
and are abbreviated as (1) “Campuse_45°_Mesh_35m”, (2) “Campuse_45°_Mesh_60m”, (3)
“Campuse_30°_Mesh_60m”, (4) “Campuse_30°_Y_60m”, (5) “City_45°_Mesh_60m”. Experiment 1
compared altitudes in (1) and (2). Experiment 2 compared camera angles in (2) and (3). Experiment 3
compared flight paths in (3) and (4), and experiment 4 compared building styles in (2) and (5).
Table 1 Summary of experiments
Abbreviation
of Each
Experiment
Campuse_45°_Mesh_
35m
Campuse_45°_Me
sh_60m
Campuse_30°_Mes
h_60m
Campuse_30°_
Y_60m
City_45°_Mes
h_60m
Abbr. in
Experiment 1
35 meters 60 meters - - -
Abbr. in
Experiment 2
- 45° 30° - -
Abbr. in
Experiment 3
- - Mesh Grid Path Y Path -
Abbr. in
Experiment 4
- Campus - - City
To compare the data fusion performance in areas of different building styles, I chose a campus area,
where modern buildings are built relatively distant from each other, and an urban area where traditional
European urban buildings are built in proximity. Four buildings were included in the campus area (Figure
6 (a) is an RGB point cloud model, and (b) is a thermal model where all points in the RGB model were
replaced by thermal information). The thermal model is represented by the iron color palette. Purple
represents a lower temperature, while yellow represents a hot temperature. Similarly, four buildings were
included in the city area (Figure 6 (c) is an RGB point cloud model, and (d) is a thermal model based on
the RGB model).
(a) (b)
37
(c) (d)
Figure 6 Illustration of thermal models on a campus and in a city area
4.3.1 Experiment One: Altitude Comparison
Experiment one tested the different altitudes of 35 meters and 60 meters. The evaluations, the SAV
and EMD values for the two flight altitudes were calculated and are summarized in Table 2. There are six
rows (row 1, 5, 9, 13, 17, and 21) for the 35 meters altitude and six rows (row 2, 6, 10, 14, 18, and 22) for
the 60 meters altitude. These six rows correspond to six evaluations for different point cloud objects that
include building 1, building 2, building 3, and building 4 (see Figure 6 (a)), all four buildings and all
buildings with their surrounding environments (such as roads, grass, and trees) summarized from subgroup
one to subgroup six (see Table 2). Columns represent the threshold constraining the distance between target
points and projected points. As described in Data Fusion Process Section, pixels in the thermal images were
projected onto the 3D model through rays and searched the nearest points in the point cloud model to fuse.
Distances were calculated during the search for their nearest points in the point cloud model to fuse. If the
distances exceeded thresholds such as 0.1 m, 0.2 m, 0.3 m, 0.4 m, or 0.5 m, the projected points were not
fused with any target points. Column “N” represents cases in which there was no threshold. The remainder
of rows was reserved for additional experiments as described in Experiment two and three.
Table 2 Statistics of evaluation criteria of different experiments conducted under varying conditions
Subgroup One: Building 1
SAV EMD Values
Threshold
Experiments
0.1 0.2 0.3 0.4 0.5 N 0.1 0.2 0.3 0.4 0.5 N
1 Campuse_45°_Mesh_3
5m
4.96
4
5.00
8
5.06
3
5.07
4
5.09
4
5.09
5
2.47
6
2.50
2
2.52
9
2.53
4
2.54
5
2.54
5
2 Campuse_45°_Mesh_6
0m
5.57
5
5.70
8
5.76
9
5.79
9
5.80
8
5.80
8
2.77
8
2.85
4
2.88
3
2.89
9
2.90
3
2.90
3
38
3 Campuse_30°_Mesh_6
0m
3.26
5
3.34
2
3.37
9
3.39
7
3.40
1
3.40
1
1.63
2
1.67
2
1.68
9
1.69
8
1.70
1
1.70
1
4
Campuse_30°_Y_60m
4.96
3
4.99
4
5.02
6
5.04
0
4.99
4
4.96
3
2.48
1
2.49
6
2.51
4
2.52
1
2.49
6
2.48
1
Subgroup Two: Building 2
SAV EMD Values
Threshold
Experiments
0.1 0.2 0.3 0.4 0.5 N 0.1 0.2 0.3 0.4 0.5 N
5 Campuse_45°_Mesh_3
5m
5.30
9
5.38
5
5.45
7
5.49
8
5.50
7
5.51
1
2.64
6
2.68
7
2.72
4
2.74
4
2.74
8
2.75
2
6 Campuse_45°_Mesh_6
0m
5.48
5
5.59
5
5.68
6
5.72
8
5.75
6
5.75
8
2.72
3
2.79
4
2.84
3
2.86
3
2.87
8
2.87
9
7 Campuse_30°_Mesh_6
0m
3.19
6
3.27
3
3.32
4
3.35
6
3.36
5
3.36
6
1.59
4
1.63
6
1.66
1
1.67
8
1.68
2
1.68
2
8
Campuse_30°_Y_60m
4.14
5
4.20
6
4.25
7
4.28
2
4.28
6
4.28
6
2.06
8
2.10
3
2.12
8
2.14
0
2.14
5
2.14
3
Subgroup Three: Building 3
SAV EMD Values
Threshold
Experiments
0.1 0.2 0.3 0.4 0.5 N 0.1 0.2 0.3 0.4 0.5 N
9
Campuse_45°_Mesh_3
5m
3.95
3
4.00
1
4.03
2
4.05
5
4.06
2
4.06
3
1.96
9
1.99
7
2.01
5
2.02
8
2.03
0
2.03
0
1
0
Campuse_45°_Mesh_6
0m
4.45
8
4.57
1
4.63
9
4.67
9
4.68
5
4.68
4
2.22
2
2.28
2
2.31
7
2.33
7
2.34
1
2.34
0
1
1
Campuse_30°_Mesh_6
0m
2.89
0
2.93
6
2.95
8
2.96
6
2.96
8
2.96
8
1.44
4
1.46
8
1.48
0
1.48
4
1.48
4
1.48
4
1
2
Campuse_30°_Y_60m
3.80
2
3.85
2
3.87
1
3.88
2
3.88
6
3.88
6
1.89
8
1.92
4
1.93
4
1.93
9
1.94
1
1.94
1
Subgroup Four: Building 4
SAV EMD Values
Threshold
Experiments
0.1 0.2 0.3 0.4 0.5 N 0.1 0.2 0.3 0.4 0.5 N
1
3
Campuse_45°_Mesh_3
5m
3.44
6
3.54
1
3.62
6
3.65
3
3.71
0
3.73
2
1.71
5
1.76
3
1.80
9
1.82
2
1.85
1
1.86
3
1
4
Campuse_45°_Mesh_6
0m
4.38
1
4.35
4
4.38
2
4.39
4
4.40
3
4.40
3
2.17
0
2.16
4
2.18
2
2.18
4
2.18
9
2.18
9
1
5
Campuse_30°_Mesh_6
0m
2.78
4
2.82
4
2.83
7
2.84
5
2.85
0
2.85
0
1.37
8
1.40
6
1.41
3
1.41
7
1.41
9
1.41
9
1
6
Campuse_30°_Y_60m
3.30
9
3.34
3
3.38
5
3.40
5
3.41
5
3.41
6
1.65
0
1.67
1
1.69
3
1.70
3
1.70
8
1.70
9
Subgroup Five: All Buildings
SAV EMD Values
Threshold
Experiments
0.1 0.2 0.3 0.4 0.5 N 0.1 0.2 0.3 0.4 0.5 N
1
7
Campuse_45°_Mesh_3
5m
4.50
4
4.55
8
4.61
0
4.63
7
4.65
4
4.65
7
2.24
3
2.27
4
2.30
3
2.31
6
2.32
4
2.32
6
1
8
Campuse_45°_Mesh_6
0m
5.04
6
5.14
8
5.21
2
5.24
4
5.25
6
5.25
6
2.50
8
2.57
0
2.60
3
2.61
9
2.62
5
2.62
4
1
9
Campuse_30°_Mesh_6
0m
3.05
6
3.11
6
3.14
6
3.16
1
3.16
5
3.16
5
1.52
5
1.55
8
1.57
3
1.58
0
1.58
3
1.58
3
2
0
Campuse_30°_Y_60m
4.15
9
4.20
2
4.23
5
4.25
1
4.25
6
4.25
6
2.07
6
2.10
1
2.11
8
2.12
6
2.12
9
2.12
9
Subgroup Six: All Buildings with Their Surrounding Environment
SAV EMD Values
Threshold
Experiments
0.1 0.2 0.3 0.4 0.5 N 0.1 0.2 0.3 0.4 0.5 N
2
1
Campuse_45°_Mesh_3
5m
4.06
2
4.13
3
4.18
1
4.21
6
4.23
4
4.23
7
2.02
4
2.06
4
2.08
9
2.10
8
2.11
7
2.11
9
2
2
Campuse_45°_Mesh_6
0m
4.80
5
4.89
5
4.95
7
4.99
0
4.99
9
4.99
9
2.38
6
2.44
3
2.47
7
2.49
3
2.49
9
2.49
9
39
2
3
Campuse_30°_Mesh_6
0m
2.94
5
2.99
5
3.02
4
3.03
9
3.04
2
3.04
2
1.46
9
1.49
9
1.51
4
1.52
2
1.52
3
1.52
3
2
4
Campuse_30°_Y_60m
3.74
8
3.78
9
3.81
3
3.82
8
3.83
2
3.83
2
1.87
0
1.89
5
1.90
7
1.91
5
1.91
7
1.91
7
As described in the research method, SAV and EMD values were calculated based on the SD values
distributions. To intuitively illustrate the distributions, they are plotted in Figure 7. The bin size of the
distributions is 0.01. Figure 7 (a), (b), and (c) show distributions for the experiment,
“Campuse_45°_Mesh_35m”, and Figure 7 (d), (e), and (f) are distributions for the experiment
“Campuse_45°_Mesh_60m.” Figure 7 (g)–(l) are reserved for additional experiments as described in
Experiment two and three. All the distributions in Figure 7 pertain to experiments with no limitations
between target points and projected points. Other situations, such as 0.1-0.5 were not plotted. The
distributions in the first column in Figure 7 are for the points belonging to all buildings with their
surrounding environments. The x-axis represents possible SD values, and the y-axis represents the number
of points having the given SD values. Distributions presented in the second column in Figure 7 are for the
same experiments, but in this case the distributions are divided and marked by various colors based on the
point properties belonging to different subgroups including building 1, building 2, building 3, building 4,
and the surrounding environment. The x-axes and y-axes for distributions in the second column have the
same meaning as the distributions in the first column. Distributions in the third column in Figure 7 are
separately plotted based on different subgroups instead of accumulatively stacked in one distribution similar
to distributions in the second column. The y-axes are converted to density of points instead of number of
points, as points for different subgroups were not balanced. The x-axes still represent possible SD values.
Since distributions of points belonging to different subgroups are plotted separately in figures in the third
column, the SAVs of the SD values for different distributions based on the subgroups they belong to are
separately calculated. These SAVs are shown as vertical lines perpendicular to the x-axis, and they are
marked with various colors based on the subgroups.
40
Figure 7 The distributions of SD values for different experiments (bin size = 0.01)
41
First, when comparing Figure 7 (a), (b), and (c) with Figure 7 (d), (e), and (f), it was found that the
experiment with a flight altitude of 35 meters had a more concentrated distribution than the experiment
with a flight altitude of 60 meters. There was only one peak in the distribution in Figure 7 (a) but three
peaks in the distribution in Figure 7 (d), which means that the SD value distribution in the former
experiment was more stable than the distribution in the latter experiment. Second, regardless of the
subgroup and the threshold configured for the distance between the target point and projected point, the
SAV and EMD values of the former experiment was always smaller than the latter experiment, as can be
seen in Table 2. Additionally, the vertical lines representing SAVs for distinct subgroups were all closer to
the y-axis in the former experiments than in the latter experiments (see Figure 7 (c) and Figure 7 (f)). Third,
the SAV and EMD values both decreased as the threshold shrank in both the former and latter experiments.
4.3.2 Experiment Two: Camera Angle Comparison
Distinct camera angles of 45 degrees and 30 degrees were tested in experiment two. The evaluations
of SAV and EMD values for these two angles were calculated and summarized in Table 2. There are six
rows (row 2, 6, 10, 14, 18, and 22) for the 45 degrees experiment and six rows (row 3, 7, 11, 15, 19, and
23) for the 30 degrees experiment. In Figure 7 (d), (e), and (f) are the distributions for experiment
“Campuse_45°_Mesh_60m,” and Figure 7 (g), (h), and (i) are the distributions for experiment
“Campuse_30°_Mesh_60m.”
First, in comparing Figure 7 (d), (e), and (f) with Figure 7 (g), (h), and (i), I can observe that the
camera angle of 45 degrees has a more scattered distribution than the camera angle experiment of 30 degrees
when other flight configurations were fixed. Additionally, distribution in the latter experiment is obviously
closer to the y-axis than the distribution in the former experiment, which means the number of points that
obtained a lower SD value in the latter experiment is greater than the number of such points in the former
experiment. Second, regardless of the subgroup and the threshold configured for distance between target
point and projected point, the SAV and EMD values of the former experiment were always greater than the
latter experiment according to Table 2. Third, the vertical lines representing SAVs for distinct subgroups
42
are all further away from the y-axis in the former experiments than in the latter experiments (see Figure 7
(f) and Figure 7 (j)). Specifically, the vertical lines in the latter experiment are more concentrated than those
in the former experiment. Fourth, the SAV and EMD values both decreased as the threshold shrank in both
the former experiment and the latter experiment. This phenomenon was like the Experiment One.
4.3.3 Experiment Three: Flight Path Comparison
Experiment three tested the diverse flight paths and included a cross-mesh flight path and a Y flight
path. The evaluations of SAV and EMD values for these two paths were calculated and summarized in
Table 2. There are six rows (row 3, 7, 11, 15, 19, and 23) for the cross-mesh flight and six rows (row 4, 8,
12, 16, 20, and 24) for the Y flight path. In Figure 7 (g), (h), and (i) show the distributions for the experiment
“Campuse_30°_Mesh_60m.” and Figure 7 (j), (k), and (l) show the distributions for the experiment
“Campuse_30°_Y_60m.”
First, when I compare Figure 7 (g), (h), and (i) with Figure 7 (j), (k), and (l), it is shown that the
experiment with a mesh grid flight path had a more concentrated distribution than the experiment with a Y
flight path when other flight configurations were fixed. Distribution in the latter experiment was very
unstable. Some subgroup distributions had two or three peaks. Second, regardless of the subgroup and the
threshold configured for distance between the target point and projected point, the SAV and EMD values
of the former experiment were always smaller than the latter experiment as shown in Table 2. Third, as
shown in Figure 7 (i) and Figure 7 (l), the vertical lines representing SAVs for distinct subgroups are all
closer to the y-axis in the former experiments than in the latter experiments. Fourth, the same phenomenon
of a decrease in SAV and EMD values as the threshold shrank was also observed in this experiment.
4.3.4 Experiment Four: Building Style Comparison
In experiment four, buildings on a campus area and buildings in an urban area were tested.
Evaluations of the SAV and EMD values for these two building styles and density was calculated and
summarized in Table 2 and Table 3. The experiment for the campus area, “Campuse_45°_Mesh_60m,” is
43
summarized in Table 2, and the experiment for the urban area, “City_45°_Mesh_60m,” is summarized in
Table 3. There are six rows (row 2, 6, 10, 14, 18, and 22 in Table 2) for the campus area and six rows (row
1, 2, 3, 4, 5, and 6 in Table 3) for the urban area. Data fusion performance evaluations are summarized in
the different tables, because buildings in the city area and buildings in the campus area have distinct
architectural styles. There were also four buildings investigated as targets in the city area. Similarly, the six
rows respectively correspond to the six evaluations of different point cloud subgroups, building 1, building
2, building 3, building 4, all buildings, and all buildings with their surrounding environments (such as roads,
grass, and trees). These four buildings are illustrated as 3D models in Figure 6 (c) and (d).
Table 3 Statistics of evaluation criteria of different experiments conducted under varying conditions
SAV EMD Values
Threshold
Subgroup
0.1 0.2 0.3 0.4 0.5 N 0.1 0.2 0.3 0.4 0.5 N
1 Building1 3.890 3.968 4.011 4.058 4.038 4.038 1.921 1.977 2.002 2.028 2.017 2.017
2 Building2 3.373 3.438 3.480 3.502 3.505 3.504 1.682 1.722 1.743 1.754 1.756 1.756
3 Building3 3.821 3.891 3.934 3.955 3.954 3.954 1.897 1.941 1.965 1.976 1.976 1.975
4 Building4 3.407 3.457 3.487 3.501 3.505 3.505 1.700 1.731 1.747 1.754 1.756 1.756
5 All Buildings 3.616 3.682 3.722 3.747 3.744 3.744 1.803 1.847 1.869 1.883 1.881 1.881
6 All Buildings
with Their
Surrounding
Environment
3.633 3.696 3.733 3.757 3.755 3.754 1.814 1.855 1.877 1.889 1.888 1.888
To intuitively illustrate the distribution characteristics, similar figures are plotted in Figure 8. Figure
8 (a), (b), and (c) show the distributions for experiment “City_45°_Mesh_60m.” These figures were
compared to Figure 7 (d), (e), and (f), the distributions for experiment “Campuse_45°_Mesh_60m.”
Similarly, I only plotted experiments where no distance threshold between target points and projected points,
while I did not plot other situations, such as the ones with distances 0.1–0.5.
44
Figure 8 The distributions of SD values for different experiments
Building 1 on campus and building 1 in the city area were not comparable because they had totally
different architectural styles, and their building volumes were different. Only the performance of the data
fusion approach between the campus and the city in general was studied. When comparing Figure 7 (d), (e),
and (f) with Figure 8 (a), (b), and (c), I observe that the experiment conducted on campus had a more
scattered distribution than the experiment conducted in the city area when other flight configurations were
fixed. SAV and EMD values of the former experiment were always greater than in the latter experiment
when surveying all buildings and all buildings with their surrounding environment, as shown in Table 2.
The vertical lines representing SAVs for distinct subgroups are scattered and further away from the y-axis
in the former experiments than in the latter experiments (see Figure 7 (f) and Figure 8 (c)). When conducting
the experiment in the city area, SAV and EMD values also decreased as the threshold shrank.
4.3.5 Points Illustrated with SD values
Figure 6 shows point cloud models with RGB information and thermal information in a 3D space.
To visualize each point’s SD value in point cloud models, Figure 9 (a) was generated for the campus
experiment and Figure 9 (b) for the city experiment. The color information of each point in the point cloud
model marks the SD value of each point. The dark purple color represents a lower SD value, and the lighter
yellow color represents a higher SD value. Results demonstrate that higher SD values were usually located
45
on the roofs of buildings. Specifically, points belonging to roof edges had the highest SD values in the
campus experiment (see Figure 9 (a)). In the city experiment, points belonging to roof and points belonging
to façade both had high SD values. However, in both experiments, points belonging to categories such as
road and grass in the environment subgroup had lower SD values.
After visualizing each point’s SD value in the point cloud models, the SD values of points
belonging to the façade of buildings significantly varied for different flight camera angles. The experiment
with a camera angle of 30 degrees is shown in Figure 9 (c), and the experiment with a camera angle of 45
degrees is shown in Figure 9 (d). Points belonging to the façade had relatively higher SD values in the
experiment with a 45-degree camera angle than in the experiment with a 30-degree camera angle, since the
colors of such points are relatively brighter in Figure 9 (d) than in Figure 9 (c).
Figure 9 Illustration of RGB models and model with SD values for each point
4.4 Discussions
To compare all the results of the different tested factors, the SAV and EMD values for the data
fusion performance are shown in Figure 10. The SAV values (Figure 10 (a)) and EMD values (Figure 10
(b)) for all experiments conducted in this study are plotted respectively. There was a total of five
46
experiments with the different tested factors: (1) “Campuse_45°_Mesh_35m,” (2)
“Campuse_45°_Mesh_60m,” (3) “Campuse_30°_Mesh_60m,” (4) “Campuse_30°_Y_60m,” and (5)
“City_45°_Mesh_60m.” Each experiment is presented in two lines representing two cases. One represents
the data fusion performance for points consisting of all buildings, which was subgroup 5 in Table 2 for the
campus area and row number 5 in Table 3 for the city area, while the other line represents the data fusion
performance for points consisting of all buildings with the surrounding environment, which was subgroup
6 in Table 2 and row number 5 in Table 3. Therefore, there are ten lines in Figure 10 marked with various
colors.
The y-axis in Figure 10 (a) represents the SAV values, and the y-axis in Figure 10 (b) represents
the EMD values. The x-axis in Figure 10 (a) and (b) both represents the parameter evaluations. There are
six situations corresponding to six sub-headings in Table 2 and Table 3 on x-axes representing distance
thresholds between target points and projected points. “N” meant that there was no threshold. As the
projected points found their nearest target points, the thermal information fused with the target points’ RGB
information. The values, emphasized as dots, on these 10 lines refer to the corresponding values in Table 2
and Table 3. The solid lines crossing the dots show a downward trend from a no-distance threshold to a
distance threshold of 0.1 meters. Dashed regression lines for the relative experiment fit the first four dots
counting from left to right, showing a downward trend from a no-distance threshold to a distance threshold
of 0.3 meters.
47
(a) (b)
Figure 10 Comparison between experiments conducted on campus and in city areas
(1) Both SAV and EMD values decrease as the distance thresholds tightened. Figure 10 (a) and (b)
show a similar downward trend for all 10 lines. In particular, the distance threshold of 0.3 marked a turning
point. In other words, lines after the distance of 0.3 dropped suddenly. This trend demonstrates that the data
fusion performance improved as the threshold narrowed, since the SAV and EMD values were smaller.
This downward trend also demonstrates that target points received less perturbation, so the points’ SD
values were more stable. However, if the thresholds were too small, some target points might have received
less thermal information from thermal images for fusion, which made finalized thermal values (FTVs) of
the points inaccurate. Therefore, the selection of an appropriate distance threshold was necessary for
improving the performance of the data fusion approach. As shown in Figure 10, the slope of lines before
0.3 is gentler than the slope of lines after 0.3; as such, 0.3 was the turning point and considered the
appropriate threshold for the distance between target points and projected points.
(2) As described, each experiment contained two lines representing two cases. One case
represented the data fusion performance of points consisting of all buildings, while the other case
48
represented the same information of all buildings and with their surrounding environments. Lines for the
latter case were found to always be below lines for the former case for all experiments. As the visualization
of points’ SD values show in Figure 9, the points belonging to the environment subgroup are darker than
points belonging to the building subgroup. Darker points represent smaller SD values and indeed smaller
errors. This phenomenon demonstrates that categories such as roads and grass in the environment subgroup
performed better for data fusion. The vertical lines in Figure 7 (c), (f), (i), and (l) also confirm this
phenomenon. Vertical lines representing the SAV of points in the environment subgroup are always closer
to the y-axis than the other vertical lines representing the SAV of points in the building subgroups. Thus,
as the points belonging to the environment subgroup were added, the SAV and EMD values reduced. The
potential reason the data fusion performance was better in the environment subgroup was due to the shapes
and characteristics of the road and grass category not varying, as opposed to windows and doors, and thus
the subgroups’ SD values tended to be stable.
Two lines for “all buildings with surrounding environment” and “all buildings” are closer to each
other for the experiment “City_45°_Mesh_60m” (lighter and darker green lines) than for other experiments
conducted on the campus area (see Figure 10). A potential explanation is that the proportion of points
belonging to the surrounding environment subgroup is greater than the proportion of points belonging to
the building subgroups in all campus experiments. To be precise, the shadow area of the surrounding
environment subgroup was obviously larger than the shadow areas of other building subgroups in Figure 7
(b), (e), (h), and (k). In contrast, the total number of points belonging to the surrounding environment
subgroup and the building subgroups in the city experiment were balanced as shown in the second column
in Figure 8. Thus, even as the points of the surrounding environment subgroup were added, two lines for
“all buildings with surrounding environment” and “all buildings” were equivalent in the experiment
“City_45°_Mesh_60m,” however, sufficient gaps existed between two lines in the experiments conducted
on the campus area.
(3) The experiments compared in Figure 10 were based on four aspects of flight altitudes, camera
angles, flight paths, and building style comparisons. First, higher altitudes could cause a loss in data fusion
49
performance, demonstrated by the results of the experiment “Campuse_45°_Mesh_60m,” which show
higher SAV and EMD values when compared to the experiment “Campuse_45°_Mesh_35m” (see Figure
10). A potential reason is that high-flight altitude made the distance between cameras and objects farther
apart, and therefore the ability of thermal information capturing declined as well. Second, regarding the
camera angles, 45-degree angles performed worse than 30-degree camera angles in the data fusion
performance, since “Campuse_45°_Mesh_60m” have higher SAV and EMD values than the experiment
“Campuse_30°_Mesh_60m” (see Figure 10). A potential explanation is that the 45-degree camera angle
captured more details on the façade areas than the 30-degree camera angle. Façade components are more
complex than roof components. Therefore, the SD values are not stable for the experiment
“Campuse_45°_Mesh_60m.” The visualizations of SD values also validated the performance of the data
fusion approach on the building façades (see Figure 9 (c) and (d)). The SD values of points belonging to
facades are higher in the “Campuse_45°_Mesh_60m” experiment than in the “Campuse_30°_Mesh_60m”
experiment. Furthermore, the SD value distribution in the experiment “Campuse_45°_Mesh_60m” is more
scattered, while the distribution in the experiment “Campuse_30°_Mesh_60m” is more concentrated (see
Figure 7 (d) and (g)). Additionally, the distribution in the experiment “Campuse_30°_Mesh_60m” is closer
to the vertical axis than the distribution in the experiment “Campuse_45°_Mesh_60m.” Third, regarding
flight paths, the Y path performed worse than the cross-mesh flight path in data fusion, since
“Campuse_30°_Y_60m” had higher SAV and EMD values than the experiment
“Campuse_30°_Mesh_60m” (see Figure 10). In addition, a scattered distribution in Figure 7 (j) for the
experiment “Campuse_30°_Y_60m” is also observed, but distribution for the experiment
“Campuse_30°_Mesh_60m” was more centered (see Figure 7 (g)). Fourth, regarding building style
comparison, the data fusion approach had a better result when mapping a city area than a campus area, since
“Campuse_45°_Mesh_60m” had higher SAV and EMD values than the experiment “City_45°_Mesh_60m”
(see Figure 10). Also, distribution for the experiment conducted in the city area, as shown in Figure 8 (a) is
more concentrated than distribution for the experiment conducted in the campus area (see Figure 7 (d)).
50
(4) Figure 10 (a) shows that the SAVs in all experiments were around 3 °C to 5 °C, which means
the overall accuracy of the data fusion approach was between 3 °C and 5 °C. The SAVs for the
“Campuse_30°_Mesh_60m” experiment is the lowest one in this study (see Figure 7 (i)). Accuracy of the
data fusion approach is around 2.8 to 3.4°C. The accuracy of my approach is not satisfactory as Lin et al.
2019’s studies in which a data fusion approach was implemented for building façades [90], but my approach
maps a larger survey area rather than just building facades. Their research only measured eight selected
check points and not the whole facade, and those points were all the facades’ lower sides, although the
accuracy of their approach was around 1.5 °C.
4.5 Conclusions and Future Work
Fusing thermal information and RGB information could help homeowners and city managers
identify potential areas that need retrofitting and potentially improve building energy simulations. However,
existing building energy research has only focused on single buildings. In this study, a point data fusion
method was proposed and tested under varying conditions with four different test factors. These factors
included altitude, camera angle, flight path, and building style comparisons in a survey area to generate a
better 3D thermal mapping model. The goal of this study is not only to understand the performance of the
proposed approach under different conditions, but also to provide suggestions for data collection using this
approach to reconstruct better thermal mapping models.
There are several important conclusions to be drawn from this study. First, the research validates
that a high-flight altitude can introduce more errors than a low-flight altitude. The camera location was
much further from the buildings when using a high-flight altitude. Therefore, when projecting pixels from
thermal images onto the 3D model, it was more difficult for projected points to locate the nearest target
points. Second, the 45-degree experiment performed worse than the 30-degree experiment in this data
fusion approach. A 45-degree camera angle more easily captured the façade information than the 30-degree
camera angle. However, the façade components are more complex than other components. Thus, the data
fusion approach was worse for the experiment with the 45-degree camera angle. Third, the Y flight path
51
performed worse than the mesh grid flight path in the point data fusion approach. Fourth, the data fusion
approach performed better when surveying traditional European buildings in a city area than surveying
modern buildings on a campus. Fifth, the distance of 0.3 meters is an appropriate threshold of distance when
conducting the data fusion approach. If the distance between a projected point and a target point is greater
than 0.3 meters, the projected point should be discarded, instead of using it to fuse its thermal information
with the target point’s RGB information. Finally, sixth, the overall accuracy of the data fusion approach
was between 3 °C and 5 °C.
There were also some limitations in this study. Due to the complexity and cost of the experiments,
the drones can only be flown for limited periods of time. Fortunately, current virtual environments allow
the simulation of 3D models in Unity3D and other game engines. In future study, campus and city
environments can be digitalized into a virtual environment, and a virtual camera could be used as an
artificial drone camera to collect synthetic data for continuous factor investigation.
52
Chapter 5. Semantic Segmentation on Energy Audit Related Building Components and
Object Information Extraction Framework
5.1 Motivation and Introduction:
Understanding the source of energy loss in the building is needed for increasing the efficiency of
the building energy usage. Mounting cameras and multiple sensors on UAS as I introduced in previous
sections can improve the efficiency. This solution shows that data collection time can be shortened, and
camera views can be broadened [117]. However, the thermal information from other objects, like cars and
equipment will also be captured, which is not heat loss that are focused on. To better detect the source of
energy loss, differentiating different components from images is needed so that researchers can determine
if the energy loss is a building energy waste [3]. To differentiate components semantic information from
images, also known as semantic segmentation, many computer vision algorithms, especially deep learning
approaches, have been designed, such as MaskRCNN [118], Yolo family [119], and DeppLab family [120].
The semantic segmentation algorithms allow for object detection in a pixel level, which means each pixel
can be classified in the images into categories researchers are interested or not.
Traditional semantic segmentation task is mostly based on visible light imagery input, RGB images,
which makes the task intrinsically challenging in certain occasions [121][122][123]. For example, it is a
difficult task to precisely distinguish objects with similar color. To overcome this intrinsic limitation of
RGB data and further improve the performance of semantic segmentation, other forms of measurement can
be added in addition to RGB images to be used as the input of tasks, such as, depth and thermal. Different
from visible light imaging, thermal imaging cameras can see objects that generates heat under various
lighting conditions. Therefore, adding thermal information may help some applications that needs high
segmentation precision in certain occasions [124][125].
In my semantic segmentation task, I focused on differentiating components from which energy loss
was important to monitor, such as façade and roof, and from which heat loss was not what researchers
interested in but might interface the research on energy audits, such as cars and equipment. It is necessary
53
to detect where the heat loss come from. Therefore, the five categories, façade, roof, cars, equipment, and
background were the classes that are classified in this research.
This study is designed to answer the following questions: (1) How do RGB images influence the
performance of semantic segmentation? (2) How is thermal combined with RGB information to improve
or deteriorate the performance of semantic segmentation?
5.2 Research Methods
5.2.1 Data Preprocessing
The images were taken from the FLIR Duo Pro R, which was mounted on DJI Matrix 600 as
introduced in previous sections. To explore the performance of using different algorithms and to understand
the performance of my data fusion strategies, I applied input fusion on different semantic segmentation
frameworks, Mask R-CNN, Pyramid Scene Parsing network (PSPNet), and DeepLab V3+.
5.2.2 Implementation on Mask RCNN
Figure 11 (a) illustrates Mask RCNN algorithm implemented only on RGB imagery datasets.
Formally, the task is defined as follows: given a set 𝑋𝑋 containing input images 𝑥𝑥 𝑖𝑖 ∈ R
H ∗ W ∗ C
with image
height 𝐻𝐻 , width 𝑊𝑊 , and channels 𝐶𝐶 (in this case, 𝐻𝐻 = 512, 𝑊𝑊 = 512, and 𝐶𝐶 =3); and a corresponding
annotation set 𝑌𝑌 that contains bounding boxes 𝑦𝑦 𝑖𝑖 , 𝑏𝑏𝑏𝑏𝑏𝑏
∈ 𝑅𝑅 𝑁𝑁 ∗ 4
, where number 4 represents the coordinates
of a bounding box’ four corners, class labels 𝑦𝑦 𝑖𝑖 , 𝑐𝑐 𝑖𝑖 𝑡𝑡 ∈ 𝑅𝑅 𝑁𝑁 , and masks 𝑦𝑦 𝑖𝑖 , 𝑏𝑏𝑏𝑏𝑏𝑏
∈ 𝑅𝑅 𝑁𝑁 ∗ 𝐻𝐻 ∗ 𝑊𝑊 , where 𝑁𝑁 is the
number of annotated objects in the given image. I represent the learning mapping as 𝐹𝐹 : 𝑋𝑋 → 𝑌𝑌 , where 𝐹𝐹
denotes a neural network.
Figure 11 (b) illustrates the Input Fusion approach on Mask R-CNN. Input fusion represents
brutally concatenating the thermal channel onto the end of the RGB channels, which does not change the
architecture of the algorithms. First, thermal channel was directly concatenated after RGB channels
integrating as a matrix of 𝑥𝑥 𝑖𝑖 ∈ R
H ∗ W ∗ C
. In this case, 𝐻𝐻 = 512, 𝑊𝑊 = 512, and 𝐶𝐶 = 4. Second, the feature
extraction block, ResNet 50, proposes a certain number of regional feature maps from this matrix. Next,
54
the rest steps are the same with the traditional Mask R-CNN approach.
In both Figure 11 (a) and (b), Mask RCNN includes two stages. First, it uses a Region Proposal
Network (RPN) to propose candidate regions of interest (ROI). Second, it uses a convolutional backbone
to extract features that are used for neural network training. To compare the different semantic segmentation
algorithms’ performances, the feature extraction blocks are all ResNet 50 in Mask R-CNN and the rest
algorithms. Mask R-CNN uses a multi-task loss on ROI. The loss function is defined as 𝐿𝐿 = 𝐿𝐿 𝑐𝑐 𝑖𝑖 𝑡𝑡 + 𝐿𝐿 𝑏𝑏𝑏𝑏𝑏𝑏
+
𝐿𝐿 𝑚𝑚 𝑖𝑖𝑡𝑡𝑘𝑘
. 𝐿𝐿 𝑐𝑐 𝑖𝑖 𝑡𝑡 is cross-entropy loss across all 5 classes plus background. 𝐿𝐿 𝑏𝑏𝑏𝑏𝑏𝑏
is the bounding box regression
over predicted box corners. Last, 𝐿𝐿 𝑚𝑚 𝑖𝑖𝑡𝑡𝑘𝑘
is average binary cross-entropy loss across pixels in the mask.
Figure 11 Illustration of fusion approach on Mask R-CNN
5.2.3 Implementation on PSPNet
The second algorithm I used was PSPNet [126], as shown in Figure 12. PSPNet has a different
neural network framework from Mask RCNN. It does not propose ROIs and bounding boxes. It directly
implements neuron regression training on feature maps extracted by Resnet-50 in its first stage. Second, it
implements a pyramid pooling module in its second stage. In the last stage, the pooling layers are up
55
sampled and concatenated to former feature maps to generate final feature representation. The feature
representation is later fed into a convolution layer to get the final pixel-wised semantic prediction.
Figure 12 (a) represents PSPNet algorithm implemented only on RGB imagery datasets. Figure 12
(b) illustrates the Input Fusion approach on RGB plus thermal datasets using PSPNet. Loss functions
include auxiliary loss and master branch loss in this algorithm. Auxiliary loss is added to RestNet and it
allows to optimize the learning process, while master branch loss is responsible for the whole network.
PSPNet adds weights to these two loss functions, including 0.4 for auxiliary loss, and 0.6 for master branch
loss.
Figure 12 Illustration of fusion approach on PSPNet
5.2.4 Implementation on DeepLab V3+
The third algorithm is DeepLab family [120][127][128]. I used the latest version, DeepLab V3+ in
its family. DeepLab V3+, like other DeepLab algorithms, also used Pyramid Pooling method used in
PSPNet. However, different from PSPNet, DeepLab family used an innovative pooling structure called
56
atrous spatial pyramid pooling (ASPP). This pooling structure can capture multi-scale information by
adjusting filter’s field-of-view. Unlike using traditional pooling structure, it also considers the hidden
relationship between disconnected pixels in the imagery datasets, shown in Figure 13. Additionally, the
several parallel ASPP convolution layers that DeepLab V3+ uses have different rates from pooling layers
that are used in PSPNet.
In deepLab family, the biggest difference of the latest DeepLab version is that DeepLab V3+
extends DeepLab V3 by employing an encoder-decoder structure. This structure has been demonstrated
that it can fasten computations for feature extraction because it does not enlarge the neural network, and it
can also recover sharp segmentation boundaries in the decoder path.
Figure 13 (a) illustrates the algorithm that is implemented on datasets only with RGB images. I
used ResNet-50 both on RGB images and thermal images for performance comparison, then overlapped
extracted features for the rest steps which were similar to typical DeepLab V3 + approach. Figure 13 (b)
illustrates the Input Fusion approach on DeepLab V3+. First, I simply concatenate thermal channel after
RGB channels obtaining a 4-channel input data. To guarantee the performance while using an acceptable
computational capacity, I used image size of 512 X 512 that is commonly used in other applications. Second,
the integrated images were put into the typical DeepLabV3 network, noting that only the first input layer
was adjusted from 3 to 4 for (sake of) conveniently processing input 4-channel images.
Loss function of DeepLab family considers the sum of cross-entropy terms for each spatial position
in their networks, and it assigns equal weight for each term.
57
Figure 13 Illustration of fusion approach on DeepLab V3
58
5.2.5 Common Configurations (Hyper-parameters) for Performance Comparison
To fairly compare the performance of different semantic segmentation algorithms, I need to contain
some control variables. First, the size of input RGB and thermal imagery datasets are all 512 x 512. The
number of training dataset is 4,190 and testing is 1,000, which meets a ratio requirement of 8:2 between
training and testing datasets. Table 4 shows the number of instances in training and testing datasets. It shows
that the ratios of number of different instances are similar between training and testing datasets.
Additionally, numbers of roof, façades, and roof equipment instances are greater than number of other
instances. Second, backbone feature extraction networks used in all these three tested segmentation
algorithms are all set to ResNet-50. In order to reduce the training time and improve the accuracy, I used a
finetuning method in which a pretrained ResNet-50 model is used as initialization for a new model being
trained on my dataset. Therefore, the same ResNet-50 model is used in each algorithm for a fair comparison.
Third, the training configuration settings are also the same for each algorithm, where data batch size is 2,
iteration is 5,000 per epoch, and maximum epoch is 200. Fourth, all algorithms used polynomial learning
rate, which means the learning rate at the beginning is 0.01, and rate at the end is 0.0001 reducing with a
fixed rate. Last, the same GPU, NVIDIA Tesla P100 is used for training the model.
Table 4 Number of instances
Category Roofs Cars Facades
Ground
equipment
Roof
equipment
Total
Number of
instances in
training datasets
10,170
(27.5%)
3,135
(8.5%)
9,177
(24.8%)
3,651
(9.9%)
10,798
(29.2%)
3,6931
Number of
instances in testing
datasets
2,448
(27.5%)
804
(9.0%)
2,177
(24.4%)
880
(9.9%)
2,606
(29.2%)
8,915
5.3 Case Studies and Results
5.3.1 Performance Evaluation
There are different evaluation approaches including precision (Equation 13), recall (Equation 14),
Jaccard/intersection-over-union (IOU) (Equation 15), Accuracy (ACC) (Equation 16) and F1 score
(Equation 17). In these equations, true positive (TP) represents the area of overlap between the predicted
59
segmentation and the ground truth in the images, and true negative (TN) represents the areas that belong to
the class, but the algorithms incorrectly recognize them do not. False positive (FP) represents the areas that
belong to the correct class but that the algorithms cannot recognize, and false negative (FN) represents the
areas that do not belong to the correct class, but that the algorithms incorrectly recognize them do. Using
TP, TN, FP, and FN, I can calculate the evaluation metrics. Precision, also known as positive predictive
value, is the fraction of the correctly classified area among the actual result area in the ground truth images.
Recall, also called sensitivity, is the fraction of the correctly classified pixel area among the predicted result
area in the predicted images. Accuracy (ACC) is easy to understand. It simply calculates the ratio between
correctly predicted areas and the whole areas of an image. However, accuracy sometimes is not robust
enough to evaluate the algorithm’s performance. Therefore, I introduce IOU, the fraction of the correctly
classified pixel area among the union areas of the actual result areas and predicted result areas. Last, F1 is
a harmonic mean that combines precision and recall score.
𝑃𝑃𝑃𝑃 𝑃𝑃 𝛼𝛼 𝑠𝑠𝑠𝑠 𝑠𝑠𝛼𝛼𝑠𝑠 =
𝑇𝑇 𝑃𝑃
𝑇𝑇 𝑃𝑃 + 𝐹𝐹 𝑃𝑃
Equation 13
𝑅𝑅 𝑃𝑃 𝛼𝛼 𝑅𝑅 𝑙𝑙𝑙𝑙 =
𝑇𝑇 𝑃𝑃
𝑇𝑇 𝑃𝑃 + 𝐹𝐹 𝑁𝑁
Equation 14
𝐼𝐼 𝐼𝐼𝐼𝐼 =
𝑇𝑇 𝑃𝑃 𝑇𝑇 𝑃𝑃 + 𝐹𝐹 𝑃𝑃 + 𝐹𝐹 𝑁𝑁 =
𝐴𝐴 𝑃𝑃 𝑃𝑃 𝑅𝑅 ( 𝑝𝑝 𝑃𝑃 𝑃𝑃 𝑑𝑑 𝑠𝑠 𝛼𝛼 𝑡𝑡 𝑃𝑃 𝑑𝑑 ⋂𝑡𝑡 𝑃𝑃 𝑡𝑡 𝑃𝑃 )
𝐴𝐴 𝑃𝑃 𝑃𝑃 𝑅𝑅 ( 𝑝𝑝 𝑃𝑃 𝑃𝑃 𝑑𝑑 𝑠𝑠 𝛼𝛼 𝑡𝑡 𝑃𝑃 𝑑𝑑 ⋃𝑡𝑡 𝑃𝑃 𝑡𝑡 𝑃𝑃 )
Equation 15
𝐴𝐴 𝛼𝛼𝛼𝛼 𝑡𝑡 𝑃𝑃 𝑅𝑅𝛼𝛼 𝑦𝑦 ( 𝐴𝐴 𝐶𝐶𝐶𝐶 ) =
𝑇𝑇 𝑃𝑃 + 𝑇𝑇 𝑁𝑁 𝑇𝑇 𝑃𝑃 + 𝑇𝑇 𝑁𝑁 + 𝐹𝐹 𝑃𝑃 + 𝐹𝐹 𝑁𝑁
Equation 16
60
𝐹𝐹 1 =
2𝑇𝑇 𝑃𝑃 2𝑇𝑇 𝑃𝑃 + 𝐹𝐹 𝑃𝑃 + 𝐹𝐹 𝑁𝑁 = 2 ∗
𝑃𝑃𝑃𝑃 𝑃𝑃 𝛼𝛼 𝑠𝑠𝑠𝑠 𝑠𝑠𝛼𝛼𝑠𝑠 ∗ 𝑅𝑅 𝑃𝑃 𝛼𝛼 𝑅𝑅 𝑙𝑙𝑙𝑙 𝑃𝑃𝑃𝑃 𝑃𝑃 𝛼𝛼 𝑠𝑠𝑠𝑠 𝑠𝑠𝛼𝛼𝑠𝑠 + 𝑅𝑅 𝑃𝑃 𝛼𝛼 𝑅𝑅 𝑙𝑙𝑙𝑙
Equation 17
5.3.2 Evaluation for PSPNet and DeepLab V3+
In this study, I used Precision, Recall, IoU, and F1 scores to evaluate the performance of PSPNet
and DeepLab V3+ algorithms. The performances are summarized in Table 1. I also plotted the trendline of
these performance evaluations along with all 200 epochs, shown in Figure 14. X-axes are epochs, and Y-
axes are values of evaluation methods. The scattered dots represent evaluation values in corresponding
epoch. The trendlines are drawn based on the patterns of these dots using locally weighted scatterplot
smoothing method.
Table 5 Performance evaluation for PSPNet and Deeplab V3+
PSPNet_RGB
Roofs Facades Roof_equipment Cars
Ground_
equipment
Average of
all classes
F1 0.973 0.941 0.848 0.906 0.635 0.861
IoU 0.947 0.888 0.736 0.828 0.466 0.773
Precision 0.967 0.952 0.855 0.915 0.736 0.885
Recall 0.978 0.93 0.841 0.897 0.559 0.841
PSPNet_RGB+Thermal
Roofs Facades Roof_equipment Cars
Ground_
equipment
Average of
all classes
F1 0.973 0.940 0.842 0.903 0.636 0.859
IoU 0.948 0.887 0.727 0.823 0.466 0.770
Precision 0.967 0.942 0.861 0.922 0.765 0.891
Recall 0.980 0.939 0.823 0.885 0.544 0.834
DeepLabV3+_RGB
Roofs Facades Roof_equipment Cars
Ground_
equipment
Average of
all classes
F1 0.974 0.943 0.852 0.913 0.677 0.872
IoU 0.949 0.892 0.743 0.841 0.512 0.787
Precision 0.970 0.950 0.850 0.922 0.766 0.892
Recall 0.978 0.935 0.855 0.905 0.607 0.856
DeepLabV3+_RGB+Thermal
Roofs Facades Roof_equipment Cars
Ground_
equipment
Average of
all classes
F1 0.975 0.946 0.861 0.912 0.709 0.881
IoU 0.952 0.897 0.755 0.838 0.549 0.798
Precision 0.973 0.946 0.862 0.920 0.804 0.901
Recall 0.978 0.946 0.859 0.904 0.634 0.864
61
Figure 14 Trendline of evaluations on PSPNet and DeepLabV3+
Both PSPNet and DeepLabV3+ that performed on RGB only datasets and RGB fused thermal
datasets have the similar performance based on different object classes, as the scores shown in Table 5.
According to Table 5, F1, IoU, Precision, and Recall all demonstrate that segmentation algorithms
performed relatively better on roof, façade, and car classes, especially, the IoU score is as high as 95.2%
for roof equipment. However, these algorithms performed relatively worse on roof equipment and ground
equipment classes, especially the IoU score is as low as around 46.6% for ground equipment.
Next, lets analyze the influence of the thermal channel. First, for PSPNet, fusing RGB with thermal
channel increased recall values for roof and façade classes, it increased precision values for roof equipment,
cars, and ground equipment classes. For other evaluation values, thermal channels did not obviously
62
increase the performance. Second, For DeepLab V3+, almost every object classes’ segmentation
performances are increased, only except precision of façade and cars classes. Third, compared with PSPNet,
DeepLab V3+ performed better in every object classes. Additionally, the thermal channel, DeepLab V3+
performed the best among all situations.
According to Figure 14, the trendline of the performance with average F1 score, IoU, Precesion,
and Recall can be illustrated. It shows that DeepLab V3+ algorithm performed worse than PSPNet in the
beginning, but later the performance of DeepLab V3+ exceed performance of PSPNet, especially DeepLab
V3+ with RGB datasets that are fused with thermal channels. Additionally, Figure 14 also demonstrates
that thermal channels can improve the segmentation performance, and it works better with DeepLab V3+
algorithm.
5.3.3 Evaluation for MaskRCNN
Instance segmentation can provide multiple predictions for one class, so it sometimes is difficult to
match the prediction and ground truth. Therefore, I need to setup an IoU threshold to check the match
between the prediction and ground truth. Since MaskRCNN is an instance segmentation algorithm, its
performance evaluation is different from other algorithms. Table 6 presents MaskRCNN’s performance.
First, I need to determine which predicted bounding boxes correspond to correct predictions, so I used IoU
to measure the predicted and ground truth bounding boxes as in Equation 15. For a given IoU threshold,
predicted bounding boxes that have an IoU with an annotated object class’ bounding box above the
threshold are considered true positives, but other annotated classes that do not satisfy this requirement are
considered false negatives. As Equation 14 shown, I calculated precision for predicted bounding boxes and
segmentations that meet the IoU threshold requirement. Table 6 shows precision values in various situations,
such as precision values for individual class, IoU threshold greater than 0.5, IoU threshold greater than 0.75,
and object class with small, medium, and large areas. An area of small, medium, and large corresponds to
objects of area less than 32
2
, between 32
2
and 96
2
, and greater than 96
2
pixels, respectively.
63
Table 6 Performance evaluation for MaskRCNN
Mask RCNN-BBOX-RGB
Roofs
Facade
s
Roof_
equip
ment
Cars
Ground
_equipm
ent
Total
Average
Precision
@
IoU>=0.5
Precision
@
IoU>=0.75
AP-
sma
ll
AP-
med
ium
AP-
larg
e
Precisio
n value
84.71
4
82.961 80.275 89.841 76.471 82.852 92.369 89.072
64.0
76
84.0
25
90.
789
Mask RCNN-Segmentation-RGB
Roofs
Facade
s
Roof_
equip
ment
Cars
Ground
_equipm
ent
Total
Average
Precision
@
IoU>=0.5
Precision
@
IoU>=0.75
AP-
sma
ll
AP-
med
ium
AP-
larg
e
AP
value
81.71
5
75.776 73.597 85.349 70.629 77.413 92.173 87.463
58.7
17
79.3
77
85.
107
Mask RCNN-BBOXRGB+Thermal
Roofs
Facade
s
Roof_
equip
ment
Cars
Ground
_equipm
ent
Total
Average
Precision
@
IoU>=0.5
Precision
@
IoU>=0.75
AP-
sma
ll
AP-
med
ium
AP-
larg
e
AP
value
84.92
2
83.034 79.414 88.750 76.030 82.430 92.413 89.351
63.2
02
84.1
50
90.
151
Mask RCNN-Segmentation-RGB+Thermal
Roofs
Facade
s
Roof_
equip
ment
Cars
Ground
_equipm
ent
Total
Average
Precision
@
IoU>=0.5
Precision
@
IoU>=0.75
AP-
sma
ll
AP-
med
ium
AP-
larg
e
AP
value
81.61
5
75.951 73.289 84.678 71.373 77.381 92.491 87.344
58.5
23
79.0
90
85.
874
I also plotted the trendline of precision values for MaskRCNN, as shown in Figure 15. Table 3 and
Figure 15 show that fusing RGB with thermal channel increased precision values for roof, façade, and
ground equipment objects.
64
Figure 15 Trendline of evaluations on MaskRCNN
65
5.4 Discussion
To analyze the performance of the algorithms, I illustrate successful cases in Figure 16. In case one,
these algorithms implemented on different datasets have good performance, however, algorithms cannot
separate two roof equipment instances that are close to each as the arrow indicates in the figure. Only
MaskRCNN can separate these two roof equipment instances. Another important observation is that roof
pixels around roof equipment are not detected in this case. In case two, DeepLab V3 mistakenly detects
ground equipment as roof equipment. The potential reason is that some equipment on ground is similar to
equipment on roofs. In case three, these algorithms all wrongly detect the façade instance on top of roof
instance. Algorithms implemented on datasets with thermal channels perform better, but these algorithms
still need to be improved. The fourth and fifth cases also illustrate that thermal image can help detect and
separate roof, roof equipment, and ground equipment instances. For example, PSPNet detects two ground
equipment instances from datasets that have thermal channels. Additionally, there is a small roof instance
(false positive) wrongly detected by PSPNet implemented on RGB only dataset. PSPNet implemented on
RGB plus thermal datasets does not detect that instance, and Deeplab V3+ and MaskRCNN do not detect
that false positive instance either.
66
67
Figure 16 Object instances detected by algorithms
5.5 Conclusion and Future Studies
There are several important conclusions can be drawn from this study. First, adding thermal
channels allows for improving the semantic segmentation performance. Practically, DeepLab V3+ can
improve the segmentation most with assistant of thermal channels. Second, thermal channel performs
differently for different objects. Its performance also differs when using different algorithms. For semantic
segmentation algorithms, such as PSPNet and DeepLab V3+, adding thermal channels can increase
precision for segmenting roof equipment, cars, and ground equipment objects, but it does not increase
precision for segmenting roof and façade. As for instance segmentation, adding thermal channels can
68
increase precision for segmenting roof, façade, and ground equipment objects, but it did not obviously
increase the performance of segmenting cars and roof equipment objects. Third, MaskRCNN, as an instance
segmentation algorithm, can individually predict small objects such as roof equipment and ground
equipment. It does not predict the same object as a whole group of pixels in a picture like semantic
segmentation algorithms. The benefit of using instance segmentation algorithm is that it can help energy
auditors distinguish different roof objects, which allows auditors index heating loss more conveniently.
Fourth, as for time complexity, PSPNet and DeepLab V3+ exceed MaskRCNN, since these two semantic
segmentation approaches do not need to propose RoI like MaskRCNN, and their network is simpler than
MaskRCNN’s network.
There are some drawbacks in this study. For example, according to Table 4, the numbers of
instances for roof equipment and ground equipment are smaller than other objects in my datasets. This
imbalance might cause the inaccuracy for segmenting roof equipment and ground equipment objects. I need
to balance the ratio of different objects in the dataset. Second, the size of the dataset should be enlarged
because of the data hunger issues. There are insufficient open-source data shared between civil engineering
projects for energy audits using thermal images. However, object segmentation tasks need a big size of
dataset to improve the segmentation accuracy. I plan to focus on synthetic thermal imagery data to enlarge
the dataset size.
In the future, I plan to improve my studies on these fields. First, I plan to investigate the semantic
segmentation on 3D models. As I have discovered, instance segmentation allows researchers index objects,
but researchers still cannot relate heat loss with building’s locations. Directly segmenting objects from 3D
models can provide an alternative approach. Second, I only explored input fusion approach in this study. I
plan to implement other fusion approach for segmentation.
69
Chapter 6. A Novel Building Temperature Simulation Approach Driven by Expanding
Semantic Segmentation Training Datasets with Synthetic Aerial Thermal Images
6.1 Introduction
The photogrammetric technology, which maps images acquired by drones onto a 3D model,
provides analytics, for example, distance and dimension measurement. Integrated with other tools and
applications, a photogrammetry-recreated 3D model can detect not only structural damage but also heat
loss from buildings and district heating networks. Such 3D model can also be used to locate roads and
classify their materials to precisely calculate driving time for route planning. All these examples emphasize
on the need of extracting semantic information from photogrammetry-recreated models.
To extract semantic information, also known as semantic segmentation, from images or
photogrammetric models, many computer vision algorithms, especially deep learning approaches, have
been applied, such as MaskRCNN [118], PSPNet [119], and DeppLab family [120] as used in Chapter 5.
Early studies used images or 3D models with only one channel (mostly the color channel, also known as
the RGB channel obtained by an image sensor). However, segmentation based on single sensor images is
insufficient when facing complex scenarios; thus, for more accurate classification and segmentation,
researchers have added more channels and features to RGB images [102]. For example, Chen et al. [129]
added texture, point density, local surface, and open source features while Liu et al. [130] added depth
information to improve photogrammetric point cloud segmentation. I also implemented algorithms to
improve segmentation by adding thermal information as described in Chapter 5 [102][104].
Despite the great success achieved by the previously described studies, deep learning algorithms
are quite data hungry as demonstrated in many studies [32][33]. Data hunger refers to the size of the training
dataset required for generating a model with a good predictive accuracy [34]. For example, there were 5193
images used to train the semantic segmentation model. If there were no pretrained parameters, the accuracy
might not be satisfied as described in Chapter 5. It is difficult for individual research groups to expand the
training datasets because researchers are often unwilling to share data, or their data formats are
70
incompatible. Therefore, researchers are forced to collect more data on their own. However, collecting data
usually takes several days for a large district and is labor-intensive, costly and inefficient [131].
Additionally, annotating these new acquired training datasets also requires many hours of labor and
inspection for annotation accuracy. For instance, datasets used in Chapter 5 required about 5 students and
6 months to manually code ground truth. In order to solve the data hunger problem, some researchers have
used synthetic data. For example, Chen et al. [132] designed a framework to generate synthetic images from
a 3D virtual environment. They simulated drone flight paths over the synthetic virtual environment that had
annotated information such as the ground, buildings, and trees to render synthetic images with
corresponding annotations. In their framework, depth images, which can be obtained by Lidar, and RGB
images, which can be obtained by color cameras, in the real world were instead generated virtually. Data
hunger also occurs with images that fuse RGB with thermal information. For example, Li et al. [133] used
thermal images to segment pedestrians, cars, tables, lamps and other objects taken outdoors and indoors on
the ground. This takes advantage of the thermal camera’s ability to capture information in dark and hazy
environments. They also introduced synthetic thermal images to improve segmentation. Inspired by Li’s
studies, I planned to use thermal information to improve the segmentation of aerial images of buildings
outdoor scenes because it would allow me to capture different thermal signatures of each part of the building
and its surroundings. [134][100][32].
There are several approaches to simulate thermal information. For instance, physics-based building
surface thermal simulation enables the precise quantification of energy fluxes and simulates the building
surface temperatures by using heat equations [135][136][137]. Many recent studies have used 3D models
to simulate heat transfers [138][139]; however, these studies are limited to their level of detail (LOD), and
accuracy and effectiveness are reduced [140][141][135][142]. To be precise, there is no surface temperature
simulation based on an as-is built model (the highest LOD model) due to the computational complexity and
inherent uncertainties caused by the many default parameters and assumptions used in a simulation process
[108]. Furthermore, physics-based simulation works only for buildings and not for the surrounding
environment.
71
Unlike the aforementioned approaches, the approach I designed focuses on simulating temperature
information for generating synthetic aerial thermal images. My approach learns features and extracts
information from historical data of drone-based images instead of a physics-based thermal simulation. My
approach avoids using default configurations when detailed system information such as building material
and users’ behaviors is not available. Furthermore, my approach is not limited to geometric models’ LOD;
on the contrary, current approaches depend on 3D models’ precision for accuracy. My approach implements
computer vision algorithms to translate RGB images acquired by drones over a large-scale area to thermal
images, which also enables them to be fused with RGB images for segmentation with multi-sensor data.
My study is designed to answer the following questions: (1) How can RGB images of buildings and
surrounding environments be used to generate thermal images? (2) How can training data of captured RGB
images affect simulation results? Particularly, how is the generation model established by one building style
used to generate thermal images with another? (3) What are the differences between the current approaches
and my proposed approach for generating thermal images? This study will only focus on thermal image
generation performance by evaluating the generated results compared to the ground truth. The performance
of deep learning using generated images will be evaluated in a future study.
6.2 Research Methods
6.2.1 Simulation Domains and Dataset
To easily detect the thermal image contrast of building envelopes, I collected data in the winter in
Karlsruhe, Germany, since the temperature difference between indoors and outdoors is obvious there in the
winter. The experiments were conducted in two outdoor scenes. One was on a college campus where
modern buildings, separated by lawn and roads, were not close in proximity in a suburban area. The other
one was in a dense city area in Germany where traditional European buildings were located close together.
The reasons for the selection of these two scenarios are that (1) heat island effect is more obvious in city
areas than in suburban areas; (2) architectural styles of buildings are different in city areas and in suburban
areas. Conducting experiments in both two areas allowed me to comprehensively explore my approaches.
72
In this study, I designed four experiments. Thermal and corresponding RGB images were taken
from two separated areas on the campus area for experiment one and two, as shown in Figure 17 (a). These
two experiments are abbreviated as “Camp1” and “Camp2”. Images were also taken from two separated
areas in city area for experiment three and four, as shown in Figure 17 (b) and abbreviated as “City1” and
“City2”.
(a) Two experiments on campus (b) Two experiments in a city center
Figure 17. Illustration of the experiment locations
6.2.2 As-built Building Envelope Thermal Image Rendering
The algorithms used in this study were based on Isola et al.’s previous work called “pixel2pixel”,
which is an image-to-image translation based on GAN and is not task specific. The network architecture
used inside of the algorithm is a fully convolutional network, U-net. The basic theory of image translation
used in this study is to directly convert RGB images to thermal images.
In order to keep the size of the dataset balanced, there were around 20,000 images in each
experiment. Each image had a resolution of 2048x2048 and was resized to 256x256 for algorithm
computation. It is common to divide datasets into training and testing sets. Training dataset usually accounts
for 70% (14,000 images) of the whole dataset, and testing dataset accounts for the rest. Therefore, datasets
in these four experiments were all divided into training and testing datasets, respectively. The ratio of
training to testing dataset is different from the ratio in Chapter 5 since I have efficient datasets for this
research.
73
In a commonly used neural network flowchart, training datasets can train and build a learning
model. The learning model learns rules and features from training datasets by comparing predicted results
with ground truth. After the model learns features by continuously adjusting its parameters, it can process
testing datasets with its updated parameters. In this study, datasets consisted of pairs of real captured RGB
and thermal images by cameras. The RGB images in proportionally separated training datasets were fed
into the initial GAN model. The model then converted RGB images to simulated thermal images, and
updated its inner parameters based on reducing the discrepancies between simulated thermal images and
captured thermal images to improve the simulation performance. After many rounds of updating parameters
(200 epochs in this study), the GAN model was ready to process RGB images in proportionally separated
testing datasets. Since I had four experiments as shown in Figure 17, I had four training datasets and four
testing datasets. In this study, I used a training dataset and built a GAN model to process a testing dataset
not only in the same experiment but also in a different experiment. The cross evaluation between every two
experiments allowed me to observe how the generation model established in one building style could be
used to generate thermal images on another. As examples shown in Figure 18, the GAN model converts the
RGB images, Figure 18 (a) and (d), to the simulated thermal images Figure 18 (b) and (e), and Figure 18
(c) and (f) are captured thermal images. Training datasets and testing datasets in Figure 18 are from the
same experiment, so simulated and captured thermal images look identical. Other cases with cross
evaluations are illustrated and discussed in the result section, and some discrepancies between simulated
and captured thermal images are observed.
(a) RGB Image on Campus Area
(b) Simulated Thermal Image on
Campus Area
(c) Captured Thermal Image on
Campus Area
74
(d) RGB Image in City Area
(e) Simulated Thermal Image in City
Area
(f) Captured Thermal Image in
City Area
Figure 18. Examples that explain thermal image rendering
6.2.3 Evaluation Metrics
The performance of each experiment and cross evaluation between every two experiments were
measured by comparing rendered thermal images (R) generated from RGB images by the GAN model with
real captured thermal images (C) by using image similarity evaluation criteria, mean squared error (MSE),
and structural similarity index (SSIM), as shown in Equation 18and Equation 19 [143]. As examples shown
in Figure 18 (b) and (e) are simulated thermal images generated from (a) and (d) respectively. (c) and (f)
are real captured thermal images.
𝐸𝐸𝑆𝑆 𝐸𝐸 ( 𝑅𝑅 , 𝐶𝐶 ) =
1
𝑚𝑚 𝑠𝑠 � �[ 𝑅𝑅 ( 𝑥𝑥 , 𝑦𝑦 ) − 𝐶𝐶 ( 𝑥𝑥 , 𝑦𝑦 )]
2
𝑛𝑛 − 1
𝑦𝑦 = 0
𝑚𝑚 − 1
𝑏𝑏 = 0
Equation 18
𝑆𝑆 𝑆𝑆 𝐼𝐼𝐸𝐸 ( 𝑅𝑅 , 𝐶𝐶 ) =
(2𝜇𝜇 𝑅𝑅 𝜇𝜇 𝐶𝐶 + 𝛼𝛼 1
)(2𝜎𝜎 𝑅𝑅𝐶𝐶 + 𝛼𝛼 2
)
( 𝜇𝜇 𝑅𝑅 2
+ 𝜇𝜇 𝐶𝐶 2
+ 𝛼𝛼 1
)( 𝜎𝜎 𝑅𝑅 2
+ 𝜎𝜎 𝐶𝐶 2
+ 𝛼𝛼 2
)
Equation 19
In Equation 18, R represents a rendered image and C represents a captured image. The resolutions
of those two images are both 256 pixels times 256 pixels. The character ( 𝑥𝑥 , 𝑦𝑦 ) represents the same
75
coordinate of pixels in both rendered and thermal images. The differences of every two relevant pixels in
two compared images are evaluated by squaring these differences, summing them up and dividing the sum
of squares by the total number of pixels (256x256) in the images. An MSE of value 0 shows that two
compared images are completely identical, and an MSE that is bigger than 0 indicates two compared images
are different. The bigger the MSE values are, the more differences the two compared images have, which
means that the generation model renders a rendered image with more errors compared to a captured image.
However, MSE is unable to agree with human subjective analysis [144]. Therefore, SSIM was selected as
a complimentary evaluation approach.
SSIM is used to compare the structural information of images. In Equation 19, R represents a
rendered image and C represents a captured image. Symbols 𝜇𝜇 𝑅𝑅 and 𝜎𝜎 𝑅𝑅 represent the mean value and
standard deviation value of pixels in a rendered image, as shown in Equation 20 and Equation 21, and 𝜇𝜇 𝐶𝐶 ,
𝜎𝜎 𝐶𝐶 are for a captured image. 𝜎𝜎 𝑅𝑅𝐶𝐶 represents the covariance of rendered images and captured images, as
shown in Equation 22. ( 𝑥𝑥 , 𝑦𝑦 ) indicates the same coordinate of pixels in the compared two images. In
Equation 22, 𝛼𝛼 1
and 𝛼𝛼 2
are constants used for the stability of the equation when 𝜇𝜇 and 𝜎𝜎 are extremely
small. The range of SSIM value is between -1 and 1, where 1 represents perfect identicality.
𝜇𝜇 𝑅𝑅 =
1
𝑚𝑚 𝑠𝑠 � � 𝑅𝑅 ( 𝑥𝑥 , 𝑦𝑦 )
𝑛𝑛 − 1
𝑦𝑦 = 0
𝑚𝑚 − 1
𝑏𝑏 = 0
Equation 20
𝜎𝜎 𝑅𝑅 = �
1
𝑚𝑚 𝑠𝑠 − 1
� �( 𝑅𝑅 ( 𝑥𝑥 , 𝑦𝑦 ) − 𝜇𝜇 𝑅𝑅 )
2
𝑛𝑛 − 1
𝑦𝑦 = 0
𝑚𝑚 − 1
𝑏𝑏 = 0
Equation 21
𝜎𝜎 𝑅𝑅𝐶𝐶 = �
1
𝑚𝑚 𝑠𝑠 − 1
� �( 𝑅𝑅 ( 𝑥𝑥 , 𝑦𝑦 ) − 𝜇𝜇 𝑅𝑅 )( 𝐶𝐶 ( 𝑥𝑥 , 𝑦𝑦 ) − 𝜇𝜇 𝑐𝑐 )
𝑛𝑛 − 1
𝑦𝑦 = 0
𝑚𝑚 − 1
𝑏𝑏 = 0
Equation 22
76
6.3 Results and Discussions
There were four experiments used in this study abbreviated as “Camp1”, “Camp2”, “City1”, and
“City2”. The evaluations were conducted on the testing datasets in the same experiment and between
different experiments as shown in Table 7. Each row represents a GAN model that was built based on a
training dataset in a corresponding experiment, and this GAN model processes a testing dataset in each
column. The color in Table 7 represents the value of the number in a cell. According to the evaluation
metrics, higher MSE values or lower SSIM values represent worse performance. Therefore, the red color
represents higher MSE and lower SSIM values, namely worse performance. To represent better
performance, the green color represents lower MSE and higher SSIM values.
Table 7. Total average MSE and SSIM values in each experiment.
Total average MSE Total average SSIM
Camp1
(testing)
Camp2
(testing)
City1
(testing)
City2
(testing)
Camp1
(testing)
Camp2
(testing)
City1
(testing)
City2
(testing)
Camp1
(training) 2.5779308 60.38710 81.72707 132.63658
0.927918 0.809343 0.803144 0.789476
Camp2
(training) 56.352844 4.675767 60.85602 138.23941
0.823244 0.914039 0.788834 0.75134
City1
(training) 107.46189 72.40453 3.70587 159.21241
0.837777 0.838546 0.944457 0.885718
City2
(training) 79.49741 66.63147 88.94927 2.33137 0.803554 0.800675 0.834157 0.943066
To investigate the bad performances, I selected cases with the highest MSE and lowest SSIM values
in each evaluation both in the same experiment and in the cross evaluation. In the same way that the
horizontal and vertical headers are organized in Table 7, each row in Figure 19 represents what training
dataset was used to build a GAN model, and each column represents a testing dataset that such a GAN
model processes. The title “Real captured”, “Simulated”, and “RGB” in Figure 19 represent captured
thermal images, rendered thermal images by a GAN model, and corresponding RGB images.
77
Figure 19. Selected images with highest MSE and lowest SSIM in each evaluation.
78
6.3.1 Simulation Result Assessment
As described in the method section, I evaluated the GAN simulation approaches based on MSE and
SSIM values. As shown in Table 7, if MSE and SSIM evaluations are treated as two matrices, the color
patterns for MSE and SSIM evaluation matrices are basically similar. First, green color is in the diagonals
both for total average MSE and SSIM evaluations, in other words, a good performance can be observed in
a case in which training and testing datasets are both in the same experiment. Second, using a GAN model
that is built based on city training datasets to render campus testing datasets performs better than the inverse,
since values in upper triangular entries are higher than values in lower triangular entries in MSE matrix,
and values in upper triangular entries are lower than values in lower triangular entries in a SSIM matrix.
The potential explanation is that the building styles in city centers are more complex than on campuses,
which allows a GAN model to learn more hidden features from building envelopes in a city center.
Additionally, as Figure 17 (a) shown, buildings are sparsely located and separated by lawn and roads on
campus. There is less building envelope information for a GAN model to learn. Therefore, a GAN model
established by city datasets is more capable of simulating building envelope thermal information. Third,
although the color patterns are similar between MSE and SSIM evaluation matrices, there is an outlier in
an entry (city1 training dataset – city 2 testing dataset) in MSE matrix, which is supposed to be small, but
such entry in SSIM evaluation matrix is normal.
Figure 19 illustrated the selected cases with bad performance in terms of MSE and SSIM values.
As the MSE and SSIM metrics described, the simulated thermal images with highest MSE values have big
color differences (grayscale color represents temperature information) from real captured thermal images,
and the simulated images with lowest SSIM values have more image noise and difficulties in representing
building envelope structures. Campus buildings’ envelopes are not complex like city buildings’ envelopes;
therefore, campus testing datasets intuitively are simulated better than city testing datasets, although
simulated images shown in Figure 19 are cases with highest MSE and lowest SSIM. Additionally, the
observation that models built based on city datasets perform better is also validated in Figure 19.
79
In order to understand the relationship between MSE and SSIM in terms of all images in an
individual evaluation, I plotted several multivariate distribution figures for both the same and cross
experiment evaluations. In the same way that the headers are organized in Table 7, these distribution figures
are plotted in Figure 20. In each distribution figure, the x-axis represents MSE value, and the y-axis
represents SSIM value. Each image in the testing dataset has an MSE and a SSIM value, and a red point
with a pair of MSE-SSIM coordinates is drawn in the distribution. The darker red area represents
concentrated red points while the lighter red area represents scattered red points. There are some patterns
observed in Figure 20. First, most figures illustrate negative correlations between MSE and SSIM values.
Figures in the diagonal show robust negative correlations that red areas are very thin like a line with a strong
negative coefficient. However, red points in other figures are scattered, which means such a performance
is not stable. Second, if I observe the distributions in terms of MSE and SSIM values, respectively, I find
that MSE values follow a long-tail distribution while SSIM values follow a gaussian distribution in most
evaluations. Third, distributions for the cases using a GAN model built with the city training dataset to
process campus testing dataset are more stable than the cases in an inverse way. The reason is that
distributions in Figure 20 (i), (j), (m), and (n) are more stable than distributions in Figure 20 (c), (d), (g),
and (h).
80
Figure 20. Multivariate distribution figures for both the same and cross experiment evaluations.
6.3.2 Comparison of Current Methods
There have been several simulation tools to generate synthetic thermal images for growing deep
learning training datasets. My approach has several differences from the current approaches. First, as Henon
et al. [145] described in their approach, they omitted some small structures (appliances and chimneys) on
roofs. In this study, there are many European traditional city buildings with appliances and chimneys on
81
complex roofs. Since my approach is directly implemented on captured images, these features are not
omitted. Second, the evaluation metrics are different. For example, in Aguerre et al.’s [146] experiment,
their simulations were based on building models, as such their evaluation did not include the surrounding
environment. In contrast, my approach covered both buildings and their surroundings. In addition, Aguerre
et al. compared simulation results from selected areas of building envelopes with real thermal information.
Such comparison cannot cover areas that the building model did not represent in the simulation. Their
evaluation did not compare this issue. In my approach, I compared the simulated thermal errors by
evaluating MSE values, on top of which, I also compared the building envelope structures in the simulated
images by evaluating SSIM values. I observe that an image translation approach is more feasible than a
physics-based approach for generating synthetic thermal images for segmentation datasets. Third, if a
physics-based approach is used for generating thermal images, researchers should configure a virtual
camera that is consistent to a camera used for capturing RGB images in terms of camera position, focal
length, and point of view (POV), but such a virtual camera is difficult to accurately configure. My image
translation approach avoids these procedures because it directly converts RGB images to thermal images.
On the other hand, my approach also has drawbacks compared to current approaches. As Aguerre
et al. [146] described, they can simulate the surface temperatures at different times of the day by adjusting
parameters. However, my datasets should be captured during the same time span of the day. For example,
datasets captured in the morning are not capable to simulate envelope surface temperatures in the night.
6.4 Conclusions and Future Work
Thermal information can be used to improve the segmentation of aerial images of outdoor scenes.
I proposed an innovative image translation approach that would simulate temperature information, and I
validated that such an approach is more feasible than a physics-based approach for generating synthetic
thermal images for segmentation. Compared to current approaches, there are some main benefits to my
approach: (1) It avoids acquisition of detailed system information like building materials and does not
require default configurations. (2) My approach is not limited to the geometric models’ precision and LOD,
82
since images data used in my approach are taken from drone view directly capturing the as-built building
envelopes. (3) My approach can simulate buildings’ surrounding environment thermal information. Those
elements were simplified as boxes in physics-based approaches. (4) Since my approach directly converts
RGB images to thermal images, it does not need to align a virtual camera that renders thermal images to a
real camera that captures RGB images.
My approach also has some limitations. Since the simulation process is based on historical training
datasets instead of the laws of physics, the time and season when these data were collected is important.
For example, training datasets collected in the morning or summer do not allow for simulating buildings’
envelope thermal information in the evening or winter, and vice versa. On the contrary, a physics-based
approach is based on building materials and laws of thermodynamics. It can simulate building surface
temperature in different times of a day and seasons of a year by adjusting corresponding parameters.
In this study, I only evaluated the GAN model performance of simulating thermal images by
implementing my approach on different datasets. I designed two evaluation metrics, MSE and SSIM values.
The former is to evaluate the ability of simulating building envelope thermal information, and the latter is
to evaluate the ability of simulating envelope appearances. As described in Section 5, some important
conclusions are shown as below: (1) Plotting all images’ pairs of MSE and SSIM values shows a negative
relationship between MSE and SSIM, namely one increasing while another one decreasing. If MSE and
SSIM are investigated separately, it shows that a long-tail distribution and a gaussian distribution can
respectively describe MSE and SSIM values’ distribution. (2) Using one model established by one building
style to generate thermal images in another is not ideal. Both Table 7 and Figure 20 demonstrated that a
case in which both training and testing datasets are in the same experiment (either city or campus
experiment) performs better than other cases in which both datasets are in different experiments. It is wiser
to use a training dataset that is similar to testing datasets for training the image translation models. (3) a
GAN model built based on city datasets perform better than a model built based on campus datasets. This
is because the city datasets have more complex buildings and features for the former model to learn. The
83
conclusions suggest researchers to use datasets in which building information is richer and envelope
structures are more complex as training datasets.
As described, the performance of deep learning using simulated images was not evaluated in this
study. In future work, I plan to further evaluate the segmentation performance using simulated images. I
will also consider integrating image generation with physics-based approaches to avoid their respective
drawbacks.
84
Chapter 7. Conclusions and Limitations
In this thesis, I establish an RGB and thermal data fusion framework for advanced building
envelope modeling. This approach makes full use of high-resolution RGB images to generate an accurate
3D point cloud model and fuses thermal information to the accurate model. Additionally, I test different
flight altitudes, angles, paths, and building styles to measure the influence of flight configurations on data
fusion performance. Additionally, each point in the thermal mapping model includes 3D coordinates, RGB,
and thermal information. Such rich information allows researchers to use a good quality of thermal mapping
model for energy audits. Second, I also demonstrate that adding thermal channels can increase the
performance of building component semantic segmentation. The semantic information allows researchers
to observe where the heat loss is located and to eliminate the heat generated by vehicles, equipment, and
other irrelevant objects. Third, to solve the data hunger problem, I use an image generation approach to
simulate synthetic thermal images. These synthetic thermal images allow researchers to expand their
training dataset and further to achieve a better segmentation result. In summary, this thesis establishes an
advanced data fusion framework and is applicable to many fields. Particularly, this thesis provides
fundamental research for energy audits using district level thermal mapping models.
However, there are some limitations in this work. First, the observation obtained from the data
fusion approach is limited to the camera used in this thesis. The proposed framework can be generalized
for data fusion in other research, but camera calibration, image registration, and other parameters need to
be re-calculated. Due to the complexity and cost of the experiments, I did not test all commercial thermal
cameras and all possible combinations of the flight configurations. In a future study, I propose to digitize
campus and city buildings in a virtual environment such as Unity3D and other digital engines and use an
artificial drone camera to collect rendered images to investigate more flight configurations. Second, this
thesis primarily improves the performance of the thermal mapping model reconstruction and introduces
approaches to classifying building components. However, to comprehensively detect heat loss and take
measures to prevent it, more research needs to be conduct in the future.
85
Chapter 8. Future Research Tasks
8.1 Thermal Bridge Detection on District Level Using Aerial Images
A thermal bridge is an area of the building envelope that conducts heat easily, thus transporting
heat from the warmer inside to the colder outside faster than it does through the adjacent areas. This is
caused by different thermal conductivities of used materials or the geometry of constructions. Air leaks can
also be subsumed under the term thermal bridge. Thermal bridges cause high energy losses which can make
up to one third of the transmission heat loss of an entire building. Additionally, they lead to the collection
of moisture, which in the long term degrades the building fabric. A thermal bridge can be seen on a
thermographic image as an area with an increased thermal radiation relative to adjacent areas.
In this study, I analyze how drone-based thermal images can be used for a simple analysis of the
thermal quality of building envelopes on district scale. To do so, I investigate the quality of thermal
panorama images obtained by drones and analyze how artificial intelligence can help to automatically detect
thermal bridges. I focus on thermal bridges on rooftops as they are difficult to access with conventional
thermography from terrestrial images.
I demonstrate a method to automatically detect thermal bridges on building rooftops in thermal
aerial images using a neural network. I employ existing solutions from the domain of object detection to
learn to identify the size and location of thermal bridges within each image. For this, I create a dataset of
drone images with annotations of thermal bridges on building rooftops. Each image of the dataset consists
of a combination of a thermal image, an RGB image recorded from the same angle and converted to the
same format, and height information for each pixel. I select a training dataset for the neural network
composed of a subset of the images and validate results on the remainder of the dataset.
The annotated images of the TBBR dataset contain a total of 6895 annotations. The annotations
only include thermal bridges that are easily identifiable, and thus also include thermal bridges that are not
annotated. Because of the image overlap each thermal bridge is annotated on average about 20 times from
86
different angles. An example image with annotations is shown in Figure 21. (a) is a thermal image, (b) is
an RGB image, (c) is a depth image, and (d) is an image with thermal bridge labels.
(a) (b) (c) (d)
Figure 21 Example of thermal bridge annotations in the dataset
I have a preliminary result shown in Table 8. To determine which predicted bounding boxes
correspond to correct predictions, the Intersection-over-Union (IoU) introduced in Chapter 5 is measured
between the predicted and ground truth boxes. For a given IoU threshold, predicted bounding boxes that
have an IoU with an annotated thermal bridge’s bounding box above the threshold are considered true
positives. Any annotated thermal bridges without a prediction satisfying this are considered false negatives.
Table 8 shows the metric scores for various common variants of the Recall metric. An IoU range (i.e.,
IoU=0.5:0.95) indicates the Recall is averaged over the given interval. An area of medium or large
corresponds to objects of area between 32
2
and 96
2
, and greater than 96
2
pixels, respectively. Max.
detections indicates the score given the N highest confidence predictions.
Table 8 Bounding box regression metrics on the test images dataset
Metric Area Max. detections Score
AR @ IoU=0.50:0.95 All 1 0.052
AR @ IoU=0.50:0.95 All 10 0.142
AR @ IoU=0.50:0.95 All 100 0.142
AR @ IoU=0.50:0.95 Medium 100 0.114
AR @ IoU=0.50:0.95 Large 100 0.196
We note immediately the comparatively low scores, which we attribute to the low number of
annotated examples relative to the large image sizes and sparsity/small size of thermal bridges. Notably,
the network performs better at larger scales, which is likely due to larger thermal bridges being less
ambiguous with regards to non-thermal bridge heat spots in an image.
87
8.2 Energy Consumption Simulations
The goal of this task is to understand how a thermal model can be used to improve the performance
of energy consumption simulation. To estimate energy consumption of several buildings in a district to
allow managers to make better buildings retrofit decisions. This study consists of four steps: (1) camera
calibration and registration; (2) model reconstruction; (3) data collection; and (4) energy consumption
simulation and evaluation. The workflow of the research method is illustrated in Figure 22.
Figure 22 Research Method Workflow
Step 1:
The first goal is to calibrate thermal cameras and evaluate the consistency of the data collection. I
plan to use three cameras for collecting data from the ground, air, and inside of the buildings. As the above
image has shown in Step 2, these thermal cameras, from left to right, are the FLIR Tau2 (FT), FLIR XT2
R (FX), and FLIR Duo Pro R (FD). Due to the complexity and cost of this study, we cannot have three
identical thermal cameras. Fortunately, these three thermal cameras have the same spectral band of 13 mm
and the same resolution of 640x512. However, we are still supposed to examine the consistency of their
abilities to record thermal data, despite all being manufactured by FLIR. We designed two experiments
before collecting data with these thermal cameras. In the first, objects are placed inside of the buildings,
88
while in the second, the objects are placed outdoors. The cameras are placed in front of the objects and the
thermal data captured by these cameras are then analyzed and compared. As shown above, in Figure 22,
Step 1, the image in the middle illustrates the experiment conducted indoors, and the image at the bottom
illustrates the experiment conducted outdoors.
I need to configure three various objects to test these thermal cameras’ abilities to record thermal
information. These three objects are (1) a crumpled piece of aluminum foil, (2) black tape (thermal
emissivity ε = 0.95), and (3) a chessboard with hollow tin foil edges. First, aluminum foil can be used to
capture the reflective ambient temperature. After these three cameras respectively obtained the thermal
information of this piece of aluminum foil, we separately calculated the average temperature values of the
foil target and compared the differences to determine these three cameras’ abilities of recording reflected
ambient temperature. Second, the black tape is a special adhesive tape with known emissivity (ε = 0.95). It
can be used as a reference material to measure other objects’ emissivity value. The values of this black tape
can therefore be detected by these three cameras that are also supposed to be the same. We then compared
the thermal information detected from these cameras and compared the discrepancies. Third, since these
three cameras were placed on a horizontal line, there inevitably exist spaces between each adjacent camera.
Thus, objects shown in thermal images detected from these cameras would have displacement. In order to
accurately locate the same object that these three cameras were targeting, a chessboard can play a role in
calibrating cameras and registering thermal images. However, the thermal discrepancies of a chessboard
made of one material cannot be detected by thermal images. Therefore, a hollow tin foil sheet is set on the
chessboard, solid areas are cooler than the hollow areas and thermal cameras can then detect the shape of
the chessboard.
Step 2:
In the second step, I need to calibrate these three cameras and compare their performance, they are
used to collect data from buildings. As shown in Figure 22, Step 2, Camera FX and FD are two cameras
that have both thermal lenses and optical lenses. They can simultaneously capture both a thermal image
(640 x 512) and a high-resolution RGB image (4000 x 3000) in a single integrated package. Additionally,
89
the double lenses of FX and FD can help generate a well-established 3D thermal and RGB model. Therefore,
FX is mounted on a drone to collect data from the air and FD is set on a tripod to collect data from the
ground. FT can only capture thermal information of objects and is used to capture thermal information from
inside buildings where RGB images are not needed.
Step 3:
The goal of the third step is to understand whether the different flight configurations of data
collection influence the results of the models used for final energy simulation to determine an appropriate
data collection setting. The flight configurations are classified into three categories: (1) different camera
altitude, (2) distinct camera angle, and (3) diverse flight path. As shown in Figure 22, Step 1, camera
altitude includes 30 and 60 meters; camera angle includes 30 and 45 degrees; and flight path includes mesh
grid and Y flight paths. After running varying data collection settings with these various factors, we
compared and analyzed the performance of the energy simulation based on thermal data collected with
these factors.
Step 4:
The fourth goal is to reconstruct a 3D thermal mapping model for buildings in a district level and
extract parameters for an energy consumption simulation. In this step, several models are reconstructed by
using different flight configurations based on the test factors introduced in Step 3. There are many
commercial software programs for 3D model reconstruction, including Pix4D, Agisoft, and DroneDeploy.
However, these programs cannot offer an API for supporting users’ extended development, for example,
fusing thermal and RGB information. Fortunately, ContextCapture from Bentley can record intermediate
files, such as image orientation, tie points, intrinsic and rotation matrix, etc. These intermediate files allow
RGB 3D models to be created based on RGB images and its RGB information to be fused with thermal
information projected from corresponding thermal images. Therefore, a well-established RGB and thermal
3D model can be obtained. Next, parameters such as U value can be analyzed and extracted from the thermal
model by a programming language (python in this task). These extracted parameters are introduced into
energy simulation software such as Autodesk Revit and Energy plus. Parameters extracted from different
90
models based on various flight configurations are introduced for energy consumption simulations. Different
results of simulations are compared to the ground truth that is real building energy consumption recorded
by meter to determine which result is much closer to the ground truth.
91
Chapter 9. Intellectual Merit and Broader Impacts
This research designs a framework for the 3D RGB and thermal model reconstruction for a group
of buildings in a district level, and it provides a framework for the classification and semantic segmentation
of fused thermal and RGB images of outdoor scenes. The proposed framework provides a novel way to
reconstruct a high-quality thermal mapping model by fusing thermal and RGB information together and
provides a novel semantic segmentation approach on RGB images that are fused with thermal images to
identify building components for energy audits.
In addition, this thesis leads to the research of: (1) relatively efficient and accurate drone-based data
collection for thermal mapping model reconstruction in a district level; (2) a novel approach to reconstruct
a high-quality thermal mapping model; (3) a novel approach to data fusion of thermal and RGB information;
(4) a novel approach to classify components in an outdoor sense; and (5) generate and simulate synthetic
thermal images to improve accuracy of semantic segmentation tasks.
This study also has several broader impacts. The ability to fuse thermal and RGB information is
valuable and applicable to many fields. For example, segmentation/classification of not only images but
also a photogrammetric-generated point cloud. Current research only focused on the RGB
photogrammetric-generated point cloud model. I plan to explore the segmentation performance on 3D
models that fuse both RGB and thermal information in a future study. Thermal information provides one
more dimension of features for the points and it potentially can improve the performance of segmentation
and classification.
92
References
[1] Commission of the European Communities, (2000). Action Plan to Improve Energy Efficiency in
the European Community., 2000. https://eur-
lex.europa.eu/LexUriServ/LexUriServ.do?uri=COM:2000:0247:FIN:EN:PDF (accessed Jun. 13,
2020).
[2] European Commission, (2014). Energy Efficiency and Its Contribution to Energy Security and the
2030 Framework for Climate and Energy Plicy, 2014.
https://www.consilium.europa.eu/en/policies/climate-change/2030-climate-and-energy-
framework/ (accessed Jun. 13, 2020).
[3] European Commission, 2030 Climate & Energy Framework.
https://ec.europa.eu/clima/policies/strategies/2030_en. (accessed Jun. 13, 2020).
[4] Legislation of UK, (2008). Climate Change Act 2008, 2008.
http://www.legislation.gov.uk/ukpga/2008/27/contents (accessed Jun. 13, 2020).
[5] J. Fankhauser, S., Averchenkova, A., & Finnegan, (2018). 10 years of the UK Climate Change
Act, The London School of Economics and Political Science, pp. 1–43, doi:
https://www.lse.ac.uk/GranthamInstitute/wp-content/uploads/2018/03/10-Years-of-the-UK-
Climate-Change-Act_Fankhauser-et-al.pdf.
[6] K. Appunn; and J. Wettengel, (2019). Germany’s Climate Action Law, Journalism for the Energy
Transition, 2019. https://www.cleanenergywire.org/factsheets/germanys-climate-action-law-
begins-take-shape (accessed Jun. 13, 2020).
[7] Federal Ministry for the Environment, (2016). Climate Action Plan 2050 - Principles and Goals of
the German Government’s Climate Policy, 2016.
https://www.bmu.de/fileadmin/Daten_BMU/Pools/Broschueren/klimaschutzplan_2050_en_bf.pdf
(accessed Jun. 13, 2020).
[8] Deutsche Energie-Agentur, Designing energy-efficient buildings. https://www.dena.de/en/topics-
projects/energy-efficiency/buildings/.
93
[9] U.S. Department of Energy, Buildings Energy Data Book, 2010.
http://buildingsdatabook.eren.doe.gov/TableView.aspx?table=2.1.6.
[10] U.S. Energy Information Administration, (2019). Annual Energy Outlook 2019. [Online].
Available: https://www.eia.gov/aeo%0AThis.
[11] U.S. Energy Information Administration, (2019). Natural Gas: Open Data, 2019.
https://www.eia.gov/dnav/ng/NG_CONS_HEAT_DCU_NUS_A.htm (accessed Jan. 03, 2019).
[12] (2010). U.S. Department of Energy, Buildings Energy Data Book, 2010.
http://buildingsdatabook.eren.doe.gov/TableView.aspx?table=2.1.6.
[13] FLIR Systems, (2011). An informative guide for the use of thermal imaging cameras for
inspecting buildings, solar panels and windmills. Thermal Image Guidebook for Building and
Renewable Energy Applications Content, 2011.
http://www.flirmedia.com/MMC/THG/Brochures/T820325/T820325_EN.pdf (accessed Jun. 13,
2020).
[14] L. De Boeck, S. Verbeke, A. Audenaert, and L. De Mesmaeker, (2015). Improving the energy
performance of residential buildings: A literature review, Renewable and Sustainable Energy
Reviews, vol. 52, pp. 960–975, doi: 10.1016/j.rser.2015.07.037.
[15] C. Cleveland; and C. Morris, (2014). Dictionary of Energy, doi: 9780080968124.
[16] A. Syed, (2012). Advanced building technologies for sustainability, doi: 9780470546031.
[17] A. Syed, (2012). Advanced building technologies for sustainability. John Wiley & Sons, 2012.
[18] (2019). Energy Efficiency Portals: Residential Sector: Homes & Appliances, American Council
for an Energy-Efficient Economy, 2019. .
[19] S. Froning, (2013). Low Carbon District Heating and CHP in the Future Energy Market : State of
the Art and Perspectives in the light of current policies Euroheat & Power, no. February.
[20] AGFW(neutral and high-performance energy efficiency association), (2018). Energy Efficiency
Association for Heating, Cooling and CHP, 2018. http://www.lowtemp.eu/partners/agfw-energy-
efficiency-association-heating-cooling-chp/ (accessed Aug. 01, 2019).
94
[21] The American Society of Civil Engineers, (2017). ASCE’s 2017 Infrastructure Report Card for
energy sector.
[22] O. Friman et al., (2014). Methods for Large-Scale Monitoring of District Heating Systems Using
Airborne Thermography Methods for Large-Scale Monitoring of District Heating Systems using
Airborne Thermography, vol. 52, no. 52, pp. 5175–5182.
[23] A. Taileb and H. Dekkiche, (2015). Infrared Imaging as a Means of Analyzing and Improving
Energy Efficiency of Building Envelopes: The case of a LEED Gold Building, Procedia
Engineering, vol. 118, pp. 639–646, doi: 10.1016/j.proeng.2015.08.497.
[24] Irish Standard, (2015). Thermal performance of buildings - Determination of air permeability of
buildings - Fan pressurization method (ISO 9972:2015), 2015.
https://www.iso.org/standard/55718.html (accessed Jun. 13, 2020).
[25] S. Dudić, I. Ignjatović, D. Šešlija, V. Blagojević, and M. Stojiljković, (2012). Leakage
quantification of compressed air using ultrasound and infrared thermography, Measurement:
Journal of the International Measurement Confederation, vol. 45, no. 7, pp. 1689–1694, doi:
10.1016/j.measurement.2012.04.019.
[26] V. Tink, S. Porritt, D. Allinson, and D. Loveday, (2018). Measuring and mitigating overheating
risk in solid wall dwellings retrofitted with internal wall insulation, Building and Environment,
vol. 141, no. June, pp. 247–261, doi: 10.1016/j.buildenv.2018.05.062.
[27] A. Kylili, P. A. Fokaides, P. Christou, and S. A. Kalogirou, (2014). Infrared thermography (IRT)
applications for building diagnostics: A review, Applied Energy, vol. 134, pp. 531–549, doi:
10.1016/j.apenergy.2014.08.005.
[28] Residential Energy Services Network, (2012). RESNET Interim Guidelines for Thermographic
Inspections of Buildings.
[29] D. Lin, M. Jarzabek-Rychard, D. Schneider, and H. G. Maas, (2018). Thermal texture selection
and correction for building facade inspection based on thermal radiant characteristics,
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences -
95
ISPRS Archives, vol. 42, no. 2, pp. 585–591, doi: 10.5194/isprs-archives-XLII-2-585-2018.
[30] Y. Hou, L. Soibelman, R. Volk, and M. Chen, (2019). Factors Affecting the Performance of 3D
Thermal Mapping for Energy Audits in A District by Using Infrared Thermography ( IRT )
Mounted on Unmanned Aircraft Systems ( UAS ), 2019 International Symposium on Automation
and Robotics in Construction (ISARC), pp. 266–273, doi:
https://doi.org/10.22260/ISARC2019/0036.
[31] Pix4DMapper Support Page, Pix4D. https://support.pix4d.com/hc/en-us/articles/202557459-Step-
1-Before-Starting-a-Project-1-Designing-the-Image-Acquisition-Plan-a-Selecting-the-Image-
Acquisition-Plan-Type#label5 (accessed Jun. 13, 2020).
[32] H. Chen, Y. Li, and D. Su, (2019). Multi-modal fusion network with multi-scale multi-path and
cross-modal interactions for RGB-D salient object detection, Pattern Recognition, vol. 86, pp.
376–385, doi: 10.1016/j.patcog.2018.08.007.
[33] A. S. Lundervold and A. Lundervold, (2019). An overview of deep learning in medical imaging
focusing on MRI, Zeitschrift fur Medizinische Physik, vol. 29, no. 2, pp. 102–127, doi:
10.1016/j.zemedi.2018.11.002.
[34] T. van der Ploeg, P. C. Austin, and E. W. Steyerberg, (2014). Modern modelling techniques are
data hungry: a simulation study for predicting dichotomous endpoints, BMC Medical Research
Methodology, vol. 14, no. 1, p. 137, doi: 10.1186/1471-2288-14-137.
[35] S. Zhou, Z. O’Neill, and C. O’Neill, (2018). A review of leakage detection methods for district
heating networks, Applied Thermal Engineering, vol. 137, no. April, pp. 567–574, doi:
10.1016/j.applthermaleng.2018.04.010.
[36] K. Sartor, S. Quoilin, and P. Dewallef, (2014). Simulation and optimization of a CHP biomass
plant and district heating network, Applied Energy, vol. 130, pp. 474–483, doi:
10.1016/j.apenergy.2014.01.097.
[37] Y. Kim, S. Je, T. Park, G. Lee, J. Chul, and J. Min, (2016). Robust leak detection and its
localization using interval estimation for water distribution network, Computers and Chemical
96
Engineering, vol. 92, pp. 1–17, doi: 10.1016/j.compchemeng.2016.04.027.
[38] A. Haghighi and H. M. Ramos, (2012). Detection of Leakage Freshwater and Friction Factor
Calibration in Drinking Networks Using Central Force Optimization, pp. 2347–2363, doi:
10.1007/s11269-012-0020-6.
[39] C. Chen, C. Yeh, B. R. Chang, and J. Pan, (2015). 3D Reconstruction from IR Thermal Images
and Reprojective Evaluations, vol. 2015.
[40] M. Goodman, (2003). Testing Building Envelope Leaks With Airborne Ultrasound.
[41] C. Moon, W. C. Brown, S. Mellen, E. Frenz, and D. J. Pickering, (2009). Ultrasound Techniques
for Leak Detection.
[42] Ultrasound Solution SDT International, (2009). Leak Surveyors Handbook. Cobourg, 2009.
[43] E. Barreira, R. M. S. F. Almeida, and M. Moreira, (2017). An infrared thermography passive
approach to assess the effect of leakage points in buildings, Energy and Buildings, vol. 140, pp.
224–235, doi: 10.1016/j.enbuild.2017.02.009.
[44] C. A. Balaras and A. A. Argiriou, (2002). Infrared thermography for building diagnostics, Energy
and Buildings, vol. 34, no. 2, pp. 171–183, doi: https://doi.org/10.1016/S0378-7788(01)00105-0.
[45] D. Johnston and D. Miles-shenton, (2018). The airtightness and air leakage characteristics of new
UK holiday homes, doi: 10.1177/0143624417748238.
[46] J. M. Hart, (1991). A practical guide to infra-red thermography for building surveys. Building
Research Establishment, 1991.
[47] F. Asdrubali, G. Baldinelli, and F. Bianchi, (2012). A quantitative methodology to evaluate
thermal bridges in buildings, Applied Energy, vol. 97, pp. 365–373, doi:
https://doi.org/10.1016/j.apenergy.2011.12.054.
[48] T. Taylor, J. Counsell, and S. Gill, (2013). Energy efficiency is more than skin deep: Improving
construction quality control in new-build housing using thermography, Energy and Buildings, vol.
66, pp. 222–231, doi: https://doi.org/10.1016/j.enbuild.2013.07.051.
[49] P. De Berardinis, M. Rotilio, C. Marchionni, and A. Friedman, (2014). Improving the energy-
97
efficiency of historic masonry buildings. A case study: A minor centre in the Abruzzo region,
Italy, Energy and Buildings, vol. 80, pp. 415–423, doi:
https://doi.org/10.1016/j.enbuild.2014.05.047.
[50] T. Rakha and A. Gorodetsky, (2018). Review of Unmanned Aerial System (UAS) applications in
the built environment: Towards automated building inspection procedures using drones,
Automation in Construction, vol. 93, no. January, pp. 252–264, doi: 10.1016/j.autcon.2018.05.002.
[51] E. Lucchi, (2018). Applications of the infrared thermography in the energy audit of buildings: A
review, Renewable and Sustainable Energy Reviews, vol. 82, no. May 2017, pp. 3077–3090, doi:
10.1016/j.rser.2017.10.031.
[52] R. Albatici, A. M. Tonelli, and M. Chiogna, (2015). A comprehensive experimental approach for
the validation of quantitative infrared thermography in the evaluation of building thermal
transmittance, Applied Energy, vol. 141, pp. 218–228, doi: 10.1016/j.apenergy.2014.12.035.
[53] L. Hoegner and U. Stilla, (2018). Mobile thermal mapping for matching of infrared images with
3D building models and 3D point clouds, Quantitative InfraRed Thermography Journal, vol.
6733, pp. 1–19, doi: 10.1080/17686733.2018.1455129.
[54] M. Fox, S. Goodhew, and P. De Wilde, (2016). Building defect detection: External versus internal
thermography, Building and Environment, vol. 105, pp. 317–331, doi:
10.1016/j.buildenv.2016.06.011.
[55] M. Vollmer and K. P. Möllmann, (2010). Infrared Thermal Imaging: Fundamentals, Research and
Applications., doi: 9783527413515.
[56] Residential Energy Services Network, RESNET Interim Guidelines for Thermographic
Inspections of Buildings. https://www.resnet.us/articles/resnet-revises-interim-guidelines-for-
thermographic-inspections-of-buildings/ (accessed Jun. 13, 2020).
[57] M. N. Zulgafli and K. N. Tahar, (2017). Three dimensional curve hall reconstruction using semi-
automatic UAV, Asian Res. Pub. Net. J. Eng. Appl. Sci.(ARPN-JEAS), vol. 12, no. 10, pp. 3228–
3232.
98
[58] L. Vegas, L. Vegas, D. R. Shields, U. S. Navy, and U. S. Army, (2014). Unmanned Aerial Vehicle
Applications and Issues for Construction.
[59] Y. Ham and M. Golparvar-Fard, (2013). An automated vision-based method for rapid 3D energy
performance modeling of existing buildings using thermal and digital imagery, Advanced
Engineering Informatics, vol. 27, no. 3, pp. 395–409, doi: 10.1016/j.aei.2013.03.005.
[60] T. Adao et al., Hyperspectral Imaging : A Review on UAV-Based Sensors , Data Processing and
Applications for Agriculture and Forestry, doi: 10.3390/rs9111110.
[61] Y. K. Cho, Y. Ham, and M. Golpavar-fard, (2015). Advanced Engineering Informatics 3D as-is
building energy modeling and diagnostics : A review of the state-of-the-art, Advanced Engineering
Informatics, vol. 29, no. 2, pp. 184–195, doi: 10.1016/j.aei.2015.03.004.
[62] D. Iwaszczuk and U. Stilla, (2016). Quality Assessment of Building Textures Extracted from
Oblique Airborne Thermal Imagery, ISPRS Annals of the Photogrammetry, Remote Sensing and
Spatial Information Sciences, vol. 3, no. 1, pp. 3–8, doi: 10.5194/isprs-annals-III-1-3-2016.
[63] P. Westfeld, D. Mader, and H. G. Maas, (2015). Generation of TIR-attributed 3D point clouds
from UAV-based thermal imagery, Photogrammetrie, Fernerkundung, Geoinformation, vol. 2015,
no. 5, pp. 381–393, doi: 10.1127/1432-8364/2015/0274.
[64] J. Park, P. Kim, Y. K. Cho, and J. Kang, (2019). Framework for automated registration of UAV
and UGV point clouds using local features in images, Automation in Construction, vol. 98, no.
November 2018, pp. 175–182, doi: 10.1016/j.autcon.2018.11.024.
[65] D. Mistry and A. Banerjee, (2017). Comparison of Feature Detection and Matching Approaches:
SIFT and SURF, GRD Journals- Global Research and Development Journal for Engineering, vol.
2, no. 4, pp. 7–13, doi:
https://www.grdjournals.com/uploads/article/GRDJE/V02/I04/0013/GRDJEV02I040013.pdf.
[66] J. L. Schonberger and J.-M. Frahm, (2016). Structure-From-Motion Revisited, The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), doi: 10.1109/CVPR.2016.445.
[67] D. Forsyth and J. Ponce, (2011). Computer vision: A modern approach, doi: 9780136085928.
99
[68] L. Acampora, I. National, F. De Filippis, A. Martucci, and L. Sorgi, (2011). 3D Reconstruction of
Thermal Images 3D Reconstruction of Thermal Images, 26th Aerospace Testing Seminar, pp.
263–276, doi:
https://www.researchgate.net/publication/233407848_3D_Reconstruction_of_Thermal_Images.
[69] F. Remondino, (2003). From point cloud to surface: the modeling and visualization problem.,
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
34, vol. XXXIV5/W10, doi: https://doi.org/10.3929/ethz-a-004655782.
[70] and S. S. Golparvar-Fard, M., F. Pe˜na-Mora, (2009). D4AR—A 4-dimensional augmented
reality model for automating construction progress monitoring data collection, processing and
communication, J. Inf. Technol. Constr., vol. 14, no. 13, pp. 129–153.
[71] and F. I. R. Armesto, J., I. Lubowiecka, C. Ord´ o˜nez, (2009). FEM modeling of structures based
on close range digital photogrammetry, Autom. Constr., vol. 18, no. 5, pp. 559–569.
[72] and A. R. Brilakis, I., H. Fathi, (2011). Progressive 3D reconstruction of infrastructure with
videogrammetry, Autom. Constr., vol. 20, no. 7, pp. 884–895.
[73] and E. K. Rashidi, A., (2018). Video to BrIM: Automated 3D asbuilt documentation of bridges, J.
Perform. Constr. Facil., vol. 32, no. 3.
[74] F. Bosché, M. Ahmed, Y. Turkan, C. T. Haas, and R. Haas, (2015). The value of integrating Scan-
to-BIM and Scan-vs-BIM techniques for construction monitoring using laser scanning and BIM:
The case of cylindrical MEP components, Automation in Construction, vol. 49, pp. 201–213.
[75] L. Wallace, A. Lucieer, Z. Malenovský, D. Turner, and P. Vopěnka, (2016). Assessment of forest
structure using two UAV techniques: A comparison of airborne laser scanning and structure from
motion (SfM) point clouds, Forests, vol. 7, no. 3, p. 62.
[76] H. Hamledari, B. McCabe, S. Davari, A. Shahi, E. Rezazadeh Azar, and F. Flager, (2017).
Evaluation of computer vision- And 4D BIM-based construction progress tracking on a UAV
platform, 6th CSCE-CRC International Construction Specialty Conference 2017, vol. 1, no.
March, pp. 621–630, doi: https://purl.stanford.edu/wh873cw2351.
100
[77] J. Shan and C. K, (2018). Topographic Laser Ranging and Scanning: Principles and Processing.
2018.
[78] N. Chehata, L. Guo, and R. Forests, (2009). AIRBORNE LIDAR FEATURE SELECTION FOR
URBAN CLASSIFICATION USING, vol. XXXVIII, no. c, pp. 207–212.
[79] A. M. Kim, R. C. Olsen, and F. A. Kruse, (2013). Methods for LiDAR point cloud classification
using local neighborhood statistics, vol. 8731, pp. 1–10, doi: 10.1117/12.2015709.
[80] J. Niemeyer, F. Rottensteiner, U. Soergel, and A. C. R. Fields, (2013). Classification of Urban
LiDAR data using Conditional Random Field and Random Forests, vol. 856, no. 1, pp. 1–4.
[81] G. Clarkson, S. Luo, and R. Fuentes, (2017). Thermal 3D Modelling, 34th International
Symposium in Automation and Robotics in Construction, doi: 10.22260/ISARC2017/0068.
[82] S. Lagüela, J. Martínez, J. Armesto, and P. Arias, (2011). Energy efficiency studies through 3D
laser scanning and thermographic technologies, vol. 43, pp. 1216–1221, doi:
10.1016/j.enbuild.2010.12.031.
[83] D. Borrmann, H. Afzal, J. Elseberg, and N. Andreas, (2012). Mutual Calibration for 3D Thermal
Mapping, pp. 3–8, doi: 10.3182/20120905-3-HR-2030.00073.
[84] L. Hoegner and U. Stilla, (2009). Thermal leakage detection on building facades using infrared
textures generated by mobile mapping, no. June, doi: 10.1109/URS.2009.5137681.
[85] S. Lagüela, J. Armesto, P. Arias, and J. Herráez, (2012). Automation of thermographic 3D
modelling through image fusion and image matching techniques, Automation in Construction, vol.
27, pp. 24–31, doi: 10.1016/j.autcon.2012.05.011.
[86] P. A. Fokaides and S. A. Kalogirou, (2011). Application of infrared thermography for the
determination of the overall heat transfer coefficient (U-Value) in building envelopes, Applied
Energy, vol. 88, no. 12, pp. 4358–4365, doi: 10.1016/j.apenergy.2011.05.014.
[87] E. Barreira, R. M. S. F. Almeida, and J. M. P. Q. Delgado, (2016). Infrared thermography for
assessing moisture related phenomena in building components, Construction and Building
Materials, vol. 110, pp. 251–269, doi: 10.1016/j.conbuildmat.2016.02.026.
101
[88] L. Hoegner and U. Stilla, (2009). Thermal leakage detection on building facades using infrared
textures generated by mobile mapping, 2009 Joint Urban Remote Sensing Event, doi:
10.1109/URS.2009.5137681.
[89] N. Bayomi, S. Nagpal, T. Rakha, C. Reinhart, and J. E. Fernandez, (2019). Aerial thermography as
a tool to inform building envelope simulation models, SimAud 2019, pp. 45–48, doi:
https://dl.acm.org/doi/pdf/10.5555/3390098.3390121.
[90] D. Lin, M. Jarzabek-rychard, X. Tong, and H. Maas, (2019). Fusion of thermal imagery with point
clouds for building façade thermal attribute mapping, ISPRS Journal of Photogrammetry and
Remote Sensing, vol. 151, no. December 2018, pp. 162–175, doi: 10.1016/j.isprsjprs.2019.03.010.
[91] M. Weinmann, J. Leitloff, L. Hoegner, B. Jutzi, U. Stilla, and S. Hinz, (2014). Thermal 3D
Mapping for Object Detetion in Dyanmic Scenes, vol. II, no. November, pp. 17–20, doi:
10.5194/isprsannals-II-1-53-2014.
[92] D. Lin, (2018). Thermal Texture Selection and Correction for Building Facade Inspection Based
on Thermal Radiant Characteristics, no. May, doi: 10.5194/isprs-archives-xlii-2-585-2018.
[93] A. Wiens, W. Kleiber, K. R. Barnhart, and D. Sain, (2019). Surface Estimation for Multiple
Misaligned Point Sets, Mathematical Geosciences, doi: 10.1007/s11004-019-09802-y.
[94] L. Hoegner, S. Tuttas, and U. Stilla, (2016). 3D building reconstruction and construction site
monitoring from RGB and TIR image sets, in 2016 12th IEEE International Symposium on
Electronics and Telecommunications (ISETC), 2016, pp. 305–308, doi:
10.1109/ISETC.2016.7781118.
[95] L. Hoegner, S. Tuttas, Y. Xu, K. Eder, and U. Stilla, (2016). Evaluation of methods for
coregistration and fusion of RPAS-based 3D point clouds and thermal infrared images,
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences -
ISPRS Archives, vol. 41, no. July, pp. 241–246, doi: 10.5194/isprsarchives-XLI-B3-241-2016.
[96] T. P. Truong, M. Yamaguchi, S. Mori, V. Nozick, and H. Saito, Registration of RGB and thermal
point clouds generated by structure from motion, 2017 IEEE International Conference on
102
Computer Vision Workshops, ICCVW 2017, pp. 419–427, doi:
https://doi.org/10.1109/ICCVW.2017.57.
[97] A. Krizhevsky, I. Sutskever, and G. E. Hinton, (2017). ImageNet Classification with Deep
Convolutional Neural Networks, Communications of the Association for Computing Machinery,
vol. 60, no. 6, pp. 84–90, doi: https://doi.org/10.1145/3065386.
[98] M. D. Zeiler and R. Fergus, (2014). Visualizing and understanding convolutional networks, in
European conference on computer vision, 2014, pp. 818–833, doi: arXiv:1311.2901.
[99] M. Nawaz and H. Yan, (2020). Saliency Detection using Deep Features and Affinity-based Robust
Background Subtraction, IEEE Transactions on Multimedia, doi: 10.1109/TMM.2020.3019688.
[100] H. Chen and Y. Li, (2018). Progressively Complementarity-Aware Fusion Network for RGB-D
Salient Object Detection, in 2018 IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2018, pp. 3051–3060, doi: 10.1109/CVPR.2018.00322.
[101] H. Chen, Y. Li, and D. Su, (2019). Multi-modal fusion network with multi-scale multi-path and
cross-modal interactions for RGB-D salient object detection, Pattern Recognition, vol. 86, pp.
376–385, doi: https://doi.org/10.1016/j.patcog.2018.08.007.
[102] C. Luo, B. Sun, K. Yang, T. Lu, and W. C. Yeh, (2019). Thermal infrared and visible sequences
fusion tracking based on a hybrid tracking framework with adaptive weighting scheme, Infrared
Physics and Technology, vol. 99, no. March, pp. 265–276, doi: 10.1016/j.infrared.2019.04.017.
[103] C. Li, X. Wu, N. Zhao, X. Cao, and J. Tang, (2018). Fusing two-stream convolutional neural
networks for RGB-T object tracking, Neurocomputing, vol. 281, pp. 78–85, doi:
10.1016/j.neucom.2017.11.068.
[104] S. Zhai, P. Shao, X. Liang, and X. Wang, (2019). Fast RGB-T Tracking via Cross-Modal
Correlation Filters, Neurocomputing, vol. 334, pp. 172–181, doi: 10.1016/j.neucom.2019.01.022.
[105] J. Jiang, K. Jin, M. Qi, Q. Wang, J. Wu, and C. Chen, (2020). A Cross-Modal Multi-granularity
Attention Network for RGB-IR Person Re-identification, Neurocomputing, vol. 406, no. 17, pp.
59–67, doi: 10.1016/j.neucom.2020.03.109.
103
[106] Y. Hou, R. Volk, and L. Soibelman, (2021). A Novel Building Temperature Simulation Approach
Driven by Expanding Semantic Segmentation Training Datasets with Synthetic Aerial Thermal
Images, Energies, vol. 14, no. 2, doi: https://doi.org/10.3390/en14020353.
[107] I. El-Darwish and M. Gomaa, (2017). Retrofitting strategy for building envelopes to achieve
energy efficiency, Alexandria Engineering Journal, vol. 56, no. 4, pp. 579–589, doi:
10.1016/j.aej.2017.05.011.
[108] T. Hong, Y. Chen, X. Luo, N. Luo, and S. H. Lee, (2020). Ten questions on urban building energy
modeling, Building and Environment, vol. 168, pp. 1–16, doi: 10.1016/j.buildenv.2019.106508.
[109] G. Baldinelli and F. Bianchi, (2014). Windows thermal resistance: Infrared thermography aided
comparative analysis among finite volumes simulations and experimental methods, Applied
Energy, vol. 136, pp. 250–258, doi: 10.1016/j.apenergy.2014.09.021.
[110] Y. Zhou, L. Wang, P. E. D. Love, L. Ding, and C. Zhou, (2019). Three-dimensional (3D)
reconstruction of structures and landscapes: A new point-and-line fusion method, Advanced
Engineering Informatics, vol. 42, pp. 1–11, doi: 10.1016/j.aei.2019.100961.
[111] G. Boström, J. G. M. Gonçalves, and V. Sequeira, (2008). Controlled 3D Data Fusion using Error-
bounds, ISPRS Journal of Photogrammetry and Remote Sensing, vol. 63, no. 1, pp. 55–67, doi:
10.1016/j.isprsjprs.2007.07.011.
[112] B. G. Rayson, N. T. P. Enna, and J. P. M. Ills, (2018). GPS Precise Point Positioning For UAV
Photogrammetry, vol. 33, no. December, pp. 427–447, doi: 10.1111/phor.12259.
[113] M. M. R. Mostafa and K. Schwarz, (2001). Digital image georeferencing from a multiple camera
system by GPS/INS, ISPRS Journal of Photogrammetry and Remote Sensing, vol. 56, no. 1, pp. 1–
12, doi: https://doi.org/10.1016/S0924-2716(01)00030-2.
[114] F. Sheeba, R. Thamburaj, J. J. Mammen, M. Kumar, and V. Rangslang, (2015). Convex hull based
detection of overlapping red blood cells in peripheral blood smear images, IFMBE Proceedings,
vol. 52, pp. 51–53, doi: 10.1007/978-3-319-19452-3_14.
[115] R. Bai, J. Zhao, D. Li, and W. Meng, (2010). A new technique of camera calibration based on X-
104
target, Communications in Computer and Information Science, vol. 97 CCIS, no. PART 1, pp. 39–
46, doi: 10.1007/978-3-642-15853-7_6.
[116] Y. Rubner, C. Tomasi, and L. J. Guibas, (2000). The Earth Mover’s Distance as a Metric for
Image Retrieval, International Journal of Computer Vision, vol. 40, pp. 99–121, doi:
https://doi.org/10.1023/A:1026543900054.
[117] Y. Hou, L. Soibelman, R. Volk, and M. Chen, Factors Affecting the Performance of 3D Thermal
Mapping for Energy Audits in A District by Using Infrared Thermography ( IRT ) Mounted on
Unmanned Aircraft Systems ( UAS ), doi: https://doi.org/10.22260/ISARC2019/0036.
[118] K. He, G. Gkioxari, P. Dollár, and R. Girshick, (2020). Mask R-CNN, IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 386–397, doi:
10.1109/TPAMI.2018.2844175.
[119] A. Wong, M. Famuori, M. J. Shafiee, F. Li, B. Chwyl, and J. Chung, (2019). YOLO Nano: a
Highly Compact You Only Look Once Convolutional Neural Network for Object Detection, pp.
1–5, doi: https://arxiv.org/abs/1910.01271.
[120] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, (2018). DeepLab: Semantic
Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected
CRFs, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–
848, doi: 10.1109/TPAMI.2017.2699184.
[121] M. Chen, U. S. C. S. Astani, R. Mcalinden, R. Spicer, L. Soibelman, and U. S. C. S. Astani,
(2019). Semantic Modeling of Outdoor Scenes for the Creation of Virtual Environments and
Simulations, vol. 6, pp. 1947–1956.
[122] R. Spicer, R. Mcalinden, and D. Conover, (2016). Producing Usable Simulation Terrain Data from
UAS-Collected Imagery, Proceedings of the 2016 Interservice/Industry Training Systems and
Education Conference (I/ITSEC), no. 16069, pp. 1–13.
[123] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. Garcia-Rodriguez,
(2017). A Review on Deep Learning Techniques Applied to Semantic Segmentation, pp. 1–23,
105
[Online]. Available: http://arxiv.org/abs/1704.06857.
[124] S. Park and S. Lee, RDFNet : RGB-D Multi-level Residual Feature Fusion for Indoor Semantic
Segmentation Ki-Sang Hong, pp. 4980–4989.
[125] J. Wang, Z. Wang, D. Tao, S. See, and G. Wang, (2016). Learning common and specific features
for RGB-D semantic segmentation with deconvolutional networks, Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 9909 LNCS, pp. 664–679, doi: 10.1007/978-3-319-46454-1_40.
[126] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, (2017). Pyramid scene parsing network, Proceedings -
30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-
Janua, pp. 6230–6239, doi: 10.1109/CVPR.2017.660.
[127] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, (2017). Rethinking Atrous Convolution for
Semantic Image Segmentation, [Online]. Available: http://arxiv.org/abs/1706.05587.
[128] L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, (2018). Encoder-decoder with
atrous separable convolution for semantic image segmentation, Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 11211 LNCS, pp. 833–851, doi: 10.1007/978-3-030-01234-2_49.
[129] M. Chen, A. Feng, R. McAlinden, and L. Soibelman, (2020). Photogrammetric Point Cloud
Segmentation and Object Information Extraction for Creating Virtual Environments and
Simulations, Journal of Management in Engineering, vol. 36, no. 2, pp. 1–17, doi:
10.1061/(ASCE)ME.1943-5479.0000737.
[130] Z. Liu, W. Zhang, and P. Zhao, (2020). A cross-modal adaptive gated fusion generative
adversarial network for RGB-D salient object detection, Neurocomputing, vol. 387, pp. 210–220,
doi: 10.1016/j.neucom.2020.01.045.
[131] M. H. Shariq and B. R. Hughes, (2020). Revolutionising building inspection techniques to meet
large-scale energy demands : A review of the state-of-the-art, Renewable and Sustainable Energy
Reviews, vol. 130, no. June, p. 109979, doi: 10.1016/j.rser.2020.109979.
106
[132] M. Chen, A. Feng, R. Mcalinden, and L. Soibelman, (2020). Generating Synthetic
Photogrammetric Data for Training Deep Learning based 3D Point Cloud Segmentation Models,
no. 20221, pp. 1–12, doi: https://arxiv.org/abs/2008.09647v1.
[133] C. Li, W. Xia, Y. Yan, B. Luo, and J. Tang, (2020). Segmenting Objects in Day and Night: Edge-
Conditioned CNN for Thermal Image Semantic Segmentation, IEEE Transactions on Neural
Networks and Learning Systems, vol. 1, pp. 1–14, doi: 10.1109/tnnls.2020.3009373.
[134] J. Han, H. Chen, N. Liu, C. Yan, and X. Li, (2018). CNNs-Based RGB-D Saliency Detection via
Cross-View Transfer and Multiview Fusion, IEEE Transactions on Cybernetics, vol. 48, no. 11,
pp. 3171–3183, doi: 10.1109/TCYB.2017.2761775.
[135] R. Ilehag, A. Schenk, Y. Huang, and S. Hinz, (2019). KLUM: An urban VNIR and SWIR spectral
library consisting of building materials, Remote Sensing, vol. 11, no. 18, pp. 1–28, doi:
10.3390/rs11182149.
[136] D. Bulatov, E. Burkard, R. Ilehag, B. Kottler, and P. Helmholz, (2020). From multi-sensor aerial
data to thermal and infrared simulation of semantic 3D models: Towards identification of urban
heat islands, Infrared Physics and Technology, vol. 105, doi: 10.1016/j.infrared.2020.103233.
[137] J. P. Aguerre, R. Nahon, E. Garcia-Nevado, C. La Borderie, E. Fernández, and B. Beckers, (2019).
A street in perspective: Thermography simulated by the finite element method, Building and
Environment, vol. 148, no. July 2018, pp. 225–239, doi: 10.1016/j.buildenv.2018.11.007.
[138] M. Idczak, D. Groleau, P. Mestayer, J. M. Rosant, and J. F. Sini, (2010). An application of the
thermo-radiative model SOLENE for the evaluation of street canyon energy balance, Building and
Environment, vol. 45, no. 5, pp. 1262–1275, doi: 10.1016/j.buildenv.2009.11.011.
[139] L. Roupioz, P. Kastendeuch, F. Nerry, J. Colin, G. Najjar, and R. Luhahe, (2018). Description and
assessment of the building surface temperature modeling in LASER/F, Energy and Buildings, vol.
173, pp. 91–102, doi: 10.1016/j.enbuild.2018.05.033.
[140] A. Hénon, P. G. Mestayer, J. P. Lagouarde, and J. A. Voogt, (2012). An urban neighborhood
temperature and energy study from the CAPITOUL experiment with the SOLENE model: Part 1:
107
Analysis of flux contributions, Theoretical and Applied Climatology, vol. 110, no. 1–2, pp. 177–
196, doi: 10.1007/s00704-012-0615-0.
[141] B. Kottler, E. Burkard, D. Bulatov, and L. Haraké, (2019). Physically-based thermal simulation of
large scenes for infrared imaging, VISIGRAPP 2019 - Proceedings of the 14th International Joint
Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol.
1, no. Visigrapp, pp. 53–64, doi: 10.5220/0007351400530064.
[142] X. Xiong, F. Zhou, X. Bai, B. Xue, and C. Sun, (2016). Semi-automated infrared simulation on
real urban scenes based on multi-view images, Optics Express, vol. 24, no. 11, p. 11345, doi:
10.1364/oe.24.011345.
[143] E. A. Silva, K. Panetta, and S. S. Agaian, (2007). Quantifying image similarity using measure of
enhancement by entropy, Mobile Multimedia/Image Processing for Military and Security
Applications 2007, vol. 6579, p. 65790U, doi: https://doi.org/10.1117/12.720087.
[144] S. AGandhi and C.V.Kulkarni, (2013). MSE Vs SSIM, International Journal of Scientific &
Engineering Research, vol. 4, no. 7, pp. 930–934, doi:
https://www.ijser.org/onlineResearchPaperViewer.aspx?MSE-Vs-SSIM.pdf.
[145] A. Hénon, P. G. Mestayer, J. P. Lagouarde, and J. A. Voogt, (2012). An urban neighborhood
temperature and energy study from the CAPITOUL experiment with the Solene model: Part 2:
Influence of building surface heterogeneities, Theoretical and Applied Climatology, vol. 110, no.
1–2, pp. 197–208, doi: 10.1007/s00704-012-0616-z.
[146] J. P. Aguerre, E. García-Nevado, J. Acuña Paz y Miño, E. Fernández, and B. Beckers, (2020).
Physically Based Simulation and Rendering of Urban Thermography, Computer Graphics Forum,
vol. 00, no. 00, pp. 1–15, doi: 10.1111/cgf.14044.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
In-situ quality assessment of scan data for as-built models using building-specific geometric features
PDF
Learning personal thermal comfort and integrating personal comfort requirements into HVAC system control loop
PDF
Building occupancy modeling and occupancy-loads relationships for building heating/cooling energy efficiency
PDF
Intelligent adaptive automation: activity-driven and user-centered building automation
PDF
A framework for comprehensive assessment of resilience and other dimensions of asset management in metropolis-scale transport systems
PDF
Semantic modeling of outdoor scenes for the creation of virtual environments and simulations
PDF
Understanding human-building interactions through perceptual decision-making processes
PDF
A radio frequency based indoor localization framework for supporting building emergency response operations
PDF
Enabling human-building communication to promote pro-environmental behavior in office buildings
PDF
Mitigating thermal bridging in ventilated rainscreen envelope construction: Methods to reduce thermal transfer in net-zero envelope optimization
PDF
User-centric smart sensing for non-intrusive electricity consumption disaggregation in buildings
PDF
Understanding human-building-emergency interactions in the built environment
PDF
A parametric study of the thermal performance of green roofs in different climates through energy modeling
PDF
Towards health-conscious spaces: building for human well-being and performance
PDF
The power of flexibility: autonomous agents that conserve energy in commercial buildings
PDF
Bridging the gap: a tool to support bim data transparency for interoperability with building energy performance software
PDF
District energy systems: Studying building types at an urban scale to understand building energy consumption and waste energy generation
PDF
Behavioral form finding using multi-agent systems: a computational methodology for combining generative design with environmental and structural analysis in architectural design
PDF
Visualizing thermal data in a building information model
PDF
A simplified building energy simulation tool: material and environmental properties effects on HVAC performance
Asset Metadata
Creator
Hou, Yu
(author)
Core Title
Point cloud data fusion of RGB and thermal information for advanced building envelope modeling in support of energy audits for large districts
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Civil Engineering
Degree Conferral Date
2021-08
Publication Date
07/23/2021
Defense Date
06/15/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D modeling,building information modeling,energy audits,heat loss,OAI-PMH Harvest,thermal
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Soibelman, Lucio (
committee chair
), Becerik-Gerber, Burcin (
committee member
), Konis, Kyle (
committee member
)
Creator Email
yuhou@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC15618746
Unique identifier
UC15618746
Legacy Identifier
etd-HouYu-9848
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Hou, Yu
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
3D modeling
building information modeling
energy audits
heat loss
thermal